New research tool: zanran

Felix Salmon mentioned a new data research tool in a recent post and asked for feedback on its usefulness. I seriously doubt he reads this three-month-old blog, but I thought I would test drive it anyway since I needed to look up some data on labor and food markets.

The tool is called zanran, and it takes advantage of something I only recently figured out myself, which is that the best way to search for data is often through the Images search in Google. The zanran folks call this ‘semi-structured’ data, and describe their technology approach as follows:

Zanran doesn’t work by spotting wording in the text and looking for images – it’s the other way round. The system examines millions of images and decides for each one whether it’s a graph, chart or table – whether it has numerical content.

The core technology is patented computer vision algorithms that decide whether an image is numerical – and they’re accurate (about 98%). But the huge majority of images on the internet are not graphs etc. So even though the accuracy is high, you will still get some non-numerical images.

In comparison, looking for tables is relatively simple. Once we’ve found a table we then have to decide whether it’s essentially numerical – and we have algorithms for that.

Our programmes then take suitable text near that image and build the search engine around that text. At present, we extract tables and images from HTML, PDF and Excel files and will be processing PowerPoint and Word documents in the near future.

The two founders sounds like interesting people, and it’s worth checking out their About page just to learn more about how this came together. And they adopted a shelter dog – always a sign of superior judgment in my somewhat biased view.

As for usefulness, I think it’s generally effective in the terms they set out to achieve, though getting to actual data for your own analysis often requires an extra step or two as it would with any traditional search tool. For example, searching for “farmland values” took me to a page with results from a wide range of  governmental and think-tank research reports. Hovering the mouse over any of the pdf images to the left of the screen pops up a very helpful view of the page in the document that includes the data (typically in a chart or graph). Equally helpful, the search results index by source, and the page gives you a link to all of the other search results from that one source.

The latter features are the differentiators vs. just going to Google Images. I will continue to use this in my research this summer, and am adding it (along with the invaluable Scribd) in a new category of permanent links to the right.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to New research tool: zanran

  1. Jon Goldhill says:

    Nice post – thanks.
    If you have ideas for improvements or new features – do get in touch

    regards

    Jon

  2. Anchard says:

    Thanks Jon, and I absolutely will. I didn’t realize until after I posted that you were still in beta – I look forward to seeing what you do next, as I think it’s a very useful tool.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s