Darkness law works the same for the hidden, the forgotten or lost. But internet darkness fall on something else that has no search tags. Millions of images on the history of the world were in danger of disappearing forever. They were filed in digitized books but had no tags. There was no way to rescue them. That and forgetting is the same.
Big data expert Kalev Leetaru began to regain last December million photographs and drawings of more than 600 million pages scanned by the Internet Archive organization books. Today there are more than 2.6 million images are available free of charge and without copyright, in a new page called Internet Archive Book Flickr Images.
“The purpose of this project is to re-imagine the book. I wanted to find images based on a set of criteria and find imagery of objects over time, not just today, “explains the expert in communication technology in an interview via email.
So far the words had been imposed on the images. This body was only labeling texts of digitized books and no way to access these photos and drawings dated from 1500-1922 through an online search. Yahoo! researcher at Georgetown University (Washington, USA) saw libraries to digitize its archives had become books in PDF format (this prevents extract images) and all search criteria refer to only texts. Leetaru thought those images contain much of the past five centuries information ever be seen in museums and galleries, and so had to recover. These images have escaped the darkness. Even the past. And now they are on a track output. At the starting point of what Kalev Leetaru called “a time travel through images.”
“For example, when viewing images of phones at different times, you realize that you have gone from being a device used by the men in the office an essential household appliance in the home. I realized that there were many digitized books about the phone but there was no way to see a collage of all the images of those works. My intention was to search for images rather than words. Thus was born the project. “
And so he took him out. “Internet Archive has already digitized books by OCR. This process recognizes text from scanned pages and so can be searched by key words. The OCR software identifies where are all the images of the pages, just ignores them and goes to the text. What I did was to create a tool that re-OCR results, trace images, extracted the tagea automatically and saves them as separate files. “