Millions of books, records and documents have been published online through optical character recognition (OCR) which is a process where a page has been scanned and a computer interprets the image and converts it into digital text. Computers are awesome at this process but the older the book, the more faded the text and the more unusual the type face or handwriting is the harder it is for the computer to recognise the text.
So if a computer can’t interpret the text how to do you people to help digitize the book for free? Well you’ve done it – possibly even hundreds of times without you even knowing it!
Did you know that when you complete one of these captchas not only are you validating that you are a human being rather than a computer, you are also interpreting two words that a computer couldn’t read.
Because these are completed millions of times a day, millions of books can be digitized by human beings every year, quite literally two words at a time.
The brilliance of this idea is that they had already solved one problem – the problem of human beings authenticating themselves online in a way that a computer can’t yet do. But they didn’t stop there. They pushed the boundaries of what their idea could accomplish by resetting the problem. Now they have solved the first problem they defined a new problem, “how can we put to use the millions of people that are interpreting millions of words every day”.
The video from TED is so worth a watch. Click here or on the image below for the full video.