Handwriting Recognition and Optical Character Recognition (OCR) have been around for decades and remained a less than effective solution for transcription to accurate data. OCR even though starting in the late 1800s, took on practical application in the late 1950s. With the development by RCA Laboratories of a photocell-based machine that could read text, standardization on the OCR-A font and Intelligent Machines Research (IMR) lead by visionary David Shepard, the military, post office and Reader Digest rolled out solutions. The ability to type a document, have is scanned, pushed over analog modems with the receiver printing it out minutes later was one of the 1960’s biggest use of technology; sounds like today’s email.
OCR was a great thing and then in 1989 Adobe® released PostScript and in 1992 Apple® and Microsoft® released TrueType. With so many new fonts and sizes, OCR technology was crippled and needed serious rework. OCR was further damaged with the FAX machine which by now had evolved to transferring binary image data across modems; handwriting or text, one could quickly send a message to someone and be done. Therefore, OCR development was put on the back burner of technology innovators.
Today the need for paper based information remains a reality and it won’t be replaced for a long time to come; therefore, we need to find a way to make it more useful and less costly to convert to data. Now the hope is Machine Learning (ML) will close the gap for handwriting and text transcription efficiencies that can equal people results.
Machine Learning Pattern Recognition
ML has been around for a while; Frank Rosenblatt in 1957 designed the first neural network where computers autonomously learned from data and information relationships, the beginning of Artificial Intelligence (AI). The approach today is to take small amounts of image input called “words or patterns” (i.e. image of the first name; John) where the image is identified by an expert (human) and categorized by the expert’s answer; this has been referred to as “Machine Learning with Supervised Training”. Not to bore you with the mathematics or algorithm details like the Sequential Multitask Problem, CART, Bagging and others, inbound images are simply compared to other stored images where a prediction is made and the results returned. Researchers using the Perceptron Learning Algorithm (PLA) which is the recommended set of algorithms for word image recognition still results in a 15% false/positive error rate. The error rate is directly contributed to what is called the “Stability Factor”. Stability Factor requires bounding the bias and variance of estimators of the expected error. I like to explain it as; because of the variations in handwriting samples, a bias or samples must be collected and compared to forecast or predict the result. Now the challenge becomes how many samples to collect before you can rely on the results; in other words, stability assumptions.
People begin the process of expert labeling small images for collection and reference by the ML. This process depending on the end solution could become never ending. University College London (UCL) researchers required 85,000 simulated spectra samples images to achieve 99.7% accuracy for deep space planet atmosphere classification. In this case, they were able to programmatically create the sample images and train the ML solution called RobERt.
The real problem now becomes how large of an image sample pool do we need. People are unique; they have first names, last names, birth dates, social security numbers, driver license numbers, etc. Some of these have commonality and some are unique. Point is where first name can have many sample images and last name would have a smaller sample down to SSN with a sample size of one, the sample is not stable hence 85% effective. Critical to standards of 99.9% accuracy is the validation process. ML can electronically validate with third party resources to determine the accuracy.
Question becomes how to correct found errors? People must fix the errors which negates the ideal of a black box or appliance base solution.
Neural Networking OCR
The next area of active innovation is combining Neural Networking (NN) and optical character recognition. This approach is not to recognize the word image data but to de-construct to the character level and recognize the character. This approach promotes the building of large samples (millions of imaged characters) creating stability and lending itself to the most current ML algorithms. This sound like the best approach but in its most controlled setting of known font and known size, 97.0% was its best result.
NN-OCR still produces errors because of character classification (i.e. Font & Size) and segmentation problems (i.e. how clean the image is). The two most common error correcting methods are dictionary lookup and statistical approach. The simplest way of correcting is using the dictionary method for correcting the mistakes but in many cases, corrections cannot be utilized if the source represents a contracting relationship (i.e. insurance application, new account form, mortgage closing).
Question becomes how to correct found errors? People must fix the errors which again negates the ideal of a black box or appliance base solution.
People preforming the transcription of small amounts of image data by comparison do better than ML and NN-OCR. A study conducted by Coleman Data Solutions in 2013 concluded that a fast data entry person (Professional) compared to an average one (Freelance) as far as accuracy goes, there was no significant difference.
- Average speed: 3 errors/40 wpm = 8% errors (92% accuracy rate)
- Fast speed: 4 errors/60 wpm = 7% errors (93% accuracy rate)
Point here is the human can transcribe at 92% accuracy where ML and NN-OCR can reach ~85%. The validation remains a critical phase and this is still best done by people. Before technology, data was keyed from paper and the technique was double or blind entry method. Today with images, the image and the transcribed results are presented to people who accepts or rejects the results. Any errors are reprocessed and revalidated until both parties agree delivering 99.9%.
When ML is given an unstable environment such as people with unique identifiable information, its effectiveness is not the best solution. ML cannot consume unique samples and return accurate results. Variances of unknown word or character attributes can return false/positive results making data corrupt.
The below table compares the above finding as to their respective accuracy results:
Supervised Machine Learning
|Neural Network OCR||97%||85%||70-80%||0%|
PaperClip has been working with Artificial Intelligence for the past decade with a service we created 8 years ago, called Image-In. This service preforms Doctyping of forms from inbound unassociated pages or blobs. We experienced the 15% error rate and worked hard over the years to create unique algorithms to achieve a 5% error rate. This AI has been brought forward into Mojo’s Forms Classification and augmented with other techniques to achieve a 0% error rate today.
PaperClip uses ML and NN-OCR today but in a much different application. We have focused on the “on-boarding” forms or setup which can be tedious and error prone. Another area of ML we use is the mapping of form SnipIts to industry standard messaging deliverables. We also use technology to determine image data results with various verification options. To dive much deeper into PaperClip’s technology would compromise our intellectual property which is patent pending.
Mojo is designed as a “Platform as a Service” for the transcription, translation and interpretation of big data into usable data. I believe the economics has a play into the value proposition; the inexpensive global labor pool providing 99.9% accuracy compared to a highly paid professional (i.e. accounting clerk, case manager, underwriter assistant) correcting ML and NN-OCR errors and results. On average a highly paid individual cost is 20 to 1 compared to the global market. More important though, if that highly paid individual is correcting flawed results, then they’re not doing their primary job. If costs and turnaround times are equal, then accuracy remains paramount. Would you sit in the back of a driverless car with those results?
Machine Learning and Neural Networking-OCR represent the latest improvements to old technology and even though they deliver some improvements, people still win the day. Humans are even still critical to these technologies working at all; Supervised Machine Learning starts with humans needed to train it and NN-OCR requires humans to correct its results, see a trend here? Now when you think about global application, Mojo just needs the appropriate crowd source where the technology solutions will need months if not years and millions of dollars to begin processing. PaperClip played the technology approach and our customers wanted more. Humans remain the best solution when the challenge is transcribing, translating and interpreting big data.
ML and NN-OCR when applied to random textual third party images for transcription now becomes a workflow challenge; who does the training and corrections?
“Recognizing the characters that make up neatly laser-printed computer text is relatively easy compared to decoding someone's scribbled handwriting. That's the kind of simple-but-tricky, everyday problem where human brains beat clever computers hands-down: we can all make a rough stab at guessing the message hidden in even the worst human writing. How? We use a combination of automatic pattern recognition, feature extraction, and—absolutely crucially—knowledge about the writer and the meaning of what's being written”, concluded Chris Woodford, a British science writer, January 9, 2017. Or we bring back the OCR-A font...