Optical Character Recognition

Many people are wondering if there is a way to dig the text out of an image or picture so they don’t have to re-type the text in Word or Excel one by one. Things turn out that there is a way to let it go with right this new technology – OCR. The full name of OCR is Optical Character Recognition and as the name indicates, it can recognize the text characters with optical technology. With OCR technology, you can edit the text, search for a phrase, display a copy free of scanning artifacts, store it more compactly, or apply techniques such as text-to-speech, machine translation and text mining etc.


Speaking from technical view, OCR is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office and more usages. As to OCR systems, they require calibration to read a specific font. Some systems are able to reproduce formatted output that closely approximates the original scanned page including images, columns and other non-textual components.


The principles of OCR technology - Optical Character Recognition systems recognize only machine print. By using pattern-matching technology, OCR translates the shapes and patterns of machine-made characters into corresponding computer codes. Though most advanced systems are able to recognize multiple fonts, they can process only standard fonts such as Arial, Times New Roman. Once all characters in a given word are recognized, the word is compared against a vocabulary of potential answers for the final display result.


