Binarization top image

Digital documents offer many advantages over paper ones: they can be preserved for an indeterminate amount of time without deterioration, they can be easily copied and transmitted, they can be searched or otherwise processed automatically, and many more. However, much of our knowledge is still stored in paper documents.

The digitalization and processing of documents, however, can be a laborious activity. This is why techniques that optimize the process of digitalization in an automated way are very useful. In particular, when applied to a processing pipeline from a digital image to a final document after optical character recognition has been performed.

I have worked with Marte Ramírez-Ortegón on a method to assess the efficacy of various unsupervised measures of image binarization based on they OCR performance. We also proposed a novel measure that outperformed existing ones.

This work has been published in the journal Pattern Recognition. You can view the abstract here.

About Me

Edgar Edgar A. Duéñez Guzmán is a Senior Research Engineer at DeepMind. Previously he was at Google, where he developed the first machine learning system to select the index for Image Search. During his academic career, he was a Postdoctoral fellow at the Department of Biology at KU Leuven working with Tom Wenseleers in social evolution in microbes;
and a Research Associate at the Department of Organismic and Evolutionary Biology at Harvard University working with David Haig in social evolution and imprinting.
Learn more...

Contact Info

E-mail: eaduenez {at} gmail {dot} com