PhotoOCR: Reading Text in Uncontrolled Conditions
Mark Cummins (Google)
COMPUTER VISION AND ROBOTICS SERIESDATE: 2013-10-10
TIME: 16:00:00 - 17:00:00
LOCATION: RSISE Seminar Room, ground floor, building 115, cnr. North and Daley Roads, ANU
CONTACT: JavaScript must be enabled to display this email address.
ABSTRACT:
The talk will describe PhotoOCR, a system for extracting text from images under general conditions. Our system builds on recent progress in deep learning for character classification, and applies datacenter-scale distributed language modeling to the problem for the first time. We set new records on public benchmark datasets by a large margin, while maintaining sub-second latency. Our motivating use case is enabling OCR as a reliable input modality for smartphones, particularly for applications such as translation where the text may be difficult for a user to input by other means. As such we must handle a wide variety of challenging imaging conditions where traditional OCR systems fail, notably substantial blur, low resolution, low contrast, background clutter and other issues. I will discuss our experiences with the importance of training data scale, exploring performance when using up to millions of labelled images and trillions of tokens of language model training data.
BIO:
Mark Cummins was most recently a senior software engineer at Google, working in the Visual Search group. The group works on a number of computer vision systems at Google, notably large-scale particular object retrieval, large-scale category recognition, robust text extraction and many other systems, with applications in products ranging from Street View to Google Goggles. Prior to joining Google, Mark was co-founder of Plink, a visual search engine company acquired by Google in 2010. Mark completed his PhD at the University of Oxford, working in the Mobile Robotics Group of Prof. Paul Newman. His PhD work developed the FAB-MAP algorithm for place recognition and loop closure, which was awarded best vision paper at ICRA 2008 and is the most cited publication of the last ten years in IJRR.





