ISBN-13: 9781447125181 / Angielski / Miękka / 2012 / 325 str.
ISBN-13: 9781447125181 / Angielski / Miękka / 2012 / 325 str.
Theoriginalmotivationsfordevelopingopticalcharacterrecognitiontechnologies weremodesttoconvertprintedtexton?atphysicalmediatodigitalform, prod- ingmachine-readabledigitalcontent. Bydoingthis, wordsthathadbeeninertand bound to physical material would be brought into the digital realm and thus gain newandpowerfulfunctionalitiesandanalyticalpossibilities. First-generation digital OCR researchers in the 1970s quickly realized that by limiting their ambitions primarily to contemporary documents printed in st- dard font type from the modern Roman alphabet (and of these, mostly English language materials), they were constraining the possibilities for future research andtechnologiesconsiderably. Domainresearchersalsosawthatthetrajectoryof OCR technologies if left unchanged would exclude a large portion of the human record. Digitalconversionofdocumentsandmanuscriptsinotheralphabets, scripts, and cursive styles was of critical importance. Embedded in non-Roman alp- bet source documents, including ancient manuscripts, papyri scrolls, clay tablets, and other inscribed artifacts was not only a wealth of scholarly information but alsonewopportunitiesandchallengesforadvancingOCR, imagingsciences, and othercomputationalresearchareas. Thelimitingcircumstancesatthetimeincluded the rudimentary capability (and high cost) of computational resources and lack of network-accessible digital content. Since then computational technology has advancedataveryrapidpaceandnetworkinginfrastructurehasproliferated. Over time, thisexponential decrease inthecost of computation, memory, and com- nicationsbandwidthcombinedwiththeexponentialincreaseinInternet-accessible digitalcontenthastransformededucation, scholarship, andresearch. Largenumbers ofresearchers, scholars, andstudentsuseanddependuponInternet-basedcontent andcomputationalresources. Thechaptersinthisbookdescribeacriticallyimportantareaofinvestigation- addressingconversionofIndicscriptintomachine-readableform. Roughestimates haveitthatcurrentlymorethanabillionpeopleuseIndicscripts. Collectively, Indic historic and cultural documents contain a vast richness of human knowledge and experience. The state-of-the-art research described in this book demonstrates the multiple values associated with these activities. Technically, the problems associated with Indicscriptrecognitionareverydif?cultandwillcontributetoandinformrelated v vi Foreword scriptrecognitionefforts. Theworkalsohasenormousconsequenceforenriching andenablingthestudyofIndicculturalheritagematerialsandthehistoricrecord of its people. This in turn broadens the intellectual context for domain scholars focusingonothersocieties, ancientandmodern. Digital character recognition has brought about another milestone in coll- tivecommunicationbybringinginert, ?xed-in-place, textintoaninteractivedi- talrealm. Indoingso, theinformationhasgainedadditionalfunctionalitieswhich expandourabilitiestoconnect, combine, contextualize, share, andcollaboratively pursue knowledge making. High-quality Internet content continues to grow in an explosivefashion. Inthenewglobalcyberenvironment, thefunctionalitiesandapp- cationsofdigitalinformationcontinuetotransformknowledgeintonewundersta- ingsofhumanexperienceandtheworldinwhichwelive. Thepossibilitiesforthe futurearelimitedonlybyavailableresearchresourcesandcapabilitiesandtheim- inationandcreativityofthosewhousethem. Arlington, Virginia StephenM.
This unique guide/reference is the very first comprehensive book on the subject of OCR (Optical Character Recognition) for Indic scripts. Features: contains contributions from the leading researchers in the field; discusses data set creation for OCR development; describes OCR systems that cover 8 different scripts Bangla, Devanagari, Gurmukhi, Gujarati, Kannada, Malayalam, Tamil, and Urdu (Perso-Arabic); explores the challenges of Indic script handwriting recognition in the online domain; examines the development of handwriting-based text input systems; describes ongoing work to increase access to Indian cultural heritage materials; provides a section on the enhancement of text and images obtained from historical Indic palm leaf manuscripts; investigates different techniques for word spotting in Indic scripts; reviews mono-lingual and cross-lingual information retrieval in Indic languages. This is an excellent reference for researchers and graduate students studying OCR technology and methodologies.