Deep Learning-Based Multifunctional End-to-End Model for Optical Character Classification and Denoising

Shuguang Xiong; Xiaoyang Chen; Huitao  Zhang

doi:10.62836/jcmea.v3i1.030103

Deep Learning-Based Multifunctional End-to-End Model for Optical Character Classification and Denoising

Optical Character Recognition (OCR) has revolutionized document processing by converting scanned documents, PDFs, and images captured by cameras into editable and searchable text. This technology is crucial for digitizing historical documents, streamlining data entry processes, and improving accessibility for the visually impaired through text-to-speech technologies. Despite its widespread application, OCR faces significant challenges, especially in accurately recognizing text in noisy or degraded images. Traditionally, OCR systems have treated noise reduction and character classification as separate stages, which can compromise the overall effectiveness of text recognition. Our research introduces a groundbreaking Multifunctional End-to-End Model for Optical Character Classification and Denoising, which integrates these functions within a unified framework. By employing a dual-output autoencoder, our model concurrently denoises images and recognizes characters, thereby enhancing both the efficiency and accuracy of OCR. This paper outlines the model's development and implementation, explores the interplay between denoising and classification, and presents compelling experimental results that demonstrate marked improvements over conventional OCR methods.

Keywords: component; Optical Character Classification; denoising; autoencoder; deep learning

References

Chaudhuri A, et al. Optical Character Recognition Systems; Springer: Berlin, Germany, 2017. DOI: https://doi.org/10.1007/978-3-319-50252-6_2
Eikvil L. Optical Character Recognition. Research Report, NorskRegnesentral, Blindern, 1993; 26.
Nagy G, Nartker TA, Rice SV. Optical Character Recognition: An Illustrated Guide to the Frontier. Document Recognition and Retrieval VII 1999; 3967: 58–69. DOI: https://doi.org/10.1117/12.373511
Sun G, Zhan T, Owusu BG, Daniel A–M, Liu G, Jiang W. Revised Reinforcement Learning Based on Anchor Graph Hashing for Autonomous Cell Activation in Cloud–RANs. Future Generation Computer Systems 2020; 104: 60–73. DOI: https://doi.org/10.1016/j.future.2019.09.044
Horne J, Beddingfield E, Knapp M, Mitchell S, Crawford L, Mills SB, Wrist A, Zhang S, Summers RM. Caffeine and Theophylline Inhibit β–Galactosidase Activity and Reduce Expression in Escherichia coli. ACS Omega 2020; 5(50): 32250–32255. DOI: https://doi.org/10.1021/acsomega.0c03909
Mock MB, Zhang S, Pniak B, Belt N, Witherspoon M, Summers RM. Substrate Promiscuity of the NdmCDE N7–Demethylase Enzyme Complex. Biotechnology Notes 2021; 2: 18–25. DOI: https://doi.org/10.1016/j.biotno.2021.05.001
Deng X, Kawano Y. Surface Plasmon Polariton Graphene Midinfrared Photodetector with Multifrequency Resonance. Journal of Nanophotonics 2018; 12(2): 026017–026017. DOI: https://doi.org/10.1117/1.JNP.12.026017
Zhou Y, Osman A, Willms M, Kunz A, Philipp S, Blatt J, Eul S. Semantic Wireframe Detection. In Proceedings of the DACH-Jahrestagung 2023, Friedrichshafen, Germany, 15–17 May 2023.
Liu Y, Liu L, Yang L, Hao L, Bao Y. Measuring Distance Using Ultra–Wideband Radio Technology Enhanced by Extreme Gradient Boosting Decision Tree (XGBoost). Automation in Construction 2021; 126: 103678. DOI: https://doi.org/10.1016/j.autcon.2021.103678
Yu F, Milord J, Orton S, Flores L, Marra R. Students’ Evaluation Toward Online Teaching Strategies for Engineering Courses during COVID. In Proceedings of the 2021 ASEE Midwest Section Conference, Virtual, 13–15 September 2021.
Liu Y, Bao Y. Real–Time Remote Measurement of Distance Using Ultra–Wideband (UWB) Sensors. Automation in Construction 2023; 150: 104849. DOI: https://doi.org/10.1016/j.autcon.2023.104849
Liu Y, Bao Y. Review of Electromagnetic Waves–Based Distance Measurement Technologies for Remote Monitoring of Civil Engineering Structures. Measurement 2021; 176: 109193. DOI: https://doi.org/10.1016/j.measurement.2021.109193
Chen H, Chen P, Qiu Y, Ling N. FARNet: Fragmented Affinity Reasoning Network of Text Instances for Arbitrary Shape Text Detection. IET Image Processing 2023; 17(6): 1959–1977. DOI: https://doi.org/10.1049/ipr2.12769
Wu P, Liu A, Fu J, Ye X, Zhao Y. Autonomous Surface Crack Identification of Concrete Structures Based on an Improved One–Stage Object Detection Algorithm. Engineering Structures 2022; 272: 114962. DOI: https://doi.org/10.1016/j.engstruct.2022.114962
Wang B, Shen Y, Zhai L, Xia X, Gu H-M, Wang M, Zhao Y, Chang X, Alabi A, Xing S, Deng S, Liu B, Wang G, Qin S, Zhang D-W. Atherosclerosis–Associated Hepatic Secretion of VLDL but Not PCSK9 Is Dependent on Cargo Receptor Protein Surf4. Journal of Lipid Research 2021; 62: 100091. DOI: https://doi.org/10.1016/j.jlr.2021.100091
Zhao F, Yu F, Trull T, Shang Y. A New Method Using LLMs for Keypoints Generation in Qualitative Data Analysis. In Proceedings of the 2023 IEEE Conference on Artificial Intelligence (CAI), Santa Clara, CA, USA, 5–6 June 2023. DOI: https://doi.org/10.1109/CAI54212.2023.00147
Hao Y, Chen Z, Jin J, Sun X. Joint Operation Planning of Drivers and Trucks for Semi–Autonomous Truck Platooning. Transportmetrica A: Transport Science 2023; 1–37. DOI: 10.1080/23249935.2023.2266041. DOI: https://doi.org/10.1080/23249935.2023.2266041
Frid–Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H. GAN–Based Synthetic Medical Image Augmentation for Increased CNN Performance in Liver Lesion Classification. Neurocomputing 2018; 321: 321–331. DOI: https://doi.org/10.1016/j.neucom.2018.09.013
Yu F, Milord J, Orton SL, Flores L, Marra R. The Concerns and Perceived Challenges Students Faced When Traditional in–Person Engineering Courses Suddenly Transitioned to Remote Learning. In Proceedings of the 2022 ASEE Annual Conference, Minneapolis, MN, USA, 26–29 June 2022.
Yu F, Strobel J. Work–in–Progress: Pre–college Teachers’ Metaphorical Beliefs about Engineering. In Proceedings of the 2021 IEEE Global Engineering Education Conference (EDUCON), Vienna, Austria, 21–23 April 2021. DOI: https://doi.org/10.1109/EDUCON46332.2021.9454049
Milord J, Yu F, Orton S, Flores L, Marra R. Impact of COVID Transition to Remote Learning on Engineering Self–Efficacy and Outcome Expectations. In Proceedings of the 2021 ASEE Virtual Annual Conference, Virtual Conference, 26–29 July 2021.
Li S, Singh K, Riedel N, Yu F, Jahnke I. Digital Learning Experience Design and Research of a Self–Paced Online Course for Risk–Based Inspection of Food Imports. Food Control 2022; 135: 108698. DOI: https://doi.org/10.1016/j.foodcont.2021.108698
Yu F, Milord JO, Flores LY, Marra R. Work in Progress: Faculty Choice and Reflection on Teaching Strategies to Improve Engineering Self–Efficacy. In Proceedings of the 2022 ASEE Annual Conference, Minneapolis, MN, USA, 26–29 June 2022.
Orton S, Yu F, Flores L, Marra R. Student Perceptions of Confidence in Learning and Teaching Before and After Teaching Improvements. In Proceedings of the 2023 ASEE Annual Conference, Baltimore, MD, USA, 25–28 June 2023.
Yi C, Tian Y, Arditi A. Portable Camera–Based Assistive Text and Product Label Reading from Hand–Held Objects for Blind Persons. IEEE/ASME Transactions On Mechatronics 2013; 19(3): 808–817. DOI: https://doi.org/10.1109/TMECH.2013.2261083
Freund Y, Schapire RE. Experiments with a New Boosting Algorithm. In Proceedings of the Proceedings of the Thirteenth International Conference (ICML '96), Bari, Italy, 3–6 July 1996.
Bai J, Chen Z, Feng B, Xu B. Image Character Recognition Using Deep Convolutional Neural Network Learned from Different Languages. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014. DOI: https://doi.org/10.1109/ICIP.2014.7025518
Patil V, Sanap RV, Kharate RB. Optical Character Recognition Using Artificial Neural Network. Int J Eng Res Gen Sci 2015; 3(1): 7.
Shrivastava V, Sharma N. Artificial Neural Network Based Optical Character Recognition. 2012. arXiv:1211.4385. DOI: https://doi.org/10.5121/sipij.2012.3506
Satyanarayana P, Sujitha K, Sai Anitha Kiron V, Ajitha Reddy P, Ganesh M. Assistance Vision for Blind People Using K–NN Algorithm and Raspberry Pi. In Proceedings of the 2nd International Conference on Micro–Electronics, Electromagnetics and Telecommunications: ICMEET 2016; Springer: Berlin, Germany, 2018. DOI: https://doi.org/10.1007/978-981-10-4280-5_12
Li Y, Zheng Y, Doermann D. Detecting Text Lines in Handwritten Documents. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, China, 20–24 August 2006.
Shiravale SS, Jayadevan R, Sannakki SS. Recognition of Devanagari Scene Text Using Autoencoder CNN. ELCVIA: Electronic Letters on Computer Vision and Image Analysis 2021; 20(1): 0055–69. DOI: https://doi.org/10.5565/rev/elcvia.1344
Gidaris S, Komodakis N. Generating Classification Weights with Gnn Denoising Autoencoders for Few–Shot Learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. DOI: https://doi.org/10.1109/CVPR.2019.00011
Ahmad I, Wang X, Li R, Rasheed S. Offline Urdu Nastaleeq Optical Character Recognition Based on Stacked Denoising Autoencoder. China Communications 2017; 14(1): 146–157. DOI: https://doi.org/10.1109/CC.2017.7839765
Alamsyah N, Fauzan MN, Putrada AG, Pane SF. Autoencoder Image Denoising to Increase Optical Character Recognition Performance in Text Conversion. In Proceedings of the 2022 International Conference on Advanced Creative Networks and Intelligent Systems (ICACNIS), Bandung, Indonesia, 23 November 2022. DOI: https://doi.org/10.1109/ICACNIS57039.2022.10054885
Qiu Y, Wang J, Jin Z, Chen H, Zhang M, Guo L. Pose–Guided Matching Based on Deep Learning for Assessing Quality of Action on Rehabilitation Training. Biomedical Signal Processing and Control 2022; 72: 103323. DOI: https://doi.org/10.1016/j.bspc.2021.103323
Deng X, Simanullang M, Kawano Y. Ge–Core/a–Si–Shell Nanowire–Based Field–Effect Transistor for Sensitive Terahertz Detection. Photonics 2018; 5(2): 13. DOI: https://doi.org/10.3390/photonics5020013
Liu Y, Yang H, Wu C. Unveiling Patterns: A Study on Semi–Supervised Classification of Strip Surface Defects. IEEE Access 2023; 11: 119933–119946. DOI: https://doi.org/10.1109/ACCESS.2023.3326843
Liu Y, Bao Y. Automatic Interpretation of Strain Distributions Measured from Distributed Fiber Optic Sensors for Crack Monitoring. Measurement 2023; 211: 112629. DOI: https://doi.org/10.1016/j.measurement.2023.112629
Ismail WN, Hassan MM, Alsalamah HA, Fortino G. CNN–Based Health Model for Regular Health Factors Analysis in Internet–of–Medical Things Environment. IEEE Access 2020; 8: 52541–52549. DOI: https://doi.org/10.1109/ACCESS.2020.2980938
Kayalibay B, Jensen G, van der Smagt P. CNN–Based Segmentation of Medical Imaging Data. 2017. arXiv:1701.03056.
Salehi AW, et al. A Study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability 2023; 15(7): 5930. DOI: https://doi.org/10.3390/su15075930
Wenjun D, Fatahizadeh M, Touchaei HG, Moayedi H, Foong LK. Application of Six Neural Network–Based Solutions on Bearing Capacity of Shallow Footing on Double–Layer Soils. Steel and Composite Structures 2023; 49(2): 231–244.
Wang M, et al. Identification of Amino Acid Residues in the MT–Loop of MT1–MMP Critical for Its Ability to Cleave Low–Density Lipoprotein Receptor. Frontiers in Cardiovascular Medicine 2022; 9: 917238. DOI: https://doi.org/10.3389/fcvm.2022.917238
Qiu Y. Estimation of Tail Risk Measures in Finance: Approaches to Extreme Value Mixture Modeling; Johns Hopkins University: Baltimore, MD, USA, 2019.
Qiu Y, Yang Y, Lin Z, Chen P, Luo Y, Huang W. Improved Denoising Autoencoder for Maritime Image Denoising and Semantic Segmentation of USV. China Communications 2020; 17(3): 46–57. DOI: https://doi.org/10.23919/JCC.2020.03.005
Deng X, Oda S, Kawano Y. Graphene–Based Midinfrared Photodetector with Bull’S Eye Plasmonic Antenna. Optical Engineering 2023; 62(9): 097102–097102. DOI: https://doi.org/10.1117/1.OE.62.9.097102
Sugaya T, Deng X. Resonant Frequency Tuning of Terahertz Plasmonic Structures Based on Solid Immersion Method. In Proceedings of the 2019 44th International Conference on Infrared, Millimeter, and Terahertz Waves (IRMMW–THz), Paris, France, 1–6 September 2019. DOI: https://doi.org/10.1109/IRMMW-THz.2019.8874404
Tao G, et al. Surf4 (Surfeit Locus Protein 4) Deficiency Reduces Intestinal Lipid Absorption and Secretion and Decreases Metabolism in Mice. Arterioscler Thromb Vasc Biol 2023; 43(4): 562–580. DOI: https://doi.org/10.1161/ATVBAHA.123.318980
Shen Y, Gu H–m, Zhai L, Wang B, Qin S, Zhang D–w. The Role of Hepatic Surf4 in Lipoprotein Metabolism and the Development of Atherosclerosis in apoE-/- mice. Biochimica et Biophysica Acta (BBA)–Molecular and Cell Biology of Lipids 2022; 1867(10): 159196. DOI: https://doi.org/10.1016/j.bbalip.2022.159196
Deng X, Li L, Enomoto M, Kawano Y. Continuously Frequency–Tuneable Plasmonic Structures for Terahertz Bio–Sensing and Spectroscopy. Scientific Reports 2019; 9(1): 3498. DOI: https://doi.org/10.1038/s41598-019-39015-6
Gu Y, Sharma S, Chen K. Demo: Image Disguising for Scalable GPU–accelerated Confidential Deep Learning. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, in CCS ’23. New York, NY, USA: Association for Computing Machinery, Copenhagen, Denmark, 26–30 November 2023. DOI: https://doi.org/10.1145/3576915.3624364
Gu Y, Chen K. GAN–Based Domain Inference Attack. Proceedings of the AAAI Conference on Artificial Intelligence 2023; 37(12): 14214–14222. DOI: https://doi.org/10.1609/aaai.v37i12.26663

Downloads

Deep Learning-Based Multifunctional End-to-End Model for Optical Character Classification and Denoising

References

Information