Enhanced Textual Data Reconstruction from Scanned Receipts Using Normalized Cross-Correlation and Deep Learning-Based Recognition with Superior Analytical Robustness and Computational Efficacy

M. Kathiravan; A. Mohan; M Vijayakumar; M. Manikandan; Terrance Frederick Fernandez; Arumugam S S

doi:10.22399/ijcesen.3284

Authors

M. Kathiravan
A. Mohan
M Vijayakumar
M. Manikandan
Terrance Frederick Fernandez
Arumugam S S

DOI:

https://doi.org/10.22399/ijcesen.3284

Keywords:

Optical Character Recognition, Auto Text Extraction, Normalized Cross Correlation, Template Matching, Deep Learning

Abstract

Text extraction from images plays a crucial role in optical character recognition applications such as invoices and receipt recognition. The recent character recognition approaches work well for good-quality scanned receipts, but they fail to do the same for low-quality receipts, offering reduced accuracy instead. This paper proposes invoice receipt identification using normalized cross-correlation-based template matching and a novel auto-text extraction approach using a deep learning algorithm. The proposed technique includes three major steps, preprocessing, character recognition and post-processing. The first step, which commences with preprocessing, involves noise removal, quality enhancement and image de-skewing. In the second step, auto-text extraction is carried out using a deep learning algorithm. The final post-processing step includes configuring the extracted text and exporting it to Word/Excel. According to the experimental results, the accuracy of the proposed approach outperformed existing approaches.

References

[1] Automatic receipt recognition system based on artificial intelligence technology. Applied Sciences, 12, 853. https://doi.org/10.3390/app12020853

[2] Patel, S., & Bhatt, D. (2020). Abstractive information extraction from scanned invoices (AIESI) using end-to-end sequential approach. arXiv preprint arXiv:2009.05728. https://arxiv.org/abs/2009.05728

[3] Le, A. D., Van Pham, D., & Nguyen, T. A. (2019). Deep learning approach for receipt recognition. In International Conference on Future Data and Security Engineering (pp. 705–712). Springer. https://doi.org/10.1007/978-3-030-35649-1_48

[4] Audebert, N., Herold, C., Slimani, K., & Vidal, C. (2019). Multimodal deep networks for text and image-based document classification. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 427–443). Springer. https://doi.org/10.1007/978-3-030-43887-6_27

[5] Kerroumi, M., Sayem, O., & Shabou, A. (2020). Visual word grid: Information extraction from scanned documents using a multimodal approach. arXiv preprint arXiv:2010.02358. https://arxiv.org/abs/2010.02358

[6] Kim, D., Kwak, M., Won, E., Shin, S., & Nam, J. (2020). TLGAN: Document text localization using generative adversarial nets. arXiv preprint arXiv:2010.11547. https://arxiv.org/abs/2010.11547

[7] Park, S., Shin, S., Lee, B., Lee, J., Surh, J., Seo, M., & Lee, H. (2020). CORD: A consolidated receipt dataset for post-OCR parsing. arXiv preprint arXiv:2005.00642v3.https://arxiv.org/abs/2005.00642

[8] Sharma, S., Gaur, M., Kumar, Y., & Varma, M. (2022). A hybrid model for invoice document processing using NLP and deep learning. International Journal of Computer Applications, 184(19), 9–14. https://doi.org/10.5120/ijca2022922440

[9] Majumder, A., Singla, A., & Mahajan, A. (2021). Structured information extraction from scanned documents using deep learning. Procedia Computer Science, 192, 3403–3412. https://doi.org/10.1016/j.procs.2021.09.111

[10] Burie, J. C., Nicolaou, A., Rusiñol, M., Karatzas, D., Mouchère, H., & Vincent, N. (2017). ICDAR2017 competition on recognition of documents with complex layouts—RDCL2017. In 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 1403–1410). IEEE. https://doi.org/10.1109/ICDAR.2017.231

[11] Huang, Z., Liu, W., Li, J., Li, Z., & Li, H. (2022). Document information extraction using BERT with layout features. Pattern Recognition, 128, 108676. https://doi.org/10.1016/j.patcog.2022.108676

[12] Katti, A. R., Reisswig, C., Guder, C., Brarda, S., Höhne, J., Bickel, S., & Faddoul, J. B. (2018). Chargrid: Towards understanding 2D documents. arXiv preprint arXiv:1809.08799. https://arxiv.org/abs/1809.08799

[13] Xu, Y., Xu, J., Lv, T., Cui, L., Wei, F., Wang, Y., ... & Zhou, M. (2020). LayoutLM: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192–1200). https://doi.org/10.1145/3394486.3403172

[14] Xu, Y., Lv, T., Cui, L., Lu, Y., Lu, Y., Wei, F., & Zhou, M. (2021). LayoutLMv2: Multi-modal pre-training for visually-rich document understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (pp. 2579–2591). https://doi.org/10.18653/v1/2021.acl-long.203

[15] Xu, Y., Lv, T., Cui, L., Lu, Y., Wang, G., & Zhou, M. (2022). LayoutXLM: Multimodal pre-training for multilingual visually-rich document understanding. Findings of the Association for Computational Linguistics: ACL 2022, 1632–1644. https://doi.org/10.18653/v1/2022.findings-acl.130

[16] Garncarek, Ł., Powalski, R., Stanisławek, T., & Śmieja, M. (2021). LamBERT: Layout-aware language modeling for information extraction. arXiv preprint arXiv:2002.08087. https://arxiv.org/abs/2002.08087

[17] Powalski, R., Stanisławek, T., Grabowski, P., & Garncarek, Ł. (2021). Donut: Document understanding transformer without OCR. arXiv preprint arXiv:2111.15664. https://arxiv.org/abs/2111.15664

[18] Appalaraju, S., & Chao, C. (2021). DocTr: Document image transformer for geometric unwarping and text recognition. arXiv preprint arXiv:2106.03060. https://arxiv.org/abs/2106.03060

[19] Hong, J., Lee, J., Lee, S., Yim, J., & Park, S. (2022). Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents. Pattern Recognition, 129, 108766. https://doi.org/10.1016/j.patcog.2022.108766

[20] Lee, B., Yim, J., Park, S., Kim, G., Shin, J., Lee, S., & Hong, J. (2022). LiteBERT: An efficient pre-trained language model for document understanding. arXiv preprint arXiv:2202.13634. https://arxiv.org/abs/2202.13634

[21] Hwang, W., Yim, J., & Park, S. (2021). Spatial-aware BERT for document-level layout understanding. arXiv preprint arXiv:2103.14470. https://arxiv.org/abs/2103.14470

[22] Lee, B., Yim, J., & Park, S. (2021). PILOT: Position Aware Text Representation for Key Information Extraction. arXiv preprint arXiv:2103.12213. https://arxiv.org/abs/2103.12213

[23] Ahmad, W., Chakraborty, T., Yuan, X., & Chang, K.-W. (2019). Context-aware layout analysis for document image understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 0–0). https://doi.org/10.1109/CVPRW.2019.00113

[24] Khan, M. A., Akram, T., Zhang, Y.-D., & Sharif, M. (2021). Attributes based skin lesion detection and recognition: A mask RCNN and transfer learning-based deep learning framework. Pattern Recognition Letters, 143, 58–66. https://doi.org/10.1016/j.patrec.2020.12.015

[25] Ramshaw, L. A., & Marcus, M. P. (1995). Text chunking using transformation-based learning. In Third Workshop on Very Large Corpora (pp. 82–94). https://aclanthology.org/W95-0107/

[26] Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 260–270). https://doi.org/10.18653/v1/N16-1030.

[27] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT (pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423

[28] Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250. https://arxiv.org/abs/1606.05250

[29] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Dollár, P. (2014). Microsoft COCO: Common objects in context. In European Conference on Computer Vision (pp. 740–755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48

[30] Afzal, M. Z., Kölsch, A., Ahmed, S., & Dengel, A. (2017). Cutting the error by half: Investigation of very deep CNN and advanced training strategies for document image classification. In 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 883–888). IEEE. https://doi.org/10.1109/ICDAR.2017.148

[31] He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2961–2969). https://doi.org/10.1109/ICCV.2017.322

[32] Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767. https://arxiv.org/abs/1804.02767

[33] Zhang, Y., Qiu, M., Chen, Y., & Huang, J. (2018). End-to-end information extraction based on deep reinforcement learning. In Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics (pp. 1–12). https://aclanthology.org/C18-1001/

[34] Liu, P., Yuan, H., Fu, J., Jiang, Z., & Zhang, Y. (2021). Structured information extraction from noisy semi-structured documents using structure-aware pre-training. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 6177–6186). https://doi.org/10.18653/v1/2021.emnlp-main.500

[35] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. https://arxiv.org/abs/1907.11692

[36] Su, J., Xu, Y., Li, S., Cui, L., Wang, G., Wei, F., & Zhou, M. (2021). VIES: A novel video information extraction system. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (pp. 4375–4383). https://doi.org/10.1145/3447548.3467323

[37] Zhang, Y., Xu, J., & Cui, L. (2020). LayoutLM: A unified model for understanding documents. In arXiv preprint arXiv:2004.14797. https://arxiv.org/abs/2004.14797

Enhanced Textual Data Reconstruction from Scanned Receipts Using Normalized Cross-Correlation and Deep Learning-Based Recognition with Superior Analytical Robustness and Computational Efficacy

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Information

Announcements

Fake Journal warning

Keywords

Announcements

Current Issue