Scalable Named Entity Recognition in social media using Bi-MEMM in a Distributed Environment

K. Syed Kousar Niasi; K. Prakash; M. Krishna Kumar; P. Murugesan

doi:10.22399/ijcesen.2065

Authors

K. Syed Kousar Niasi Assistant Professor, Department of Computer Science, Jamal Mohamed College (Affiliated To Bharathidasan University), Tiruchirappalli-620020.Tamilnadu, India.
K. Prakash Assistant professor, Department of Mathematics, Bannari Amman Institute Of Technology, Sathyamangalam, Erode, Tamilnadu, India
M. Krishna Kumar Assistant Professor, Department of Electronics and Communication Engineering, Grace College of Engineering, Thoothukudi, Tamilnadu, India
P. Murugesan Professor, Department of Mechanical Engineering, K.S.R. College of Engineering, Tiruchengode, Tamil Nadu, India

DOI:

https://doi.org/10.22399/ijcesen.2065

Keywords:

Trend Detection, User-generated Content, Information Extraction, Distributed Computing, Parallel Processing

Abstract

Data mining provides a wealth of actionable intelligence for enhancing internet-based, query-based AI. This study focuses on the importance of Named Entity Recognition (NER) in extracting valuable information from social media's dynamic and extensive realm. This research paper introduces a novel method for performing Named Entity Recognition in a distributed setting, specifically designed to address the unique difficulties presented by social media data. This research investigates the effectiveness of combining Bidirectional Long Short-Term Memory (Bi-LSTM) and Maximum Entropy Markov Model (MEMM) as Bi-MEMM for improving Named Entity Recognition (NER) accuracy. This research presents a model that uses Bi-LSTM to effectively capture the bidirectional context in social media text. By leveraging this approach, the model can accurately identify complex named entities within the text. This study utilises the Maximum Entropy Markov Model (MEMM) to effectively capture and model the dependencies between labels, thereby enhancing the accuracy and precision of entity recognition. This study focuses on the significance of a distributed environment in the context of social media, where data is generated rapidly. This research presents a system optimising performance by leveraging distributed computing resources for parallel processing. This study examines the performance evaluations of a model in identifying named entities in user-generated content across diverse datasets. The findings demonstrate the model’s effectiveness in this task with an accuracy of 99.3%. This research focuses on developing a system that operates in a distributed environment to ensure precision and efficiency. The plan addresses the specific requirements of social media platforms, where recognising named entities plays a crucial role in understanding and analysing user-generated content

References

[1] Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticæ Investigationes, 30(1), 3–26.

[2] Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., & Liu, Q. (2019). ERNIE: Enhanced language representation with informative entities. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1441–1451).

[3] Cheng, P., & Erk, K. (2019). Attending to entities for better text understanding. arXiv preprint arXiv:1911.04361.

[4] Guo, J., Xu, G., Cheng, X., & Li, H. (2009). Named entity recognition in query. In Proceedings of the 32nd International ACM SIGIR Conference (pp. 267–274).

[5] Petkova, D., & Croft, W. B. (2007). Proximity-based document representation for named entity retrieval. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (731–740).

[6] Aone, C., Okurowski, M. E., & Gorlinsky, J. (1999). A trainable summarizer with knowledge acquired from robust NLP techniques. In Advances in Automatic Text Summarization (Vol. 71).

[7] Aliod, D. M., van Zaanen, M., & Smith, D. (2006). Named entity recognition for question answering. In Proceedings of the Australasian Language Technology Workshop (51–58).

[8] Babych, B., & Hartley, A. (2003). Improving machine translation quality with automatic named entity recognition. In Proceedings of the 7th EAMT Workshop (. 1–8).

[9] Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., & Yates, A. (2005). Unsupervised named entity extraction from the web: An experimental study. Artificial Intelligence, 165(1), 91–134.

[10] Wolf, T., Chaumond, J., Debut, L., et al. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 38–45).

[11] Wolf, T., Debut, L., Sanh, V., et al. (2019). HuggingFace’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771. Retrieved from https://arxiv.org/abs/1910.03771

[12] Gu, Y., Tinn, R., Cheng, H., et al. (2021). Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare, 3(1), 1–23.

[13] Pei, J., Zhong, K., Li, J., Xu, J., & Wang, X. (2021). ECNN: Evaluating a cluster-neural network model for city innovation capability. Neural Computing and Applications, 1–13.

[14] Guo, J., He, H., & He, T. (2020). GluonCV and GluonNLP: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, 21(23), 1–7.

[15] Zhang, J., Guo, M., Geng, Y., Li, M., Zhang, Y., & Geng, N. (2021). Chinese named entity recognition for apple diseases and pests based on character augmentation. Computers and Electronics in Agriculture, 190, Article 106464.

[16] Liu, J., Gao, L., Guo, S., et al. (2021). A hybrid deep-learning approach for complex biochemical named entity recognition. Knowledge-Based Systems, 221, Article 106958.

[17] Al-Nabki, M. W., Fidalgo, E., Alegre, E., & Fernandez-Robles, L. (2020). Improving named entity recognition in noisy user-generated text with local distance neighbour feature. Neurocomputing, 382, 1–11.

[18] Taufik, N., Wicaksono, A. F., & Adriani, M. (2016). Named entity recognition on Indonesian microblog messages. In 2016 International Conference on Asian Language Processing (IALP) (pp. 358–361). IEEE.

[19] Munarko, Y., Sutrisno, M., Mahardika, W., Nuryasin, I., & Azhar, Y. (2018). Named entity recognition model for Indonesian tweet using CRF classifier. In IOP Conference Series: Materials Science and Engineering (Vol. 403, p. 012067). IOP Publishing.

[20] Rachman, V., Savitri, S., Augustianti, F., & Mahendra, R. (2017). Named entity recognition on Indonesian Twitter posts using long short-term memory networks. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS) ( 228–232). IEEE.

[21] Akbik, A., Blythe, D., & Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 1638–1649). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/C18-1139

[22] Ma, X., & Hovy, E. (2016). End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (1064–1074). Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1101

[23] J. Zhang, M. Guo, Y. Geng, M. Li, Y. Zhang, and N. Geng, (2021). “Chinese named entity recognition for apple diseases and pests based on character augmentation,” Computers and Electronics in Agriculture, vol. 190, Article ID 106464,

[24] J. Liu, L. Gao, S. Guo et al., (2021). “A hybrid deep-learning approach for complex biochemical named entity recognition,” Knowledge-Based Systems, vol. 221, Article ID 106958,

[25] M. W. Al-Nabki, E. Fidalgo, E. Alegre, and L. Fernandez- ´ Robles, (2020). “Improving named entity recognition in noisy user generated text with local distance neighbor feature,” Neurocomputing, vol. 382,

[26] Affi, M., & Latiri, C. (2021). BE-BLC: BERT-ELMO-Based deep neural network architecture for English named entity recognition task. Procedia Computer Science, 192, 168–181.

[27] Carbonell, M., Fornes, A., Villegas, M., & Lladós, J. (2020). A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recognition Letters, 136, 219–227.

[28] Wang, J., Xu, W., Fu, X., Xu, G., & Wu, Y. (2020). ASTRAL: Adversarial trained LSTM-CNN for named entity recognition. Knowledge-Based Systems, 197, Article 105842.

Scalable Named Entity Recognition in social media using Bi-MEMM in a Distributed Environment

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Keywords

Announcements

Current Issue