International Journal of Engineering Technology and Management Sciences

2023, Volume 7 Issue 5

Exploring Preprocessing Techniques for Natural LanguageText: A Comprehensive Study Using Python Code

AUTHOR(S)

MR. ADEPU RAJESH, DR. TRYAMBAK HIWARKAR

DOI: https://doi.org/10.46647/ijetms.2023.v07i05.047

ABSTRACT
The paper highlights the significance of efficient text preprocessing strategies in Natural Language Processing (NLP), a field focused on enabling machines to understand and interpret human language. Text preprocessing is a crucial step in converting unstructured text into a machine-understandable format. It plays a vital role in various text classification tasks, including web search, document classification, chatbots, and virtual assistants. Techniques such as tokenization, stop word removal, and lemmatization are carefully studied and applied in a specific order to ensure accurate and efficient information retrieval. The paper emphasizes the importance of selecting and ordering preprocessing techniques wisely to achieve high-quality results. Effective text preprocessing involves cleaning and filtering textual data to eliminate noise and enhance efficiency. Furthermore, it provides insights into the impact of different techniques, such as raw text, tokenization, stop word removal, and stemming, using a Python implementation.

Page No: 390 - 399

References:

      • H,Research Article Text Classification Based on Machine Learning and Natural Language Processing Algorithms ,Hindawi Wireless Communications and Mobile Computing Volume 2022, Article ID 3915491, 12 pages https://doi.org/10.1155/2022/3915491
      • On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis Jose Camacho-Collados School of Computer Science and Informatics Cardiff University camachocolladosj@cardiff.ac.uk Mohammad Taher Pilehvar School of Computer Engineering 2018
      • An Evaluation of Preprocessing Techniques for Text Classification, International Journal of Computer Science and Information Security, 16(6):22-32  Ammar Kadhim June 2018
      •  International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 6, June 2018 .  An Evaluation of Preprocessing Techniques for Text Classification Ammar Ismael Kadhim
      • Department of Computer Science Information Technology and Quantitative Management (ITQM2013) The Role of Text Pre-processing in Sentiment Analysis Emma Haddia , Xiaohui Liua , Yong Shib
      • Haddi E., Liu X., & Shi Y. (2013). The role of text pre-processing in sentiment analysis. Procedia Computer Science, 17, 26–32. 
      • Pre-processing methods on twitter sentiment analysis. IEEE Access, 5, 2870–2879 Hianqiang Z., & Xiaolin G. (2017). Comparison research on text preprocessing
      • Ayedh A., Tan G., Alwesabi K., & Rajeh H. (2016). The effect of preprocessing on arabic document categorization. Algorithms, 9(2), 27.‏
      • Kirill SmelyakovDanil KarachevtsevDenis KulemzaYehor SamoilenkoOleh Patlan, Effectiveness of Preprocessing Algorithms for Natural Language Processing Applications, Published in2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T) Date of Conference: 06-09 October 2020
      • Krouska A, Troussas C, Virvou M. The effect of preprocessing techniques on twitter sentiment analysis. In: 7th International Conference on Information, Intelligence, Systems & Applications; Chalkidiki, Greece; 2016. pp. 740- 752
      • Ghag KV, Shah K. Comparative analysis of effect of stopwords removal on sentiment classification. In: IEEE International Conference on Computer, Communication and Control; Indore, India; 2015. pp.   1-6.
      • Sharma P, Agrawal A, Alai L, Garg A. Challenges and techniques in preprocessing for twitter data. International Journal of Engineering Science and Computing 2017; 7 (4): 6611-6613.
      • Safeek I, Kalideen MR. Preprocessing on facebook data for sentiment analysis. In: Proceedings of 7th International Symposium on Multidisciplinary Research for Sustainable Development; Oluvil, Sri Lanka; 2015. pp. 69-78.
      • Ghag K, Shah K. Optimising sentiment classification using preprocessing techniques. International Journal of IT & Knowledge Management 2015; 8 (2) : 61-70

       

         

      How to Cite This Article:
      MR. ADEPU RAJESH, DR. TRYAMBAK HIWARKAR . Exploring Preprocessing Techniques for Natural LanguageText: A Comprehensive Study Using Python Code . ijetms;7(5):390-399. DOI: 10.46647/ijetms.2023.v07i05.047