Exploring Word Embeddings for Text Classification: A Comparative Analysis
AUTHOR(S)
Satya Mohan Chowdary G, T Ganga Bhavani, D Konda Babu, B Prasanna Rani, K Sireesha
DOI: https://doi.org/10.46647/ijetms.2023.v07i05.007
ABSTRACT
For language tasks like text classification and sequence labeling, word embeddings are essential for providing input characteristics in deep models. There have been many word embedding techniques put out in the past ten years, which can be broadly divided into classic and context-based embeddings. In this study, two encoders—CNN and BiLSTM—are used in a downstream network architecture to analyze both forms of embeddings in the context of text classification. Four benchmarking classification datasets with single-label and multi-label tasks and a range of average sample lengths are selected in order to evaluate the effects of word embeddings on various datasets. CNN routinely beats BiLSTM, especially on datasets that don't take document context into account, according to the evaluation results with confidence intervals. CNN is therefore advised above BiLSTM for datasets involving document categorization where context is less predictive of class membership. Concatenating numerous classic embeddings or growing their size for word embeddings doesn't greatly increase performance, while there are few instances when there are marginal gains. Contrarily, context-based embeddings like ELMo and BERT are investigated, with BERT showing better overall performance, particularly for longer document datasets. On short datasets, both context-based embeddings perform better, but on longer datasets, no significant improvement is seen.In conclusion, this study emphasizes the significance of word embeddings and their impact on downstream tasks, highlighting the advantages of BERT over ELMo, especially for lengthier documents, and CNN over BiLSTM for certain scenarios involving document classification.
Page No: 52 - 66
References:
- Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. “Scoring, term weighting and the vector space model.” Introduction to information retrieval 100 (2008): 2-4.
- Mikolov, Tomáš, Wen-tau Yih, and Geoffrey Zweig. “Linguistic regularities in continuous space word representations.” Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies. 2013.
- Santos, Cicero D., and Bianca Zadrozny. “Learning character-level representations for part-of- speech tagging.” Proceedings of the 31st international conference on machine learning (ICML- 14). 2014.
- dos Santos, Cıcero, et al. “Boosting Named Entity Recognition with Neural Character Embeddings.” Proceedings of NEWS 2015 The Fifth Named Entities Workshop. 2015.
- Mikolov, Tomas, et al. “Efficient Estimation of Word Representations in Vector Space.” ICLR (Workshop Poster). 2013.
- Mikolov, Tomas, et al. “Distributed representations of words and phrases and their compositionality.” Advances in neural information processing systems. 2013.
- Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. “Glove: Global vectors for word representation.” Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
- Shubham Agarwal; Nishant Moghe; Vaishali Wadhe. "Big Data Analytics for Supply Chain Optimization: A Review of Methodologies and Applications". International Research Journal on Advanced Science Hub, 5, 07, 2023, 215-221. doi: 10.47392/irjash.2023.046
- Hanumanthappa S; Guruprakash C D. "Feature Extraction from Brain MR Images for Detecting Brain Tumor using Deep Learning Techniques". International Research Journal on Advanced Science Hub, 5, 07, 2023, 242-247. doi: 10.47392/irjash.2023.049
- Shakti Punj; Lavkush Sharma; Brajesh Kumar Singh. "Enhancing Face Mask Detection Using Convolutional Neural Networks: A Comparative Study". International Research Journal on Advanced Science Hub, 5, 08, 2023, 280-289. doi: 10.47392/irjash.2023.054
- Sirajudeen S; Sudha S. "A Review - Smoke-Fire Detection and YOLO (You Only Look Once)". International Research Journal on Advanced Science Hub, 5, 08, 2023, 248-256. doi: 10.47392/irjash.2023.051
- Mr. Buddesab; Nanda Ashwin; Shruthi M; Rekha P; Pavan Mulgund. "Real time eye based Password Authentication by Eye Blinking System". International Research Journal on Advanced Science Hub, 5, Issue 05S, 2023, 1-6. doi: 10.47392/irjash.2023.S001
- Rekha P; Mr. Buddesab; Shruthi K; Sathya M; Nanda Ashwin. "Tabib : Chatbot for Healthcare Automation with Audio Assistance using Artificial Intelligence". International Research Journal on Advanced Science Hub, 5, Issue 05S, 2023, 7-14. doi: 10.47392/irjash.2023.S002
- Krishnan S; Pranay Varma; Aravind J; Indra Gandhi K. "Analysis of Supervised and Unsupervised Deep Learning Approaches for Identifying and Localizing Image Forgeries". International Research Journal on Advanced Science Hub, 5, Issue 05S, 2023, 15-25. doi: 10.47392/irjash.2023.S003
- Yu, Mo, and Mark Dredze. “Learning composition models for phrase embeddings.” Transactions of the Association for Computational Linguistics 3 (2015): 227-242.
- Zhou, Zhihao, Lifu Huang, and Heng Ji. “Learning Phrase Embeddings from Paraphrases with GRUs.” Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora. 2017.
- Kiros, Ryan, et al. “Skip-thought vectors.” Advances in neural information processing systems. 2015.
- Le, Quoc, and Tomas Mikolov. “Distributed representations of sentences and documents.” International conference on machine learning. 2014.
- Levy, Omer, and Yoav Goldberg. “Linguistic regularities in sparse and explicit word representations.” Proceedings of the eighteenth conference on computational natural language learning. 2014.
- Schnabel, Tobias, et al. “Evaluation methods for unsupervised word embeddings.” Proceedings of the 2015 conference on empirical methods in natural language processing. 2015.
- Ratnaparkhi, Adwait. “A maximum entropy model for part-of-speech tagging.” Conference on Empirical Methods in Natural Language Processing. 1996.
- Yadav, Vikas, and Steven Bethard. “A Survey on Recent Advances in Named Entity Recognition from Deep Learning models.” Proceedings of the 27th International Conference on Computational Linguistics. 2018.
- Liu, Bing. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge University Press, 2015.
- Cho, Kyunghyun, et al. “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation.” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.
- Finkelstein, Lev, et al. “Placing search in context: The concept revisited.” Proceedings of the 10th international conference on World Wide Web. 2001.
- Hill, Felix, Roi Reichart, and Anna Korhonen. “Simlex-999: Evaluating semantic models with (genuine) similarity estimation.” Computational Linguistics 41.4 (2015): 665-695.
- Gerz, Daniela, et al. “SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity.” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.
- Bojanowski, Piotr, et al. “Enriching word vectors with subword information.” Transactions of the Association for Computational Linguistics 5 (2017): 135-146.
- Joulin, Armand, et al. “Bag of Tricks for Efficient Text Classification.” Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017.
- A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jegou, and T. Mikolov, “Fasttext. zip: Compressing text classification models,” arXiv preprint arXiv:1612.03651 (2016).
- Salle, Alexandre, Aline Villavicencio, and Marco Idiart. “Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations.” Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2016.
- Bullinaria, John A., and Joseph P. Levy. “Extracting semantic representations from word co-occurrence statistics: A computational study.” Behavior research methods 39.3 (2007): 510-526.
- Speer, Robyn, and Catherine Havasi. “Representing General Relational Knowledge in ConceptNet 5.” Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). 2012.
- Ganitkevitch, Juri, Benjamin Van Durme, and Chris Callison-Burch. “PPDB: The paraphrase database.” Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.
- R. Speer and J. Chin, “An ensemble method to produce high-quality word embeddings.” arXiv preprint arXiv:1604.01692 (2016).
- Speer, Robyn, Joshua Chin, and Catherine Havasi. “Conceptnet 5.5: An open multilingual graph of general knowledge.” Thirty-First AAAI Conference on Artificial Intelligence. 2017.
- Berardi, Giacomo, Andrea Esuli, and Diego Marcheggiani. “Word Embeddings Go to Italy: A Comparison of Models and Training Datasets.” IIR. 2015.
- Makrai, Márton, et al. “Comparison of distributed language models on medium-resourced languages.” XI. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2015) (2015).
- Baroni, Marco, Georgiana Dinu, and Germán Kruszewski. “Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors.” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014.
- Ghannay, Sahar, et al. “Word embedding evaluation and combination.” Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 2016.
- Schwenk, Holger. “CSLM-a modular open-source continuous space language modeling toolkit.” INTERSPEECH. 2013.
- Levy, Omer, and Yoav Goldberg. “Dependency-based word embeddings.” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2014.
- Dhillon, Paramveer S., et al. “Two step CCA: a new spectral method for estimating vector models of words.” Proceedings of the 29th International Conference on International Conference on Machine Learning. 2012.
- Collobert, Ronan, et al. “Natural language processing (almost) from scratch.” Journal of machine learning research 12.Aug (2011): 2493-2537.
- Lebret, Rémi, and Ronan Collobert. “Word Embeddings through Hellinger PCA.” Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 2014.
- Li, Ping, Trevor J. Hastie, and Kenneth W. Church. “Very sparse random projections.” Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 2006.
- Pakhomov, Serguei VS, et al. “Corpus domain effects on distributional semantic modeling of medical terms.” Bioinformatics 32.23 (2016): 3635-3644.
- Wang, Yanshan, et al. “A comparison of word embeddings for the biomedical natural language processing.” Journal of biomedical informatics 87 (2018): 12-20.
- Agirre, Eneko, et al. “A Study on Similarity and Relatedness Using Distributional and WordNet- based Approaches.” Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2009.
- Kliegr, Tomáš, and Ondřej Zamazal. “Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353.” Data & Knowledge Engineering 115 (2018): 174-193.
- Peters, Matthew E., et al. “Deep Contextualized Word Representations.” Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
- Devlin, Jacob, et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
How to Cite This Article:
Satya Mohan Chowdary G, T Ganga Bhavani, D Konda Babu, B Prasanna Rani, K Sireesha
. Exploring Word Embeddings for Text Classification: A Comparative Analysis
. ijetms;7(5):52-66. DOI: 10.46647/ijetms.2023.v07i05.007