Machine Learning Research

Submit a Manuscript

Publishing with us to make your research visible to the widest possible audience.

Propose a Special Issue

Building a community of authors and readers to discuss the latest research and develop new ideas.

Research Article |

Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach

Idiomatic phrases are natural components of all languages that cannot be comprehended straight from the word from which they are generated. Vector representations are a key method that bridges the human understanding of language to that of machines and solves many NLP problems. Idiomatic expression representation is necessary for machine learning, deep learning, and natural language processing applications. Machine learning and deep learning techniques have not been used to process text as input for natural language processing applications in previous literature. As such, in order to study natural language processing with machine learning and deep learning methods, vector or numeric representations of idiomatic statements are needed. Therefore, this research aimed at the proposed vector representation of Amharic idioms for NLP applications through vector representation models. Researchers that study natural language processing use this format, and for classification or regression, they employ machine learning and deep learning techniques. Before doing NLP application researches on Amharic idiom, first, it requires vector or numeric representation using suitable methods. We used five hundred idiomatic expressions from Amharic Idioms book as a dataset for this representation, which are comprised of two words. To evaluate performance, we employed the accuracy, precision, recall, and F-score metrics. The dataset produced a result of 95.5% accuracy.

Amharic Idiom, Machine Learning, Vector Representation, Word2vector

APA Style

Abebe Fenta, A. (2023). Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach. Machine Learning Research, 8(2), 17-22.

ACS Style

Abebe Fenta, A. Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach. Mach. Learn. Res. 2023, 8(2), 17-22. doi: 10.11648/j.mlr.20230802.11

AMA Style

Abebe Fenta A. Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach. Mach Learn Res. 2023;8(2):17-22. doi: 10.11648/j.mlr.20230802.11

Copyright © 2023 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. A. A. F. &. S. Gebeyehu, "Automatic Idiom Identification Model for Amharic Language," ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22, 8, Article 210, p. 9, 2023.
2. G. D. Salton, "Representations of Idioms for Natural Language Processing: Idiom type and token identification, Language Modelling and Neural Machine Translation," Doctotal thesis, DIT, 2017., 2017.
3. A. A. &. D. Worku, Amharic Idioms 2nd edition, Addis Abeba, Ethiopia: Kuraz publishing Agency, 1992.
4. J. P. &. A. Feldman, "Automatic Idiom Recognition with Word Embeddings," in In: Lossio-Ventura, J., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig SIMBig 2015 2016. Communications in Computer and Information Science, vol 656. Springer, Cham., 2017.
5. J. P. &. A. Feldman, "Experiments in Idiom Recognition," in In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2752–2761. The COLING 2016 Organizing Committee, Osaka, Japan, 2016.
6. R. R. J. K. Giancarlo Salton, "Idiom Token Classification using Sentential Distributed Semantics," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) doi 10.18653/v1/P16-1019, Berlin, Germeny, 2016.
7. A. M. A. Y. M. H. R. M. R. &. A. A. Mohamed A. Zahran, "Word Representations in Vector Space and their Applications for Arabic," in International Conference on Intelligent Text Processing and Computational Linguistics: Computational Linguistics and Intelligent Text Processing pp 430–443, 2015.
8. G. W. &. X. Z. Lei Zhu, "A Study of Chinese Document Representation and Classification with Word2vec," in 2016 9th International Symposium on Computational Intelligence and Design (ISCID) DOI: 10.1109/ISCID.2016.1075, 2016.
9. G. A. P. J. &. T. K. Yash Sharma, "Vector representation of words for sentiment analysis using GloVe," in 2017 International Conference on Intelligent Communication and Computational Techniques (ICCT) DOI: 10.1109/INTELCCT.2017.8324059, 2017.
10. Y. Z. Joseph Lilleberg & Yun Zhu, "Support Vector Machines and Word2vec for Text Classification with Semantic Features," in Proc. 20151IEE 14th Internations conference on Cognitive Inlormatics & Cognitive Computing, 2015.
11. Q. V. L. &. I. S. Tomas Mikolov, "Exploiting Similarities among Languages for Machine Translation," Cornell University arXiv: 1309.4168v1 [cs.CL], 2013.
12. K. Grzegorczyk, "Vector representations of text data in deep learning," Cornell University arXiv: 1901.01695v1, 2019.
13. G. T. &. T. A. Abebawu Eshetu, "Learning Word and Sub-word Vectors for Amharic (Less Resourced Language)," International Journal of Advanced Engineering Research and Science, vol. 7, no. 8, 2020.
14. A. Abebe, "Automatic Idiom identification Model for Amharic language,", Bahir Dar, 2021.