University of Bahrain
Scientific Journals

A Corpus Based Transformation-Based Learning for Hausa Text Parts of Speech Tagging

Show simple item record

dc.contributor.author Awwalu, Jamilu
dc.contributor.author Abdullahi, Saleh Elyakub
dc.contributor.author Evwiekpaefe, Abraham Eseoghene
dc.date.accessioned 2020-07-21T13:44:18Z
dc.date.available 2020-07-21T13:44:18Z
dc.date.issued 2021-04-21
dc.identifier.issn 2210-142X
dc.identifier.uri https://journal.uob.edu.bh:443/handle/123456789/4028
dc.description.abstract Parts of Speech tagging also known as POS tagging is a division under semantic analysis in Natural Language Processing. It has been an active research area for a very long time especially for languages such as; English, Arabic, Mandarin, Czech, Bahasa Melayu, Wolof, and Igbo. Hausa language belongs to the West Chadic languages, it is spoken in parts of several countries such as; Nigeria, Niger, Benin, Cameroun, Chad, Burkina Faso, Sudan, Congo, Ghana, and Togo. However, despite all these wide number of speakers, the Hausa language lacks Natural Language Processing (NLP) resources such as POS taggers. This limits NLP research such as Information Retrieval, Machine Translation, and Word Sense Disambiguation on Hausa language. Different Machine Learning (ML) approaches have yielded varying performance in POS tagging, thus indicating the critical role ML approach plays on performance of POS taggers. In this study, we create a Hausa language POS tagset, called Hausa tagset (HTS), apply Transformation-based Learning as a hybrid tagger, Hidden Markov Model (HMM) and N-Gram as probabilistic taggers to perform POS tagging of the Hausa language. Results on taggers testing based on precision, recall, and f1-measure from this study shows that TBL tagger scored 64%, 52%, and 53% outperforming the HMM tagger which scored 55%, 7%, and 5%. Comparing TBL with the N-gram taggers, the TBL and Unigram taggers achieved 53% f1-measure while the bigram and the Trigram taggers achieved 52%. On recall, the TBL achieved 6% more than the Unigram and 7% more than the Bigram and Trigram. In terms of precision, the TBL scored lowest compared to the N-gram taggers by scoring 64%, while the Unigram tagger achieved 70%, followed by the Bigram and Trigram both scoring 69%. Although the TBL tagger majorly outperformed other (i.e. HMM, Unigram, Bigram, Trigram) taggers on all evaluation metrics except for Unigram precision, both TBL and Unigram tagger achieved same level of f1-measure, and differ on precision, and recall with a balanced difference as TBL exceeded the Unigram tagger by 6% on recall while the Unigram tagger exceeded the TBL tagger by 6% also on precision. en_US
dc.language.iso en en_US
dc.publisher University of Bahrain en_US
dc.rights Attribution-NonCommercial-NoDerivatives 4.0 International *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/4.0/ *
dc.subject Machine Learning, Transformation Based Learning, Hidden Markov Model, N-gram, Hausa, Parts of Speech Tagging en_US
dc.title A Corpus Based Transformation-Based Learning for Hausa Text Parts of Speech Tagging en_US
dc.type Article en_US
dc.identifier.doi http://dx.doi.org/10.12785/ijcds/100146
dc.volume 10 en_US
dc.pagestart 473 en_US
dc.pageend 490 en_US
dc.source.title International Journal of Computing and Digital Systems en_US
dc.abbreviatedsourcetitle IJCDS en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Issue(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International

All Journals


Advanced Search

Browse

Administrator Account