A Corpus Based Transformation-Based Learning for Hausa Text Parts of Speech Tagging

Awwalu, Jamilu; Abdullahi, Saleh Elyakub; Evwiekpaefe, Abraham Eseoghene

doi:http://dx.doi.org/10.12785/ijcds/100146

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 10
→
Issue 01
→
View Item

dc.contributor.author	Awwalu, Jamilu
dc.contributor.author	Abdullahi, Saleh Elyakub
dc.contributor.author	Evwiekpaefe, Abraham Eseoghene
dc.date.accessioned	2020-07-21T13:44:18Z
dc.date.available	2020-07-21T13:44:18Z
dc.date.issued	2021-04-21
dc.identifier.issn	2210-142X
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/4028
dc.description.abstract	Parts of Speech tagging also known as POS tagging is a division under semantic analysis in Natural Language Processing. It has been an active research area for a very long time especially for languages such as; English, Arabic, Mandarin, Czech, Bahasa Melayu, Wolof, and Igbo. Hausa language belongs to the West Chadic languages, it is spoken in parts of several countries such as; Nigeria, Niger, Benin, Cameroun, Chad, Burkina Faso, Sudan, Congo, Ghana, and Togo. However, despite all these wide number of speakers, the Hausa language lacks Natural Language Processing (NLP) resources such as POS taggers. This limits NLP research such as Information Retrieval, Machine Translation, and Word Sense Disambiguation on Hausa language. Different Machine Learning (ML) approaches have yielded varying performance in POS tagging, thus indicating the critical role ML approach plays on performance of POS taggers. In this study, we create a Hausa language POS tagset, called Hausa tagset (HTS), apply Transformation-based Learning as a hybrid tagger, Hidden Markov Model (HMM) and N-Gram as probabilistic taggers to perform POS tagging of the Hausa language. Results on taggers testing based on precision, recall, and f1-measure from this study shows that TBL tagger scored 64%, 52%, and 53% outperforming the HMM tagger which scored 55%, 7%, and 5%. Comparing TBL with the N-gram taggers, the TBL and Unigram taggers achieved 53% f1-measure while the bigram and the Trigram taggers achieved 52%. On recall, the TBL achieved 6% more than the Unigram and 7% more than the Bigram and Trigram. In terms of precision, the TBL scored lowest compared to the N-gram taggers by scoring 64%, while the Unigram tagger achieved 70%, followed by the Bigram and Trigram both scoring 69%. Although the TBL tagger majorly outperformed other (i.e. HMM, Unigram, Bigram, Trigram) taggers on all evaluation metrics except for Unigram precision, both TBL and Unigram tagger achieved same level of f1-measure, and differ on precision, and recall with a balanced difference as TBL exceeded the Unigram tagger by 6% on recall while the Unigram tagger exceeded the TBL tagger by 6% also on precision.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Bahrain	en_US
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Machine Learning, Transformation Based Learning, Hidden Markov Model, N-gram, Hausa, Parts of Speech Tagging	en_US
dc.title	A Corpus Based Transformation-Based Learning for Hausa Text Parts of Speech Tagging	en_US
dc.type	Article	en_US
dc.identifier.doi	http://dx.doi.org/10.12785/ijcds/100146
dc.volume	10	en_US
dc.pagestart	473	en_US
dc.pageend	490	en_US
dc.source.title	International Journal of Computing and Digital Systems	en_US
dc.abbreviatedsourcetitle	IJCDS	en_US