A Corpus Based Transformation-Based Learning for Hausa Text Parts of Speech Tagging

Awwalu, Jamilu; Abdullahi, Saleh Elyakub; Evwiekpaefe, Abraham Eseoghene

doi:http://dx.doi.org/10.12785/ijcds/100146

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 10
→
Issue 01
→
View Item

A Corpus Based Transformation-Based Learning for Hausa Text Parts of Speech Tagging

Awwalu, Jamilu; Abdullahi, Saleh Elyakub; Evwiekpaefe, Abraham Eseoghene

DOI: http://dx.doi.org/10.12785/ijcds/100146

ISSN: 2210-142X

Date: 2021-04-21

Abstract:

Parts of Speech tagging also known as POS tagging is a division under semantic analysis in Natural Language Processing. It has been an active research area for a very long time especially for languages such as; English, Arabic, Mandarin, Czech, Bahasa Melayu, Wolof, and Igbo. Hausa language belongs to the West Chadic languages, it is spoken in parts of several countries such as; Nigeria, Niger, Benin, Cameroun, Chad, Burkina Faso, Sudan, Congo, Ghana, and Togo. However, despite all these wide number of speakers, the Hausa language lacks Natural Language Processing (NLP) resources such as POS taggers. This limits NLP research such as Information Retrieval, Machine Translation, and Word Sense Disambiguation on Hausa language. Different Machine Learning (ML) approaches have yielded varying performance in POS tagging, thus indicating the critical role ML approach plays on performance of POS taggers. In this study, we create a Hausa language POS tagset, called Hausa tagset (HTS), apply Transformation-based Learning as a hybrid tagger, Hidden Markov Model (HMM) and N-Gram as probabilistic taggers to perform POS tagging of the Hausa language. Results on taggers testing based on precision, recall, and f1-measure from this study shows that TBL tagger scored 64%, 52%, and 53% outperforming the HMM tagger which scored 55%, 7%, and 5%. Comparing TBL with the N-gram taggers, the TBL and Unigram taggers achieved 53% f1-measure while the bigram and the Trigram taggers achieved 52%. On recall, the TBL achieved 6% more than the Unigram and 7% more than the Bigram and Trigram. In terms of precision, the TBL scored lowest compared to the N-gram taggers by scoring 64%, while the Unigram tagger achieved 70%, followed by the Bigram and Trigram both scoring 69%. Although the TBL tagger majorly outperformed other (i.e. HMM, Unigram, Bigram, Trigram) taggers on all evaluation metrics except for Unigram precision, both TBL and Unigram tagger achieved same level of f1-measure, and differ on precision, and recall with a balanced difference as TBL exceeded the Unigram tagger by 6% on recall while the Unigram tagger exceeded the TBL tagger by 6% also on precision.

Show full item record