Abstract:
Parts of Speech tagging as a branch under semantic analysis is an active research area that have been carried out for several languages such as; English, Mandarin, Arabic, Czech, Bahasa Melayu, Igbo and Wolof. Hausa is a West Chadic language that is spoken in parts of; Nigeria, Niger, Cameroun, Benin, Burkina Faso, Chad, Congo, Sudan, Ghana, Togo, and much of North Africa. However, despite all these wide number of speakers and audience, Hausa language suffers scarcity of Natural Language Processing) resources and POS taggers thus limiting NLP research on the language such as Machine Translation, Information Retrieval, and Word Sense Disambiguation. There have been competitive metric values scored by different Machine Learning approaches in terms of performance, which is accuracy and speed. In this study, we model and implement a Hausa language POS tagset called HTS, apply Transformation-based Learning as a hybrid tagger, HMM and N-Gram as non-hybrid taggers in performing POS tagging of Hausa language. The goal of the corpus based approach is to build a generic POS Hausa text tagger with little or no dependence on corpus domain. Results on taggers testing based precision, recall, accuracy from this study shows that TBL tagger scored 64%, 52%, and 53% outperforming the HMM tagger which scored 55%, 7%, and 5%. Comparing TBL with the N-gram taggers, the TBL and Unigram taggers achieved 53% f1-measure while the bigram and the Trigram taggers achieved 52%. On recall, the TBL achieved 6% more than the Unigram and 7% more than the Bigram and Trigram. In terms of precision, the TBL scored lowest compared to the N-gram taggers by scoring 64%, while the Unigram tagger achieved 70%, followed by the Bigram and Trigram both scoring 69%. Although the TBL tagger majorly outperformed other (i.e. HMM, Unigram, Bigram, Trigram) taggers on all evaluation metrics except for Unigram precision, both TBL and Unigram tagger achieved same level of f1-measure, and differ on precision, and recall with a balanced difference as TBL exceeded the Unigram tagger by 6% on recall while the Unigram tagger exceeded the TBL tagger by 6% also on precision.