Abstract:
Interest in Natural Language Processing (NLP) has grown very quickly over the last decades, mainly because it
provides tools to represent and analyze human languages computationally. A key challenge in NLP is word categorization
or classification based on its meaning within a given context. This problem is referred to as word-sense disambiguation
(WSD). This issue is prevalent in all languages around the world. However, WSD poses the greatest challenge among
North-East Indian languages due to the scarcity of digital resources. This work is an attempt to solve the problem of Word
Sense Disambiguation in a low-resource Bodo language and is also considered text-sparse using an adapted Convolutional
Neural Network (CNN) model with an attention mechanism. The northeastern region of India predominantly speaks the
Bodo language, necessitating careful consideration of its data when constructing NLP models. An attention layer has been
implemented in order to effectively identify the significant properties associated with a particular label, enabling the model
to focus on the more important things. The CNN layer again extracts certain semantic components from sentences, which
further helps in catching subtle nuances of meaning. Testing results were promising, as the proposed framework achieved a
remarkable accuracy of 71.43% on a very narrow dataset. Therefore, it demonstrates that the deep CNN with soft attention
is more effective in inferring the meaning of words in the Bodo language. Hence, the study proves that NLP, using advanced
methodologies like the CNN-Attention model, has immense potential to get over these challenges in low-resource languages.
By drawing powerful attention mechanisms and convolutional neural networks together, the model is endowed better at
capturing fine-grained semantic differences, offering a glimpse into the possibility for better language processing tools in
Bodo and other similarly resource-limited languages.