Abstract:
Part-of-speech (POS) tagging is the process of selecting an appropriate POS tag for each word in a natural language sentence. POS tagging is a vital part of most natural language processing (NLP) applications. In comparison to other languages, there is a dearth of studies on NLP applications for the Arabic language. Recently, neural networks (NNs) and deep learning technologies have shown excellent results for some English and Latin NLP applications. However, for Arabic, the practice is still in its infancy, and more work is needed to determine whether neural technologies will lead to convincing results for NLP applications. In this paper, a long short-term memory (LSTM) model has been used to investigate the effectiveness of NNs in Arabic NLP. The model has been specifically applied to identify the POS tags for Arabic words and morphemes taken from the Quranic Arabic Corpus (QAC) data set. QAC is a well-known gold standard dataset prepared by researchers from Leeds University. It is interesting to note that LSTM tagger achieved 99.72% accuracy for tagging morphemes and 99.18% for tagging words, while the Word2Vec tagger achieved 99.55% for tagging morphemes and 97.33% for tagging words.