Abstract:
"Most Arab users express their opinions on social media platforms using colloquial dialect which contains a huge quantity
of unstructured and ambiguities data. These data require treatment by utilizing sentiment analysis (SA) techniques to discover opinions polarity that deemed useful for stakeholders. In Palestinian dialect there are many problems related to their nature, such as abbreviations and lack of standardization rules for grammar and spelling. These issues made the extract opinions process very challenging task in the SA area. In this paper, a rule-based sentiment lexicon for Palestinian dialect is proposed with novel rules and concepts to classify users' comments into positive, negative, and neutral. Also, a grouping-terms technique is proposed in the
preprocessing step to outdo the issues related to texts in Palestinian dialect such as writing different spellings (shapes) that have one meaning for the same word. Additionally, a polarity lexicon for Palestinian dialect has been created during the grouping-terms process. The proposed lexical-classifier achieved better results when using the grouping-terms technique instead of stemming. It achieved an accuracy of 85% when using two classes, and an accuracy of 80.5% when using three classes, which is considered a very good performance in SA approaches. Our results showed that the development of rules and polarity lexicon for Palestinian dialect terms can be considered as a good implication for further related studies."