Abstract:
Question Answering Systems (QAS) are made to automatically respond with precise information to user questions that are
phrased in natural language. Due to its intricate and rich morphology, Arabic QAS poses a significant problem. Information retrieval,
text summarization, and question-answering systems all fall under the category of natural language processing activities where text
representation is a critical step. Comparing SE representation to more traditional approaches like bag-of-words and word embedding, it
has demonstrated encouraging results.
In this study, we introduce a novel QA approach for the Arabic language that is based on passage retrieval and SE representation.
It consists of three steps: ”Question classification and query formulation”, ”Documents and passages retrieval”, and then ”Answers
extraction”. In this work, we adopt the AraBert pre-trained model to compute vector representation. It allows us to consider implicit
semantics and the words’ context within the text. Furthermore, in order to collect potential passages for user questions, we investigate
a method for retrieving Arabic passages using the BM25 model, a query expansion process, and SE representation. The final answer is
extracted by fine-tuning AraBERT parameters by ranking passages and extracting the most relevant ones. We carry out a number of tests
with the CLEF and TREC datasets by following two different taxonomies. The outcomes demonstrate the efficacy of our methodology