Abstract:
In modern times, there has been an exorbitant rise in unstructured digital text, owing to the ever-increasing use
of internet. Therefore, to be able to extract knowledge out of it, a perceived need was felt to organize the enormous amount
of digital text into different categories. This is why Text Classification is considered a critical task in NLP (Natural Language
Processing). This research suggests a hybrid model(C-LSTM-ATC) that combines the benefits of two deep learning models,
namely, the Convolutional Neural Network (CNN) and Long and Short Term Memory (LSTM), to categorize Assamese
text, a topic that largely remains unexplored till now. Another hybrid model was also tried by combining LSTM with the
Support Vector Machine (LSTM-SVM). The C-LSTM-ATC model performs splendidly with an accuracy of 97.2% while the
LSTM-SVM model outputs an accuracy of 92.2% when tested on the dataset prepared. The model was trained using as
many as 768 Assamese text documents and the test results showed that the proposed C-LSTM-ATC model produces more
accurate classification and higher F1 scores, than the LSTM-SVM and also the CNN and LSTM models when used separately.