Abstract:
Today, there is an enormous impact on a generation of data in everyday life due to microblogging sites like Twitter, Facebook, and other social networking websites. The valuable data that is broadcast through microblogging can provide useful information to different situations if captured and analyzed properly promptly. In the case of Smart City, automatically identifying event types using Twitter messages as a data source can contribute to situation awareness about the city, and it also brings out much useful information related to it for people who are interested. The focus of this work is an automatic categorization of microblogging data from the certain location, as well as identify the sentiment level at each of the categories to provide a better understanding of public needs and concerns. As the processing of Twitter messages is a challenging task, we propose an algorithm to preprocess the Twitter messages automatically. For the experiment, we used Twitter messages for sixteen different event types from one geo-location. We proposed an algorithm to preprocess the Twitter messages, and Random Forest classifier automatically categorize these tweets into predefined event types. Therefore, applying sentiment analysis to tweets related to these categories allows if people are talking in negative or positive context about it, thus providing valuable information for timely decision making for recommending local service. The results have shown that Random Forest performs better than Support Vector Machines and Naive Bayes classifiers, and combining sentiment score with cosine similarity of event types provides more detailed understanding for the identified public categories.