SGAKE: Semantic Graph-based Automatic Keyword Extraction from Hindi Text Documents

Joshi, Manju Lata; Mittal, Namita; Joshi, Nisheeth

doi:https://dx.doi.org/10.12785/ijcds/120130

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 12
→
Issue 01
→
View Item

dc.contributor.author	Joshi, Manju Lata
dc.contributor.author	Mittal, Namita
dc.contributor.author	Joshi, Nisheeth
dc.date.accessioned	2021-07-14T11:21:50Z
dc.date.available	2021-07-14T11:21:50Z
dc.date.issued	2021-07-14
dc.identifier.issn	2210-142X
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/4290
dc.description.abstract	Automatic keyword extraction is an automated process to identify terms that best describe the subject of the document. These terms can be in the form of key terms or key phrases representing the most relevant information conveyed by the documents. Keyword extraction techniques can be Statistical based, Linguistic based, Machine Learning based, Graph-based, or Hybrid of any these. Each approach has its limitations and strengths. This paper focuses on Graph-based approaches. These approaches rely on the exploration of network properties like Degree, Structural Diversity Index, Strength, Clustering Coefficient, Neighborhood Size, Page Rank, Closeness, Betweenness, Eigenvector Centrality, Hub, and Authority Score. In the proposed approach, the graph is constructed using semantic linkages between the terms in the document. The semantic linkages between the document terms are extracted using Hindi Wordnet as a background knowledge source. Further, fourteen different graphical measures are applied to extract the keywords. The experiments are conducted on the Tourism and Health data set of the Hindi language. The results of the proposed approach are evaluated and compared with the state-of-the-art approach TextRank as well as with the Human Annotated keywords. The result shows that the closeness centrality measure produces better precision and recall as compared to other graphical measures in case of matching with human-annotated keywords while authority proved as a good graphical measure to produce keywords, matching with TextRank. The experiments prove that the proposed semantic graph-based approach performs better as compared to the state of art approach TextRank. This paper also explored the correlation between different graph-theoretic measures using different methods of correlations.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Bahrain	en_US
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Automatic Keyword Extraction	en_US
dc.subject	Semantic Graph-based Keyword Extraction	en_US
dc.subject	Semantic Network	en_US
dc.subject	Hindi Text Documents	en_US
dc.subject	Hindi WordNet	en_US
dc.title	SGAKE: Semantic Graph-based Automatic Keyword Extraction from Hindi Text Documents	en_US
dc.identifier.doi	https://dx.doi.org/10.12785/ijcds/120130
dc.contributor.authorcountry	India	en_US
dc.contributor.authorcountry	India	en_US
dc.contributor.authorcountry	India	en_US
dc.contributor.authoraffiliation	Banasthali University & ISIM	en_US
dc.contributor.authoraffiliation	MNIT Jaipur	en_US
dc.contributor.authoraffiliation	Banasthali University	en_US
dc.source.title	International Journal of Computing and Digital System	en_US