Record Details

Automatic extraction of significant terms from the title and abstract of scientific papers using the machine learning algorithm: A multiple module approach

NOPR - NISCAIR Online Periodicals Repository

View Archive Info
 
 
Field Value
 
Title Automatic extraction of significant terms from the title and abstract of scientific papers using the machine learning algorithm: A multiple module approach
 
Creator Mukherjee, Bhaskar
Majhi, Debasis
 
Subject Data mining
Title extraction
Natural Language Processing
YAKE
NLTK
Keyword Extraction-NLP
 
Description 33-40
Keyword extraction is the task of identifying important terms or phrase that are most representative of the source
document. Although the process of automatic extraction of keywords from title is an old method, it was mainly for
extraction from a single web document. Our approach differs from previous research works on keyword extraction in several
aspects. For those who are non-expert of the scientific fields, understating scientific research trends is difficult. The purpose
of this study is to develop an automatic method of obtaining overviews of a scientific field for non-experts by capturing
research trends. This empirical study excavates significant term extraction using Natural Language Processing (NLP) tools.
More than 15000 titles saved in a .csv file was our dataset and scripts written in Python were our process to compare how far
significant terms of scientific title corpus are similar or different to the terms available in the abstract of that same scientific
article corpus. A light-weight unsupervised title extractor, Yet Another Keyword Extractor (YAKE) was used to extract the
results. Based on our analysis, it can be concluded that these algorithms can be used for other fields too by the non-experts
of that subject field to perform automatic extraction of significant words and understanding trends. Our algorithm could be a
solution to reduce the labour-intensive manual indexing process.
 
Date 2023-05-03T08:45:35Z
2023-05-03T08:45:35Z
2023-05
 
Type Article
 
Identifier 0975-2404 (Online); 0972-5423 (Print)
http://nopr.niscpr.res.in/handle/123456789/61836
https://doi.org/10.56042/alis.v70i1.71272
 
Language en
 
Publisher NIScPR-CSIR, India
 
Source ALIS Vol.70(1) [March 2023]