Mold - A framework for entity extraction and summarization

Sarthak Tiwari; Bharat Goel; Srividya Bansal

doi:10.1109/ICSC.2020.00086

Mold - A framework for entity extraction and summarization

Sarthak Tiwari, Bharat Goel, Srividya Bansal

Computing and Augmented Intelligence, School of (IAFSE-SCAI)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

With the dawn of Wikis, the collaboration of endless users and the rise of automated agents scraping thorough the internet in search of actionable information, it has become a necessity to create a framework that could serve as a translator between these two parties. Currently, moderators or developers serve this purpose by manually adding the required structure to natural language input while being assisted by one or more assisting applications. In short, the process still requires human intervention and that is what we are aiming to eliminate using the framework suggested in this paper. The framework is divided into two parts, one natural language processing module that uses state-of-the-art Part-of-Speech (PoS) taggers and our own algorithms to convert a natural language text into triples made up of predefined predicates, which are provided to the machine users (autonomous agents). The generated triples are also complemented with their own schema thus enabling machine-based reasoning on the text. The other part i.e. the summarization takes the triples generated by the previous step and creates a natural language text out of it so that the human users of the system could access the information without any knowledge of underlying triples and schema. The framework is trained on a large corpus of English text to optimally find the subject and the object of a given sentence along with the most probable predicate. The predicates will be stored in a separate place in an XML based syntax giving the users the functionality to add/update the schema and predicates as the need arises enabling easy adaptation of the framework to a specific domain.

Original language	English (US)
Title of host publication	Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	445-450
Number of pages	6
ISBN (Electronic)	9781728163321
DOIs	https://doi.org/10.1109/ICSC.2020.00086
State	Published - Feb 2020
Event	14th IEEE International Conference on Semantic Computing, ICSC 2020 - San Diego, United States Duration: Feb 3 2020 → Feb 5 2020

Publication series

Name	Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020

Conference

Conference	14th IEEE International Conference on Semantic Computing, ICSC 2020
Country/Territory	United States
City	San Diego
Period	2/3/20 → 2/5/20

Keywords

Entity extraction
Entity summarization
Natural language processing
RDF triples
Semantic web

ASJC Scopus subject areas

Artificial Intelligence
Computer Science Applications
Computer Vision and Pattern Recognition

Access to Document

10.1109/ICSC.2020.00086

Cite this

Tiwari, S., Goel, B., & Bansal, S. (2020). Mold - A framework for entity extraction and summarization. In Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020 (pp. 445-450). Article 9031507 (Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICSC.2020.00086

Mold - A framework for entity extraction and summarization. / Tiwari, Sarthak; Goel, Bharat; Bansal, Srividya.
Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020. Institute of Electrical and Electronics Engineers Inc., 2020. p. 445-450 9031507 (Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Tiwari, S, Goel, B & Bansal, S 2020, Mold - A framework for entity extraction and summarization. in Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020., 9031507, Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020, Institute of Electrical and Electronics Engineers Inc., pp. 445-450, 14th IEEE International Conference on Semantic Computing, ICSC 2020, San Diego, United States, 2/3/20. https://doi.org/10.1109/ICSC.2020.00086

@inproceedings{a705c90017d44c2da23f316e373b7f7b,

title = "Mold - A framework for entity extraction and summarization",

abstract = "With the dawn of Wikis, the collaboration of endless users and the rise of automated agents scraping thorough the internet in search of actionable information, it has become a necessity to create a framework that could serve as a translator between these two parties. Currently, moderators or developers serve this purpose by manually adding the required structure to natural language input while being assisted by one or more assisting applications. In short, the process still requires human intervention and that is what we are aiming to eliminate using the framework suggested in this paper. The framework is divided into two parts, one natural language processing module that uses state-of-the-art Part-of-Speech (PoS) taggers and our own algorithms to convert a natural language text into triples made up of predefined predicates, which are provided to the machine users (autonomous agents). The generated triples are also complemented with their own schema thus enabling machine-based reasoning on the text. The other part i.e. the summarization takes the triples generated by the previous step and creates a natural language text out of it so that the human users of the system could access the information without any knowledge of underlying triples and schema. The framework is trained on a large corpus of English text to optimally find the subject and the object of a given sentence along with the most probable predicate. The predicates will be stored in a separate place in an XML based syntax giving the users the functionality to add/update the schema and predicates as the need arises enabling easy adaptation of the framework to a specific domain.",

keywords = "Entity extraction, Entity summarization, Natural language processing, RDF triples, Semantic web",

author = "Sarthak Tiwari and Bharat Goel and Srividya Bansal",

year = "2020",

month = feb,

doi = "10.1109/ICSC.2020.00086",

language = "English (US)",

series = "Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "445--450",

booktitle = "Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020",

note = "14th IEEE International Conference on Semantic Computing, ICSC 2020 ; Conference date: 03-02-2020 Through 05-02-2020",

}

TY - GEN

T1 - Mold - A framework for entity extraction and summarization

AU - Tiwari, Sarthak

AU - Goel, Bharat

AU - Bansal, Srividya

PY - 2020/2

Y1 - 2020/2

N2 - With the dawn of Wikis, the collaboration of endless users and the rise of automated agents scraping thorough the internet in search of actionable information, it has become a necessity to create a framework that could serve as a translator between these two parties. Currently, moderators or developers serve this purpose by manually adding the required structure to natural language input while being assisted by one or more assisting applications. In short, the process still requires human intervention and that is what we are aiming to eliminate using the framework suggested in this paper. The framework is divided into two parts, one natural language processing module that uses state-of-the-art Part-of-Speech (PoS) taggers and our own algorithms to convert a natural language text into triples made up of predefined predicates, which are provided to the machine users (autonomous agents). The generated triples are also complemented with their own schema thus enabling machine-based reasoning on the text. The other part i.e. the summarization takes the triples generated by the previous step and creates a natural language text out of it so that the human users of the system could access the information without any knowledge of underlying triples and schema. The framework is trained on a large corpus of English text to optimally find the subject and the object of a given sentence along with the most probable predicate. The predicates will be stored in a separate place in an XML based syntax giving the users the functionality to add/update the schema and predicates as the need arises enabling easy adaptation of the framework to a specific domain.

AB - With the dawn of Wikis, the collaboration of endless users and the rise of automated agents scraping thorough the internet in search of actionable information, it has become a necessity to create a framework that could serve as a translator between these two parties. Currently, moderators or developers serve this purpose by manually adding the required structure to natural language input while being assisted by one or more assisting applications. In short, the process still requires human intervention and that is what we are aiming to eliminate using the framework suggested in this paper. The framework is divided into two parts, one natural language processing module that uses state-of-the-art Part-of-Speech (PoS) taggers and our own algorithms to convert a natural language text into triples made up of predefined predicates, which are provided to the machine users (autonomous agents). The generated triples are also complemented with their own schema thus enabling machine-based reasoning on the text. The other part i.e. the summarization takes the triples generated by the previous step and creates a natural language text out of it so that the human users of the system could access the information without any knowledge of underlying triples and schema. The framework is trained on a large corpus of English text to optimally find the subject and the object of a given sentence along with the most probable predicate. The predicates will be stored in a separate place in an XML based syntax giving the users the functionality to add/update the schema and predicates as the need arises enabling easy adaptation of the framework to a specific domain.

KW - Entity extraction

KW - Entity summarization

KW - Natural language processing

KW - RDF triples

KW - Semantic web

UR - http://www.scopus.com/inward/record.url?scp=85083451579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85083451579&partnerID=8YFLogxK

U2 - 10.1109/ICSC.2020.00086

DO - 10.1109/ICSC.2020.00086

M3 - Conference contribution

AN - SCOPUS:85083451579

T3 - Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020

SP - 445

EP - 450

BT - Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 14th IEEE International Conference on Semantic Computing, ICSC 2020

Y2 - 3 February 2020 through 5 February 2020

ER -

Mold - A framework for entity extraction and summarization

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this