Mold - A framework for entity extraction and summarization

Sarthak Tiwari, Bharat Goel, Srividya Bansal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

With the dawn of Wikis, the collaboration of endless users and the rise of automated agents scraping thorough the internet in search of actionable information, it has become a necessity to create a framework that could serve as a translator between these two parties. Currently, moderators or developers serve this purpose by manually adding the required structure to natural language input while being assisted by one or more assisting applications. In short, the process still requires human intervention and that is what we are aiming to eliminate using the framework suggested in this paper. The framework is divided into two parts, one natural language processing module that uses state-of-the-art Part-of-Speech (PoS) taggers and our own algorithms to convert a natural language text into triples made up of predefined predicates, which are provided to the machine users (autonomous agents). The generated triples are also complemented with their own schema thus enabling machine-based reasoning on the text. The other part i.e. the summarization takes the triples generated by the previous step and creates a natural language text out of it so that the human users of the system could access the information without any knowledge of underlying triples and schema. The framework is trained on a large corpus of English text to optimally find the subject and the object of a given sentence along with the most probable predicate. The predicates will be stored in a separate place in an XML based syntax giving the users the functionality to add/update the schema and predicates as the need arises enabling easy adaptation of the framework to a specific domain.

Original languageEnglish (US)
Title of host publicationProceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages445-450
Number of pages6
ISBN (Electronic)9781728163321
DOIs
StatePublished - Feb 2020
Event14th IEEE International Conference on Semantic Computing, ICSC 2020 - San Diego, United States
Duration: Feb 3 2020Feb 5 2020

Publication series

NameProceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020

Conference

Conference14th IEEE International Conference on Semantic Computing, ICSC 2020
Country/TerritoryUnited States
CitySan Diego
Period2/3/202/5/20

Keywords

  • Entity extraction
  • Entity summarization
  • Natural language processing
  • RDF triples
  • Semantic web

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Mold - A framework for entity extraction and summarization'. Together they form a unique fingerprint.

Cite this