TY - GEN
T1 - Mold - A framework for entity extraction and summarization
AU - Tiwari, Sarthak
AU - Goel, Bharat
AU - Bansal, Srividya
PY - 2020/2
Y1 - 2020/2
N2 - With the dawn of Wikis, the collaboration of endless users and the rise of automated agents scraping thorough the internet in search of actionable information, it has become a necessity to create a framework that could serve as a translator between these two parties. Currently, moderators or developers serve this purpose by manually adding the required structure to natural language input while being assisted by one or more assisting applications. In short, the process still requires human intervention and that is what we are aiming to eliminate using the framework suggested in this paper. The framework is divided into two parts, one natural language processing module that uses state-of-the-art Part-of-Speech (PoS) taggers and our own algorithms to convert a natural language text into triples made up of predefined predicates, which are provided to the machine users (autonomous agents). The generated triples are also complemented with their own schema thus enabling machine-based reasoning on the text. The other part i.e. the summarization takes the triples generated by the previous step and creates a natural language text out of it so that the human users of the system could access the information without any knowledge of underlying triples and schema. The framework is trained on a large corpus of English text to optimally find the subject and the object of a given sentence along with the most probable predicate. The predicates will be stored in a separate place in an XML based syntax giving the users the functionality to add/update the schema and predicates as the need arises enabling easy adaptation of the framework to a specific domain.
AB - With the dawn of Wikis, the collaboration of endless users and the rise of automated agents scraping thorough the internet in search of actionable information, it has become a necessity to create a framework that could serve as a translator between these two parties. Currently, moderators or developers serve this purpose by manually adding the required structure to natural language input while being assisted by one or more assisting applications. In short, the process still requires human intervention and that is what we are aiming to eliminate using the framework suggested in this paper. The framework is divided into two parts, one natural language processing module that uses state-of-the-art Part-of-Speech (PoS) taggers and our own algorithms to convert a natural language text into triples made up of predefined predicates, which are provided to the machine users (autonomous agents). The generated triples are also complemented with their own schema thus enabling machine-based reasoning on the text. The other part i.e. the summarization takes the triples generated by the previous step and creates a natural language text out of it so that the human users of the system could access the information without any knowledge of underlying triples and schema. The framework is trained on a large corpus of English text to optimally find the subject and the object of a given sentence along with the most probable predicate. The predicates will be stored in a separate place in an XML based syntax giving the users the functionality to add/update the schema and predicates as the need arises enabling easy adaptation of the framework to a specific domain.
KW - Entity extraction
KW - Entity summarization
KW - Natural language processing
KW - RDF triples
KW - Semantic web
UR - http://www.scopus.com/inward/record.url?scp=85083451579&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083451579&partnerID=8YFLogxK
U2 - 10.1109/ICSC.2020.00086
DO - 10.1109/ICSC.2020.00086
M3 - Conference contribution
AN - SCOPUS:85083451579
T3 - Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020
SP - 445
EP - 450
BT - Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE International Conference on Semantic Computing, ICSC 2020
Y2 - 3 February 2020 through 5 February 2020
ER -