TY - JOUR
T1 - Semantic similarity measure of natural language text through machine learning and a keyword-aware cross-encoder-ranking summarizer—A case study using UCGIS GIS&T body of knowledge
AU - Tian, Yuanyuan
AU - Li, Wenwen
AU - Wang, Sizhe
AU - Gu, Zhining
N1 - Publisher Copyright:
© 2023 John Wiley & Sons Ltd.
PY - 2023/6
Y1 - 2023/6
N2 - Initiated by the University Consortium of Geographic Information Science (UCGIS), the GIS&T Body of Knowledge (BoK) is a community-driven endeavor to define, develop, and document geospatial topics related to geographic information science and technologies (GIS&T). In recent years, GIS&T BoK has undergone rigorous development in terms of its topic re-organization and content updating, resulting in a new digital version of the project. While the BoK topics provide useful materials for researchers and students to learn about GIS, the semantic relationships among the topics, such as semantic similarity, should also be identified so that a better and automated topic navigation can be achieved. Currently, the related topics are either defined manually by editors or authors, which may result in an incomplete assessment of topic relationships. To address this challenge, our research evaluates the effectiveness of multiple natural language processing (NLP) techniques in extracting semantics from text, including both deep neural networks and traditional machine learning approaches. Besides, a novel text summarization—KACERS (Keyword-Aware Cross-Encoder-Ranking Summarizer)—is proposed to generate a semantic summary of scientific publications. By identifying the semantic linkages among key topics, this work guides the future development and content organization of the GIS&T BoK project. It also offers a new perspective on the use of machine learning techniques for analyzing scientific publications and demonstrates the potential of the KACERS summarizer in semantic understanding of long text documents.
AB - Initiated by the University Consortium of Geographic Information Science (UCGIS), the GIS&T Body of Knowledge (BoK) is a community-driven endeavor to define, develop, and document geospatial topics related to geographic information science and technologies (GIS&T). In recent years, GIS&T BoK has undergone rigorous development in terms of its topic re-organization and content updating, resulting in a new digital version of the project. While the BoK topics provide useful materials for researchers and students to learn about GIS, the semantic relationships among the topics, such as semantic similarity, should also be identified so that a better and automated topic navigation can be achieved. Currently, the related topics are either defined manually by editors or authors, which may result in an incomplete assessment of topic relationships. To address this challenge, our research evaluates the effectiveness of multiple natural language processing (NLP) techniques in extracting semantics from text, including both deep neural networks and traditional machine learning approaches. Besides, a novel text summarization—KACERS (Keyword-Aware Cross-Encoder-Ranking Summarizer)—is proposed to generate a semantic summary of scientific publications. By identifying the semantic linkages among key topics, this work guides the future development and content organization of the GIS&T BoK project. It also offers a new perspective on the use of machine learning techniques for analyzing scientific publications and demonstrates the potential of the KACERS summarizer in semantic understanding of long text documents.
UR - http://www.scopus.com/inward/record.url?scp=85158162452&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85158162452&partnerID=8YFLogxK
U2 - 10.1111/tgis.13059
DO - 10.1111/tgis.13059
M3 - Article
AN - SCOPUS:85158162452
SN - 1361-1682
VL - 27
SP - 1068
EP - 1089
JO - Transactions in GIS
JF - Transactions in GIS
IS - 4
ER -