Automated Paragraph Detection Using Cohesion Network Analysis

Robert Mihai Botarleanu, Mihai Dascalu, Scott Andrew Crossley, Danielle S. McNamara

Research output: Chapter in Book/Report/Conference proceedingConference contribution


The ability to express yourself concisely and coherently is a crucial skill, both for academic purposes and professional careers. An important aspect to consider in writing is an adequate segmentation of ideas, which in turn requires a proper understanding of where to place paragraph breaks. However, these decisions are often performed intuitively, with little systematicity in sequencing ideas. Thus, an automated method of detecting the optimal hierarchical structure of texts using quantifiable features could be a valuable tool for learners. Here, we aim to define a framework grounded in Cohesion Network Analysis to establish the structure of a text by modeling paragraphs as clusters of sentences. The analogy to clustering enables us to identify paragraph breaks that maximize inter-paragraph separation while ensuring high intra-paragraph cohesion. Our approach consists of two steps acted on texts without paragraph breaks. First, the number of paragraphs is automatically inferred with an absolute error of 1.02 using a Recurrent Neural Network, which relies on text features and cohesion flow. Second, paragraph splits are detected using two algorithms: top k which selects the largest cohesion gaps between adjacent utterances, and divisive clustering which iteratively splits the text into paragraphs. Silhouette scores are used to assess performance and the obtained values denote adequately inferred structures.

Original languageEnglish (US)
Title of host publicationPolyphonic Construction of Smart Learning Ecosystems - Proceedings of the 7th Conference on Smart Learning Ecosystems and Regional Development
EditorsMihai Dascalu, Patrizia Marti, Francesca Pozzi
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages14
ISBN (Print)9789811952395
StatePublished - 2023
Event7th International Conference on Smart Learning Ecosystems and Regional Development, SLERD 2022 - Bucharest, Romania
Duration: Jul 5 2022Jul 6 2022

Publication series

NameSmart Innovation, Systems and Technologies
ISSN (Print)2190-3018
ISSN (Electronic)2190-3026


Conference7th International Conference on Smart Learning Ecosystems and Regional Development, SLERD 2022


  • Clustering
  • Cohesion network analysis
  • Paragraph marking
  • Sentence embeddings

ASJC Scopus subject areas

  • General Decision Sciences
  • General Computer Science


Dive into the research topics of 'Automated Paragraph Detection Using Cohesion Network Analysis'. Together they form a unique fingerprint.

Cite this