Table detection via probability optimization

Yalin Wang, Ihsin T. Phillips, Robert M. Haralick

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations


In this paper, we define the table detection problem as a probability optimization problem. We begin, as we do in our previous algorithm, finding and validating each detected table candidates. We proceed to compute a set of probability measurements for each of the table entities. The computation of the probability measurements takes into consideration tables, table text separators and table neighboring text blocks. Then, an iterative updating method is used to optimize the page segmentation probability to obtain the final result. This new algorithm shows a great improvement over our previous algorithm. The training and testing data set for the algorithm include 1, 125 document pages having 518 table entities and a total of 10, 934 cell entities. Compared with our previouswork, it raised the accuracy rate to 95.67% from 90.32% and to 97.05% from 92.04%.

Original languageEnglish (US)
Title of host publicationDocument Analysis Systems V - 5th International Workshop, DAS 2002, Proceedings
EditorsDaniel Lopresti, Jianying Hu, Ramanujan Kashi
PublisherSpringer Verlag
Number of pages11
ISBN (Print)3540440682, 9783540440680
StatePublished - 2002
Externally publishedYes
Event5th International Workshop on Document Analysis Systems, DAS 2002 - Princeton, United States
Duration: Aug 19 2002Aug 21 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other5th International Workshop on Document Analysis Systems, DAS 2002
Country/TerritoryUnited States

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Table detection via probability optimization'. Together they form a unique fingerprint.

Cite this