Support Estimation with Sampling Artifacts and Errors

Eli Chien, Olgica Milenkovic, Angelia Nedich

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations


The problem of estimating the support of a distribution is of great importance in many areas of machine learning, computer science and molecular biology. Almost all of the existing work in this area has used perfectly accurate sampling assumptions, which is seldom true in practice. Here we introduce the first known theoretical approach to support estimation in the presence of sampling artifacts, where each sample is assumed to be observed through a Poisson channel that simultaneously captures repetitions and deletions. The proposed estimator is based on regularized weighted Chebyshev approximations, with weights governed by evaluations of Touchard (Bell) polynomials. The supports in the presence of sampling artifacts are calculated via discretized semi-infinite programming methods. The newly proposed estimation approach is tested on synthetic and textual data, as well as on GISAID data for the purpose of estimating the mutational diversity of genes in the SARS-Cov-2 viral genome. For all experiments performed, we observed significant improvements of our integrated method compared to adequately modified known noiseless support estimation methods.

Original languageEnglish (US)
Title of host publication2021 IEEE International Symposium on Information Theory, ISIT 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781538682098
StatePublished - Jul 12 2021
Event2021 IEEE International Symposium on Information Theory, ISIT 2021 - Virtual, Melbourne, Australia
Duration: Jul 12 2021Jul 20 2021

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
ISSN (Print)2157-8095


Conference2021 IEEE International Symposium on Information Theory, ISIT 2021
CityVirtual, Melbourne

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics


Dive into the research topics of 'Support Estimation with Sampling Artifacts and Errors'. Together they form a unique fingerprint.

Cite this