Removing data with noisy responses in regression analysis

Alan Wisler, Visar Berisha, Karthikeyan Ramamurthy, Andreas Spanias, Julie Liss

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

In regression analysis, outliers in the data can induce a bias in the learned function, resulting in larger errors. In this paper we derive an empirically estimable bound on the regression error based on a Euclidean minimum spanning tree generated from the data. Using this bound as motivation, we propose an iterative approach to remove data with noisy responses from the training set. We evaluate the performance of the algorithm on experiments with real-world pathological speech (speech from individuals with neurogenic disorders). Comparative results show that removing noisy examples during training using the proposed approach yields better predictive performance on out-of-sample data.

Original languageEnglish (US)
Title of host publication2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2066-2070
Number of pages5
ISBN (Electronic)9781467369978
DOIs
StatePublished - Aug 4 2015
Event40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, Australia
Duration: Apr 19 2014Apr 24 2014

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2015-August
ISSN (Print)1520-6149

Other

Other40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
Country/TerritoryAustralia
CityBrisbane
Period4/19/144/24/14

Keywords

  • Friedman-Rafsky statistic
  • minimum spanning tree
  • noisy data
  • outlier removal
  • robust regression

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Removing data with noisy responses in regression analysis'. Together they form a unique fingerprint.

Cite this