Abstract
Clustering sequences is important in a variety of applications, including development of nonredundant databases, function prediction, and identifying patterns of gene expression. Currently, clustering methods rely on a prealignment as supplementary information to guide the construction of clusters. This chapter introduces a novel algorithm to cluster nucleotide and peptide sequences. The algorithm is a no-reference approach that utilizes only the sequences as input. We also introduce a novel metric that is used to describe the relationship between biological sequences, and serves as the distance measurement for clustering. Results are presented for real biological sequences, comparing the proposed algorithm to other similar tools available.
Original language | English (US) |
---|---|
Title of host publication | Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology |
Subtitle of host publication | Algorithms and Software Tools |
Publisher | Elsevier Inc. |
Pages | 203-220 |
Number of pages | 18 |
ISBN (Electronic) | 9780128026465 |
ISBN (Print) | 9780128025086 |
DOIs | |
State | Published - Aug 7 2015 |
Keywords
- Biological sequences
- Clustering
- Databases
- Graph cuts
- Hashing
- Nucleotide
- Peptide
ASJC Scopus subject areas
- Computer Science(all)