Signal finding (pattern discovery) in biological sequences is a fundamental problem in both computer science and molecular biology. From a biological point of view, the knowledge of which motifs are the most frequent in a given set of sequences is only partially informative. In general, one can identify different subsets of sequences, each characterized by different motifs. To superimpose a single set of frequent motifs over the whole dataset is often an oversimplification that do not expose important pieces of information. We propose a de novo framework allowing one to simultaneously build partitions of protein sequences and groups of associated patterns. In this way we are able to individuate a richer set of motifs, each one possibly characterizing only some of the sequences in the whole dataset.
A new protein motif extraction framework based on constrained co-clustering
CORDERO, Francesca;VISCONTI, ALESSIA;BOTTA, Marco
2009-01-01
Abstract
Signal finding (pattern discovery) in biological sequences is a fundamental problem in both computer science and molecular biology. From a biological point of view, the knowledge of which motifs are the most frequent in a given set of sequences is only partially informative. In general, one can identify different subsets of sequences, each characterized by different motifs. To superimpose a single set of frequent motifs over the whole dataset is often an oversimplification that do not expose important pieces of information. We propose a de novo framework allowing one to simultaneously build partitions of protein sequences and groups of associated patterns. In this way we are able to individuate a richer set of motifs, each one possibly characterizing only some of the sequences in the whole dataset.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.