983 resultados para ¹H and 13C spectral data
Resumo:
In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease.
Resumo:
Complex networks have been extensively used in the last decade to characterize and analyze complex systems, and they have been recently proposed as a novel instrument for the analysis of spectra extracted from biological samples. Yet, the high number of measurements composing spectra, and the consequent high computational cost, make a direct network analysis unfeasible. We here present a comparative analysis of three customary feature selection algorithms, including the binning of spectral data and the use of information theory metrics. Such algorithms are compared by assessing the score obtained in a classification task, where healthy subjects and people suffering from different types of cancers should be discriminated. Results indicate that a feature selection strategy based on Mutual Information outperforms the more classical data binning, while allowing a reduction of the dimensionality of the data set in two orders of magnitude
Resumo:
Molecular and fragment ion data of intact 8- to 43-kDa proteins from electrospray Fourier-transform tandem mass spectrometry are matched against the corresponding data in sequence data bases. Extending the sequence tag concept of Mann and Wilm for matching peptides, a partial amino acid sequence in the unknown is first identified from the mass differences of a series of fragment ions, and the mass position of this sequence is defined from molecular weight and the fragment ion masses. For three studied proteins, a single sequence tag retrieved only the correct protein from the data base; a fourth protein required the input of two sequence tags. However, three of the data base proteins differed by having an extra methionine or by missing an acetyl or heme substitution. The positions of these modifications in the protein examined were greatly restricted by the mass differences of its molecular and fragment ions versus those of the data base. To characterize the primary structure of an unknown represented in the data base, this method is fast and specific and does not require prior enzymatic or chemical degradation.