986 resultados para Structural classification
Resumo:
In this paper, mixed spectral-structural kernel machines are proposed for the classification of very-high resolution images. The simultaneous use of multispectral and structural features (computed using morphological filters) allows a significant increase in classification accuracy of remote sensing images. Subsequently, weighted summation kernel support vector machines are proposed and applied in order to take into account the multiscale nature of the scene considered. Such classifiers use the Mercer property of kernel matrices to compute a new kernel matrix accounting simultaneously for two scale parameters. Tests on a Zurich QuickBird image show the relevance of the proposed method : using the mixed spectral-structural features, the classification accuracy increases of about 5%, achieving a Kappa index of 0.97. The multikernel approach proposed provide an overall accuracy of 98.90% with related Kappa index of 0.985.
Resumo:
Hardness is a property largely used in material specifications, mechanical and metallurgical research and quality control of several materials. Specifically for timber, Janka hardness is a simple, quick and easy test, with good correlations with the compression parallel to grain strength, a strong reference in structural classification for this material. More recently, international studies have reported the use of Brinell hardness for timber assessment which resumes the advantages previously mentioned for Janka hardness and make it easier to be performed in the field, especially because of the lower magnitude of the involved loads. A first generation of an equipment for field evaluation of hardness in wood - Portable Hardness tester for wood - based on Brinell hardness has already been developed by the Research Group on Forest Products from FCA/UNESP, Brazil, with very good correlations between the evaluated hardness and several other mechanical properties of the material when performing tests with different species of native and reforested wood (traditionally used as ties - sleepers - in railways). This paper presents results obtained in the experimental program with the first generation of this equipment and preliminary tests with its second generation, which uses accelerometers to substitute the indentation measurements in wood. For the first generation of the equipment functional and calibration tests were carried out using 16 native and reforestation timber lots, among there E. citriodora, E. tereticornis, E. saligna, E. urophylla, E. grandis, Goupia glabra and Bagassa guianenses, with different origins and ages. The results obtained confirm its potential in the classification of specimens, with inclusion errors varying from 4.5% to 16.6%.
Resumo:
The parasitic bacterium Mycoplasma genitalium has a small, reduced genome with close to a basic set of genes. As a first step toward determining the families of protein domains that form the products of these genes, we have used the multiple sequence programs psi-blast and geanfammer to match the sequences of the 467 gene products of M. genitalium to the sequences of the domains that form proteins of known structure [Protein Data Bank (PDB) sequences]. PDB sequences (274) match all of 106 M. genitalium sequences and some parts of another 85; thus, 41% of its total sequences are matched in all or part. The evolutionary relationships of the PDB domains that match M. genitalium are described in the structural classification of proteins (SCOP) database. Using this information, we show that the domains in the matched M. genitalium sequences come from 114 superfamilies and that 58% of them have arisen by gene duplication. This level of duplication is more than twice that found by using pairwise sequence comparisons. The PDB domain matches also describe the domain structure of the matched sequences: just over a quarter contain one domain and the rest have combinations of two or more domains.
Resumo:
A general overview of the protein sequence set for the mouse transcriptome produced during the FANTOM2 sequencing project is presented here. We applied different algorithms to characterize protein sequences derived from a nonredundant representative protein set (RPS) and a variant protein set (VPS) of the mouse transcriptome. The functional characterization and assignment of Gene Ontology terms was done by analysis of the proteome using InterPro. The Superfamily database analyses gave a detailed structural classification according to SCOP and provide additional evidence for the functional characterization of the proteome data. The MDS database analysis revealed new domains which are not presented in existing protein domain databases. Thus the transcriptome gives us a unique source of data for the detection of new functional groups. The data obtained for the RPS and VPS sets facilitated the comparison of different patterns of protein expression. A comparison of other existing mouse and human protein sequence sets (e.g., the International Protein Index) demonstrates the common patterns in mammalian proteornes. The analysis of the membrane organization within the transcriptome of multiple eukaryotes provides valuable statistics about the distribution of secretory and transmembrane proteins
Resumo:
INTRODUCTION: Authors describe human schistosomal granuloma in late chronic phase, from the morphological and evolutionary viewpoints. METHODS: The study was based on a histological analysis of two fragments obtained from a surgical biopsy of peritoneum and large intestine of a 42-year-old patient, with a pseudotumoral form mimicking a peritoneal carcinomatosis associated to the schistosomiasis hepatointestinal form. RESULTS: Two hundred and three granulomas were identified in the pseudotumor and 27 in the intestinal biopsy, with similar morphological features, most in the late chronic phase, in fibrotic healing. A new structural classification was suggested for granulomas: zone 1 (internal), 2 (intermediate) and 3 (external). CONCLUSIONS: Regarding granuloma as a whole, we may conclude that fibrosis is likely to be controlled by different and independent mechanisms in the three zones of the granuloma. Lamellar fibrosis in zone 3 seems to be controlled by matrix mesenchymal cells (fibroblasts and myoepithelial cells) and by inflammatory exudate cells (lymphocytes, plasmocytes, neutrophils, eosinophils). Annular fibrosis in zone 2, comprising a dense fibrous connective tissue, with few cells in the advanced phase, would be controlled by epithelioid cells involving zone 1 in recent granulomas. In zone 1, replacing periovular necrosis, an initialy loose and tracery connective neoformation, housing stellate cells or with fusiform nuclei, a dense paucicellular nodular connctive tissue emerges, probably induced by fibroblasts. In several granulomas, one of the zones is missing and granuloma is represented by two of them: Z3 and Z2, Z3 and Z1 or Z2 and Z1 and, ultimately, by a scar.
Resumo:
Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Latex is the main product extracted from rubber trees (Hevea brasiliensis). In Brazil, at the end of the production cycle of latex, the wood of rubber tree is traditionally used for energy purposes, but several international studies have reported consolidated practices of adding value to it. The objective of this paper was to evaluate the quality of wood and classify it structurally based on its mechanical properties. Six 20-year-old trees of the clone GT 1 of rubber tree proceeding from Itajobi, State of Sao Paulo, Brazil were sampled. Reduced dimensions specimens in the radial direction of the wood were produced to evaluate the quality by compression parallel to the grain, static bending and Janka hardness tests. Two specimens, one from the lower log (since the ground up to breast height) and one from the higher log (from breast height up to 2.50 m) were produced for structural classification of the wood based on the characteristic strength in compression parallel to the grain (NBR 7190 norm, 1997). The wood was classified as C40 (fc0k ≥ 40 MPa) class. Results revealed that the strength was not statistically different in the radial direction (except for the Janka hardness), though tending to increase from pith to bark.
Resumo:
Pós-graduação em Agronomia (Energia na Agricultura) - FCA
Resumo:
Pós-graduação em Psicologia - FCLAS
Resumo:
We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., blast and fasta validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.
Resumo:
Scorpion toxins are common experimental tools for studies of biochemical and pharmacological properties of ion channels. The number of functionally annotated scorpion toxins is steadily growing, but the number of identified toxin sequences is increasing at much faster pace. With an estimated 100,000 different variants, bioinformatic analysis of scorpion toxins is becoming a necessary tool for their systematic functional analysis. Here, we report a bioinformatics-driven system involving scorpion toxin structural classification, functional annotation, database technology, sequence comparison, nearest neighbour analysis, and decision rules which produces highly accurate predictions of scorpion toxin functional properties. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
Kernel methods provide a convenient way to apply a wide range of learning techniques to complex and structured data by shifting the representational problem from one of finding an embedding of the data to that of defining a positive semidefinite kernel. One problem with the most widely used kernels is that they neglect the locational information within the structures, resulting in less discrimination. Correspondence-based kernels, on the other hand, are in general more discriminating, at the cost of sacrificing positive-definiteness due to their inability to guarantee transitivity of the correspondences between multiple graphs. In this paper we generalize a recent structural kernel based on the Jensen-Shannon divergence between quantum walks over the structures by introducing a novel alignment step which rather than permuting the nodes of the structures, aligns the quantum states of their walks. This results in a novel kernel that maintains localization within the structures, but still guarantees positive definiteness. Experimental evaluation validates the effectiveness of the kernel for several structural classification tasks. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.
Resumo:
This work investigates the problem of feature selection in neuroimaging features from structural MRI brain images for the classification of subjects as healthy controls, suffering from Mild Cognitive Impairment or Alzheimer’s Disease. A Genetic Algorithm wrapper method for feature selection is adopted in conjunction with a Support Vector Machine classifier. In very large feature sets, feature selection is found to be redundant as the accuracy is often worsened when compared to an Support Vector Machine with no feature selection. However, when just the hippocampal subfields are used, feature selection shows a significant improvement of the classification accuracy. Three-class Support Vector Machines and two-class Support Vector Machines combined with weighted voting are also compared with the former and found more useful. The highest accuracy achieved at classifying the test data was 65.5% using a genetic algorithm for feature selection with a three-class Support Vector Machine classifier.