19 resultados para Classification, Decimal
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Epidendrum L. is the largest genus of Orchidaceae in the Neotropical region; it has an impressive morphological diversification, which imposes difficulties in delimitation of both infrageneric and interspecific boundaries. In this study, we review infrageneric boundaries within the subgenus Amphiglottium and try to contribute to the understanding of morphological diversification and taxa delimitation within this group. We tested the monophyly of the subgenus Amphiglottium sect. Amphiglottium, expanding previous phylogenetic investigations and reevaluated previous infrageneric classifications proposed. Sequence data from the trnL-trnF region were analyzed with both parsimony and maximum likelihood criteria. AFLP markers were also obtained and analyzed with phylogenetic and principal coordinate analyses. Additionally, we obtained chromosome numbers for representative species within the group. The results strengthen the monophyly of the subgenus Amphiglottium but do not support the current classification system proposed by previous authors. Only section Tuberculata comprises a well-supported monophyletic group, with sections Carinata and Integra not supported. Instead of morphology, biogeographical and ecological patterns are reflected in the phylogenetic signal in this group. This study also confirms the large variability of chromosome numbers for the subgenus Amphiglottium (numbers ranging from 2n = 24 to 2n = 240), suggesting that polyploidy and hybridization are probably important mechanisms of speciation within the group.
Resumo:
Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.
Resumo:
This work proposes and discusses an approach for inducing Bayesian classifiers aimed at balancing the tradeoff between the precise probability estimates produced by time consuming unrestricted Bayesian networks and the computational efficiency of Naive Bayes (NB) classifiers. The proposed approach is based on the fundamental principles of the Heuristic Search Bayesian network learning. The Markov Blanket concept, as well as a proposed ""approximate Markov Blanket"" are used to reduce the number of nodes that form the Bayesian network to be induced from data. Consequently, the usually high computational cost of the heuristic search learning algorithms can be lessened, while Bayesian network structures better than NB can be achieved. The resulting algorithms, called DMBC (Dynamic Markov Blanket Classifier) and A-DMBC (Approximate DMBC), are empirically assessed in twelve domains that illustrate scenarios of particular interest. The obtained results are compared with NB and Tree Augmented Network (TAN) classifiers, and confinn that both proposed algorithms can provide good classification accuracies and better probability estimates than NB and TAN, while being more computationally efficient than the widely used K2 Algorithm.
Resumo:
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.
Resumo:
Credit scoring modelling comprises one of the leading formal tools for supporting the granting of credit. Its core objective consists of the generation of a score by means of which potential clients can be listed in the order of the probability of default. A critical factor is whether a credit scoring model is accurate enough in order to provide correct classification of the client as a good or bad payer. In this context the concept of bootstraping aggregating (bagging) arises. The basic idea is to generate multiple classifiers by obtaining the predicted values from the fitted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy. In this paper we propose a new bagging-type variant procedure, which we call poly-bagging, consisting of combining predictors over a succession of resamplings. The study is derived by credit scoring modelling. The proposed poly-bagging procedure was applied to some different artificial datasets and to a real granting of credit dataset up to three successions of resamplings. We observed better classification accuracy for the two-bagged and the three-bagged models for all considered setups. These results lead to a strong indication that the poly-bagging approach may promote improvement on the modelling performance measures, while keeping a flexible and straightforward bagging-type structure easy to implement. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Extending our previous work `Fields on the Poincare group and quantum description of orientable objects` (Gitman and Shelepin 2009 Eur. Phys. J. C 61 111-39), we consider here a classification of orientable relativistic quantum objects in 3 + 1 dimensions. In such a classification, one uses a maximal set of ten commuting operators (generators of left and right transformations) in the space of functions on the Poincare group. In addition to the usual six quantum numbers related to external symmetries (given by left generators), there appear additional quantum numbers related to internal symmetries (given by right generators). Spectra of internal and external symmetry operators are interrelated, which, however, does not contradict the Coleman-Mandula no-go theorem. We believe that the proposed approach can be useful for the description of elementary spinning particles considered as orientable objects. In particular, it gives a group-theoretical interpretation of some facts of the existing phenomenological classification of spinning particles.
Resumo:
In this paper, we present a study on a deterministic partially self-avoiding walk (tourist walk), which provides a novel method for texture feature extraction. The method is able to explore an image on all scales simultaneously. Experiments were conducted using different dynamics concerning the tourist walk. A new strategy, based on histograms. to extract information from its joint probability distribution is presented. The promising results are discussed and compared to the best-known methods for texture description reported in the literature. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Shape provides one of the most relevant information about an object. This makes shape one of the most important visual attributes used to characterize objects. This paper introduces a novel approach for shape characterization, which combines modeling shape into a complex network and the analysis of its complexity in a dynamic evolution context. Descriptors computed through this approach show to be efficient in shape characterization, incorporating many characteristics, such as scale and rotation invariant. Experiments using two different shape databases (an artificial shapes database and a leaf shape database) are presented in order to evaluate the method. and its results are compared to traditional shape analysis methods found in literature. (C) 2009 Published by Elsevier B.V.
Resumo:
Differently from theoretical scale-free networks, most real networks present multi-scale behavior, with nodes structured in different types of functional groups and communities. While the majority of approaches for classification of nodes in a complex network has relied on local measurements of the topology/connectivity around each node, valuable information about node functionality can be obtained by concentric (or hierarchical) measurements. This paper extends previous methodologies based on concentric measurements, by studying the possibility of using agglomerative clustering methods, in order to obtain a set of functional groups of nodes, considering particular institutional collaboration network nodes, including various known communities (departments of the University of Sao Paulo). Among the interesting obtained findings, we emphasize the scale-free nature of the network obtained, as well as identification of different patterns of authorship emerging from different areas (e.g. human and exact sciences). Another interesting result concerns the relatively uniform distribution of hubs along concentric levels, contrariwise to the non-uniform pattern found in theoretical scale-free networks such as the BA model. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
The latest version of CATH (class, architecture, topology, homology) (version 3.2), released in July 2008 (http://www.cathdb.info), contains 1 14215 domains, 2178 Homologous superfamilies and 1110 fold groups. We have assigned 20 330 new domains, 87 new homologous superfamilies and 26 new folds since CATH release version 3.1. A total of 28 064 new domains have been assigned since our NAR 2007 database publication (CATH version 3.0). The CATH website has been completely redesigned and includes more comprehensive documentation. We have revisited the CATH architecture level as part of the development of a `Protein Chart` and present information on the population of each architecture. The CATHEDRAL structure comparison algorithm has been improved and used to characterize structural diversity in CATH superfamilies and structural overlaps between superfamilies. Although the majority of superfamilies in CATH are not structurally diverse and do not overlap significantly with other superfamilies, similar to 4% of superfamilies are very diverse and these are the superfamilies that are most highly populated in both the PDB and in the genomes. Information on the degree of structural diversity in each superfamily and structural overlaps between superfamilies can now be downloaded from the CATH website.
Resumo:
Bothropasin is a 48 kDa hemorrhagic PIII snake venom metalloprotease (SVMP) isolated from Bothrops jararaca, containing disintegrin/cysteine-rich adhesive domains. Here we present the crystal structure of bothropasin complexed with the inhibitor POL647. The catalytic domain consists of a scaffold of two subdomains organized similarly to those described for other SVMPs, including the zinc and calcium-binding sites. The free cysteine residue Cys(189) is located within a hydrophobic core and it is not available for disulfide bonding or other interactions. There is no identifiable secondary structure for the disintegrin domain, but instead it is composed mostly of loops stabilized by seven disulfide bonds and by two calcium ions. The ECD region is in a loop and is structurally related to the RGD region of RGD disintegrins, which are derived from I`ll SVMPs. The ECD motif is stabilized by the Cys(117)_Cys(310) disulfide bond (between the disintegrin and cysteine-rich domains) and by one calcium ion. The side chain of Glu(276) of the ECD motif is exposed to solvent and free to make interactions. In bothropasin, the HVR (hyper-variable region) described for other Pill SVMPs in the cysteine-rich domain, presents a well-conserved sequence with respect to several other Pill members from different species. We propose that this subset be referred to as PIII-HCR (highly conserved region) SVMPs. The differences in the disintegrin-like, cysteine-rich or disintegrin-like cysteine-rich domains may be involved in selecting target binding, which in turn could generate substrate diversity or specificity for the catalytic domain. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
In this paper we present a novel approach for multispectral image contextual classification by combining iterative combinatorial optimization algorithms. The pixel-wise decision rule is defined using a Bayesian approach to combine two MRF models: a Gaussian Markov Random Field (GMRF) for the observations (likelihood) and a Potts model for the a priori knowledge, to regularize the solution in the presence of noisy data. Hence, the classification problem is stated according to a Maximum a Posteriori (MAP) framework. In order to approximate the MAP solution we apply several combinatorial optimization methods using multiple simultaneous initializations, making the solution less sensitive to the initial conditions and reducing both computational cost and time in comparison to Simulated Annealing, often unfeasible in many real image processing applications. Markov Random Field model parameters are estimated by Maximum Pseudo-Likelihood (MPL) approach, avoiding manual adjustments in the choice of the regularization parameters. Asymptotic evaluations assess the accuracy of the proposed parameter estimation procedure. To test and evaluate the proposed classification method, we adopt metrics for quantitative performance assessment (Cohen`s Kappa coefficient), allowing a robust and accurate statistical analysis. The obtained results clearly show that combining sub-optimal contextual algorithms significantly improves the classification performance, indicating the effectiveness of the proposed methodology. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Let n >= 3. We classify the finite groups which are realised as subgroups of the sphere braid group B(n)(S(2)). Such groups must be of cohomological period 2 or 4. Depending on the value of n, we show that the following are the maximal finite subgroups of B(n)(S(2)): Z(2(n-1)); the dicyclic groups of order 4n and 4(n - 2); the binary tetrahedral group T*; the binary octahedral group O*; and the binary icosahedral group I(*). We give geometric as well as some explicit algebraic constructions of these groups in B(n)(S(2)) and determine the number of conjugacy classes of such finite subgroups. We also reprove Murasugi`s classification of the torsion elements of B(n)(S(2)) and explain how the finite subgroups of B(n)(S(2)) are related to this classification, as well as to the lower central and derived series of B(n)(S(2)).
Resumo:
We classify all unital subalgebras of the Cayley algebra O(q) over the finite field F(q), q = p(n). We obtain the number of subalgebras of each type and prove that all isomorphic subalgebras are conjugate with respect to the automorphism group of O(q). We also determine the structure of the Moufang loops associated with each subalgebra of O(q).
Resumo:
We classify the ( finite and infinite) virtually cyclic subgroups of the pure braid groups P(n)(RP(2)) of the projective plane. The maximal finite subgroups of P(n)(RP(2)) are isomorphic to the quaternion group of order 8 if n = 3, and to Z(4) if n >= 4. Further, for all n >= 3, the following groups are, up to isomorphism, the infinite virtually cyclic subgroups of P(n)(RP(2)): Z, Z(2) x Z and the amalgamated product Z(4)*(Z2)Z(4).