35 resultados para Feature Classification
Resumo:
Feature selection is one of important and frequently used techniques in data preprocessing. It can improve the efficiency and the effectiveness of data mining by reducing the dimensions of feature space and removing the irrelevant and redundant information. Feature selection can be viewed as a global optimization problem of finding a minimum set of M relevant features that describes the dataset as well as the original N attributes. In this paper, we apply the adaptive partitioned random search strategy into our feature selection algorithm. Under this search strategy, the partition structure and evaluation function is proposed for feature selection problem. This algorithm ensures the global optimal solution in theory and avoids complete randomness in search direction. The good property of our algorithm is shown through the theoretical analysis.
Resumo:
The phylogenetic relationships of members of Eudorylini (Diptera: Pipunculidae: Pipunculinae) were explored. Two hundred and fifty-seven species of Eudorylini from all biogeographical regions and all known genera were examined. Sixty species were included in an exemplar-based phylogeny for the tribe. Two new genera are described, Clistoabdominalis and Dasydorylas. The identity of Eudorylas Aczél, the type genus for Eudorylini, has been obscure since its inception. The genus is re-diagnosed and a proposal to stabilize the genus and tribal names is discussed. An illustrated key to the genera of Pipunculidae is presented and all Eudorylini genera are diagnosed. Numerous new generic synonyms are proposed. Moriparia nigripennis Kozánek & Kwon is preoccupied by Congomyia nigripennis Hardy when both are transferred to Claraeola, so Cla. koreana Skevington is proposed as a new name for Mo. nigripennis.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
The vascular and bryophyte floras of subantarctic Heard Island were classified using cluster analysis into six vegetation communities: Open Cushion Carpet, Mossy Feldmark, Wet Mixed Herbfield, Coastal Biotic Vegetation, Saltspray Vegetation, and Closed Cushion Carpet. Multidimensional scaling indicated that the vegetation communities were not well delineated but were continua. Discriminant analysis and a classification tree identified altitude, wind, peat depth, bryophyte cover and extent of bare ground, and particle size as discriminating variables. The combination of small area, glaciation, and harsh climate has resulted in reduced vegetation variety in comparison to those subantarctic islands north of the Antarctic Polar Front Zone. Some of the functional groups and vegetation communities found on warmer subantarctic islands are not present on Heard Island, notably ferns and sedges and fernbrakes and extensive mires, respectively.
Resumo:
Development of a unified classification system to replace four of the systems currently used in disability athletics (i.e., track and field) has been widely advocated. The definition and purpose of classification, underpinned by taxonomic principles and collectively endorsed by relevant disability sport organizations, have not been developed but are required for successful implementation of a unified system. It is posited that the International classification of functioning. disability, and health (ICF), published by the World Health Organization (2001), and current disability athletics systems are, fundamentally, classifications of the functioning and disability associated with health conditions and are highly interrelated. A rationale for basing a unified disability athletics system on ICF is established. Following taxonomic analysis of the current systems, the definition and purpose of a unified disability athletics classification are proposed and discussed. The proposed taxonomic framework and definitions have implications for other disability sport classification systems.