930 resultados para classification algorithm
Resumo:
The phylogenetic relationships of members of Eudorylini (Diptera: Pipunculidae: Pipunculinae) were explored. Two hundred and fifty-seven species of Eudorylini from all biogeographical regions and all known genera were examined. Sixty species were included in an exemplar-based phylogeny for the tribe. Two new genera are described, Clistoabdominalis and Dasydorylas. The identity of Eudorylas Aczél, the type genus for Eudorylini, has been obscure since its inception. The genus is re-diagnosed and a proposal to stabilize the genus and tribal names is discussed. An illustrated key to the genera of Pipunculidae is presented and all Eudorylini genera are diagnosed. Numerous new generic synonyms are proposed. Moriparia nigripennis Kozánek & Kwon is preoccupied by Congomyia nigripennis Hardy when both are transferred to Claraeola, so Cla. koreana Skevington is proposed as a new name for Mo. nigripennis.
Resumo:
A new algorithm, PfAGSS, for predicting 3' splice sites in Plasmodium falciparum genomic sequences is described. Application of this program to the published P. falciparum chromosome 2 and 3 data suggests that existing programs result in a high error rate in assigning 3' intron boundaries. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.
Resumo:
The vascular and bryophyte floras of subantarctic Heard Island were classified using cluster analysis into six vegetation communities: Open Cushion Carpet, Mossy Feldmark, Wet Mixed Herbfield, Coastal Biotic Vegetation, Saltspray Vegetation, and Closed Cushion Carpet. Multidimensional scaling indicated that the vegetation communities were not well delineated but were continua. Discriminant analysis and a classification tree identified altitude, wind, peat depth, bryophyte cover and extent of bare ground, and particle size as discriminating variables. The combination of small area, glaciation, and harsh climate has resulted in reduced vegetation variety in comparison to those subantarctic islands north of the Antarctic Polar Front Zone. Some of the functional groups and vegetation communities found on warmer subantarctic islands are not present on Heard Island, notably ferns and sedges and fernbrakes and extensive mires, respectively.
Resumo:
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.
Resumo:
A new algorithm has been developed for smoothing the surfaces in finite element formulations of contact-impact. A key feature of this method is that the smoothing is done implicitly by constructing smooth signed distance functions for the bodies. These functions are then employed for the computation of the gap and other variables needed for implementation of contact-impact. The smoothed signed distance functions are constructed by a moving least-squares approximation with a polynomial basis. Results show that when nodes are placed on a surface, the surface can be reproduced with an error of about one per cent or less with either a quadratic or a linear basis. With a quadratic basis, the method exactly reproduces a circle or a sphere even for coarse meshes. Results are presented for contact problems involving the contact of circular bodies. Copyright (C) 2002 John Wiley Sons, Ltd.
Resumo:
Development of a unified classification system to replace four of the systems currently used in disability athletics (i.e., track and field) has been widely advocated. The definition and purpose of classification, underpinned by taxonomic principles and collectively endorsed by relevant disability sport organizations, have not been developed but are required for successful implementation of a unified system. It is posited that the International classification of functioning. disability, and health (ICF), published by the World Health Organization (2001), and current disability athletics systems are, fundamentally, classifications of the functioning and disability associated with health conditions and are highly interrelated. A rationale for basing a unified disability athletics system on ICF is established. Following taxonomic analysis of the current systems, the definition and purpose of a unified disability athletics classification are proposed and discussed. The proposed taxonomic framework and definitions have implications for other disability sport classification systems.
Resumo:
A detailed analysis procedure is described for evaluating rates of volumetric change in brain structures based on structural magnetic resonance (MR) images. In this procedure, a series of image processing tools have been employed to address the problems encountered in measuring rates of change based on structural MR images. These tools include an algorithm for intensity non-uniforniity correction, a robust algorithm for three-dimensional image registration with sub-voxel precision and an algorithm for brain tissue segmentation. However, a unique feature in the procedure is the use of a fractional volume model that has been developed to provide a quantitative measure for the partial volume effect. With this model, the fractional constituent tissue volumes are evaluated for voxels at the tissue boundary that manifest partial volume effect, thus allowing tissue boundaries be defined at a sub-voxel level and in an automated fashion. Validation studies are presented on key algorithms including segmentation and registration. An overall assessment of the method is provided through the evaluation of the rates of brain atrophy in a group of normal elderly subjects for which the rate of brain atrophy due to normal aging is predictably small. An application of the method is given in Part 11 where the rates of brain atrophy in various brain regions are studied in relation to normal aging and Alzheimer's disease. (C) 2002 Elsevier Science Inc. All rights reserved.
Resumo:
Libraries of cyclic peptides are being synthesized using combinatorial chemistry for high throughput screening in the drug discovery process. This paper describes the min_syn_steps.cpp program (available at http://www.imb.uq.edu.au/groups/smythe/tran), which after inputting a list of cyclic peptides to be synthesized, removes cyclic redundant sequences and calculates synthetic strategies which minimize the synthetic steps as well as the reagent requirements. The synthetic steps and reagent requirements could be minimized by finding common subsets within the sequences for block synthesis. Since a brute-force approach to search for optimum synthetic strategies is impractically large, a subset-orientated approach is utilized here to limit the size of the search. (C) 2002 Elsevier Science Ltd. All rights reserved.