901 resultados para Clustering search algorithm


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new algorithm, PfAGSS, for predicting 3' splice sites in Plasmodium falciparum genomic sequences is described. Application of this program to the published P. falciparum chromosome 2 and 3 data suggests that existing programs result in a high error rate in assigning 3' intron boundaries. (C) 2001 Elsevier Science B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of designing spatially cohesive nature reserve systems that meet biodiversity objectives is formulated as a nonlinear integer programming problem. The multiobjective function minimises a combination of boundary length, area and failed representation of the biological attributes we are trying to conserve. The task is to reserve a subset of sites that best meet this objective. We use data on the distribution of habitats in the Northern Territory, Australia, to show how simulated annealing and a greedy heuristic algorithm can be used to generate good solutions to such large reserve design problems, and to compare the effectiveness of these methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The prevalence of type 2 diabetes among Australian residents is 7.5%; however, prevalence rates up to six times higher have been reported for indigenous Australian communities. Epidemiological evidence implicates genetic factors in the susceptibility of indigenous Australians to type 2 diabetes and supports the hypothesis of the thrifty genotype, but, to date, the nature of the genetic predisposition is unknown. We have ascertained clinical details from a community of indigenous Australian descent in North Stradbroke Island, Queensland. In this population, the phenotype is characterized by severe insulin resistance. We have conducted a genomewide scan, at an average resolution of 10 cM, for type 2 diabetes-susceptibility genes in a large multigeneration pedigree from this community. Parametric linkage analysis undertaken using FASTLINK version 4.1p yielded a maximum two-point LOD score of +2.97 at marker D2S2345. Multipoint analysis yielded a peak LOD score of +3.9

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Understanding the ecological role of benthic microalgae, a highly productive component of coral reef ecosystems, requires information on their spatial distribution. The spatial extent of benthic microalgae on Heron Reef (southern Great Barrier Reef, Australia) was mapped using data from the Landsat 5 Thematic Mapper sensor. integrated with field measurements of sediment chlorophyll concentration and reflectance. Field-measured sediment chlorophyll concentrations. 2 ranging from 23-1.153 mg chl a m(2), were classified into low, medium, and high concentration classes (1-170, 171-290, and > 291 mg chl a m(-2)) using a K-means clustering algorithm. The mapping process assumed that areas in the Thematic Mapper image exhibiting similar reflectance levels in red and blue bands would correspond to areas of similar chlorophyll a levels. Regions of homogenous reflectance values corresponding to low, medium, and high chlorophyll levels were identified over the reef sediment zone by applying a standard image classification algorithm to the Thematic Mapper image. The resulting distribution map revealed large-scale ( > 1 km 2) patterns in chlorophyll a levels throughout the sediment zone of Heron Reef. Reef-wide estimates of chlorophyll a distribution indicate that benthic Microalgae may constitute up to 20% of the total benthic chlorophyll a at Heron Reef. and thus contribute significantly to total primary productivity on the reef.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new algorithm has been developed for smoothing the surfaces in finite element formulations of contact-impact. A key feature of this method is that the smoothing is done implicitly by constructing smooth signed distance functions for the bodies. These functions are then employed for the computation of the gap and other variables needed for implementation of contact-impact. The smoothed signed distance functions are constructed by a moving least-squares approximation with a polynomial basis. Results show that when nodes are placed on a surface, the surface can be reproduced with an error of about one per cent or less with either a quadratic or a linear basis. With a quadratic basis, the method exactly reproduces a circle or a sphere even for coarse meshes. Results are presented for contact problems involving the contact of circular bodies. Copyright (C) 2002 John Wiley Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Using benthic habitat data from the Florida Keys (USA), we demonstrate how siting algorithms can help identify potential networks of marine reserves that comprehensively represent target habitat types. We applied a flexible optimization tool-simulated annealing-to represent a fixed proportion of different marine habitat types within a geographic area. We investigated the relative influence of spatial information, planning-unit size, detail of habitat classification, and magnitude of the overall conservation goal on the resulting network scenarios. With this method, we were able to identify many adequate reserve systems that met the conservation goals, e.g., representing at least 20% of each conservation target (i.e., habitat type) while fulfilling the overall aim of minimizing the system area and perimeter. One of the most useful types of information provided by this siting algorithm comes from an irreplaceability analysis, which is a count of the number of, times unique planning units were included in reserve system scenarios. This analysis indicated that many different combinations of sites produced networks that met the conservation goals. While individual 1-km(2) areas were fairly interchangeable, the irreplaceability analysis highlighted larger areas within the planning region that were chosen consistently to meet the goals incorporated into the algorithm. Additionally, we found that reserve systems designed with a high degree of spatial clustering tended to have considerably less perimeter and larger overall areas in reserve-a configuration that may be preferable particularly for sociopolitical reasons. This exercise illustrates the value of using the simulated annealing algorithm to help site marine reserves: the approach makes efficient use of;available resources, can be used interactively by conservation decision makers, and offers biologically suitable alternative networks from which an effective system of marine reserves can be crafted.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Lanczos algorithm is appreciated in many situations due to its speed. and economy of storage. However, the advantage that the Lanczos basis vectors need not be kept is lost when the algorithm is used to compute the action of a matrix function on a vector. Either the basis vectors need to be kept, or the Lanczos process needs to be applied twice. In this study we describe an augmented Lanczos algorithm to compute a dot product relative to a function of a large sparse symmetric matrix, without keeping the basis vectors.