24 resultados para APPROXIMATIONS


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the past, the accuracy of facial approximations has been assessed by resemblance ratings (i.e., the comparison of a facial approximation directly to a target individual) and recognition tests (e.g., the comparison of a facial approximation to a photo array of faces including foils and a target individual). Recently, several research studies have indicated that recognition tests hold major strengths in contrast to resemblance ratings. However, resemblance ratings remain popularly employed and/or are given weighting when judging facial approximations, thus indicating that no consensus has been reached. This study aims to further investigate the matter by comparing the results of resemblance ratings and recognition tests for two facial approximations which clearly differed in their morphological appearance. One facial approximation was constructed by an experienced practitioner privy to the appearance of the target individual (practitioner had direct access to an antemortem frontal photograph during face construction), while the other facial approximation was constructed by a novice under blind conditions. Both facial approximations, whilst clearly morphologically different, were given similar resemblance scores even though recognition test results produced vastly different results. One facial approximation was correctly recognized almost without exception while the other was not correctly recognized above chance rates. These results suggest that resemblance ratings are insensitive measures of the accuracy of facial approximations and lend further weight to the use of recognition tests in facial approximation assessment. (c) 2006 Elsevier Ireland Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, delta and epsilon, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a metachain to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely. [Bayesian phylogenetic inference; heating parameter; Markov chain Monte Carlo; replicated chains.]

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper has three primary aims: to establish an effective means for modelling mainland-island metapopulations inhabiting a dynamic landscape: to investigate the effect of immigration and dynamic changes in habitat on metapopulation patch occupancy dynamics; and to illustrate the implications of our results for decision-making and population management. We first extend the mainland-island metapopulation model of Alonso and McKane [Bull. Math. Biol. 64:913-958,2002] to incorporate a dynamic landscape. It is shown, for both the static and the dynamic landscape models, that a suitably scaled version of the process converges to a unique deterministic model as the size of the system becomes large. We also establish that. under quite general conditions, the density of occupied patches, and the densities of suitable and occupied patches, for the respective models, have approximate normal distributions. Our results not only provide us with estimates for the means and variances that are valid at all stages in the evolution of the population, but also provide a tool for fitting the models to real metapopulations. We discuss the effect of immigration and habitat dynamics on metapopulations, showing that mainland-like patches heavily influence metapopulation persistence, and we argue for adopting measures to increase connectivity between this large patch and the other island-like patches. We illustrate our results with specific reference to examples of populations of butterfly and the grasshopper Bryodema tuberculata.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In early generation variety trials, large numbers of new breeders' lines (varieties) may be compared, with each having little seed available. A so-called unreplicated trial has each new variety on just one plot at a site, but includes several replicated control varieties, making up around 10% and 20% of the trial. The aim of the trial is to choose some (usually around one third) good performing new varieties to go on for further testing, rather than precise estimation of their mean yields. Now that spatial analyses of data from field experiments are becoming more common, there is interest in an efficient layout of an experiment given a proposed spatial analysis and an efficiency criterion. Common optimal design criteria values depend on the usual C-matrix, which is very large, and hence it is time consuming to calculate its inverse. Since most varieties are unreplicated, the variety incidence matrix has a simple form, and some matrix manipulations can dramatically reduce the computation needed. However, there are many designs to compare, and numerical optimisation lacks insight into good design features. Some possible design criteria are discussed, and approximations to their values considered. These allow the features of efficient layouts under spatial dependence to be given and compared. (c) 2006 Elsevier Inc. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we examine the problem of fitting a hypersphere to a set of noisy measurements of points on its surface. Our work generalises an estimator of Delogne (Proc. IMEKO-Symp. Microwave Measurements 1972,117-123) which he proposed for circles and which has been shown by Kasa (IEEE Trans. Instrum. Meas. 25, 1976, 8-14) to be convenient for its ease of analysis and computation. We also generalise Chan's 'circular functional relationship' to describe the distribution of points. We derive the Cramer-Rao lower bound (CRLB) under this model and we derive approximations for the mean and variance for fixed sample sizes when the noise variance is small. We perform a statistical analysis of the estimate of the hypersphere's centre. We examine the existence of the mean and variance of the estimator for fixed sample sizes. We find that the mean exists when the number of sample points is greater than M + 1, where M is the dimension of the hypersphere. The variance exists when the number of sample points is greater than M + 2. We find that the bias approaches zero as the noise variance diminishes and that the variance approaches the CRLB. We provide simulation results to support our findings.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Biologists are increasingly conscious of the critical role that noise plays in cellular functions such as genetic regulation, often in connection with fluctuations in small numbers of key regulatory molecules. This has inspired the development of models that capture this fundamentally discrete and stochastic nature of cellular biology - most notably the Gillespie stochastic simulation algorithm (SSA). The SSA simulates a temporally homogeneous, discrete-state, continuous-time Markov process, and of course the corresponding probabilities and numbers of each molecular species must all remain positive. While accurately serving this purpose, the SSA can be computationally inefficient due to very small time stepping so faster approximations such as the Poisson and Binomial τ-leap methods have been suggested. This work places these leap methods in the context of numerical methods for the solution of stochastic differential equations (SDEs) driven by Poisson noise. This allows analogues of Euler-Maruyuma, Milstein and even higher order methods to be developed through the Itô-Taylor expansions as well as similar derivative-free Runge-Kutta approaches. Numerical results demonstrate that these novel methods compare favourably with existing techniques for simulating biochemical reactions by more accurately capturing crucial properties such as the mean and variance than existing methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper is concerned with evaluating the performance of loss networks. Accurate determination of loss network performance can assist in the design and dimensioning of telecommunications networks. However, exact determination can be difficult and generally cannot be done in reasonable time. For these reasons there is much interest in developing fast and accurate approximations. We develop a reduced load approximation which improves on the famous Erlang fixed point approximation (EFPA) in a variety of circumstances. We illustrate our results with reference to a range of networks for which the EFPA may be expected to perform badly.