114 resultados para Computational routines
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.
Resumo:
This paper presents a method of evaluating the expected value of a path integral for a general Markov chain on a countable state space. We illustrate the method with reference to several models, including birth-death processes and the birth, death and catastrophe process. (C) 2002 Elsevier Science Inc. All rights reserved.
Resumo:
A finite-element method is used to study the elastic properties of random three-dimensional porous materials with highly interconnected pores. We show that Young's modulus, E, is practically independent of Poisson's ratio of the solid phase, nu(s), over the entire solid fraction range, and Poisson's ratio, nu, becomes independent of nu(s) as the percolation threshold is approached. We represent this behaviour of nu in a flow diagram. This interesting but approximate behaviour is very similar to the exactly known behaviour in two-dimensional porous materials. In addition, the behaviour of nu versus nu(s) appears to imply that information in the dilute porosity limit can affect behaviour in the percolation threshold limit. We summarize the finite-element results in terms of simple structure-property relations, instead of tables of data, to make it easier to apply the computational results. Without using accurate numerical computations, one is limited to various effective medium theories and rigorous approximations like bounds and expansions. The accuracy of these equations is unknown for general porous media. To verify a particular theory it is important to check that it predicts both isotropic elastic moduli, i.e. prediction of Young's modulus alone is necessary but not sufficient. The subtleties of Poisson's ratio behaviour actually provide a very effective method for showing differences between the theories and demonstrating their ranges of validity. We find that for moderate- to high-porosity materials, none of the analytical theories is accurate and, at present, numerical techniques must be relied upon.
Resumo:
Observations of an insect's movement lead to theory on the insect's flight behaviour and the role of movement in the species' population dynamics. This theory leads to predictions of the way the population changes in time under different conditions. If a hypothesis on movement predicts a specific change in the population, then the hypothesis can be tested against observations of population change. Routine pest monitoring of agricultural crops provides a convenient source of data for studying movement into a region and among fields within a region. Examples of the use of statistical and computational methods for testing hypotheses with such data are presented. The types of questions that can be addressed with these methods and the limitations of pest monitoring data when used for this purpose are discussed. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
C,C-Dicyanoketenimines 10a-c were generated by flash vacuum thermolysis of ketene NS-acetals 9a-c or by thermal or photochemical decomposition of alpha-azido-,beta-cyanocinnamonitrile 11. In the latter reaction, 3,3-dicyano-2-phenyl-1-azirine 12 is also formed. IR spectroscopy of the keteniminines isolated in Ar matrixes or as neat films, NMR spectroscopy of 10c, and theoretical calculations (B3LYP/6-31G*) demonstrate that these ketenimines have variable geometry, being essentially linear along the CCN-R framework in polar media (neat films and solution), but in the gas phase or Ar matrix they are bent, as is usual for ketenimines. Experiments and calculations agree that a single CN substituent as in 13 is not enough to enforce linearity, and sulfonyl groups are less effective that cyano groups in causing linearity. C,C-Bis(methylsulfonyl)ketenimines 4-5 and a C-cyano-C-(methylsulfonyl)ketenimine 15 are not linear. The compound p-O2NC6H4N=C= C(COOMe)2 previously reported in the literature is probably somewhat linearized along the CCNR moiety. A computational survey (B3LYP/6-31G*) of the inversion barrier at nitrogen indicates that electronegative C-substituents dramatically lower the barrier; this is also true of N-acyl substituents. Increasing polarity causes lower barriers. Although N-alkylbis(methylsulfonyl)ketenimines are not calculated to be linear, the barriers are so low that crystal lattice forces can induce planarity in N-methylbis(methylsulfonyl)ketenimine 3.
Resumo:
An efficient representation method for arbitrarily shaped image segments is proposed. This method includes a smart way to select wavelet basis to approximate the given image segment, with improved image quality and reduced computational load.
Resumo:
The Las Canadas caldera is a nested collapse caldera formed by the successive migration and collapse of shallow magmatic chambers. Among the pyroclastic products of this caldera are phonolitic fallout deposits that crop out in the caldera wall and on the extracaldera slopes. These deposits exhibit an uninterrupted facies gradation from nonwelded to lava-like and record continuous volcanic deposition. Densely welded and lava-like facies result from the extreme attenuation and complete homogenization of juvenile clasts that destroy original clast outlines and any evidence of fallout deposition. Agglutination contributes significantly to the final degree of flattening observed in the welded facies. After deposition, rheomorphic flowage occurs. Emplacement temperatures for one of the welding sequences are calculated from magmatic temperatures and a model of tephra cooling during fallout. Results are 486 degreesC for the nonwelded facies and 740 degreesC for the moderately welded facies. For the same welding sequence, a cooling time between 25 and 54 days is estimated from published experimental and computational data as the possible duration of welding and rheomorphism. Following deposition and agglutination, the lava-like pyroclastic facies had the rheological properties of viscous lavas and flowed down the outer slopes away from the caldera. Some lava-like masses detached from proximal areas to more distal regions. During deposition, the eruptive style evolved from Plinian fallout to fountain-fed spatter deposition. This evolution was accompanied by a decrease in explosive power and a lower height of the eruptive column, which produce higher emplacement temperatures and more effective heat retention of pyroclasts.