854 resultados para data gathering algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Much progress has been made on inferring population history from molecular data. However, complex demographic scenarios have been considered rarely or have proved intractable. The serial introduction of the South-Central American cane Load Bufo marinas in various Caribbean and Pacific islands involves four major phases: a possible genetic admixture during the first introduction, a bottleneck associated with founding, a transitory, population boom, and finally, a demographic stabilization. A large amount of historical and demographic information is available for those introductions and can be combined profitably with molecular data. We used a Bayesian approach to combine this information With microsatellite (10 loci) and enzyme (22 loci) data and used a rejection algorithm to simultaneously estimate the demographic parameters describing the four major phases of the introduction history,. The general historical trends supported by microsatellites and enzymes were similar. However, there was a stronger support for a larger bottleneck at introductions for microsatellites than enzymes and for a more balanced genetic admixture for enzymes than for microsatellites. Verb, little information was obtained from either marker about the transitory population boom observed after each introduction. Possible explanations for differences in resolution of demographic events and discrepancies between results obtained with microsatellites and enzymes were explored. Limits Of Our model and method for the analysis of nonequilibrium populations were discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright (C) 2003 John Wiley Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For zygosity diagnosis in the absence of genotypic data, or in the recruitment phase of a twin study where only single twins from same-sex pairs are being screened, or to provide a test for sample duplication leading to the false identification of a dizygotic pair as monozygotic, the appropriate analysis of respondents' answers to questions about zygosity is critical. Using data from a young adult Australian twin cohort (N = 2094 complete pairs and 519 singleton twins from same-sex pairs with complete responses to all zygosity items), we show that application of latent class analysis (LCA), fitting a 2-class model, yields results that show good concordance with traditional methods of zygosity diagnosis, but with certain important advantages. These include the ability, in many cases, to assign zygosity with specified probability on the basis of responses of a single informant (advantageous when one zygosity type is being oversampled); and the ability to quantify the probability of misassignment of zygosity, allowing prioritization of cases for genotyping as well as identification of cases of probable laboratory error. Out of 242 twins (from 121 like-sex pairs) where genotypic data were available for zygosity confirmation, only a single case was identified of incorrect zygosity assignment by the latent class algorithm. Zygosity assignment for that single case was identified by the LCA as uncertain (probability of being a monozygotic twin only 76%), and the co-twin's responses clearly identified the pair as dizygotic (probability of being dizygotic 100%). In the absence of genotypic data, or as a safeguard against sample duplication, application of LCA for zygosity assignment or confirmation is strongly recommended.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many organisations need to extract useful information from huge amounts of movement data. One example is found in maritime transportation, where the automated identification of a diverse range of traffic routes is a key management issue for improving the maintenance of ports and ocean routes, and accelerating ship traffic. This paper addresses, in a first stage, the research challenge of developing an approach for the automated identification of traffic routes based on clustering motion vectors rather than reconstructed trajectories. The immediate benefit of the proposed approach is to avoid the reconstruction of trajectories in terms of their geometric shape of the path, their position in space, their life span, and changes of speed, direction and other attributes over time. For clustering the moving objects, an adapted version of the Shared Nearest Neighbour algorithm is used. The motion vectors, with a position and a direction, are analysed in order to identify clusters of vectors that are moving towards the same direction. These clusters represent traffic routes and the preliminary results have shown to be promising for the automated identification of traffic routes with different shapes and densities, as well as for handling noise data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Image segmentation is an ubiquitous task in medical image analysis, which is required to estimate morphological or functional properties of given anatomical targets. While automatic processing is highly desirable, image segmentation remains to date a supervised process in daily clinical practice. Indeed, challenging data often requires user interaction to capture the required level of anatomical detail. To optimize the analysis of 3D images, the user should be able to efficiently interact with the result of any segmentation algorithm to correct any possible disagreement. Building on a previously developed real-time 3D segmentation algorithm, we propose in the present work an extension towards an interactive application where user information can be used online to steer the segmentation result. This enables a synergistic collaboration between the operator and the underlying segmentation algorithm, thus contributing to higher segmentation accuracy, while keeping total analysis time competitive. To this end, we formalize the user interaction paradigm using a geometrical approach, where the user input is mapped to a non-cartesian space while this information is used to drive the boundary towards the position provided by the user. Additionally, we propose a shape regularization term which improves the interaction with the segmented surface, thereby making the interactive segmentation process less cumbersome. The resulting algorithm offers competitive performance both in terms of segmentation accuracy, as well as in terms of total analysis time. This contributes to a more efficient use of the existing segmentation tools in daily clinical practice. Furthermore, it compares favorably to state-of-the-art interactive segmentation software based on a 3D livewire-based algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Precise needle puncture of renal calyces is a challenging and essential step for successful percutaneous nephrolithotomy. This work tests and evaluates, through a clinical trial, a real-time navigation system to plan and guide percutaneous kidney puncture. Methods: A novel system, entitled i3DPuncture, was developed to aid surgeons in establishing the desired puncture site and the best virtual puncture trajectory, by gathering and processing data from a tracked needle with optical passive markers. In order to navigate and superimpose the needle to a preoperative volume, the patient, 3D image data and tracker system were previously registered intraoperatively using seven points that were strategically chosen based on rigid bone structures and nearby kidney area. In addition, relevant anatomical structures for surgical navigation were automatically segmented using a multi-organ segmentation algorithm that clusters volumes based on statistical properties and minimum description length criterion. For each cluster, a rendering transfer function enhanced the visualization of different organs and surrounding tissues. Results: One puncture attempt was sufficient to achieve a successful kidney puncture. The puncture took 265 seconds, and 32 seconds were necessary to plan the puncture trajectory. The virtual puncture path was followed correctively until the needle tip reached the desired kidney calyceal. Conclusions: This new solution provided spatial information regarding the needle inside the body and the possibility to visualize surrounding organs. It may offer a promising and innovative solution for percutaneous punctures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the current frontiers in the clinical management of Pectus Excavatum (PE) patients is the prediction of the surgical outcome prior to the intervention. This can be done through computerized simulation of the Nuss procedure, which requires an anatomically correct representation of the costal cartilage. To this end, we take advantage of the costal cartilage tubular structure to detect it through multi-scale vesselness filtering. This information is then used in an interactive 2D initialization procedure which uses anatomical maximum intensity projections of 3D vesselness feature images to efficiently initialize the 3D segmentation process. We identify the cartilage tissue centerlines in these projected 2D images using a livewire approach. We finally refine the 3D cartilage surface through region-based sparse field level-sets. We have tested the proposed algorithm in 6 noncontrast CT datasets from PE patients. A good segmentation performance was found against reference manual contouring, with an average Dice coefficient of 0.75±0.04 and an average mean surface distance of 1.69±0.30mm. The proposed method requires roughly 1 minute for the interactive initialization step, which can positively contribute to an extended use of this tool in clinical practice, since current manual delineation of the costal cartilage can take up to an hour.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algorithm to search over it. We show that the resulting encoder attains roughly the same compression ratios as those based on suffix trees. However, the amount of memory required by the suffix array is fixed, and much lower than the variable amount of memory used by encoders based on suffix trees (which depends on the text to encode). We conclude that suffix arrays, when compared to suffix trees in terms of the trade-off among time, memory, and compression ratio, may be preferable in scenarios (e.g., embedded systems) where memory is at a premium and high speed is not critical.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper deals with the establishment of a characterization methodology of electric power profiles of medium voltage (MV) consumers. The characterization is supported on the data base knowledge discovery process (KDD). Data Mining techniques are used with the purpose of obtaining typical load profiles of MV customers and specific knowledge of their customers’ consumption habits. In order to form the different customers’ classes and to find a set of representative consumption patterns, a hierarchical clustering algorithm and a clustering ensemble combination approach (WEACS) are used. Taking into account the typical consumption profile of the class to which the customers belong, new tariff options were defined and new energy coefficients prices were proposed. Finally, and with the results obtained, the consequences that these will have in the interaction between customer and electric power suppliers are analyzed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A methodology based on data mining techniques to support the analysis of zonal prices in real transmission networks is proposed in this paper. The mentioned methodology uses clustering algorithms to group the buses in typical classes that include a set of buses with similar LMP values. Two different clustering algorithms have been used to determine the LMP clusters: the two-step and K-means algorithms. In order to evaluate the quality of the partition as well as the best performance algorithm adequacy measurements indices are used. The paper includes a case study using a Locational Marginal Prices (LMP) data base from the California ISO (CAISO) in order to identify zonal prices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

25th Annual Conference of the European Cetacean Society, Cadiz, Spain 21-23 March 2011.