933 resultados para Clustering analysis


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Choosing a single similarity threshold for cutting dendrograms is not sufficient for performing hierarchical clustering analysis of heterogeneous data sets. In addition, alternative automated or semi-automated methods that cut dendrograms in multiple levels make assumptions about the data in hand. In an attempt to help the user to find patterns in the data and resolve ambiguities in cluster assignments, we developed MLCut: a tool that provides visual support for exploring dendrograms of heterogeneous data sets in different levels of detail. The interactive exploration of the dendrogram is coordinated with a representation of the original data, shown as parallel coordinates. The tool supports three analysis steps. Firstly, a single-height similarity threshold can be applied using a dynamic slider to identify the main clusters. Secondly, a distinctiveness threshold can be applied using a second dynamic slider to identify “weak-edges” that indicate heterogeneity within clusters. Thirdly, the user can drill-down to further explore the dendrogram structure - always in relation to the original data - and cut the branches of the tree at multiple levels. Interactive drill-down is supported using mouse events such as hovering, pointing and clicking on elements of the dendrogram. Two prototypes of this tool have been developed in collaboration with a group of biologists for analysing their own data sets. We found that enabling the users to cut the tree at multiple levels, while viewing the effect in the original data, is a promising method for clustering which could lead to scientific discoveries.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Satellites have great potential for diagnosis of surface air quality conditions, though reduced sensitivity of satellite instrumentation to the lower troposphere currently impedes their applicability. One objective of the NASA DISCOVER-AQ project is to provide information relevant to improving our ability to relate satellite-observed columns to surface conditions for key trace gases and aerosols. In support of DISCOVER-AQ, this dissertation investigates the degree of correlation between O3 and NO2 column abundance and surface mixing ratio during the four DISCOVER-AQ deployments; characterize the variability of the aircraft in situ and model-simulated O3 and NO2 profiles; and use the WRF-Chem model to further investigate the role of boundary layer mixing in the column-surface connection for the Maryland 2011 deployment, and determine which of the available boundary layer schemes best captures the observations. Simple linear regression analyses suggest that O3 partial column observations from future satellite instruments with sufficient sensitivity to the lower troposphere may be most meaningful for surface air quality under the conditions associated with the Maryland 2011 campaign, which included generally deep, convective boundary layers, the least wind shear of all four deployments, and few geographical influences on local meteorology, with exception of bay breezes. Hierarchical clustering analysis of the in situ O3 and NO2 profiles indicate that the degree of vertical mixing (defined by temperature lapse rate) associated with each cluster exerted an important influence on the shapes of the median cluster profiles for O3, as well as impacted the column vs. surface correlations for many clusters for both O3 and NO2. However, comparisons to the CMAQ model suggest that, among other errors, vertical mixing is overestimated, causing too great a column-surface connection within the model. Finally, the WRF-Chem model, a meteorology model with coupled chemistry, is used to further investigate the impact of vertical mixing on the O3 and NO2 column-surface connection, for an ozone pollution event that occurred on July 26-29, 2011. Five PBL schemes were tested, with no one scheme producing a clear, consistent “best” comparison with the observations for PBLH and pollutant profiles; however, despite improvements, the ACM2 scheme continues to overestimate vertical mixing.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: The microflora hypothesis may be the underlying explanation for the growth of inflammatory disease. In addition to many known affecting factors, knowing the gut microbiota of healthy newborns can help to understand the gut immunity and modulate it. Objectives: This study examined the microbiota of healthy newborns from urban regions. Patients and Methods: We enrolled 128 full-term newborns, born at Seoul St. Mary and St. Paul hospital from January 2009 to February 2010. All 143 samples of feces were cultivated in six culture plates to determine the amounts of total bacteria, anaerobes, gram-positive bacteria, coliforms, lactobacilli, and bifidobacteria. The samples were evaluated with a bivariate correlation between coliforms and lactobacilli. Terminal restriction fragment length polymorphism (T-RFLP) analysis with HhaI and MspI and a clustering analysis were performed for determination of diversity. Results: Bacteria were cultured in 61.5% of feces in the following order: anaerobes, gram-positive bacteria, lactobacilli, coliform, and bifidobacteria. The growth of total bacteria and lactobacilli increased in feces defecated after 24 hours of birth (P < 0.001, P = 0.008) and anaerobes decreased (P = 0.003). A negative correlation between the growth of lactobacilli and coliforms was found (r = -463, P < 0.001). Conclusions: This study confirms that bacterial colonization of healthy newborns born in cities is non-sterile, but has early diversification and inter-individuality.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The main objective of this study is to apply recently developed methods of physical-statistic to time series analysis, particularly in electrical induction s profiles of oil wells data, to study the petrophysical similarity of those wells in a spatial distribution. For this, we used the DFA method in order to know if we can or not use this technique to characterize spatially the fields. After obtain the DFA values for all wells, we applied clustering analysis. To do these tests we used the non-hierarchical method called K-means. Usually based on the Euclidean distance, the K-means consists in dividing the elements of a data matrix N in k groups, so that the similarities among elements belonging to different groups are the smallest possible. In order to test if a dataset generated by the K-means method or randomly generated datasets form spatial patterns, we created the parameter Ω (index of neighborhood). High values of Ω reveals more aggregated data and low values of Ω show scattered data or data without spatial correlation. Thus we concluded that data from the DFA of 54 wells are grouped and can be used to characterize spatial fields. Applying contour level technique we confirm the results obtained by the K-means, confirming that DFA is effective to perform spatial analysis

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Migraine is a painful disorder for which the etiology remains obscure. Diagnosis is largely based on International Headache Society criteria. However, no feature occurs in all patients who meet these criteria, and no single symptom is required for diagnosis. Consequently, this definition may not accurately reflect the phenotypic heterogeneity or genetic basis of the disorder. Such phenotypic uncertainty is typical for complex genetic disorders and has encouraged interest in multivariate statistical methods for classifying disease phenotypes. We applied three popular statistical phenotyping methods—latent class analysis, grade of membership and grade of membership “fuzzy” clustering (Fanny)—to migraine symptom data, and compared heritability and genome-wide linkage results obtained using each approach. Our results demonstrate that different methodologies produce different clustering structures and non-negligible differences in subsequent analyses. We therefore urge caution in the use of any single approach and suggest that multiple phenotyping methods be used.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We introduce a framework for population analysis of white matter tracts based on diffusion-weighted images of the brain. The framework enables extraction of fibers from high angular resolution diffusion images (HARDI); clustering of the fibers based partly on prior knowledge from an atlas; representation of the fiber bundles compactly using a path following points of highest density (maximum density path; MDP); and registration of these paths together using geodesic curve matching to find local correspondences across a population. We demonstrate our method on 4-Tesla HARDI scans from 565 young adults to compute localized statistics across 50 white matter tracts based on fractional anisotropy (FA). Experimental results show increased sensitivity in the determination of genetic influences on principal fiber tracts compared to the tract-based spatial statistics (TBSS) method. Our results show that the MDP representation reveals important parts of the white matter structure and considerably reduces the dimensionality over comparable fiber matching approaches.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new clustering technique, based on the concept of immediato neighbourhood, with a novel capability to self-learn the number of clusters expected in the unsupervized environment, has been developed. The method compares favourably with other clustering schemes based on distance measures, both in terms of conceptual innovations and computational economy. Test implementation of the scheme using C-l flight line training sample data in a simulated unsupervized mode has brought out the efficacy of the technique. The technique can easily be implemented as a front end to established pattern classification systems with supervized learning capabilities to derive unified learning systems capable of operating in both supervized and unsupervized environments. This makes the technique an attractive proposition in the context of remotely sensed earth resources data analysis wherein it is essential to have such a unified learning system capability.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this article, we present a novel application of a quantum clustering (QC) technique to objectively cluster the conformations, sampled by molecular dynamics simulations performed on different ligand bound structures of the protein. We further portray each conformational population in terms of dynamically stable network parameters which beautifully capture the ligand induced variations in the ensemble in atomistic detail. The conformational populations thus identified by the QC method and verified by network parameters are evaluated for different ligand bound states of the protein pyrrolysyl-tRNA synthetase (DhPylRS) from D. hafniense. The ligand/environment induced re-distribution of protein conformational ensembles forms the basis for understanding several important biological phenomena such as allostery and enzyme catalysis. The atomistic level characterization of each population in the conformational ensemble in terms of the re-orchestrated networks of amino acids is a challenging problem, especially when the changes are minimal at the backbone level. Here we demonstrate that the QC method is sensitive to such subtle changes and is able to cluster MD snapshots which are similar at the side-chain interaction level. Although we have applied these methods on simulation trajectories of a modest time scale (20 ns each), we emphasize that our methodology provides a general approach towards an objective clustering of large-scale MD simulation data and may be applied to probe multistate equilibria at higher time scales, and to problems related to protein folding for any protein or protein-protein/RNA/DNA complex of interest with a known structure.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Clustering techniques are used in regional flood frequency analysis (RFFA) to partition watersheds into natural groups or regions with similar hydrologic responses. The linear Kohonen's self‐organizing feature map (SOFM) has been applied as a clustering technique for RFFA in several recent studies. However, it is seldom possible to interpret clusters from the output of an SOFM, irrespective of its size and dimensionality. In this study, we demonstrate that SOFMs may, however, serve as a useful precursor to clustering algorithms. We present a two‐level. SOFM‐based clustering approach to form regions for FFA. In the first level, the SOFM is used to form a two‐dimensional feature map. In the second level, the output nodes of SOFM are clustered using Fuzzy c‐means algorithm to form regions. The optimal number of regions is based on fuzzy cluster validation measures. Effectiveness of the proposed approach in forming homogeneous regions for FFA is illustrated through application to data from watersheds in Indiana, USA. Results show that the performance of the proposed approach to form regions is better than that based on classical SOFM.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Regionalization approaches are widely used in water resources engineering to identify hydrologically homogeneous groups of watersheds that are referred to as regions. Pooled information from sites (depicting watersheds) in a region forms the basis to estimate quantiles associated with hydrological extreme events at ungauged/sparsely gauged sites in the region. Conventional regionalization approaches can be effective when watersheds (data points) corresponding to different regions can be separated using straight lines or linear planes in the space of watershed related attributes. In this paper, a kernel-based Fuzzy c-means (KFCM) clustering approach is presented for use in situations where such linear separation of regions cannot be accomplished. The approach uses kernel-based functions to map the data points from the attribute space to a higher-dimensional space where they can be separated into regions by linear planes. A procedure to determine optimal number of regions with the KFCM approach is suggested. Further, formulations to estimate flood quantiles at ungauged sites with the approach are developed. Effectiveness of the approach is demonstrated through Monte-Carlo simulation experiments and a case study on watersheds in United States. Comparison of results with those based on conventional Fuzzy c-means clustering, Region-of-influence approach and a prior study indicate that KFCM approach outperforms the other approaches in forming regions that are closer to being statistically homogeneous and in estimating flood quantiles at ungauged sites. Key Points Kernel-based regionalization approach is presented for flood frequency analysis Kernel procedure to estimate flood quantiles at ungauged sites is developed A set of fuzzy regions is delineated in Ohio, USA

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we study the classification of spatiotemporal pattern of one-dimensional cellular automata (CA) whereas the classification comprises CA rules including their initial conditions. We propose an exploratory analysis method based on the normalized compression distance (NCD) of spatiotemporal patterns which is used as dissimilarity measure for a hierarchical clustering. Our approach is different with respect to the following points. First, the classification of spatiotemporal pattern is comparative because the NCD evaluates explicitly the difference of compressibility among two objects, e.g., strings corresponding to spatiotemporal patterns. This is in contrast to all other measures applied so far in a similar context because they are essentially univariate. Second, Kolmogorov complexity, which underlies the NCD, was used in the classification of CA with respect to their spatiotemporal pattern. Third, our method is semiautomatic allowing us to investigate hundreds or thousands of CA rules or initial conditions simultaneously to gain insights into their organizational structure. Our numerical results are not only plausible confirming previous classification attempts but also shed light on the intricate influence of random initial conditions on the classification results.