535 resultados para Clustering Analysis
Resumo:
Imaging genetics aims to discover how variants in the human genome influence brain measures derived from images. Genome-wide association scans (GWAS) can screen the genome for common differences in our DNA that relate to brain measures. In small samples, GWAS has low power as individual gene effects are weak and one must also correct for multiple comparisons across the genome and the image. Here we extend recent work on genetic clustering of images, to analyze surface-based models of anatomy using GWAS. We performed spherical harmonic analysis of hippocampal surfaces, automatically extracted from brain MRI scans of 1254 subjects. We clustered hippocampal surface regions with common genetic influences by examining genetic correlations (r(g)) between the normalized deformation values at all pairs of surface points. Using genetic correlations to cluster surface measures, we were able to boost effect sizes for genetic associations, compared to clustering with traditional phenotypic correlations using Pearson's r.
Resumo:
Automatic labeling of white matter fibres in diffusion-weighted brain MRI is vital for comparing brain integrity and connectivity across populations, but is challenging. Whole brain tractography generates a vast set of fibres throughout the brain, but it is hard to cluster them into anatomically meaningful tracts, due to wide individual variations in the trajectory and shape of white matter pathways. We propose a novel automatic tract labeling algorithm that fuses information from tractography and multiple hand-labeled fibre tract atlases. As streamline tractography can generate a large number of false positive fibres, we developed a top-down approach to extract tracts consistent with known anatomy, based on a distance metric to multiple hand-labeled atlases. Clustering results from different atlases were fused, using a multi-stage fusion scheme. Our "label fusion" method reliably extracted the major tracts from 105-gradient HARDI scans of 100 young normal adults. © 2012 Springer-Verlag.
Resumo:
Human expert analyses are commonly used in bioacoustic studies and can potentially limit the reproducibility of these results. In this paper, a machine learning method is presented to statistically classify avian vocalizations. Automated approaches were applied to isolate bird songs from long field recordings, assess song similarities, and classify songs into distinct variants. Because no positive controls were available to assess the true classification of variants, multiple replicates of automatic classification of song variants were analyzed to investigate clustering uncertainty. The automatic classifications were more similar to the expert classifications than expected by chance. Application of these methods demonstrated the presence of discrete song variants in an island population of the New Zealand hihi (Notiomystis cincta). The geographic patterns of song variation were then revealed by integrating over classification replicates. Because this automated approach considers variation in song variant classification, it reduces potential human bias and facilitates the reproducibility of the results.
Resumo:
Environmental acoustic recordings can be used to perform avian species richness surveys, whereby a trained ornithologist can observe the species present by listening to the recording. This could be made more efficient by using computational methods for iteratively selecting the richest parts of a long recording for the human observer to listen to, a process known as “smart sampling”. This allows scaling up to much larger ecological datasets. In this paper we explore computational approaches based on information and diversity of selected samples. We propose to use an event detection algorithm to estimate the amount of information present in each sample. We further propose to cluster the detected events for a better estimate of this amount of information. Additionally, we present a time dispersal approach to estimating diversity between iteratively selected samples. Combinations of approaches were evaluated on seven 24-hour recordings that have been manually labeled by bird watchers. The results show that on average all the methods we have explored would allow annotators to observe more new species in fewer minutes compared to a baseline of random sampling at dawn.
Resumo:
Multicentric carpotarsal osteolysis (MCTO) is a rare skeletal dysplasia characterized by aggressive osteolysis, particularly affecting the carpal and tarsal bones, and is frequently associated with progressive renal failure. Using exome capture and next-generation sequencing in five unrelated simplex cases of MCTO, we identified previously unreported missense mutations clustering within a 51 base pair region of the single exon of MAFB, validated by Sanger sequencing. A further six unrelated simplex cases with MCTO were also heterozygous for previously unreported mutations within this same region, as were affected members of two families with autosomal-dominant MCTO. MAFB encodes a transcription factor that negatively regulates RANKL-induced osteoclastogenesis and is essential for normal renal development. Identification of this gene paves the way for development of novel therapeutic approaches for this crippling disease and provides insight into normal bone and kidney development.
Resumo:
Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods.
Resumo:
The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988-96, while accounting for spatial and temporal correlations in the catch-effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had within-cluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations.
Resumo:
Images from cell biology experiments often indicate the presence of cell clustering, which can provide insight into the mechanisms driving the collective cell behaviour. Pair-correlation functions provide quantitative information about the presence, or absence, of clustering in a spatial distribution of cells. This is because the pair-correlation function describes the ratio of the abundance of pairs of cells, separated by a particular distance, relative to a randomly distributed reference population. Pair-correlation functions are often presented as a kernel density estimate where the frequency of pairs of objects are grouped using a particular bandwidth (or bin width), Δ>0. The choice of bandwidth has a dramatic impact: choosing Δ too large produces a pair-correlation function that contains insufficient information, whereas choosing Δ too small produces a pair-correlation signal dominated by fluctuations. Presently, there is little guidance available regarding how to make an objective choice of Δ. We present a new technique to choose Δ by analysing the power spectrum of the discrete Fourier transform of the pair-correlation function. Using synthetic simulation data, we confirm that our approach allows us to objectively choose Δ such that the appropriately binned pair-correlation function captures known features in uniform and clustered synthetic images. We also apply our technique to images from two different cell biology assays. The first assay corresponds to an approximately uniform distribution of cells, while the second assay involves a time series of images of a cell population which forms aggregates over time. The appropriately binned pair-correlation function allows us to make quantitative inferences about the average aggregate size, as well as quantifying how the average aggregate size changes with time.
Resumo:
The family of location and scale mixtures of Gaussians has the ability to generate a number of flexible distributional forms. The family nests as particular cases several important asymmetric distributions like the Generalized Hyperbolic distribution. The Generalized Hyperbolic distribution in turn nests many other well known distributions such as the Normal Inverse Gaussian. In a multivariate setting, an extension of the standard location and scale mixture concept is proposed into a so called multiple scaled framework which has the advantage of allowing different tail and skewness behaviours in each dimension with arbitrary correlation between dimensions. Estimation of the parameters is provided via an EM algorithm and extended to cover the case of mixtures of such multiple scaled distributions for application to clustering. Assessments on simulated and real data confirm the gain in degrees of freedom and flexibility in modelling data of varying tail behaviour and directional shape.
Resumo:
Raman spectroscopy of formamide-intercalated kaolinites treated using controlled-rate thermal analysis technology (CRTA), allowing the separation of adsorbed formamide from intercalated formamide in formamide-intercalated kaolinites, is reported. The Raman spectra of the CRTA-treated formamide-intercalated kaolinites are significantly different from those of the intercalated kaolinites, which display a combination of both intercalated and adsorbed formamide. An intense band is observed at 3629 cm-1, attributed to the inner surface hydroxyls hydrogen bonded to the formamide. Broad bands are observed at 3600 and 3639 cm-1, assigned to the inner surface hydroxyls, which are hydrogen bonded to the adsorbed water molecules. The hydroxyl-stretching band of the inner hydroxyl is observed at 3621 cm-1 in the Raman spectra of the CRTA-treated formamide-intercalated kaolinites. The results of thermal analysis show that the amount of intercalated formamide between the kaolinite layers is independent of the presence of water. Significant differences are observed in the CO stretching region between the adsorbed and intercalated formamide.
Resumo:
Diffusion equations that use time fractional derivatives are attractive because they describe a wealth of problems involving non-Markovian Random walks. The time fractional diffusion equation (TFDE) is obtained from the standard diffusion equation by replacing the first-order time derivative with a fractional derivative of order α ∈ (0, 1). Developing numerical methods for solving fractional partial differential equations is a new research field and the theoretical analysis of the numerical methods associated with them is not fully developed. In this paper an explicit conservative difference approximation (ECDA) for TFDE is proposed. We give a detailed analysis for this ECDA and generate discrete models of random walk suitable for simulating random variables whose spatial probability density evolves in time according to this fractional diffusion equation. The stability and convergence of the ECDA for TFDE in a bounded domain are discussed. Finally, some numerical examples are presented to show the application of the present technique.
Resumo:
The time for conducting Preventive Maintenance (PM) on an asset is often determined using a predefined alarm limit based on trends of a hazard function. In this paper, the authors propose using both hazard and reliability functions to improve the accuracy of the prediction particularly when the failure characteristic of the asset whole life is modelled using different failure distributions for the different stages of the life of the asset. The proposed method is validated using simulations and case studies.