917 resultados para model-based clustering


Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present the first approach to the genetic diversity and structure of the Balearic toad (Bufo balearicus Boettger, 1880) for the island of Menorca. Forty-one individ- uals from 21 localities were analyzed for ten microsatellite loci. We used geo-refer- enced individual multilocus genotypes and a model-based clustering method for the inference of the number of populations and of the spatial location of genetic dis- continuities between those populations.¦Only six of the microsatellites analyzed were polymorphic. We revealed a northwest- ern area inhabited by a single population with several well-connected localities and another set of populations in the southeast that includes a few unconnected small units with genetically significant differences among them as well as with the individ- uals from the northwest of the island. The observed fragmentation may be explained by shifts from agricultural to tourism practices that have been taking place on the island of Menorca since the 1960s. The abandonment of rural activities in favor of urbanization and concomitant service areas has mostly affected the southeast of the island and is currently threatening the overall geographic connectivity between the different farming areas of the island that are inhabited by the Balearic toad.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work was to assess the genetic diversity and population structure of wheat genotypes, to detect significant and stable genetic associations, as well as to evaluate the efficiency of statistical models to identify chromosome regions responsible for the expression of spike-related traits. Eight important spike characteristics were measured during five growing seasons in Serbia. A set of 30 microsatellite markers positioned near important agronomic loci was used to evaluate genetic diversity, resulting in a total of 349 alleles. The marker-trait associations were analyzed using the general linear and mixed linear models. The results obtained for number of allelic variants per locus (11.5), average polymorphic information content value (0.68), and average gene diversity (0.722) showed that the exceptional level of polymorphism in the genotypes is the main requirement for association studies. The population structure estimated by model-based clustering distributed the genotypes into six subpopulations according to log probability of data. Significant and stable associations were detected on chromosomes 1B, 2A, 2B, 2D, and 6D, which explained from 4.7 to 40.7% of total phenotypic variations. The general linear model identified a significantly larger number of marker-trait associations (192) than the mixed linear model (76). The mixed linear model identified nine markers associated to six traits.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Background: The amount and structure of genetic diversity in dessert apple germplasm conserved at a European level is mostly unknown, since all diversity studies conducted in Europe until now have been performed on regional or national collections. Here, we applied a common set of 16 SSR markers to genotype more than 2,400 accessions across 14 collections representing three broad European geographic regions (North+East, West and South) with the aim to analyze the extent, distribution and structure of variation in the apple genetic resources in Europe. Results: A Bayesian model-based clustering approach showed that diversity was organized in three groups, although these were only moderately differentiated (FST=0.031). A nested Bayesian clustering approach allowed identification of subgroups which revealed internal patterns of substructure within the groups, allowing a finer delineation of the variation into eight subgroups (FST=0.044). The first level of stratification revealed an asymmetric division of the germplasm among the three groups, and a clear association was found with the geographical regions of origin of the cultivars. The substructure revealed clear partitioning of genetic groups among countries, but also interesting associations between subgroups and breeding purposes of recent cultivars or particular usage such as cider production. Additional parentage analyses allowed us to identify both putative parents of more than 40 old and/or local cultivars giving interesting insights in the pedigree of some emblematic cultivars. Conclusions: The variation found at group and sub-group levels may reflect a combination of historical processes of migration/selection and adaptive factors to diverse agricultural environments that, together with genetic drift, have resulted in extensive genetic variation but limited population structure. The European dessert apple germplasm represents an important source of genetic diversity with a strong historical and patrimonial value. The present work thus constitutes a decisive step in the field of conservation genetics. Moreover, the obtained data can be used for defining a European apple core collection useful for further identification of genomic regions associated with commercially important horticultural traits in apple through genome-wide association studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Raccoons are the reservoir for the raccoon rabies virus variant in the United States. To combat this threat, oral rabies vaccination (ORV) programs are conducted in many eastern states. To aid in these efforts, the genetic structure of raccoons (Procyon lotor) was assessed in southwestern Pennsylvania to determine if select geographic features (i.e., ridges and valleys) serve as corridors or hindrances to raccoon gene flow (e.g., movement) and, therefore, rabies virus trafficking in this physiographic region. Raccoon DNA samples (n = 185) were collected from one ridge site and two adjacent valleys in southwestern Pennsylvania (Westmoreland, Cambria, Fayette, and Somerset counties). Raccoon genetic structure within and among these study sites was characterized at nine microsatellite loci. Results indicated that there was little population subdivision among any sites sampled. Furthermore, analyses using a model-based clustering approach indicated one essentially panmictic population was present among all the raccoons sampled over a reasonably broad geographic area (e.g., sites up to 36 km apart). However, a signature of isolation by distance was detected, suggesting that widths of ORV zones are critical for success. Combined, these data indicate that geographic features within this landscape influence raccoon gene flow only to a limited extent, suggesting that ridges of this physiographic system will not provide substantial long-term natural barriers to rabies virus trafficking. These results may be of value for future ORV efforts in Pennsylvania and other eastern states with similar landscapes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Optimal currency area theory suggests that business cycle comovement is a sufficient condition for monetary union, particularly if there are low levels of labour mobility between potential members of the monetary union. Previous studies of co-movement of business cycle variables (mainly authored by Artis and Zhang in the late 1990s) found that there was a core of member states in the EU that could be grouped together as having similar business cycle comovements, but these studies always used Germany as the country against which to compare. In this study, the analysis of Artis and Zhang is extended and updated but correlating against both German and euro area macroeconomic aggregates and using more recent techniques in cluster analysis, namely model-based clustering techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genetic diversity and population structure were investigated across the core range of Tasmanian devils (Sarcophilus laniarius; Dasyuridae), a wide-ranging marsupial carnivore restricted to the island of Tasmania. Heterozygosity (0.386-0.467) and allelic diversity (2.7-3.3) were low in all subpopulations and allelic size ranges were small and almost continuous, consistent with a founder effect. Island effects and repeated periods of low population density may also have contributed to the low variation. Within continuous habitat, gene flow appears extensive up to 50 km (high assignment rates to source or close neighbour populations; nonsignificant values of pairwise F-ST), in agreement with movement data. At larger scales (150-250 km), gene flow is reduced (significant pairwise F-ST) but there is no evidence for isolation by distance. The most substantial genetic structuring was observed for comparisons spanning unsuitable habitat, implying limited dispersal of devils between the well-connected, eastern populations and a smaller northwestern population. The genetic distinctiveness of the northwestern population was reflected in all analyses: unique alleles; multivariate analyses of gene frequency (multidimensional scaling, minimum spanning tree, nearest neighbour); high self-assignment (95%); two distinct populations for Tasmania were detected in isolation by distance and in Bayesian model-based clustering analyses. Marsupial carnivores appear to have stronger population subdivisions than their placental counterparts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ecological regions are increasingly used as a spatial unit for planning and environmental management. It is important to define these regions in a scientifically defensible way to justify any decisions made on the basis that they are representative of broad environmental assets. The paper describes a methodology and tool to identify cohesive bioregions. The methodology applies an elicitation process to obtain geographical descriptions for bioregions, each of these is transformed into a Normal density estimate on environmental variables within that region. This prior information is balanced with data classification of environmental datasets using a Bayesian statistical modelling approach to objectively map ecological regions. The method is called model-based clustering as it fits a Normal mixture model to the clusters associated with regions, and it addresses issues of uncertainty in environmental datasets due to overlapping clusters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE: Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset, using a large, international database. METHODS: The database includes 4037 patients with a diagnosis of bipolar I disorder, previously collected at 36 collection sites in 23 countries. Generalized estimating equations (GEE) were used to adjust the data for country median age, and in some models, birth cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. RESULTS: There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After adjusting for the birth cohort or when considering only those born after 1959, two subgroups were found. With results of either two or three subgroups, the youngest subgroup was more likely to have a family history of mood disorders and a first episode with depressed polarity. However, without adjusting for birth cohort (three subgroups), family history and polarity of the first episode could not be distinguished between the middle and oldest subgroups. CONCLUSION: These results using international data confirm prior findings using single country data, that there are subgroups of bipolar I disorder based on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more useful for research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)