116 resultados para height partition clustering
Resumo:
SNPs discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method with simulations based on the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.
Resumo:
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
Resumo:
Most information in linkage analysis for quantitative traits comes from pairs of relatives that are phenotypically most discordant or concordant. Confounding this, within-family outliers from non-genetic causes may create false positives and negatives. We investigated the influence of within-family outliers empirically, using one of the largest genome-wide linkage scans for height. The subjects were drawn from Australian twin cohorts consisting of 8447 individuals in 2861 families, providing a total of 5815 possible pairs of siblings in sibships. A variance component linkage analysis was performed, either including or excluding the within-family outliers. Using the entire dataset, the largest LOD scores were on chromosome 15q (LOD 2.3) and 11q (1.5). Excluding within-family outliers increased the LOD score for most regions, but the LOD score on chromosome 15 decreased from 2.3 to 1.2, suggesting that the outliers may create false negatives and false positives, although rare alleles of large effect may also be an explanation. Several regions suggestive of linkage to height were found after removing the outliers, including 1q23.1 (2.0), 3q22.1 (1.9) and 5q32 (2.3). We conclude that the investigation of the effect of within-family outliers, which is usually neglected, should be a standard quality control measure in linkage analysis for complex traits and may reduce the noise for the search of common variants of modest effect size as well as help identify rare variants of large effect and clinical significance. We suggest that the effect of within-family outliers deserves further investigation via theoretical and simulation studies.
Resumo:
Objective To investigate the epidemic characteristics of human cutaneous anthrax (CA) in China, detect the spatiotemporal clusters at the county level for preemptive public health interventions, and evaluate the differences in the epidemiological characteristics within and outside clusters. Methods CA cases reported during 2005–2012 from the national surveillance system were evaluated at the county level using space-time scan statistic. Comparative analysis of the epidemic characteristics within and outside identified clusters was performed using using the χ2 test or Kruskal-Wallis test. Results The group of 30–39 years had the highest incidence of CA, and the fatality rate increased with age, with persons ≥70 years showing a fatality rate of 4.04%. Seasonality analysis showed that most of CA cases occurred between May/June and September/October of each year. The primary spatiotemporal cluster contained 19 counties from June 2006 to May 2010, and it was mainly located straddling the borders of Sichuan, Gansu, and Qinghai provinces. In these high-risk areas, CA cases were predominantly found among younger, local, males, shepherds, who were living on agriculture and stockbreeding and characterized with high morbidity, low mortality and a shorter period from illness onset to diagnosis. Conclusion CA was geographically and persistently clustered in the Southwestern China during 2005–2012, with notable differences in the epidemic characteristics within and outside spatiotemporal clusters; this demonstrates the necessity for CA interventions such as enhanced surveillance, health education, mandatory and standard decontamination or disinfection procedures to be geographically targeted to the areas identified in this study.
Resumo:
The family of location and scale mixtures of Gaussians has the ability to generate a number of flexible distributional forms. The family nests as particular cases several important asymmetric distributions like the Generalized Hyperbolic distribution. The Generalized Hyperbolic distribution in turn nests many other well known distributions such as the Normal Inverse Gaussian. In a multivariate setting, an extension of the standard location and scale mixture concept is proposed into a so called multiple scaled framework which has the advantage of allowing different tail and skewness behaviours in each dimension with arbitrary correlation between dimensions. Estimation of the parameters is provided via an EM algorithm and extended to cover the case of mixtures of such multiple scaled distributions for application to clustering. Assessments on simulated and real data confirm the gain in degrees of freedom and flexibility in modelling data of varying tail behaviour and directional shape.
Resumo:
We propose a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from introducing multidimensional instead of univariate scale variables for the mixture of scaled Gaussian family of distributions. In contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examine a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and t tails. For these latter cases, we provide maximum likelihood estimation of the parameters and illustrate their modelling flexibility on simulated and real data clustering examples.
Resumo:
Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.
Resumo:
The maximum height of a siphon is generally assumed to be dependent on barometric pressure—about 10 m at sea level. This limit arises because the pressure in a siphon above the upper reservoir level is below the ambient pressure, and when the height of a siphon approaches 10 m, the pressure at the crown of the siphon falls below the vapour pressure of water causing water to boil breaking the column. After breaking, the columns on either side are supported by differential pressure between ambient and the low-pressure region at the top of the siphon. Here we report an experiment of a siphon operating at sea level at a height of 15 m, well above 10 m. Prior degassing of the water prevented cavitation. This experiment provides conclusive evidence that siphons operate through gravity and molecular cohesion.
Resumo:
This paper presents a statistical aircraft trajectory clustering approach aimed at discriminating between typical manned and expected unmanned traffic patterns. First, a resampled version of each trajectory is modelled using a mixture of Von Mises distributions (circular statistics). Second, the remodelled trajectories are globally aligned using tools from bioinformatics. Third, the alignment scores are used to cluster the trajectories using an iterative k-medoids approach and an appropriate distance function. The approach is then evaluated using synthetically generated unmanned aircraft flights combined with real air traffic position reports taken over a sector of Northern Queensland, Australia. Results suggest that the technique is useful in distinguishing between expected unmanned and manned aircraft traffic behaviour, as well as identifying some common conventional air traffic patterns.
Resumo:
Background Epidemiological studies suggest a potential role for obesity and determinants of adult stature in prostate cancer risk and mortality, but the relationships described in the literature are complex. To address uncertainty over the causal nature of previous observational findings, we investigated associations of height- and adiposity-related genetic variants with prostate cancer risk and mortality. Methods We conducted a case–control study based on 20,848 prostate cancers and 20,214 controls of European ancestry from 22 studies in the PRACTICAL consortium. We constructed genetic risk scores that summed each man’s number of height and BMI increasing alleles across multiple single nucleotide polymorphisms robustly associated with each phenotype from published genome-wide association studies. Results The genetic risk scores explained 6.31 and 1.46 % of the variability in height and BMI, respectively. There was only weak evidence that genetic variants previously associated with increased BMI were associated with a lower prostate cancer risk (odds ratio per standard deviation increase in BMI genetic score 0.98; 95 % CI 0.96, 1.00; p = 0.07). Genetic variants associated with increased height were not associated with prostate cancer incidence (OR 0.99; 95 % CI 0.97, 1.01; p = 0.23), but were associated with an increase (OR 1.13; 95 % CI 1.08, 1.20) in prostate cancer mortality among low-grade disease (p heterogeneity, low vs. high grade <0.001). Genetic variants associated with increased BMI were associated with an increase (OR 1.08; 95 % CI 1.03, 1.14) in all-cause mortality among men with low-grade disease (p heterogeneity = 0.03). Conclusions We found little evidence of a substantial effect of genetically elevated height or BMI on prostate cancer risk, suggesting that previously reported observational associations may reflect common environmental determinants of height or BMI and prostate cancer risk. Genetically elevated height and BMI were associated with increased mortality (prostate cancer-specific and all-cause, respectively) in men with low-grade disease, a potentially informative but novel finding that requires replication.
Resumo:
Agricultural pests are responsible for millions of dollars in crop losses and management costs every year. In order to implement optimal site-specific treatments and reduce control costs, new methods to accurately monitor and assess pest damage need to be investigated. In this paper we explore the combination of unmanned aerial vehicles (UAV), remote sensing and machine learning techniques as a promising technology to address this challenge. The deployment of UAVs as a sensor platform is a rapidly growing field of study for biosecurity and precision agriculture applications. In this experiment, a data collection campaign is performed over a sorghum crop severely damaged by white grubs (Coleoptera: Scarabaeidae). The larvae of these scarab beetles feed on the roots of plants, which in turn impairs root exploration of the soil profile. In the field, crop health status could be classified according to three levels: bare soil where plants were decimated, transition zones of reduced plant density and healthy canopy areas. In this study, we describe the UAV platform deployed to collect high-resolution RGB imagery as well as the image processing pipeline implemented to create an orthoimage. An unsupervised machine learning approach is formulated in order to create a meaningful partition of the image into each of the crop levels. The aim of the approach is to simplify the image analysis step by minimizing user input requirements and avoiding the manual data labeling necessary in supervised learning approaches. The implemented algorithm is based on the K-means clustering algorithm. In order to control high-frequency components present in the feature space, a neighbourhood-oriented parameter is introduced by applying Gaussian convolution kernels prior to K-means. The outcome of this approach is a soft K-means algorithm similar to the EM algorithm for Gaussian mixture models. The results show the algorithm delivers decision boundaries that consistently classify the field into three clusters, one for each crop health level. The methodology presented in this paper represents a venue for further research towards automated crop damage assessments and biosecurity surveillance.