970 resultados para DATASETS
Resumo:
BACKGROUND Endometriosis is a heritable common gynaecological condition influenced by multiple genetic and environmental factors. Genome-wide association studies (GWASs) have proved successful in identifying common genetic variants of moderate effects for various complex diseases. To date, eight GWAS and replication studies from multiple populations have been published on endometriosis. In this review, we investigate the consistency and heterogeneity of the results across all the studies and their implications for an improved understanding of the aetiology of the condition. METHODS Meta-analyses were conducted on four GWASs and four replication studies including a total of 11 506 cases and 32 678 controls, and on the subset of studies that investigated associations for revised American Fertility Society (rAFS) Stage III/IV including 2859 cases. The datasets included 9039 cases and 27 343 controls of European (Australia, Belgium, Italy, UK, USA) and 2467 cases and 5335 controls of Japanese ancestry. Fixed and Han and Elkin random-effects models, and heterogeneity statistics (Cochran's Q test), were used to investigate the evidence of the nine reported genome-wide significant loci across datasets and populations. RESULTS Meta-analysis showed that seven out of nine loci had consistent directions of effect across studies and populations, and six out of nine remained genome-wide significant (P < 5 × 10(-8)), including rs12700667 on 7p15.2 (P = 1.6 × 10(-9)), rs7521902 near WNT4 (P = 1.8 × 10(-15)), rs10859871 near VEZT (P = 4.7 × 10(-15)), rs1537377 near CDKN2B-AS1 (P = 1.5 × 10(-8)), rs7739264 near ID4 (P = 6.2 × 10(-10)) and rs13394619 in GREB1 (P = 4.5 × 10(-8)). In addition to the six loci, two showed borderline genome-wide significant associations with Stage III/IV endometriosis, including rs1250248 in FN1 (P = 8 × 10(-8)) and rs4141819 on 2p14 (P = 9.2 × 10(-8)). Two independent inter-genic loci, rs4141819 and rs6734792 on chromosome 2, showed significant evidence of heterogeneity across datasets (P < 0.005). Eight of the nine loci had stronger effect sizes among Stage III/IV cases, implying that they are likely to be implicated in the development of moderate to severe, or ovarian, disease. While three out of nine loci were inter-genic, the remaining were in or near genes with known functions of biological relevance to endometriosis, varying from roles in developmental pathways to cellular growth/carcinogenesis. CONCLUSIONS Our meta-analysis shows remarkable consistency in endometriosis GWAS results across studies, with little evidence of population-based heterogeneity. They also show that the phenotypic classifications used in GWAS to date have been limited. Stronger associations with Stage III/IV disease observed for most loci emphasize the importance for future studies to include detailed sub-phenotype information. Functional studies in relevant tissues are needed to understand the effect of the variants on downstream biological pathways.
Resumo:
Marker ordering during linkage map construction is a critical component of QTL mapping research. In recent years, high-throughput genotyping methods have become widely used, and these methods may generate hundreds of markers for a single mapping population. This poses problems for linkage analysis software because the number of possible marker orders increases exponentially as the number of markers increases. In this paper, we tested the accuracy of linkage analyses on simulated recombinant inbred line data using the commonly used Map Manager QTX (Manly et al. 2001: Mammalian Genome 12, 930-932) software and RECORD (Van Os et al. 2005: Theoretical and Applied Genetics 112, 30-40). Accuracy was measured by calculating two scores: % correct marker positions, and a novel, weighted rank-based score derived from the sum of absolute values of true minus observed marker ranks divided by the total number of markers. The accuracy of maps generated using Map Manager QTX was considerably lower than those generated using RECORD. Differences in linkage maps were often observed when marker ordering was performed several times using the identical dataset. In order to test the effect of reducing marker numbers on the stability of marker order, we pruned marker datasets focusing on regions consisting of tightly linked clusters of markers, which included redundant markers. Marker pruning improved the accuracy and stability of linkage maps because a single unambiguous marker order was produced that was consistent across replications of analysis. Marker pruning was also applied to a real barley mapping population and QTL analysis was performed using different map versions produced by the different programs. While some QTLs were identified with both map versions, there were large differences in QTL mapping results. Differences included maximum LOD and R-2 values at QTL peaks and map positions, thus highlighting the importance of marker order for QTL mapping
Resumo:
Birds represent the most diverse extant tetrapod clade, with ca. 10,000 extant species, and the timing of the crown avian radiation remains hotly debated. The fossil record supports a primarily Cenozoic radiation of crown birds, whereas molecular divergence dating analyses generally imply that this radiation was well underway during the Cretaceous. Furthermore, substantial differences have been noted between published divergence estimates. These have been variously attributed to clock model, calibration regime, and gene type. One underappreciated phenomenon is that disparity between fossil ages and molecular dates tends to be proportionally greater for shallower nodes in the avian Tree of Life. Here, we explore potential drivers of disparity in avian divergence dates through a set of analyses applying various calibration strategies and coding methods to a mitochondrial genome dataset and an 18-gene nuclear dataset, both sampled across 72 taxa. Our analyses support the occurrence of two deep divergences (i.e., the Palaeognathae/Neognathae split and the Galloanserae/Neoaves split) well within the Cretaceous, followed by a rapid radiation of Neoaves near the K-Pg boundary. However, 95% highest posterior density intervals for most basal divergences in Neoaves cross the boundary, and we emphasize that, barring unreasonably strict prior distributions, distinguishing between a rapid Early Paleocene radiation and a Late Cretaceous radiation may be beyond the resolving power of currently favored divergence dating methods. In contrast to recent observations for placental mammals, constraining all divergences within Neoaves to occur in the Cenozoic does not result in unreasonably high inferred substitution rates. Comparisons of nuclear DNA (nDNA) versus mitochondrial DNA (mtDNA) datasets and NT- versus RY-coded mitochondrial data reveal patterns of disparity that are consistent with substitution model misspecifications that result in tree compression/tree extension artifacts, which may explain some discordance between previous divergence estimates based on different sequence types. Comparisons of fully calibrated and nominally calibrated trees support a correlation between body mass and apparent dating error. Overall, our results are consistent with (but do not require) a Paleogene radiation for most major clades of crown birds.
Resumo:
Daily rainfall datasets of 10 years (1998-2007) of Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) version 6 and India Meteorological Department (IMD) gridded rain gauge have been compared over the Indian landmass, both in large and small spatial scales. On the larger spatial scale, the pattern correlation between the two datasets on daily scales during individual years of the study period is ranging from 0.4 to 0.7. The correlation improved significantly (similar to 0.9) when the study was confined to specific wet and dry spells each of about 5-8 days. Wavelet analysis of intraseasonal oscillations (ISO) of the southwest monsoon rainfall show the percentage contribution of the major two modes (30-50 days and 10-20 days), to be ranging respectively between similar to 30-40% and 5-10% for the various years. Analysis of inter-annual variability shows the satellite data to be underestimating seasonal rainfall by similar to 110 mm during southwest monsoon and overestimating by similar to 150 mm during northeast monsoon season. At high spatio-temporal scales, viz., 1 degrees x1 degrees grid, TMPA data do not correspond to ground truth. We have proposed here a new analysis procedure to assess the minimum spatial scale at which the two datasets are compatible with each other. This has been done by studying the contribution to total seasonal rainfall from different rainfall rate windows (at 1 mm intervals) on different spatial scales (at daily time scale). The compatibility spatial scale is seen to be beyond 5 degrees x5 degrees average spatial scale over the Indian landmass. This will help to decide the usability of TMPA products, if averaged at appropriate spatial scales, for specific process studies, e.g., cloud scale, meso scale or synoptic scale.
Resumo:
This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics. (C) 2014 Acoustical Society of America
Resumo:
Supporting presentation slides as part of the Janet network end to end performance initiative
Resumo:
More and more users aim at taking advantage of the existing Linked Open Data environment to formulate a query over a dataset and to then try to process the same query over different datasets, one after another, in order to obtain a broader set of answers. However, the heterogeneity of vocabularies used in the datasets on the one side, and the fact that the number of alignments among those datasets is scarce on the other, makes that querying task difficult for them. Considering this scenario we present in this paper a proposal that allows on demand translations of queries formulated over an original dataset, into queries expressed using the vocabulary of a targeted dataset. Our approach relieves users from knowing the vocabulary used in the targeted datasets and even more it considers situations where alignments do not exist or they are not suitable for the formulated query. Therefore, in order to favour the possibility of getting answers, sometimes there is no guarantee of obtaining a semantically equivalent translation. The core component of our proposal is a query rewriting model that considers a set of transformation rules devised from a pragmatic point of view. The feasibility of our scheme has been validated with queries defined in well known benchmarks and SPARQL endpoint logs, as the obtained results confirm.
Resumo:
MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.
Resumo:
Long-term biological time-series in the oceans are relatively rare. Using the two longest of these we show how the information value of such ecological time-series increases through space and time in terms of their potential policy value. We also explore the co-evolution of these oceanic biological time-series with changing marine management drivers. Lessons learnt from reviewing these sequences of observations provide valuable context for the continuation of existing time-series and perspective for the initiation of new time-series in response to rapid global change. Concluding sections call for a more integrated approach to marine observation systems and highlight the future role of ocean observations in adaptive marine management.
Resumo:
The use of in situ measurements is essential in the validation and evaluation of the algorithms that provide coastal water quality data products from ocean colour satellite remote sensing. Over the past decade, various types of ocean colour algorithms have been developed to deal with the optical complexity of coastal waters. Yet there is a lack of a comprehensive intercomparison due to the availability of quality checked in situ databases. The CoastColour Round Robin (CCRR) project, funded by the European Space Agency (ESA), was designed to bring together three reference data sets using these to test algorithms and to assess their accuracy for retrieving water quality parameters. This paper provides a detailed description of these reference data sets, which include the Medium Resolution Imaging Spectrometer (MERIS) level 2 match-ups, in situ reflectance measurements, and synthetic data generated by a radiative transfer model (HydroLight). These data sets, representing mainly coastal waters, are available from doi:10.1594/PANGAEA.841950. The data sets mainly consist of 6484 marine reflectance (either multispectral or hyperspectral) associated with various geometrical (sensor viewing and solar angles) and sky conditions and water constituents: total suspended matter (TSM) and chlorophyll a (CHL) concentrations, and the absorption of coloured dissolved organic matter (CDOM). Inherent optical properties are also provided in the simulated data sets (5000 simulations) and from 3054 match-up locations. The distributions of reflectance at selected MERIS bands and band ratios, CHL and TSM as a function of reflectance, from the three data sets are compared. Match-up and in situ sites where deviations occur are identified. The distributions of the three reflectance data sets are also compared to the simulated and in situ reflectances used previously by the International Ocean Colour Coordinating Group (IOCCG, 2006) for algorithm testing, showing a clear extension of the CCRR data which covers more turbid waters.