935 resultados para complex data
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^
Resumo:
When choosing among models to describe categorical data, the necessity to consider interactions makes selection more difficult. With just four variables, considering all interactions, there are 166 different hierarchical models and many more non-hierarchical models. Two procedures have been developed for categorical data which will produce the "best" subset or subsets of each model size where size refers to the number of effects in the model. Both procedures are patterned after the Leaps and Bounds approach used by Furnival and Wilson for continuous data and do not generally require fitting all models. For hierarchical models, likelihood ratio statistics (G('2)) are computed using iterative proportional fitting and "best" is determined by comparing, among models with the same number of effects, the Pr((chi)(,k)('2) (GREATERTHEQ) G(,ij)('2)) where k is the degrees of freedom for ith model of size j. To fit non-hierarchical as well as hierarchical models, a weighted least squares procedure has been developed.^ The procedures are applied to published occupational data relating to the occurrence of byssinosis. These results are compared to previously published analyses of the same data. Also, the procedures are applied to published data on symptoms in psychiatric patients and again compared to previously published analyses.^ These procedures will make categorical data analysis more accessible to researchers who are not statisticians. The procedures should also encourage more complex exploratory analyses of epidemiologic data and contribute to the development of new hypotheses for study. ^
Resumo:
These Data Management Plans are more comprehensive and complex than in the past. Libraries around the nation are trying to put together tools to help researchers write plans that conform to the new requirements. This session will look at some of these tools.
Resumo:
Genome-wide association studies (GWAS) have successfully identified several genetic loci associated with inherited predisposition to primary biliary cirrhosis (PBC), the most common autoimmune disease of the liver. Pathway-based tests constitute a novel paradigm for GWAS analysis. By evaluating genetic variation across a biological pathway (gene set), these tests have the potential to determine the collective impact of variants with subtle effects that are individually too weak to be detected in traditional single variant GWAS analysis. To identify biological pathways associated with the risk of development of PBC, GWAS of PBC from Italy (449 cases and 940 controls) and Canada (530 cases and 398 controls) were independently analyzed. The linear combination test (LCT), a recently developed pathway-level statistical method was used for this analysis. For additional validation, pathways that were replicated at the P <0.05 level of significance in both GWAS on LCT analysis were also tested for association with PBC in each dataset using two complementary GWAS pathway approaches. The complementary approaches included a modification of the gene set enrichment analysis algorithm (i-GSEA4GWAS) and Fisher's exact test for pathway enrichment ratios. Twenty-five pathways were associated with PBC risk on LCT analysis in the Italian dataset at P<0.05, of which eight had an FDR<0.25. The top pathway in the Italian dataset was the TNF/stress related signaling pathway (p=7.38×10 -4, FDR=0.18). Twenty-six pathways were associated with PBC at the P<0.05 level using the LCT in the Canadian dataset with the regulation and function of ChREBP in liver pathway (p=5.68×10-4, FDR=0.285) emerging as the most significant pathway. Two pathways, phosphatidylinositol signaling system (Italian: p=0.016, FDR=0.436; Canadian: p=0.034, FDR=0.693) and hedgehog signaling (Italian: p=0.044, FDR=0.636; Canadian: p=0.041, FDR=0.693), were replicated at LCT P<0.05 in both datasets. Statistically significant association of both pathways with PBC genetic susceptibility was confirmed in the Italian dataset on i-GSEA4GWAS. Results for the phosphatidylinositol signaling system were also significant in both datasets on applying Fisher's exact test for pathway enrichment ratios. This study identified a combination of known and novel pathway-level associations with PBC risk. If functionally validated, the findings may yield fresh insights into the etiology of this complex autoimmune disease with possible preventive and therapeutic application.^
Resumo:
Objective: The primary objective of this project was to describe the efficacy of the Levonorgestrel Intrauterine Device (LIUD) for treatment of Complex Endometrial Cancer (CAH) and Grade 1 Endometrial Cancer (G1EEC) in terms of rate of Complete Response (CR) and Partial Response (PR) after 6 months of therapy. Finally, we assessed if any clinical or pathologic features were associated with response to the LIUD. ^ Methods: This study was a retrospective case series designed to report the response rate of patients with CAH or G1EEC treated with LIUD therapy. In addition, this study has a laboratory component to assess molecular predictors of response to LIUD therapy. Retrospective data already collected from patients diagnosed with CAH or EEC grade 1 and treated with LIUD therapy at MD Anderson Cancer Center (MDACC) were used for this study. Patients from all ethnic and race groups were included. A Complete Response (CR) was defined in patients diagnosed with CAH if pathologic report at 6 months demonstrated either no evidence of hyperplasia or no atypia in the setting of simple or complex hyperplasia. Partial Response (PR) was recorded if disease downgraded to only CAH from G1EEC. No Response (NR) was recorded if pathologic report demonstrates no change (Stable Disease, SD) or progression to cancer (Progressive Disease, PD). We calculated the proportion of patients with complete response to LIUD therapy with 95% confidence interval. We compared the response rates (CR/PR vs NR) by obesity status (Obese if BMI > 40 kg/m2 vs non-obese if BMI <= 40 kg/m2) as well as other clinical and pathologic factors, such as age, uterine size (median size), and presence of exogenous progesterone effect. ^ Results: There were 39 patients diagnosed with either CAH or G1EEC treated with the LIUD. Of 39 patients, 12 did not have pathological results of biopsy at 6months time period. Of 27 evaluable patients, 17 were diagnosed with CAH and 10 with G1EEC. Overall response rate (RR) was 78% (95% CI = 62-94%) at 6 months, 18 patients had CR (4 in G1EEC; 14 in CAH), 3 patients had PR (3 in G1EEC), 3 had SD (1 in CAH; 2 in G1EEC), 3 had PD (2 in CAH; 1 in G1EEC). After histology stratification, RR at 6 months was 82.35% (14/17; 95%CI = 67.4-97.3%) in CAH and 70% (7/10; 95% CI = 41-98.4%) in G1EEC. ^ There was no difference in response (R) and no response (NR) based on BMI (p=0.56). He observed a trend showing association between age with response (p=0.1). There was no association between uterine size and response to therapy (p=0.17). We recorded strong association between exogenous progesterone effect and response. ^ Conclusion: LIUD therapy for the treatment of CAH and G1EEC may be effective and safe. Presence of exogenous progesterone effect may predict the response to LIUD therapy at earlier time points. There is need of further studies with larger sample size to explore the relationship of response with other clinical and pathologic factors^
Resumo:
My dissertation focuses on developing methods for gene-gene/environment interactions and imprinting effect detections for human complex diseases and quantitative traits. It includes three sections: (1) generalizing the Natural and Orthogonal interaction (NOIA) model for the coding technique originally developed for gene-gene (GxG) interaction and also to reduced models; (2) developing a novel statistical approach that allows for modeling gene-environment (GxE) interactions influencing disease risk, and (3) developing a statistical approach for modeling genetic variants displaying parent-of-origin effects (POEs), such as imprinting. In the past decade, genetic researchers have identified a large number of causal variants for human genetic diseases and traits by single-locus analysis, and interaction has now become a hot topic in the effort to search for the complex network between multiple genes or environmental exposures contributing to the outcome. Epistasis, also known as gene-gene interaction is the departure from additive genetic effects from several genes to a trait, which means that the same alleles of one gene could display different genetic effects under different genetic backgrounds. In this study, we propose to implement the NOIA model for association studies along with interaction for human complex traits and diseases. We compare the performance of the new statistical models we developed and the usual functional model by both simulation study and real data analysis. Both simulation and real data analysis revealed higher power of the NOIA GxG interaction model for detecting both main genetic effects and interaction effects. Through application on a melanoma dataset, we confirmed the previously identified significant regions for melanoma risk at 15q13.1, 16q24.3 and 9p21.3. We also identified potential interactions with these significant regions that contribute to melanoma risk. Based on the NOIA model, we developed a novel statistical approach that allows us to model effects from a genetic factor and binary environmental exposure that are jointly influencing disease risk. Both simulation and real data analyses revealed higher power of the NOIA model for detecting both main genetic effects and interaction effects for both quantitative and binary traits. We also found that estimates of the parameters from logistic regression for binary traits are no longer statistically uncorrelated under the alternative model when there is an association. Applying our novel approach to a lung cancer dataset, we confirmed four SNPs in 5p15 and 15q25 region to be significantly associated with lung cancer risk in Caucasians population: rs2736100, rs402710, rs16969968 and rs8034191. We also validated that rs16969968 and rs8034191 in 15q25 region are significantly interacting with smoking in Caucasian population. Our approach identified the potential interactions of SNP rs2256543 in 6p21 with smoking on contributing to lung cancer risk. Genetic imprinting is the most well-known cause for parent-of-origin effect (POE) whereby a gene is differentially expressed depending on the parental origin of the same alleles. Genetic imprinting affects several human disorders, including diabetes, breast cancer, alcoholism, and obesity. This phenomenon has been shown to be important for normal embryonic development in mammals. Traditional association approaches ignore this important genetic phenomenon. In this study, we propose a NOIA framework for a single locus association study that estimates both main allelic effects and POEs. We develop statistical (Stat-POE) and functional (Func-POE) models, and demonstrate conditions for orthogonality of the Stat-POE model. We conducted simulations for both quantitative and qualitative traits to evaluate the performance of the statistical and functional models with different levels of POEs. Our results showed that the newly proposed Stat-POE model, which ensures orthogonality of variance components if Hardy-Weinberg Equilibrium (HWE) or equal minor and major allele frequencies is satisfied, had greater power for detecting the main allelic additive effect than a Func-POE model, which codes according to allelic substitutions, for both quantitative and qualitative traits. The power for detecting the POE was the same for the Stat-POE and Func-POE models under HWE for quantitative traits.
Resumo:
The Greenland ice sheet is accepted as a key factor controlling the Quaternary glacial scenario. However, the origin and mechanisms of major Arctic glaciation starting at 3.15 Ma and culminating at 2.74 Ma are still controversial. For this phase of intense cooling Ravelo et al. proposed a complex gradual forcing mechanism. In contrast, our new submillennial-scale paleoceanographic records from the Pliocene North Atlantic suggest a far more precise timing and forcing for the initiation of northern hemisphere glaciation (NHG), since it was linked to a 2-3 °C surface water warming during warm stages from 2.95 to 2.82 Ma. These records support previous models, claiming that the final closure of the Panama Isthmus (3.0- ~2.5 Ma induced an increased poleward salt and heat transport. Associated strengthening of North Atlantic Thermohaline Circulation and in turn, an intensified moisture supply to northern high latitudes resulted in the build-up of NHG, finally culminating in the great, irreversible climate crash at marine isotope stage G6 (2.74 Ma). In summary, there was a two-step threshold mechanism that marked the onset of NHG with glacial-to-interglacial cycles quasi-persistent until today.
Resumo:
Summary: The stratigraphy of the Shackleton Range established by Stephenson (1966) and Clarkson (1972) was revised by results of the German Expedition GEISHA 1987/88. The "Turnpike Bluff Group" does not form a stratigraphic unit. The stratigraphic correlation of its formations is still a matter of discussion. The following four formations are presumed to belong to different units: The Stephenson Bastion Formation and Wyeth Heights Formation are probably of Late Precambrian age. The Late Precambrian Watts Needle Formation, which lies unconformably on the Read Group, is an independant unit which has to be separated from the "Turnpike Bluff Group". The Mount Wegener Formation has been thrusted over the Watts Needle Formation. Early Cambrian fossils (Oldhamia sp., Epiphyton sp., Botomaella (?) sp. and echinoderms) were found in the Mt. Wegener Formation in the Read Mountains. The Middle Cambrian trilobite shales on Mount Provender, which form the Haskard Highlands Formation, are possibly in faulted contact with the basement complex (Pioneers and Stratton Groups). They are overlain by the Blaiklock Glacier Group, for which an Ordovician age is indicated by trilobite tracks and trails, low inclination of the paleomagnetic field and the similarity to the basal units of the Table Mountain Quartzite in South Africa. The Watts Needle Formation represents epicontinental shelf sediments, the Mount Wegener Formation was deposited in a (continental) back-arc environment, and the Blaiklock Glacier Group is a typical molasse sediment of the Ross Orogen.
Resumo:
Three complementary imaging techniques were used to describe a complex rosette-shaped microboring that penetrates the shells of brachiopods from the OrdovicianSilurian shallow marine limestones of Anticosti Island, Canada. Pyrodendrina cupra n. igen. and isp. is among the oldest dendrinid microborings and consists of shallow and deep penetrating canals that radiate from a central polygonal chamber. The affinity of the tracemaker is unknown, but a foraminiferal origin, as proposed for some dendrinid borings, is rejected. Combining microCT with traditional stereomicroscopy and SEM helped distinguish and quantify fine morphological features while maintaining contextual information of the microboring within the shell substrate. Different imaging techniques inherently bias the description of microborings. These biases must be accounted for as new methods in ichnotaxonomy are integrated with past research based on different methods.
Resumo:
The distinctly cyclic sediments recovered during ODP Leg 154 played an important role in constructing the astronomical time scale and associated astro(bio)chronology for the Miocene, and in deciphering ocean-climate history. The accuracy of the timescale critically depends on the reliability of the shipboard splice used for the tuning and on the tuning itself. New high-resolution colour- and magnetic susceptibility core scanning data supplemented with limited XRF-data allow improvement of the stratigraphy. The revised composite record results in an improved astronomical age model for ODP Site 926 between 5 and 14.4 Ma. The new age model is confirmed by results of complex amplitude demodulation of the precession and obliquity related cycle patterns. Different values for tidal dissipation are applied to improve the fit between the sedimentary cycle patterns and the astronomical solution. Due to the improved stratigraphy and tuning, supported by the results of amplitude demodulation, the revised time scale yields more reliable age estimates for planktic foraminiferal and calcareous nannofossil events. The results of this study highlight the importance of stratigraphy for timescale construction.
Resumo:
Using an extensive network of occurrence records for 293 plant species collected over the past 40 years across a climatically diverse geographic section of western North America, we find that plant species distributions were just as likely to shift upwards (i.e., towards higher elevations) as downward (i.e., towards lower elevations) - despite consistent warming across the study area. Although there was no clear directional response to climate warming across the entire study area, there was significant region-to region- variation in responses (i.e. from as many as 73% to as few as32% of species shifting upward or downward). To understand the factors that might be controlling region-specific distributional shifts, we explored the relationship between the direction of change in distribution limits and the nature of recent climate change. We found that the direction of distribution limit shifts was explained by an interaction between the rate of change in local summer temperatures and seasonal precipitation. Specifically, species shifted upward at their upper elevational limit when snowfall declined at slower rates and minimum temperatures increased. By contrast, species shifted upwards at their lower elevation limit when maximum temperatures increased or both temperature and precipitation decreased. Our results suggest that future species' elevational distribution shifts will be complex, depending on the interaction between seasonal temperature and precipitation change.