6 resultados para Analysis and statistical methods

em Digital Commons - Michigan Tech


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex human diseases are a major challenge for biological research. The goal of my research is to develop effective methods for biostatistics in order to create more opportunities for the prevention and cure of human diseases. This dissertation proposes statistical technologies that have the ability of being adapted to sequencing data in family-based designs, and that account for joint effects as well as gene-gene and gene-environment interactions in the GWA studies. The framework includes statistical methods for rare and common variant association studies. Although next-generation DNA sequencing technologies have made rare variant association studies feasible, the development of powerful statistical methods for rare variant association studies is still underway. Chapter 2 demonstrates two adaptive weighting methods for rare variant association studies based on family data for quantitative traits. The results show that both proposed methods are robust to population stratification, robust to the direction and magnitude of the effects of causal variants, and more powerful than the methods using weights suggested by Madsen and Browning [2009]. In Chapter 3, I extended the previously proposed test for Testing the effect of an Optimally Weighted combination of variants (TOW) [Sha et al., 2012] for unrelated individuals to TOW &ndash F, TOW for Family &ndash based design. Simulation results show that TOW &ndash F can control for population stratification in wide range of population structures including spatially structured populations, is robust to the directions of effect of causal variants, and is relatively robust to percentage of neutral variants. In GWA studies, this dissertation consists of a two &ndash locus joint effect analysis and a two-stage approach accounting for gene &ndash gene and gene &ndash environment interaction. Chapter 4 proposes a novel two &ndash stage approach, which is promising to identify joint effects, especially for monotonic models. The proposed approach outperforms a single &ndash marker method and a regular two &ndash stage analysis based on the two &ndash locus genotypic test. In Chapter 5, I proposed a gene &ndash based two &ndash stage approach to identify gene &ndash gene and gene &ndash environment interactions in GWA studies which can include rare variants. The two &ndash stage approach is applied to the GAW 17 dataset to identify the interaction between KDR gene and smoking status.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The developmental processes and functions of an organism are controlled by the genes and the proteins that are derived from these genes. The identification of key genes and the reconstruction of gene networks can provide a model to help us understand the regulatory mechanisms for the initiation and progression of biological processes or functional abnormalities (e.g. diseases) in living organisms. In this dissertation, I have developed statistical methods to identify the genes and transcription factors (TFs) involved in biological processes, constructed their regulatory networks, and also evaluated some existing association methods to find robust methods for coexpression analyses. Two kinds of data sets were used for this work: genotype data and gene expression microarray data. On the basis of these data sets, this dissertation has two major parts, together forming six chapters. The first part deals with developing association methods for rare variants using genotype data (chapter 4 and 5). The second part deals with developing and/or evaluating statistical methods to identify genes and TFs involved in biological processes, and construction of their regulatory networks using gene expression data (chapter 2, 3, and 6). For the first part, I have developed two methods to find the groupwise association of rare variants with given diseases or traits. The first method is based on kernel machine learning and can be applied to both quantitative as well as qualitative traits. Simulation results showed that the proposed method has improved power over the existing weighted sum method (WS) in most settings. The second method uses multiple phenotypes to select a few top significant genes. It then finds the association of each gene with each phenotype while controlling the population stratification by adjusting the data for ancestry using principal components. This method was applied to GAW 17 data and was able to find several disease risk genes. For the second part, I have worked on three problems. First problem involved evaluation of eight gene association methods. A very comprehensive comparison of these methods with further analysis clearly demonstrates the distinct and common performance of these eight gene association methods. For the second problem, an algorithm named the bottom-up graphical Gaussian model was developed to identify the TFs that regulate pathway genes and reconstruct their hierarchical regulatory networks. This algorithm has produced very significant results and it is the first report to produce such hierarchical networks for these pathways. The third problem dealt with developing another algorithm called the top-down graphical Gaussian model that identifies the network governed by a specific TF. The network produced by the algorithm is proven to be of very high accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Northern hardwood management was assessed throughout the state of Michigan using data collected on recently harvested stands in 2010 and 2011. Methods of forensic estimation of diameter at breast height were compared and an ideal, localized equation form was selected for use in reconstructing pre-harvest stand structures. Comparisons showed differences in predictive ability among available equation forms which led to substantial financial differences when used to estimate the value of removed timber. Management on all stands was then compared among state, private, and corporate landowners. Comparisons of harvest intensities against a liberal interpretation of a well-established management guideline showed that approximately one third of harvests were conducted in a manner which may imply that the guideline was followed. One third showed higher levels of removals than recommended, and one third of harvests were less intensive than recommended. Multiple management guidelines and postulated objectives were then synthesized into a novel system of harvest taxonomy, against which all harvests were compared. This further comparison showed approximately the same proportions of harvests, while distinguishing sanitation cuts and the future productive potential of harvests cut more intensely than suggested by guidelines. Stand structures are commonly represented using diameter distributions. Parametric and nonparametric techniques for describing diameter distributions were employed on pre-harvest and post-harvest data. A common polynomial regression procedure was found to be highly sensitive to the method of histogram construction which provides the data points for the regression. The discriminative ability of kernel density estimation was substantially different from that of the polynomial regression technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the development of genotyping and next-generation sequencing technologies, multi-marker testing in genome-wide association study and rare variant association study became active research areas in statistical genetics. This dissertation contains three methodologies for association study by exploring different genetic data features and demonstrates how to use those methods to test genetic association hypothesis. The methods can be categorized into in three scenarios: 1) multi-marker testing for strong Linkage Disequilibrium regions, 2) multi-marker testing for family-based association studies, 3) multi-marker testing for rare variant association study. I also discussed the advantage of using these methods and demonstrated its power by simulation studies and applications to real genetic data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Three-dimensional flow visualization plays an essential role in many areas of science and engineering, such as aero- and hydro-dynamical systems which dominate various physical and natural phenomena. For popular methods such as the streamline visualization to be effective, they should capture the underlying flow features while facilitating user observation and understanding of the flow field in a clear manner. My research mainly focuses on the analysis and visualization of flow fields using various techniques, e.g. information-theoretic techniques and graph-based representations. Since the streamline visualization is a popular technique in flow field visualization, how to select good streamlines to capture flow patterns and how to pick good viewpoints to observe flow fields become critical. We treat streamline selection and viewpoint selection as symmetric problems and solve them simultaneously using the dual information channel [81]. To the best of my knowledge, this is the first attempt in flow visualization to combine these two selection problems in a unified approach. This work selects streamline in a view-independent manner and the selected streamlines will not change for all viewpoints. My another work [56] uses an information-theoretic approach to evaluate the importance of each streamline under various sample viewpoints and presents a solution for view-dependent streamline selection that guarantees coherent streamline update when the view changes gradually. When projecting 3D streamlines to 2D images for viewing, occlusion and clutter become inevitable. To address this challenge, we design FlowGraph [57, 58], a novel compound graph representation that organizes field line clusters and spatiotemporal regions hierarchically for occlusion-free and controllable visual exploration. We enable observation and exploration of the relationships among field line clusters, spatiotemporal regions and their interconnection in the transformed space. Most viewpoint selection methods only consider the external viewpoints outside of the flow field. This will not convey a clear observation when the flow field is clutter on the boundary side. Therefore, we propose a new way to explore flow fields by selecting several internal viewpoints around the flow features inside of the flow field and then generating a B-Spline curve path traversing these viewpoints to provide users with closeup views of the flow field for detailed observation of hidden or occluded internal flow features [54]. This work is also extended to deal with unsteady flow fields. Besides flow field visualization, some other topics relevant to visualization also attract my attention. In iGraph [31], we leverage a distributed system along with a tiled display wall to provide users with high-resolution visual analytics of big image and text collections in real time. Developing pedagogical visualization tools forms my other research focus. Since most cryptography algorithms use sophisticated mathematics, it is difficult for beginners to understand both what the algorithm does and how the algorithm does that. Therefore, we develop a set of visualization tools to provide users with an intuitive way to learn and understand these algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Crosswell data set contains a range of angles limited only by the geometry of the source and receiver configuration, the separation of the boreholes and the depth to the target. However, the wide angles reflections present in crosswell imaging result in amplitude-versus-angle (AVA) features not usually observed in surface data. These features include reflections from angles that are near critical and beyond critical for many of the interfaces; some of these reflections are visible only for a small range of angles, presumably near their critical angle. High-resolution crosswell seismic surveys were conducted over a Silurian (Niagaran) reef at two fields in northern Michigan, Springdale and Coldspring. The Springdale wells extended to much greater depths than the reef, and imaging was conducted from above and from beneath the reef. Combining the results from images obtained from above with those from beneath provides additional information, by exhibiting ranges of angles that are different for the two images, especially for reflectors at shallow depths, and second, by providing additional constraints on the solutions for Zoeppritz equations. Inversion of seismic data for impedance has become a standard part of the workflow for quantitative reservoir characterization. Inversion of crosswell data using either deterministic or geostatistical methods can lead to poor results with phase change beyond the critical angle, however, the simultaneous pre-stack inversion of partial angle stacks may be best conducted with restrictions to angles less than critical. Deterministic inversion is designed to yield only a single model of elastic properties (best-fit), while the geostatistical inversion produces multiple models (realizations) of elastic properties, lithology and reservoir properties. Geostatistical inversion produces results with far more detail than deterministic inversion. The magnitude of difference in details between both types of inversion becomes increasingly pronounced for thinner reservoirs, particularly those beyond the vertical resolution of the seismic. For any interface imaged from above and from beneath, the results AVA characters must result from identical contrasts in elastic properties in the two sets of images, albeit in reverse order. An inversion approach to handle both datasets simultaneously, at pre-critical angles, is demonstrated in this work. The main exploration problem for carbonate reefs is determining the porosity distribution. Images of elastic properties, obtained from deterministic and geostatistical simultaneous inversion of a high-resolution crosswell seismic survey were used to obtain the internal structure and reservoir properties (porosity) of Niagaran Michigan reef. The images obtained are the best of any Niagaran pinnacle reef to date.