4 resultados para Large Data Sets
em Digital Commons - Michigan Tech
Resumo:
Nitrogen and water are essential for plant growth and development. In this study, we designed experiments to produce gene expression data of poplar roots under nitrogen starvation and water deprivation conditions. We found low concentration of nitrogen led first to increased root elongation followed by lateral root proliferation and eventually increased root biomass. To identify genes regulating root growth and development under nitrogen starvation and water deprivation, we designed a series of data analysis procedures, through which, we have successfully identified biologically important genes. Differentially Expressed Genes (DEGs) analysis identified the genes that are differentially expressed under nitrogen starvation or drought. Protein domain enrichment analysis identified enriched themes (in same domains) that are highly interactive during the treatment. Gene Ontology (GO) enrichment analysis allowed us to identify biological process changed during nitrogen starvation. Based on the above analyses, we examined the local Gene Regulatory Network (GRN) and identified a number of transcription factors. After testing, one of them is a high hierarchically ranked transcription factor that affects root growth under nitrogen starvation. It is very tedious and time-consuming to analyze gene expression data. To avoid doing analysis manually, we attempt to automate a computational pipeline that now can be used for identification of DEGs and protein domain analysis in a single run. It is implemented in scripts of Perl and R.
Resumo:
A camera maps 3-dimensional (3D) world space to a 2-dimensional (2D) image space. In the process it loses the depth information, i.e., the distance from the camera focal point to the imaged objects. It is impossible to recover this information from a single image. However, by using two or more images from different viewing angles this information can be recovered, which in turn can be used to obtain the pose (position and orientation) of the camera. Using this pose, a 3D reconstruction of imaged objects in the world can be computed. Numerous algorithms have been proposed and implemented to solve the above problem; these algorithms are commonly called Structure from Motion (SfM). State-of-the-art SfM techniques have been shown to give promising results. However, unlike a Global Positioning System (GPS) or an Inertial Measurement Unit (IMU) which directly give the position and orientation respectively, the camera system estimates it after implementing SfM as mentioned above. This makes the pose obtained from a camera highly sensitive to the images captured and other effects, such as low lighting conditions, poor focus or improper viewing angles. In some applications, for example, an Unmanned Aerial Vehicle (UAV) inspecting a bridge or a robot mapping an environment using Simultaneous Localization and Mapping (SLAM), it is often difficult to capture images with ideal conditions. This report examines the use of SfM methods in such applications and the role of combining multiple sensors, viz., sensor fusion, to achieve more accurate and usable position and reconstruction information. This project investigates the role of sensor fusion in accurately estimating the pose of a camera for the application of 3D reconstruction of a scene. The first set of experiments is conducted in a motion capture room. These results are assumed as ground truth in order to evaluate the strengths and weaknesses of each sensor and to map their coordinate systems. Then a number of scenarios are targeted where SfM fails. The pose estimates obtained from SfM are replaced by those obtained from other sensors and the 3D reconstruction is completed. Quantitative and qualitative comparisons are made between the 3D reconstruction obtained by using only a camera versus that obtained by using the camera along with a LIDAR and/or an IMU. Additionally, the project also works towards the performance issue faced while handling large data sets of high-resolution images by implementing the system on the Superior high performance computing cluster at Michigan Technological University.
Resumo:
This dissertation has three separate parts: the first part deals with the general pedigree association testing incorporating continuous covariates; the second part deals with the association tests under population stratification using the conditional likelihood tests; the third part deals with the genome-wide association studies based on the real rheumatoid arthritis (RA) disease data sets from Genetic Analysis Workshop 16 (GAW16) problem 1. Many statistical tests are developed to test the linkage and association using either case-control status or phenotype covariates for family data structure, separately. Those univariate analyses might not use all the information coming from the family members in practical studies. On the other hand, the human complex disease do not have a clear inheritance pattern, there might exist the gene interactions or act independently. In part I, the new proposed approach MPDT is focused on how to use both the case control information as well as the phenotype covariates. This approach can be applied to detect multiple marker effects. Based on the two existing popular statistics in family studies for case-control and quantitative traits respectively, the new approach could be used in the simple family structure data set as well as general pedigree structure. The combined statistics are calculated using the two statistics; A permutation procedure is applied for assessing the p-value with adjustment from the Bonferroni for the multiple markers. We use simulation studies to evaluate the type I error rates and the powers of the proposed approach. Our results show that the combined test using both case-control information and phenotype covariates not only has the correct type I error rates but also is more powerful than the other existing methods. For multiple marker interactions, our proposed method is also very powerful. Selective genotyping is an economical strategy in detecting and mapping quantitative trait loci in the genetic dissection of complex disease. When the samples arise from different ethnic groups or an admixture population, all the existing selective genotyping methods may result in spurious association due to different ancestry distributions. The problem can be more serious when the sample size is large, a general requirement to obtain sufficient power to detect modest genetic effects for most complex traits. In part II, I describe a useful strategy in selective genotyping while population stratification is present. Our procedure used a principal component based approach to eliminate any effect of population stratification. The paper evaluates the performance of our procedure using both simulated data from an early study data sets and also the HapMap data sets in a variety of population admixture models generated from empirical data. There are one binary trait and two continuous traits in the rheumatoid arthritis dataset of Problem 1 in the Genetic Analysis Workshop 16 (GAW16): RA status, AntiCCP and IgM. To allow multiple traits, we suggest a set of SNP-level F statistics by the concept of multiple-correlation to measure the genetic association between multiple trait values and SNP-specific genotypic scores and obtain their null distributions. Hereby, we perform 6 genome-wide association analyses using the novel one- and two-stage approaches which are based on single, double and triple traits. Incorporating all these 6 analyses, we successfully validate the SNPs which have been identified to be responsible for rheumatoid arthritis in the literature and detect more disease susceptibility SNPs for follow-up studies in the future. Except for chromosome 13 and 18, each of the others is found to harbour susceptible genetic regions for rheumatoid arthritis or related diseases, i.e., lupus erythematosus. This topic is discussed in part III.
Resumo:
One of the original ocean-bottom time-lapse seismic studies was performed at the Teal South oil field in the Gulf of Mexico during the late 1990’s. This work reexamines some aspects of previous work using modern analysis techniques to provide improved quantitative interpretations. Using three-dimensional volume visualization of legacy data and the two phases of post-production time-lapse data, I provide additional insight into the fluid migration pathways and the pressure communication between different reservoirs, separated by faults. This work supports a conclusion from previous studies that production from one reservoir caused regional pressure decline that in turn resulted in liberation of gas from multiple surrounding unproduced reservoirs. I also provide an explanation for unusual time-lapse changes in amplitude-versus-offset (AVO) data related to the compaction of the producing reservoir which, in turn, changed an isotropic medium to an anisotropic medium. In the first part of this work, I examine regional changes in seismic response due to the production of oil and gas from one reservoir. The previous studies primarily used two post-production ocean-bottom surveys (Phase I and Phase II), and not the legacy streamer data, due to the unavailability of legacy prestack data and very different acquisition parameters. In order to incorporate the legacy data in the present study, all three poststack data sets were cross-equalized and examined using instantaneous amplitude and energy volumes. This approach appears quite effective and helps to suppress changes unrelated to production while emphasizing those large-amplitude changes that are related to production in this noisy (by current standards) suite of data. I examine the multiple data sets first by using the instantaneous amplitude and energy attributes, and then also examine specific apparent time-lapse changes through direct comparisons of seismic traces. In so doing, I identify time-delays that, when corrected for, indicate water encroachment at the base of the producing reservoir. I also identify specific sites of leakage from various unproduced reservoirs, the result of regional pressure blowdown as explained in previous studies; those earlier studies, however, were unable to identify direct evidence of fluid movement. Of particular interest is the identification of one site where oil apparently leaked from one reservoir into a “new” reservoir that did not originally contain oil, but was ideally suited as a trap for fluids leaking from the neighboring spill-point. With continued pressure drop, oil in the new reservoir increased as more oil entered into the reservoir and expanded, liberating gas from solution. Because of the limited volume available for oil and gas in that temporary trap, oil and gas also escaped from it into the surrounding formation. I also note that some of the reservoirs demonstrate time-lapse changes only in the “gas cap” and not in the oil zone, even though gas must be coming out of solution everywhere in the reservoir. This is explained by interplay between pore-fluid modulus reduction by gas saturation decrease and dry-frame modulus increase by frame stiffening. In the second part of this work, I examine various rock-physics models in an attempt to quantitatively account for frame-stiffening that results from reduced pore-fluid pressure in the producing reservoir, searching for a model that would predict the unusual AVO features observed in the time-lapse prestack and stacked data at Teal South. While several rock-physics models are successful at predicting the time-lapse response for initial production, most fail to match the observations for continued production between Phase I and Phase II. Because the reservoir was initially overpressured and unconsolidated, reservoir compaction was likely significant, and is probably accomplished largely by uniaxial strain in the vertical direction; this implies that an anisotropic model may be required. Using Walton’s model for anisotropic unconsolidated sand, I successfully model the time-lapse changes for all phases of production. This observation may be of interest for application to other unconsolidated overpressured reservoirs under production.