2 resultados para Population-size
em Digital Commons - Michigan Tech
Resumo:
This dissertation has three separate parts: the first part deals with the general pedigree association testing incorporating continuous covariates; the second part deals with the association tests under population stratification using the conditional likelihood tests; the third part deals with the genome-wide association studies based on the real rheumatoid arthritis (RA) disease data sets from Genetic Analysis Workshop 16 (GAW16) problem 1. Many statistical tests are developed to test the linkage and association using either case-control status or phenotype covariates for family data structure, separately. Those univariate analyses might not use all the information coming from the family members in practical studies. On the other hand, the human complex disease do not have a clear inheritance pattern, there might exist the gene interactions or act independently. In part I, the new proposed approach MPDT is focused on how to use both the case control information as well as the phenotype covariates. This approach can be applied to detect multiple marker effects. Based on the two existing popular statistics in family studies for case-control and quantitative traits respectively, the new approach could be used in the simple family structure data set as well as general pedigree structure. The combined statistics are calculated using the two statistics; A permutation procedure is applied for assessing the p-value with adjustment from the Bonferroni for the multiple markers. We use simulation studies to evaluate the type I error rates and the powers of the proposed approach. Our results show that the combined test using both case-control information and phenotype covariates not only has the correct type I error rates but also is more powerful than the other existing methods. For multiple marker interactions, our proposed method is also very powerful. Selective genotyping is an economical strategy in detecting and mapping quantitative trait loci in the genetic dissection of complex disease. When the samples arise from different ethnic groups or an admixture population, all the existing selective genotyping methods may result in spurious association due to different ancestry distributions. The problem can be more serious when the sample size is large, a general requirement to obtain sufficient power to detect modest genetic effects for most complex traits. In part II, I describe a useful strategy in selective genotyping while population stratification is present. Our procedure used a principal component based approach to eliminate any effect of population stratification. The paper evaluates the performance of our procedure using both simulated data from an early study data sets and also the HapMap data sets in a variety of population admixture models generated from empirical data. There are one binary trait and two continuous traits in the rheumatoid arthritis dataset of Problem 1 in the Genetic Analysis Workshop 16 (GAW16): RA status, AntiCCP and IgM. To allow multiple traits, we suggest a set of SNP-level F statistics by the concept of multiple-correlation to measure the genetic association between multiple trait values and SNP-specific genotypic scores and obtain their null distributions. Hereby, we perform 6 genome-wide association analyses using the novel one- and two-stage approaches which are based on single, double and triple traits. Incorporating all these 6 analyses, we successfully validate the SNPs which have been identified to be responsible for rheumatoid arthritis in the literature and detect more disease susceptibility SNPs for follow-up studies in the future. Except for chromosome 13 and 18, each of the others is found to harbour susceptible genetic regions for rheumatoid arthritis or related diseases, i.e., lupus erythematosus. This topic is discussed in part III.
Resumo:
A range of societal issues have been caused by fossil fuel consumption in the transportation sector in the United States (U.S.), including health related air pollution, climate change, the dependence on imported oil, and other oil related national security concerns. Biofuels production from various lignocellulosic biomass types such as wood, forest residues, and agriculture residues have the potential to replace a substantial portion of the total fossil fuel consumption. This research focuses on locating biofuel facilities and designing the biofuel supply chain to minimize the overall cost. For this purpose an integrated methodology was proposed by combining the GIS technology with simulation and optimization modeling methods. The GIS based methodology was used as a precursor for selecting biofuel facility locations by employing a series of decision factors. The resulted candidate sites for biofuel production served as inputs for simulation and optimization modeling. As a precursor to simulation or optimization modeling, the GIS-based methodology was used to preselect potential biofuel facility locations for biofuel production from forest biomass. Candidate locations were selected based on a set of evaluation criteria, including: county boundaries, a railroad transportation network, a state/federal road transportation network, water body (rivers, lakes, etc.) dispersion, city and village dispersion, a population census, biomass production, and no co-location with co-fired power plants. The simulation and optimization models were built around key supply activities including biomass harvesting/forwarding, transportation and storage. The built onsite storage served for spring breakup period where road restrictions were in place and truck transportation on certain roads was limited. Both models were evaluated using multiple performance indicators, including cost (consisting of the delivered feedstock cost, and inventory holding cost), energy consumption, and GHG emissions. The impact of energy consumption and GHG emissions were expressed in monetary terms to keep consistent with cost. Compared with the optimization model, the simulation model represents a more dynamic look at a 20-year operation by considering the impacts associated with building inventory at the biorefinery to address the limited availability of biomass feedstock during the spring breakup period. The number of trucks required per day was estimated and the inventory level all year around was tracked. Through the exchange of information across different procedures (harvesting, transportation, and biomass feedstock processing procedures), a smooth flow of biomass from harvesting areas to a biofuel facility was implemented. The optimization model was developed to address issues related to locating multiple biofuel facilities simultaneously. The size of the potential biofuel facility is set up with an upper bound of 50 MGY and a lower bound of 30 MGY. The optimization model is a static, Mathematical Programming Language (MPL)-based application which allows for sensitivity analysis by changing inputs to evaluate different scenarios. It was found that annual biofuel demand and biomass availability impacts the optimal results of biofuel facility locations and sizes.