104 resultados para statistik
Resumo:
Background qtl.outbred is an extendible interface in the statistical environment, R, for combining quantitative trait loci (QTL) mapping tools. It is built as an umbrella package that enables outbred genotype probabilities to be calculated and/or imported into the software package R/qtl. Findings Using qtl.outbred, the genotype probabilities from outbred line cross data can be calculated by interfacing with a new and efficient algorithm developed for analyzing arbitrarily large datasets (included in the package) or imported from other sources such as the web-based tool, GridQTL. Conclusion qtl.outbred will improve the speed for calculating probabilities and the ability to analyse large future datasets. This package enables the user to analyse outbred line cross data accurately, but with similar effort than inbred line cross data.
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
Random effect models have been widely applied in many fields of research. However, models with uncertain design matrices for random effects have been little investigated before. In some applications with such problems, an expectation method has been used for simplicity. This method does not include the extra information of uncertainty in the design matrix is not included. The closed solution for this problem is generally difficult to attain. We therefore propose an two-step algorithm for estimating the parameters, especially the variance components in the model. The implementation is based on Monte Carlo approximation and a Newton-Raphson-based EM algorithm. As an example, a simulated genetics dataset was analyzed. The results showed that the proportion of the total variance explained by the random effects was accurately estimated, which was highly underestimated by the expectation method. By introducing heuristic search and optimization methods, the algorithm can possibly be developed to infer the 'model-based' best design matrix and the corresponding best estimates.
Resumo:
Background: Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods: We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results: Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion: The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.
Resumo:
MAPfastR is a software package developed to analyze QTL data from inbred and outbred line-crosses. The package includes a number of modules for fast and accurate QTL analyses. It has been developed in the R language for fast and comprehensive analyses of large datasets. MAPfastR is freely available at: http://www.computationalgenetics.se/?page_id=7.
Resumo:
BACKGROUND: Canalization is defined as the stability of a genotype against minor variations in both environment and genetics. Genetic variation in degree of canalization causes heterogeneity of within-family variance. The aims of this study are twofold: (1) quantify genetic heterogeneity of (within-family) residual variance in Atlantic salmon and (2) test whether the observed heterogeneity of (within-family) residual variance can be explained by simple scaling effects. RESULTS: Analysis of body weight in Atlantic salmon using a double hierarchical generalized linear model (DHGLM) revealed substantial heterogeneity of within-family variance. The 95% prediction interval for within-family variance ranged from ~0.4 to 1.2 kg2, implying that the within-family variance of the most extreme high families is expected to be approximately three times larger than the extreme low families. For cross-sectional data, DHGLM with an animal mean sub-model resulted in severe bias, while a corresponding sire-dam model was appropriate. Heterogeneity of variance was not sensitive to Box-Cox transformations of phenotypes, which implies that heterogeneity of variance exists beyond what would be expected from simple scaling effects. CONCLUSIONS: Substantial heterogeneity of within-family variance was found for body weight in Atlantic salmon. A tendency towards higher variance with higher means (scaling effects) was observed, but heterogeneity of within-family variance existed beyond what could be explained by simple scaling effects. For cross-sectional data, using the animal mean sub-model in the DHGLM resulted in biased estimates of variance components, which differed substantially both from a standard linear mean animal model and a sire-dam DHGLM model. Although genetic differences in canalization were observed, selection for increased canalization is difficult, because there is limited individual information for the variance sub-model, especially when based on cross-sectional data. Furthermore, potential macro-environmental changes (diet, climatic region, etc.) may make genetic heterogeneity of variance a less stable trait over time and space.
Resumo:
GPS technology has been embedded into portable, low-cost electronic devices nowadays to track the movements of mobile objects. This implication has greatly impacted the transportation field by creating a novel and rich source of traffic data on the road network. Although the promise offered by GPS devices to overcome problems like underreporting, respondent fatigue, inaccuracies and other human errors in data collection is significant; the technology is still relatively new that it raises many issues for potential users. These issues tend to revolve around the following areas: reliability, data processing and the related application. This thesis aims to study the GPS tracking form the methodological, technical and practical aspects. It first evaluates the reliability of GPS based traffic data based on data from an experiment containing three different traffic modes (car, bike and bus) traveling along the road network. It then outline the general procedure for processing GPS tracking data and discuss related issues that are uncovered by using real-world GPS tracking data of 316 cars. Thirdly, it investigates the influence of road network density in finding optimal location for enhancing travel efficiency and decreasing travel cost. The results show that the geographical positioning is reliable. Velocity is slightly underestimated, whereas altitude measurements are unreliable.Post processing techniques with auxiliary information is found necessary and important when solving the inaccuracy of GPS data. The densities of the road network influence the finding of optimal locations. The influence will stabilize at a certain level and do not deteriorate when the node density is higher.
Resumo:
This thesis contributes to the heuristic optimization of the p-median problem and Swedish population redistribution. The p-median model is the most representative model in the location analysis. When facilities are located to a population geographically distributed in Q demand points, the p-median model systematically considers all the demand points such that each demand point will have an effect on the decision of the location. However, a series of questions arise. How do we measure the distances? Does the number of facilities to be located have a strong impact on the result? What scale of the network is suitable? How good is our solution? We have scrutinized a lot of issues like those. The reason why we are interested in those questions is that there are a lot of uncertainties in the solutions. We cannot guarantee our solution is good enough for making decisions. The technique of heuristic optimization is formulated in the thesis. Swedish population redistribution is examined by a spatio-temporal covariance model. A descriptive analysis is not always enough to describe the moving effects from the neighbouring population. A correlation or a covariance analysis is more explicit to show the tendencies. Similarly, the optimization technique of the parameter estimation is required and is executed in the frame of statistical modeling.
Resumo:
Planning policies in several European countries have aimed at hindering the expansion of out-of-town shopping centers. One argument for this is concern for the increase in transport and a resulting increase in environmental externalities such as CO2-emissions. This concern is weakly founded in science as few studies have attempted to measure CO2-emissions of shopping trips as a function of the location of the shopping centers. In this paper we conduct a counter-factual analysis comparing downtown, edge-of-town and out-of-town shopping. In this comparison we use GPS to track 250 consumers over a time-span of two months in a Swedish region. The GPS-data enters the Oguchi’s formula to obtain shopping trip-specific CO2-emissions. We find that consumers’ out-of-town shopping would generate an excess of 60 per cent CO2-emissions whereas downtown and edge-of-town shopping centers are comparable.
Resumo:
Most previous studies have focused on entire trips in a geographic region, while a few of them addressed trips induced by a city landmark. Therefore paper explores trips and their CO2 emissions induced by a shopping center from a time-space perspective and their usage in relocation planning. This is conducted by the means of a case study in the city of Borlänge in mid-Sweden where trips to the city’s largest shopping mall in its center are examined. We use GPS tracking data of car trips that end and start at the shopping center. Thereafter, (1) we analyze the traffic emission patterns from a time-space perspective where temporal patterns reveal an hourly-based traffic emission dynamics and where spatial patterns uncover a heterogeneous distribution of traffic emissions in spatial areas and individual street segments. Further, (2) this study reports that most of the observed trips follow an optimal route in terms of CO2 emissions. In this respect, (3) we evaluate how well placed the current shopping center is through a comparison with two competing locations. We conclude that the two suggested locations, which are close to the current shopping center, do not show a significant improvement in term of CO2 emissions.
Resumo:
The p-median model is used to locate P facilities to serve a geographically distributed population. Conventionally, it is assumed that the population always travels to the nearest facility. Drezner and Drezner (2006, 2007) provide three arguments on why this assumption might be incorrect, and they introduce the extended the gravity p-median model to relax the assumption. We favour the gravity p-median model, but we note that in an applied setting, Drezner and Drezner’s arguments are incomplete. In this communication, we point at the existence of a fourth compelling argument for the gravity p-median model.
Resumo:
Location Models are usedfor planning the location of multiple service centers in order to serve a geographicallydistributed population. A cornerstone of such models is the measure of distancebetween the service center and a set of demand points, viz, the location of thepopulation (customers, pupils, patients and so on). Theoretical as well asempirical evidence support the current practice of using the Euclidian distancein metropolitan areas. In this paper, we argue and provide empirical evidencethat such a measure is misleading once the Location Models are applied to ruralareas with heterogeneous transport networks. This paper stems from the problemof finding an optimal allocation of a pre-specified number of hospitals in alarge Swedish region with a low population density. We conclude that the Euclidianand the network distances based on a homogenous network (equal travel costs inthe whole network) give approximately the same optimums. However networkdistances calculated from a heterogeneous network (different travel costs indifferent parts of the network) give widely different optimums when the numberof hospitals increases. In terms ofaccessibility we find that the recent closure of hospitals and the in-optimallocation of the remaining ones has increased the average travel distance by 75%for the population. Finally, aggregation the population misplaces the hospitalsby on average 10 km.
Resumo:
In this paper, the p-median model is used to find the location of retail stores that minimizes CO2 emissions from consumer travel. The optimal location is then compared with the existing retail location,and the excess CO2 emissions compared with the optimal solution is calculated. The results show that by using the environmentally optimal location, CO2 emissions from consumer travel could be reduced by approximately 25percent.
Resumo:
The p-median problem is often used to locate p service centers by minimizing their distances to a geographically distributed demand (n). The optimal locations are sensitive to geographical context such as road network and demand points especially when they are asymmetrically distributed in the plane. Most studies focus on evaluating performances of the p-median model when p and n vary. To our knowledge this is not a very well-studied problem when the road network is alternated especially when it is applied in a real world context. The aim in this study is to analyze how the optimal location solutions vary, using the p-median model, when the density in the road network is alternated. The investigation is conducted by the means of a case study in a region in Sweden with an asymmetrically distributed population (15,000 weighted demand points), Dalecarlia. To locate 5 to 50 service centers we use the national transport administrations official road network (NVDB). The road network consists of 1.5 million nodes. To find the optimal location we start with 500 candidate nodes in the network and increase the number of candidate nodes in steps up to 67,000. To find the optimal solution we use a simulated annealing algorithm with adaptive tuning of the temperature. The results show that there is a limited improvement in the optimal solutions when nodes in the road network increase and p is low. When p is high the improvements are larger. The results also show that choice of the best network depends on p. The larger p the larger density of the network is needed.
Resumo:
This study covers a period when society changed from a pre-industrial agricultural society to a post-industrial service-producing society. Parallel with this social transformation, major population changes took place. In this study, we analyse how local population changes are affected by neighbouring populations. To do so we use the last 200 years of local population change that redistributed population in Sweden. We use literature to identify several different processes and spatial dependencies in the redistribution between a parish and its surrounding parishes. The analysis is based on a unique unchanged historical parish division, and we use an index of local spatial correlation to describe different kinds of spatial dependencies that have influenced the redistribution of the population. To control inherent time dependencies, we introduce a non-separable spatial temporal correlation model into the analysis of population redistribution. Hereby, several different spatial dependencies can be observed simultaneously over time. The main conclusions are that while local population changes have been highly dependent on the neighbouring populations in the 19th century, this spatial dependence have become insignificant already when two parishes is separated by 5 kilometres in the late 20th century. Another conclusion is that the time dependency in the population change is higher when the population redistribution is weak, as it currently is and as it was during the 19th century until the start of industrial revolution.