10 resultados para rainfall-runoff empirical statistical model
em Helda - Digital Repository of University of Helsinki
Resumo:
In genetic epidemiology, population-based disease registries are commonly used to collect genotype or other risk factor information concerning affected subjects and their relatives. This work presents two new approaches for the statistical inference of ascertained data: a conditional and full likelihood approaches for the disease with variable age at onset phenotype using familial data obtained from population-based registry of incident cases. The aim is to obtain statistically reliable estimates of the general population parameters. The statistical analysis of familial data with variable age at onset becomes more complicated when some of the study subjects are non-susceptible, that is to say these subjects never get the disease. A statistical model for a variable age at onset with long-term survivors is proposed for studies of familial aggregation, using latent variable approach, as well as for prospective studies of genetic association studies with candidate genes. In addition, we explore the possibility of a genetic explanation of the observed increase in the incidence of Type 1 diabetes (T1D) in Finland in recent decades and the hypothesis of non-Mendelian transmission of T1D associated genes. Both classical and Bayesian statistical inference were used in the modelling and estimation. Despite the fact that this work contains five studies with different statistical models, they all concern data obtained from nationwide registries of T1D and genetics of T1D. In the analyses of T1D data, non-Mendelian transmission of T1D susceptibility alleles was not observed. In addition, non-Mendelian transmission of T1D susceptibility genes did not make a plausible explanation for the increase in T1D incidence in Finland. Instead, the Human Leucocyte Antigen associations with T1D were confirmed in the population-based analysis, which combines T1D registry information, reference sample of healthy subjects and birth cohort information of the Finnish population. Finally, a substantial familial variation in the susceptibility of T1D nephropathy was observed. The presented studies show the benefits of sophisticated statistical modelling to explore risk factors for complex diseases.
Resumo:
Various reasons, such as ethical issues in maintaining blood resources, growing costs, and strict requirements for safe blood, have increased the pressure for efficient use of resources in blood banking. The competence of blood establishments can be characterized by their ability to predict the volume of blood collection to be able to provide cellular blood components in a timely manner as dictated by hospital demand. The stochastically varying clinical need for platelets (PLTs) sets a specific challenge for balancing supply with requests. Labour has been proven a primary cost-driver and should be managed efficiently. International comparisons of blood banking could recognize inefficiencies and allow reallocation of resources. Seventeen blood centres from 10 countries in continental Europe, Great Britain, and Scandinavia participated in this study. The centres were national institutes (5), parts of the local Red Cross organisation (5), or integrated into university hospitals (7). This study focused on the departments of blood component preparation of the centres. The data were obtained retrospectively by computerized questionnaires completed via Internet for the years 2000-2002. The data were used in four original articles (numbered I through IV) that form the basis of this thesis. Non-parametric data envelopment analysis (DEA, II-IV) was applied to evaluate and compare the relative efficiency of blood component preparation. Several models were created using different input and output combinations. The focus of comparisons was on the technical efficiency (II-III) and the labour efficiency (I, IV). An empirical cost model was tested to evaluate the cost efficiency (IV). Purchasing power parities (PPP, IV) were used to adjust the costs of the working hours and to make the costs comparable among countries. The total annual number of whole blood (WB) collections varied from 8,880 to 290,352 in the centres (I). Significant variation was also observed in the annual volume of produced red blood cells (RBCs) and PLTs. The annual number of PLTs produced by any method varied from 2,788 to 104,622 units. In 2002, 73% of all PLTs were produced by the buffy coat (BC) method, 23% by aphaeresis and 4% by the platelet-rich plasma (PRP) method. The annual discard rate of PLTs varied from 3.9% to 31%. The mean discard rate (13%) remained in the same range throughout the study period and demonstrated similar levels and variation in 2003-2004 according to a specific follow-up question (14%, range 3.8%-24%). The annual PLT discard rates were, to some extent, associated with production volumes. The mean RBC discard rate was 4.5% (range 0.2%-7.7%). Technical efficiency showed marked variation (median 60%, range 41%-100%) among the centres (II). Compared to the efficient departments, the inefficient departments used excess labour resources (and probably) production equipment to produce RBCs and PLTs. Technical efficiency tended to be higher when the (theoretical) proportion of lost WB collections (total RBC+PLT loss) from all collections was low (III). The labour efficiency varied remarkably, from 25% to 100% (median 47%) when working hours were the only input (IV). Using the estimated total costs as the input (cost efficiency) revealed an even greater variation (13%-100%) and overall lower efficiency level compared to labour only as the input. In cost efficiency only, the savings potential (observed inefficiency) was more than 50% in 10 departments, whereas labour and cost savings potentials were both more than 50% in six departments. The association between department size and efficiency (scale efficiency) could not be verified statistically in the small sample. In conclusion, international evaluation of the technical efficiency in component preparation departments revealed remarkable variation. A suboptimal combination of manpower and production output levels was the major cause of inefficiency, and the efficiency did not directly relate to production volume. Evaluation of the reasons for discarding components may offer a novel approach to study efficiency. DEA was proven applicable in analyses including various factors as inputs and outputs. This study suggests that analytical models can be developed to serve as indicators of technical efficiency and promote improvements in the management of limited resources. The work also demonstrates the importance of integrating efficiency analysis into international comparisons of blood banking.
Resumo:
Microarrays are high throughput biological assays that allow the screening of thousands of genes for their expression. The main idea behind microarrays is to compute for each gene a unique signal that is directly proportional to the quantity of mRNA that was hybridized on the chip. A large number of steps and errors associated with each step make the generated expression signal noisy. As a result, microarray data need to be carefully pre-processed before their analysis can be assumed to lead to reliable and biologically relevant conclusions. This thesis focuses on developing methods for improving gene signal and further utilizing this improved signal for higher level analysis. To achieve this, first, approaches for designing microarray experiments using various optimality criteria, considering both biological and technical replicates, are described. A carefully designed experiment leads to signal with low noise, as the effect of unwanted variations is minimized and the precision of the estimates of the parameters of interest are maximized. Second, a system for improving the gene signal by using three scans at varying scanner sensitivities is developed. A novel Bayesian latent intensity model is then applied on these three sets of expression values, corresponding to the three scans, to estimate the suitably calibrated true signal of genes. Third, a novel image segmentation approach that segregates the fluorescent signal from the undesired noise is developed using an additional dye, SYBR green RNA II. This technique helped in identifying signal only with respect to the hybridized DNA, and signal corresponding to dust, scratch, spilling of dye, and other noises, are avoided. Fourth, an integrated statistical model is developed, where signal correction, systematic array effects, dye effects, and differential expression, are modelled jointly as opposed to a sequential application of several methods of analysis. The methods described in here have been tested only for cDNA microarrays, but can also, with some modifications, be applied to other high-throughput technologies. Keywords: High-throughput technology, microarray, cDNA, multiple scans, Bayesian hierarchical models, image analysis, experimental design, MCMC, WinBUGS.
Resumo:
The Baltic Sea is a geologically young, large brackish water basin, and few of the species living there have fully adapted to its special conditions. Many of the species live on the edge of their distribution range in terms of one or more environmental variables such as salinity or temperature. Environmental fluctuations are know to cause fluctuations in populations abundance, and this effect is especially strong near the edges of the distribution range, where even small changes in an environmental variable can be critical to the success of a species. This thesis examines which environmental factors are the most important in relation to the success of various commercially exploited fish species in the northern Baltic Sea. It also examines the uncertainties related to fish stocks current and potential status as well as to their relationship with their environment. The aim is to quantify the uncertainties related to fisheries and environmental management, to find potential management strategies that can be used to reduce uncertainty in management results and to develop methodology related to uncertainty estimation in natural resources management. Bayesian statistical methods are utilized due to their ability to treat uncertainty explicitly in all parts of the statistical model. The results show that uncertainty about important parameters of even the most intensively studied fish species such as salmon (Salmo salar L.) and Baltic herring (Clupea harengus membras L.) is large. On the other hand, management approaches that reduce uncertainty can be found. These include utilising information about ecological similarity of fish stocks and species, and using management variables that are directly related to stock parameters that can be measured easily and without extrapolations or assumptions.
Resumo:
Accelerator mass spectrometry (AMS) is an ultrasensitive technique for measuring the concentration of a single isotope. The electric and magnetic fields of an electrostatic accelerator system are used to filter out other isotopes from the ion beam. The high velocity means that molecules can be destroyed and removed from the measurement background. As a result, concentrations down to one atom in 10^16 atoms are measurable. This thesis describes the construction of the new AMS system in the Accelerator Laboratory of the University of Helsinki. The system is described in detail along with the relevant ion optics. System performance and some of the 14C measurements done with the system are described. In a second part of the thesis, a novel statistical model for the analysis of AMS data is presented. Bayesian methods are used in order to make the best use of the available information. In the new model, instrumental drift is modelled with a continuous first-order autoregressive process. This enables rigorous normalization to standards measured at different times. The Poisson statistical nature of a 14C measurement is also taken into account properly, so that uncertainty estimates are much more stable. It is shown that, overall, the new model improves both the accuracy and the precision of AMS measurements. In particular, the results can be improved for samples with very low 14C concentrations or measured only a few times.
Resumo:
In Helsinki's evangelical lutheran congregations, the share of the people being members of that church compared with all the people living in their specific geographical areas varies from 62,4 per cent in Paavali to 80,7 per cent in Munkkiniemi. The boundaries of the congregations are about to be redrawn to level the differences in the congregations. In this thesis, the reasons of the differences in Helsinki s districts were studied closer. The data consisted of statistical information gathered from the Population Information System of Finland. It included information by age groups about the population register keeper, marital status, native tongue, level of education and gender in the end of 2005. Additional data was gathered from Helsinki Region Statistics web service. It included information about the dwelling, level of income and main activities of the inhabitants in the districts. The main method was stepwise linear regression. Minor methods were crosstabulation and correlation matrixes. The result of the study was a statistical model that explains 72,2 per cent of the variation of the shares in the congregations. The dependent variable was the share of the people being members of evangelical lutheran church in the dirstricts. The independent variables were the share of the people having other than Finnish or Swedish as their native tongue, the share of rented apartments, the shares of apartments including four rooms and a kitchen, the share of detached houses in the districts and the shares of women and people with no income in the districts. The independent variables present in the model depict the amount of foreigners, dwellings, gender and the level of income of the population. The high share of foreigners, people with no income and rented apartments explain the low share of the people being members of evangelical lutheran church. On the contrary, the high share of the people being members of evangelical lutheran church in the district is explained by the large apartments, detached houses and amount of women living there.
Resumo:
Population dynamics are generally viewed as the result of intrinsic (purely density dependent) and extrinsic (environmental) processes. Both components, and potential interactions between those two, have to be modelled in order to understand and predict dynamics of natural populations; a topic that is of great importance in population management and conservation. This thesis focuses on modelling environmental effects in population dynamics and how effects of potentially relevant environmental variables can be statistically identified and quantified from time series data. Chapter I presents some useful models of multiplicative environmental effects for unstructured density dependent populations. The presented models can be written as standard multiple regression models that are easy to fit to data. Chapters II IV constitute empirical studies that statistically model environmental effects on population dynamics of several migratory bird species with different life history characteristics and migration strategies. In Chapter II, spruce cone crops are found to have a strong positive effect on the population growth of the great spotted woodpecker (Dendrocopos major), while cone crops of pine another important food resource for the species do not effectively explain population growth. The study compares rate- and ratio-dependent effects of cone availability, using state-space models that distinguish between process and observation error in the time series data. Chapter III shows how drought, in combination with settling behaviour during migration, produces asymmetric spatially synchronous patterns of population dynamics in North American ducks (genus Anas). Chapter IV investigates the dynamics of a Finnish population of skylark (Alauda arvensis), and point out effects of rainfall and habitat quality on population growth. Because the skylark time series and some of the environmental variables included show strong positive autocorrelation, the statistical significances are calculated using a Monte Carlo method, where random autocorrelated time series are generated. Chapter V is a simulation-based study, showing that ignoring observation error in analyses of population time series data can bias the estimated effects and measures of uncertainty, if the environmental variables are autocorrelated. It is concluded that the use of state-space models is an effective way to reach more accurate results. In summary, there are several biological assumptions and methodological issues that can affect the inferential outcome when estimating environmental effects from time series data, and that therefore need special attention. The functional form of the environmental effects and potential interactions between environment and population density are important to deal with. Other issues that should be considered are assumptions about density dependent regulation, modelling potential observation error, and when needed, accounting for spatial and/or temporal autocorrelation.
Resumo:
Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.
Resumo:
This thesis studies binary time series models and their applications in empirical macroeconomics and finance. In addition to previously suggested models, new dynamic extensions are proposed to the static probit model commonly used in the previous literature. In particular, we are interested in probit models with an autoregressive model structure. In Chapter 2, the main objective is to compare the predictive performance of the static and dynamic probit models in forecasting the U.S. and German business cycle recession periods. Financial variables, such as interest rates and stock market returns, are used as predictive variables. The empirical results suggest that the recession periods are predictable and dynamic probit models, especially models with the autoregressive structure, outperform the static model. Chapter 3 proposes a Lagrange Multiplier (LM) test for the usefulness of the autoregressive structure of the probit model. The finite sample properties of the LM test are considered with simulation experiments. Results indicate that the two alternative LM test statistics have reasonable size and power in large samples. In small samples, a parametric bootstrap method is suggested to obtain approximately correct size. In Chapter 4, the predictive power of dynamic probit models in predicting the direction of stock market returns are examined. The novel idea is to use recession forecast (see Chapter 2) as a predictor of the stock return sign. The evidence suggests that the signs of the U.S. excess stock returns over the risk-free return are predictable both in and out of sample. The new "error correction" probit model yields the best forecasts and it also outperforms other predictive models, such as ARMAX models, in terms of statistical and economic goodness-of-fit measures. Chapter 5 generalizes the analysis of univariate models considered in Chapters 2 4 to the case of a bivariate model. A new bivariate autoregressive probit model is applied to predict the current state of the U.S. business cycle and growth rate cycle periods. Evidence of predictability of both cycle indicators is obtained and the bivariate model is found to outperform the univariate models in terms of predictive power.