15 resultados para Statistical-models

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objective of this study was to gain an understanding of the effects of population heterogeneity, missing data, and causal relationships on parameter estimates from statistical models when analyzing change in medication use. From a public health perspective, two timely topics were addressed: the use and effects of statins in populations in primary prevention of cardiovascular disease and polypharmacy in older population. Growth mixture models were applied to characterize the accumulation of cardiovascular and diabetes medications among apparently healthy population of statin initiators. The causal effect of statin adherence on the incidence of acute cardiovascular events was estimated using marginal structural models in comparison with discrete-time hazards models. The impact of missing data on the growth estimates of evolution of polypharmacy was examined comparing statistical models under different assumptions for missing data mechanism. The data came from Finnish administrative registers and from the population-based Geriatric Multidisciplinary Strategy for the Good Care of the Elderly study conducted in Kuopio, Finland, during 2004–07. Five distinct patterns of accumulating medications emerged among the population of apparently healthy statin initiators during two years after statin initiation. Proper accounting for time-varying dependencies between adherence to statins and confounders using marginal structural models produced comparable estimation results with those from a discrete-time hazards model. Missing data mechanism was shown to be a key component when estimating the evolution of polypharmacy among older persons. In conclusion, population heterogeneity, missing data and causal relationships are important aspects in longitudinal studies that associate with the study question and should be critically assessed when performing statistical analyses. Analyses should be supplemented with sensitivity analyses towards model assumptions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Statistical analyses of measurements that can be described by statistical models are of essence in astronomy and in scientific inquiry in general. The sensitivity of such analyses, modelling approaches, and the consequent predictions, is sometimes highly dependent on the exact techniques applied, and improvements therein can result in significantly better understanding of the observed system of interest. Particularly, optimising the sensitivity of statistical techniques in detecting the faint signatures of low-mass planets orbiting the nearby stars is, together with improvements in instrumentation, essential in estimating the properties of the population of such planets, and in the race to detect Earth-analogs, i.e. planets that could support liquid water and, perhaps, life on their surfaces. We review the developments in Bayesian statistical techniques applicable to detections planets orbiting nearby stars and astronomical data analysis problems in general. We also discuss these techniques and demonstrate their usefulness by using various examples and detailed descriptions of the respective mathematics involved. We demonstrate the practical aspects of Bayesian statistical techniques by describing several algorithms and numerical techniques, as well as theoretical constructions, in the estimation of model parameters and in hypothesis testing. We also apply these algorithms to Doppler measurements of nearby stars to show how they can be used in practice to obtain as much information from the noisy data as possible. Bayesian statistical techniques are powerful tools in analysing and interpreting noisy data and should be preferred in practice whenever computational limitations are not too restrictive.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The identifiability of the parameters of a heat exchanger model without phase change was studied in this Master’s thesis using synthetically made data. A fast, two-step Markov chain Monte Carlo method (MCMC) was tested with a couple of case studies and a heat exchanger model. The two-step MCMC-method worked well and decreased the computation time compared to the traditional MCMC-method. The effect of measurement accuracy of certain control variables to the identifiability of parameters was also studied. The accuracy used did not seem to have a remarkable effect to the identifiability of parameters. The use of the posterior distribution of parameters in different heat exchanger geometries was studied. It would be computationally most efficient to use the same posterior distribution among different geometries in the optimisation of heat exchanger networks. According to the results, this was possible in the case when the frontal surface areas were the same among different geometries. In the other cases the same posterior distribution can be used for optimisation too, but that will give a wider predictive distribution as a result. For condensing surface heat exchangers the numerical stability of the simulation model was studied. As a result, a stable algorithm was developed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Yksi keskeisimmistä tehtävistä matemaattisten mallien tilastollisessa analyysissä on mallien tuntemattomien parametrien estimointi. Tässä diplomityössä ollaan kiinnostuneita tuntemattomien parametrien jakaumista ja niiden muodostamiseen sopivista numeerisista menetelmistä, etenkin tapauksissa, joissa malli on epälineaarinen parametrien suhteen. Erilaisten numeeristen menetelmien osalta pääpaino on Markovin ketju Monte Carlo -menetelmissä (MCMC). Nämä laskentaintensiiviset menetelmät ovat viime aikoina kasvattaneet suosiotaan lähinnä kasvaneen laskentatehon vuoksi. Sekä Markovin ketjujen että Monte Carlo -simuloinnin teoriaa on esitelty työssä siinä määrin, että menetelmien toimivuus saadaan perusteltua. Viime aikoina kehitetyistä menetelmistä tarkastellaan etenkin adaptiivisia MCMC menetelmiä. Työn lähestymistapa on käytännönläheinen ja erilaisia MCMC -menetelmien toteutukseen liittyviä asioita korostetaan. Työn empiirisessä osuudessa tarkastellaan viiden esimerkkimallin tuntemattomien parametrien jakaumaa käyttäen hyväksi teoriaosassa esitettyjä menetelmiä. Mallit kuvaavat kemiallisia reaktioita ja kuvataan tavallisina differentiaaliyhtälöryhminä. Mallit on kerätty kemisteiltä Lappeenrannan teknillisestä yliopistosta ja Åbo Akademista, Turusta.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is generally accepted that between 70 and 80% of manufacturing costs can be attributed to design. Nevertheless, it is difficult for the designer to estimate manufacturing costs accurately, especially when alternative constructions are compared at the conceptual design phase, because of the lack of cost information and appropriate tools. In general, previous reports concerning optimisation of a welded structure have used the mass of the product as the basis for the cost comparison. However, it can easily be shown using a simple example that the use of product mass as the sole manufacturing cost estimator is unsatisfactory. This study describes a method of formulating welding time models for cost calculation, and presents the results of the models for particular sections, based on typical costs in Finland. This was achieved by collecting information concerning welded products from different companies. The data included 71 different welded assemblies taken from the mechanical engineering and construction industries. The welded assemblies contained in total 1 589 welded parts, 4 257 separate welds, and a total welded length of 3 188 metres. The data were modelled for statistical calculations, and models of welding time were derived by using linear regression analysis. Themodels were tested by using appropriate statistical methods, and were found to be accurate. General welding time models have been developed, valid for welding in Finland, as well as specific, more accurate models for particular companies. The models are presented in such a form that they can be used easily by a designer, enabling the cost calculation to be automated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work presents new, efficient Markov chain Monte Carlo (MCMC) simulation methods for statistical analysis in various modelling applications. When using MCMC methods, the model is simulated repeatedly to explore the probability distribution describing the uncertainties in model parameters and predictions. In adaptive MCMC methods based on the Metropolis-Hastings algorithm, the proposal distribution needed by the algorithm learns from the target distribution as the simulation proceeds. Adaptive MCMC methods have been subject of intensive research lately, as they open a way for essentially easier use of the methodology. The lack of user-friendly computer programs has been a main obstacle for wider acceptance of the methods. This work provides two new adaptive MCMC methods: DRAM and AARJ. The DRAM method has been built especially to work in high dimensional and non-linear problems. The AARJ method is an extension to DRAM for model selection problems, where the mathematical formulation of the model is uncertain and we want simultaneously to fit several different models to the same observations. The methods were developed while keeping in mind the needs of modelling applications typical in environmental sciences. The development work has been pursued while working with several application projects. The applications presented in this work are: a winter time oxygen concentration model for Lake Tuusulanjärvi and adaptive control of the aerator; a nutrition model for Lake Pyhäjärvi and lake management planning; validation of the algorithms of the GOMOS ozone remote sensing instrument on board the Envisat satellite of European Space Agency and the study of the effects of aerosol model selection on the GOMOS algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a very volatile industry of high technology it is of utmost importance to accurately forecast customers’ demand. However, statistical forecasting of sales, especially in heavily competitive electronics product business, has always been a challenging task due to very high variation in demand and very short product life cycles of products. The purpose of this thesis is to validate if statistical methods can be applied to forecasting sales of short life cycle electronics products and provide a feasible framework for implementing statistical forecasting in the environment of the case company. Two different approaches have been developed for forecasting on short and medium term and long term horizons. Both models are based on decomposition models, but differ in interpretation of the model residuals. For long term horizons residuals are assumed to represent white noise, whereas for short and medium term forecasting horizon residuals are modeled using statistical forecasting methods. Implementation of both approaches is performed in Matlab. Modeling results have shown that different markets exhibit different demand patterns and therefore different analytical approaches are appropriate for modeling demand in these markets. Moreover, the outcomes of modeling imply that statistical forecasting can not be handled separately from judgmental forecasting, but should be perceived only as a basis for judgmental forecasting activities. Based on modeling results recommendations for further deployment of statistical methods in sales forecasting of the case company are developed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Construction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The optimal design of a heat exchanger system is based on given model parameters together with given standard ranges for machine design variables. The goals set for minimizing the Life Cycle Cost (LCC) function which represents the price of the saved energy, for maximizing the momentary heat recovery output with given constraints satisfied and taking into account the uncertainty in the models were successfully done. Nondominated Sorting Genetic Algorithm II (NSGA-II) for the design optimization of a system is presented and implemented inMatlab environment. Markov ChainMonte Carlo (MCMC) methods are also used to take into account the uncertainty in themodels. Results show that the price of saved energy can be optimized. A wet heat exchanger is found to be more efficient and beneficial than a dry heat exchanger even though its construction is expensive (160 EUR/m2) compared to the construction of a dry heat exchanger (50 EUR/m2). It has been found that the longer lifetime weights higher CAPEX and lower OPEX and vice versa, and the effect of the uncertainty in the models has been identified in a simplified case of minimizing the area of a dry heat exchanger.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this work is to compare two families of mathematical models for their respective capability to capture the statistical properties of real electricity spot market time series. The first model family is ARMA-GARCH models and the second model family is mean-reverting Ornstein-Uhlenbeck models. These two models have been applied to two price series of Nordic Nord Pool spot market for electricity namely to the System prices and to the DenmarkW prices. The parameters of both models were calibrated from the real time series. After carrying out simulation with optimal models from both families we conclude that neither ARMA-GARCH models, nor conventional mean-reverting Ornstein-Uhlenbeck models, even when calibrated optimally with real electricity spot market price or return series, capture the statistical characteristics of the real series. But in the case of less spiky behavior (System prices), the mean-reverting Ornstein-Uhlenbeck model could be seen to partially succeeded in this task.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Systems biology is a new, emerging and rapidly developing, multidisciplinary research field that aims to study biochemical and biological systems from a holistic perspective, with the goal of providing a comprehensive, system- level understanding of cellular behaviour. In this way, it addresses one of the greatest challenges faced by contemporary biology, which is to compre- hend the function of complex biological systems. Systems biology combines various methods that originate from scientific disciplines such as molecu- lar biology, chemistry, engineering sciences, mathematics, computer science and systems theory. Systems biology, unlike “traditional” biology, focuses on high-level concepts such as: network, component, robustness, efficiency, control, regulation, hierarchical design, synchronization, concurrency, and many others. The very terminology of systems biology is “foreign” to “tra- ditional” biology, marks its drastic shift in the research paradigm and it indicates close linkage of systems biology to computer science. One of the basic tools utilized in systems biology is the mathematical modelling of life processes tightly linked to experimental practice. The stud- ies contained in this thesis revolve around a number of challenges commonly encountered in the computational modelling in systems biology. The re- search comprises of the development and application of a broad range of methods originating in the fields of computer science and mathematics for construction and analysis of computational models in systems biology. In particular, the performed research is setup in the context of two biolog- ical phenomena chosen as modelling case studies: 1) the eukaryotic heat shock response and 2) the in vitro self-assembly of intermediate filaments, one of the main constituents of the cytoskeleton. The range of presented approaches spans from heuristic, through numerical and statistical to ana- lytical methods applied in the effort to formally describe and analyse the two biological processes. We notice however, that although applied to cer- tain case studies, the presented methods are not limited to them and can be utilized in the analysis of other biological mechanisms as well as com- plex systems in general. The full range of developed and applied modelling techniques as well as model analysis methodologies constitutes a rich mod- elling framework. Moreover, the presentation of the developed methods, their application to the two case studies and the discussions concerning their potentials and limitations point to the difficulties and challenges one encounters in computational modelling of biological systems. The problems of model identifiability, model comparison, model refinement, model inte- gration and extension, choice of the proper modelling framework and level of abstraction, or the choice of the proper scope of the model run through this thesis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The condensation rate has to be high in the safety pressure suppression pool systems of Boiling Water Reactors (BWR) in order to fulfill their safety function. The phenomena due to such a high direct contact condensation (DCC) rate turn out to be very challenging to be analysed either with experiments or numerical simulations. In this thesis, the suppression pool experiments carried out in the POOLEX facility of Lappeenranta University of Technology were simulated. Two different condensation modes were modelled by using the 2-phase CFD codes NEPTUNE CFD and TransAT. The DCC models applied were the typical ones to be used for separated flows in channels, and their applicability to the rapidly condensing flow in the condensation pool context had not been tested earlier. A low Reynolds number case was the first to be simulated. The POOLEX experiment STB-31 was operated near the conditions between the ’quasi-steady oscillatory interface condensation’ mode and the ’condensation within the blowdown pipe’ mode. The condensation models of Lakehal et al. and Coste & Lavi´eville predicted the condensation rate quite accurately, while the other tested ones overestimated it. It was possible to get the direct phase change solution to settle near to the measured values, but a very high resolution of calculation grid was needed. Secondly, a high Reynolds number case corresponding to the ’chugging’ mode was simulated. The POOLEX experiment STB-28 was chosen, because various standard and highspeed video samples of bubbles were recorded during it. In order to extract numerical information from the video material, a pattern recognition procedure was programmed. The bubble size distributions and the frequencies of chugging were calculated with this procedure. With the statistical data of the bubble sizes and temporal data of the bubble/jet appearance, it was possible to compare the condensation rates between the experiment and the CFD simulations. In the chugging simulations, a spherically curvilinear calculation grid at the blowdown pipe exit improved the convergence and decreased the required cell count. The compressible flow solver with complete steam-tables was beneficial for the numerical success of the simulations. The Hughes-Duffey model and, to some extent, the Coste & Lavi´eville model produced realistic chugging behavior. The initial level of the steam/water interface was an important factor to determine the initiation of the chugging. If the interface was initialized with a water level high enough inside the blowdown pipe, the vigorous penetration of a water plug into the pool created a turbulent wake which invoked the chugging that was self-sustaining. A 3D simulation with a suitable DCC model produced qualitatively very realistic shapes of the chugging bubbles and jets. The comparative FFT analysis of the bubble size data and the pool bottom pressure data gave useful information to distinguish the eigenmodes of chugging, bubbling, and pool structure oscillations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cells of epithelial origin, e.g. from breast and prostate cancers, effectively differentiate into complex multicellular structures when cultured in three-dimensions (3D) instead of conventional two-dimensional (2D) adherent surfaces. The spectrum of different organotypic morphologies is highly dependent on the culture environment that can be either non-adherent or scaffold-based. When embedded in physiological extracellular matrices (ECMs), such as laminin-rich basement membrane extracts, normal epithelial cells differentiate into acinar spheroids reminiscent of glandular ductal structures. Transformed cancer cells, in contrast, typically fail to undergo acinar morphogenic patterns, forming poorly differentiated or invasive multicellular structures. The 3D cancer spheroids are widely accepted to better recapitulate various tumorigenic processes and drug responses. So far, however, 3D models have been employed predominantly in the Academia, whereas the pharmaceutical industry has yet to adopt a more widely and routine use. This is mainly due to poor characterisation of cell models, lack of standardised workflows and high throughput cell culture platforms, and the availability of proper readout and quantification tools. In this thesis, a complete workflow has been established entailing well-characterised 3D cell culture models for prostate cancer, a standardised 3D cell culture routine based on high-throughput-ready platform, automated image acquisition with concomitant morphometric image analysis, and data visualisation, in order to enable large-scale high-content screens. Our integrated suite of software and statistical analysis tools were optimised and validated using a comprehensive panel of prostate cancer cell lines and 3D models. The tools quantify multiple key cancer-relevant morphological features, ranging from cancer cell invasion through multicellular differentiation to growth, and detect dynamic changes both in morphology and function, such as cell death and apoptosis, in response to experimental perturbations including RNA interference and small molecule inhibitors. Our panel of cell lines included many non-transformed and most currently available classic prostate cancer cell lines, which were characterised for their morphogenetic properties in 3D laminin-rich ECM. The phenotypes and gene expression profiles were evaluated concerning their relevance for pre-clinical drug discovery, disease modelling and basic research. In addition, a spontaneous model for invasive transformation was discovered, displaying a highdegree of epithelial plasticity. This plasticity is mediated by an abundant bioactive serum lipid, lysophosphatidic acid (LPA), and its receptor LPAR1. The invasive transformation was caused by abrupt cytoskeletal rearrangement through impaired G protein alpha 12/13 and RhoA/ROCK, and mediated by upregulated adenylyl cyclase/cyclic AMP (cAMP)/protein kinase A, and Rac/ PAK pathways. The spontaneous invasion model tangibly exemplifies the biological relevance of organotypic cell culture models. Overall, this thesis work underlines the power of novel morphometric screening tools in drug discovery.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 1–5 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this research, the effectiveness of Naive Bayes and Gaussian Mixture Models classifiers on segmenting exudates in retinal images is studied and the results are evaluated with metrics commonly used in medical imaging. Also, a color variation analysis of retinal images is carried out to find how effectively can retinal images be segmented using only the color information of the pixels.