972 resultados para Models, statistical
Resumo:
In a very volatile industry of high technology it is of utmost importance to accurately forecast customers’ demand. However, statistical forecasting of sales, especially in heavily competitive electronics product business, has always been a challenging task due to very high variation in demand and very short product life cycles of products. The purpose of this thesis is to validate if statistical methods can be applied to forecasting sales of short life cycle electronics products and provide a feasible framework for implementing statistical forecasting in the environment of the case company. Two different approaches have been developed for forecasting on short and medium term and long term horizons. Both models are based on decomposition models, but differ in interpretation of the model residuals. For long term horizons residuals are assumed to represent white noise, whereas for short and medium term forecasting horizon residuals are modeled using statistical forecasting methods. Implementation of both approaches is performed in Matlab. Modeling results have shown that different markets exhibit different demand patterns and therefore different analytical approaches are appropriate for modeling demand in these markets. Moreover, the outcomes of modeling imply that statistical forecasting can not be handled separately from judgmental forecasting, but should be perceived only as a basis for judgmental forecasting activities. Based on modeling results recommendations for further deployment of statistical methods in sales forecasting of the case company are developed.
Resumo:
Due to the rise of criminal, civil and administrative judicial situations involving people lacking valid identity documents, age estimation of living persons has become an important operational procedure for numerous forensic and medicolegal services worldwide. The chronological age of a given person is generally estimated from the observed degree of maturity of some selected physical attributes by means of statistical methods. However, their application in the forensic framework suffers from some conceptual and practical drawbacks, as recently claimed in the specialised literature. The aim of this paper is therefore to offer an alternative solution for overcoming these limits, by reiterating the utility of a probabilistic Bayesian approach for age estimation. This approach allows one to deal in a transparent way with the uncertainty surrounding the age estimation process and to produce all the relevant information in the form of posterior probability distribution about the chronological age of the person under investigation. Furthermore, this probability distribution can also be used for evaluating in a coherent way the possibility that the examined individual is younger or older than a given legal age threshold having a particular legal interest. The main novelty introduced by this work is the development of a probabilistic graphical model, i.e. a Bayesian network, for dealing with the problem at hand. The use of this kind of probabilistic tool can significantly facilitate the application of the proposed methodology: examples are presented based on data related to the ossification status of the medial clavicular epiphysis. The reliability and the advantages of this probabilistic tool are presented and discussed.
Resumo:
Construction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.
Resumo:
The optimal design of a heat exchanger system is based on given model parameters together with given standard ranges for machine design variables. The goals set for minimizing the Life Cycle Cost (LCC) function which represents the price of the saved energy, for maximizing the momentary heat recovery output with given constraints satisfied and taking into account the uncertainty in the models were successfully done. Nondominated Sorting Genetic Algorithm II (NSGA-II) for the design optimization of a system is presented and implemented inMatlab environment. Markov ChainMonte Carlo (MCMC) methods are also used to take into account the uncertainty in themodels. Results show that the price of saved energy can be optimized. A wet heat exchanger is found to be more efficient and beneficial than a dry heat exchanger even though its construction is expensive (160 EUR/m2) compared to the construction of a dry heat exchanger (50 EUR/m2). It has been found that the longer lifetime weights higher CAPEX and lower OPEX and vice versa, and the effect of the uncertainty in the models has been identified in a simplified case of minimizing the area of a dry heat exchanger.
Resumo:
A statistical mixture-design technique was used to study the effects of different solvents and their mixtures on the yield, total polyphenol content, and antioxidant capacity of the crude extracts from the bark of Schinus terebinthifolius Raddi (Anacardiaceae). The experimental results and their response-surface models showed that ternary mixtures with equal portions of all the three solvents (water, ethanol and acetone) were better than the binary mixtures in generating crude extracts with the highest yield (22.04 ± 0.48%), total polyphenol content (29.39 ± 0.39%), and antioxidant capacity (6.38 ± 0.21). An analytical method was developed and validated for the determination of total polyphenols in the extracts. Optimal conditions for the various parameters in this analytical method, namely, the time for the chromophoric reaction to stabilize, wavelength of the absorption maxima to be monitored, the reference standard and the concentration of sodium carbonate were determined to be 5 min, 780 nm, pyrogallol, and 14.06% w v-1, respectively. UV-Vis spectrophotometric monitoring of the reaction under these conditions proved the method to be linear, specific, precise, accurate, reproducible, robust, and easy to perform.
Resumo:
The aim of this work is to compare two families of mathematical models for their respective capability to capture the statistical properties of real electricity spot market time series. The first model family is ARMA-GARCH models and the second model family is mean-reverting Ornstein-Uhlenbeck models. These two models have been applied to two price series of Nordic Nord Pool spot market for electricity namely to the System prices and to the DenmarkW prices. The parameters of both models were calibrated from the real time series. After carrying out simulation with optimal models from both families we conclude that neither ARMA-GARCH models, nor conventional mean-reverting Ornstein-Uhlenbeck models, even when calibrated optimally with real electricity spot market price or return series, capture the statistical characteristics of the real series. But in the case of less spiky behavior (System prices), the mean-reverting Ornstein-Uhlenbeck model could be seen to partially succeeded in this task.
Resumo:
Asian rust of soybean [Glycine max (L.) Merril] is one of the most important fungal diseases of this crop worldwide. The recent introduction of Phakopsora pachyrhizi Syd. & P. Syd in the Americas represents a major threat to soybean production in the main growing regions, and significant losses have already been reported. P. pachyrhizi is extremely aggressive under favorable weather conditions, causing rapid plant defoliation. Epidemiological studies, under both controlled and natural environmental conditions, have been done for several decades with the aim of elucidating factors that affect the disease cycle as a basis for disease modeling. The recent spread of Asian soybean rust to major production regions in the world has promoted new development, testing and application of mathematical models to assess the risk and predict the disease. These efforts have included the integration of new data, epidemiological knowledge, statistical methods, and advances in computer simulation to develop models and systems with different spatial and temporal scales, objectives and audience. In this review, we present a comprehensive discussion on the models and systems that have been tested to predict and assess the risk of Asian soybean rust. Limitations, uncertainties and challenges for modelers are also discussed.
Resumo:
Systems biology is a new, emerging and rapidly developing, multidisciplinary research field that aims to study biochemical and biological systems from a holistic perspective, with the goal of providing a comprehensive, system- level understanding of cellular behaviour. In this way, it addresses one of the greatest challenges faced by contemporary biology, which is to compre- hend the function of complex biological systems. Systems biology combines various methods that originate from scientific disciplines such as molecu- lar biology, chemistry, engineering sciences, mathematics, computer science and systems theory. Systems biology, unlike “traditional” biology, focuses on high-level concepts such as: network, component, robustness, efficiency, control, regulation, hierarchical design, synchronization, concurrency, and many others. The very terminology of systems biology is “foreign” to “tra- ditional” biology, marks its drastic shift in the research paradigm and it indicates close linkage of systems biology to computer science. One of the basic tools utilized in systems biology is the mathematical modelling of life processes tightly linked to experimental practice. The stud- ies contained in this thesis revolve around a number of challenges commonly encountered in the computational modelling in systems biology. The re- search comprises of the development and application of a broad range of methods originating in the fields of computer science and mathematics for construction and analysis of computational models in systems biology. In particular, the performed research is setup in the context of two biolog- ical phenomena chosen as modelling case studies: 1) the eukaryotic heat shock response and 2) the in vitro self-assembly of intermediate filaments, one of the main constituents of the cytoskeleton. The range of presented approaches spans from heuristic, through numerical and statistical to ana- lytical methods applied in the effort to formally describe and analyse the two biological processes. We notice however, that although applied to cer- tain case studies, the presented methods are not limited to them and can be utilized in the analysis of other biological mechanisms as well as com- plex systems in general. The full range of developed and applied modelling techniques as well as model analysis methodologies constitutes a rich mod- elling framework. Moreover, the presentation of the developed methods, their application to the two case studies and the discussions concerning their potentials and limitations point to the difficulties and challenges one encounters in computational modelling of biological systems. The problems of model identifiability, model comparison, model refinement, model inte- gration and extension, choice of the proper modelling framework and level of abstraction, or the choice of the proper scope of the model run through this thesis.
Resumo:
The condensation rate has to be high in the safety pressure suppression pool systems of Boiling Water Reactors (BWR) in order to fulfill their safety function. The phenomena due to such a high direct contact condensation (DCC) rate turn out to be very challenging to be analysed either with experiments or numerical simulations. In this thesis, the suppression pool experiments carried out in the POOLEX facility of Lappeenranta University of Technology were simulated. Two different condensation modes were modelled by using the 2-phase CFD codes NEPTUNE CFD and TransAT. The DCC models applied were the typical ones to be used for separated flows in channels, and their applicability to the rapidly condensing flow in the condensation pool context had not been tested earlier. A low Reynolds number case was the first to be simulated. The POOLEX experiment STB-31 was operated near the conditions between the ’quasi-steady oscillatory interface condensation’ mode and the ’condensation within the blowdown pipe’ mode. The condensation models of Lakehal et al. and Coste & Lavi´eville predicted the condensation rate quite accurately, while the other tested ones overestimated it. It was possible to get the direct phase change solution to settle near to the measured values, but a very high resolution of calculation grid was needed. Secondly, a high Reynolds number case corresponding to the ’chugging’ mode was simulated. The POOLEX experiment STB-28 was chosen, because various standard and highspeed video samples of bubbles were recorded during it. In order to extract numerical information from the video material, a pattern recognition procedure was programmed. The bubble size distributions and the frequencies of chugging were calculated with this procedure. With the statistical data of the bubble sizes and temporal data of the bubble/jet appearance, it was possible to compare the condensation rates between the experiment and the CFD simulations. In the chugging simulations, a spherically curvilinear calculation grid at the blowdown pipe exit improved the convergence and decreased the required cell count. The compressible flow solver with complete steam-tables was beneficial for the numerical success of the simulations. The Hughes-Duffey model and, to some extent, the Coste & Lavi´eville model produced realistic chugging behavior. The initial level of the steam/water interface was an important factor to determine the initiation of the chugging. If the interface was initialized with a water level high enough inside the blowdown pipe, the vigorous penetration of a water plug into the pool created a turbulent wake which invoked the chugging that was self-sustaining. A 3D simulation with a suitable DCC model produced qualitatively very realistic shapes of the chugging bubbles and jets. The comparative FFT analysis of the bubble size data and the pool bottom pressure data gave useful information to distinguish the eigenmodes of chugging, bubbling, and pool structure oscillations.
Resumo:
Cells of epithelial origin, e.g. from breast and prostate cancers, effectively differentiate into complex multicellular structures when cultured in three-dimensions (3D) instead of conventional two-dimensional (2D) adherent surfaces. The spectrum of different organotypic morphologies is highly dependent on the culture environment that can be either non-adherent or scaffold-based. When embedded in physiological extracellular matrices (ECMs), such as laminin-rich basement membrane extracts, normal epithelial cells differentiate into acinar spheroids reminiscent of glandular ductal structures. Transformed cancer cells, in contrast, typically fail to undergo acinar morphogenic patterns, forming poorly differentiated or invasive multicellular structures. The 3D cancer spheroids are widely accepted to better recapitulate various tumorigenic processes and drug responses. So far, however, 3D models have been employed predominantly in the Academia, whereas the pharmaceutical industry has yet to adopt a more widely and routine use. This is mainly due to poor characterisation of cell models, lack of standardised workflows and high throughput cell culture platforms, and the availability of proper readout and quantification tools. In this thesis, a complete workflow has been established entailing well-characterised 3D cell culture models for prostate cancer, a standardised 3D cell culture routine based on high-throughput-ready platform, automated image acquisition with concomitant morphometric image analysis, and data visualisation, in order to enable large-scale high-content screens. Our integrated suite of software and statistical analysis tools were optimised and validated using a comprehensive panel of prostate cancer cell lines and 3D models. The tools quantify multiple key cancer-relevant morphological features, ranging from cancer cell invasion through multicellular differentiation to growth, and detect dynamic changes both in morphology and function, such as cell death and apoptosis, in response to experimental perturbations including RNA interference and small molecule inhibitors. Our panel of cell lines included many non-transformed and most currently available classic prostate cancer cell lines, which were characterised for their morphogenetic properties in 3D laminin-rich ECM. The phenotypes and gene expression profiles were evaluated concerning their relevance for pre-clinical drug discovery, disease modelling and basic research. In addition, a spontaneous model for invasive transformation was discovered, displaying a highdegree of epithelial plasticity. This plasticity is mediated by an abundant bioactive serum lipid, lysophosphatidic acid (LPA), and its receptor LPAR1. The invasive transformation was caused by abrupt cytoskeletal rearrangement through impaired G protein alpha 12/13 and RhoA/ROCK, and mediated by upregulated adenylyl cyclase/cyclic AMP (cAMP)/protein kinase A, and Rac/ PAK pathways. The spontaneous invasion model tangibly exemplifies the biological relevance of organotypic cell culture models. Overall, this thesis work underlines the power of novel morphometric screening tools in drug discovery.
Resumo:
Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 1–5 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.
Resumo:
In this research, the effectiveness of Naive Bayes and Gaussian Mixture Models classifiers on segmenting exudates in retinal images is studied and the results are evaluated with metrics commonly used in medical imaging. Also, a color variation analysis of retinal images is carried out to find how effectively can retinal images be segmented using only the color information of the pixels.
Resumo:
The aim of this work was to make tofu from soybean cultivar BRS 267 under different processing conditions in order to evaluate the influence of each treatment on the product quality. A fractional factorial 2(5-1) design was used, in which independent variables (thermal treatment, coagulant concentration, coagulation time, curd cutting, and draining time) were tested at two different levels. The response variables studied were hardness, yield, total solids, and protein content of tofu. Polynomial models were generated for each response. To obtain tofu with desirable characteristics (hardness ~4 N, yield 306 g tofu.100 g-1 soybeans, 12 g proteins.100 g-1 tofu and 22 g solids.100 g-1 tofu), the following processing conditions were selected: heating until boiling plus 10 minutes in water bath, 2% dihydrated CaSO4 w/w, 10 minutes coagulation, curd cutting, and 30 minutes draining time.
Resumo:
New density functionals representing the exchange and correlation energies (per electron) are employed, based on the electron gas model, to calculate interaction potentials of noble gas systems X2 and XY, where X (and Y) are He,Ne,Ar and Kr, and of hydrogen atomrare gas systems H-X. The exchange energy density functional is that recommended by Handler and the correlation energy density functional is a rational function involving two parameters which were optimized to reproduce the correlation energy of He atom. Application of the two parameter function to other rare gas atoms shows that it is "universal"; i. e. ,accurate for the systems considered. The potentials obtained in this work compare well with recent experimental results and are a significant improvement over those from competing statistical modelS.
Resumo:
A feature-based fitness function is applied in a genetic programming system to synthesize stochastic gene regulatory network models whose behaviour is defined by a time course of protein expression levels. Typically, when targeting time series data, the fitness function is based on a sum-of-errors involving the values of the fluctuating signal. While this approach is successful in many instances, its performance can deteriorate in the presence of noise. This thesis explores a fitness measure determined from a set of statistical features characterizing the time series' sequence of values, rather than the actual values themselves. Through a series of experiments involving symbolic regression with added noise and gene regulatory network models based on the stochastic 'if-calculus, it is shown to successfully target oscillating and non-oscillating signals. This practical and versatile fitness function offers an alternate approach, worthy of consideration for use in algorithms that evaluate noisy or stochastic behaviour.