927 resultados para LINEAR-REGRESSION MODELS
Resumo:
Time domain laser reflectance spectroscopy (TDRS) was applied for the first time to evaluate internal fruit quality. This technique, known in medicine-related knowledge areas, has not been used before in agricultural or food research. It allows the simultaneous non-destructive measuring of two optical characteristics of the tissues: light scattering and absorption. Models to measure firmness, sugar & acid contents in kiwifruit, tomato, apple, peach, nectarine and other fruits were built using sequential statistical techniques: principal component analysis, multiple stepwise linear regression, clustering and discriminant analysis. Consistent correlations were established between the two parameters measured with TDRS, i.e. absorption & transport scattering coefficients, with chemical constituents (sugars and acids) and firmness, respectively. Classification models were built to sort fruits into three quality grades, according to their firmness, soluble solids and acidity.
Finite mixture regression model with random effects: application to neonatal hospital length of stay
Resumo:
A two-component mixture regression model that allows simultaneously for heterogeneity and dependency among observations is proposed. By specifying random effects explicitly in the linear predictor of the mixture probability and the mixture components, parameter estimation is achieved by maximising the corresponding best linear unbiased prediction type log-likelihood. Approximate residual maximum likelihood estimates are obtained via an EM algorithm in the manner of generalised linear mixed model (GLMM). The method can be extended to a g-component mixture regression model with the component density from the exponential family, leading to the development of the class of finite mixture GLMM. For illustration, the method is applied to analyse neonatal length of stay (LOS). It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining-a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi-likelihood estimation. The proposed hierarchical mixture regression approach enables the identification and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration, (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
Specific cutting energy (SE) has been widely used to assess the rock cuttability for mechanical excavation purposes. Some prediction models were developed for SE through correlating rock properties with SE values. However, some of the textural and compositional rock parameters i.e. texture coefficient and feldspar, mafic, and felsic mineral contents were not considered. The present study is to investigate the effects of previously ignored rock parameters along with engineering rock properties on SE. Mineralogical and petrographic analyses, rock mechanics, and linear rock cutting tests were performed on sandstone samples taken from sites around Ankara, Turkey. Relationships between SE and rock properties were evaluated using bivariate correlation and linear regression analyses. The tests and subsequent analyses revealed that the texture coefficient and feldspar content of sandstones affected rock cuttability, evidenced by significant correlations between these parameters and SE at a 90% confidence level. Felsic and mafic mineral contents of sandstones did not exhibit any statistically significant correlation against SE. Cementation coefficient, effective porosity, and pore volume had good correlations against SE. Poisson's ratio, Brazilian tensile strength, Shore scleroscope hardness, Schmidt hammer hardness, dry density, and point load strength index showed very strong linear correlations against SE at confidence levels of 95% and above, all of which were also found suitable to be used in predicting SE individually, depending on the results of regression analysis, ANOVA, Student's t-tests, and R2 values. Poisson's ratio exhibited the highest correlation with SE and seemed to be the most reliable SE prediction tool in sandstones.
Resumo:
Specific cutting energy (SE) has been widely used to assess the rock cuttability for mechanical excavation purposes. Some prediction models were developed for SE through correlating rock properties with SE values. However, some of the textural and compositional rock parameters i.e. texture coefficient and feldspar, mafic, and felsic mineral contents were not considered. The present study is to investigate the effects of previously ignored rock parameters along with engineering rock properties on SE. Mineralogical and petrographic analyses, rock mechanics, and linear rock cutting tests were performed on sandstone samples taken from sites around Ankara, Turkey. Relationships between SE and rock properties were evaluated using bivariate correlation and linear regression analyses. The tests and subsequent analyses revealed that the texture coefficient and feldspar content of sandstones affected rock cuttability, evidenced by significant correlations between these parameters and SE at a 90% confidence level. Felsic and mafic mineral contents of sandstones did not exhibit any statistically significant correlation against SE. Cementation coefficient, effective porosity, and pore volume had good correlations against SE. Poisson's ratio, Brazilian tensile strength, Shore scleroscope hardness, Schmidt hammer hardness, dry density, and point load strength index showed very strong linear correlations against SE at confidence levels of 95% and above, all of which were also found suitable to be used in predicting SE individually, depending on the results of regression analysis, ANOVA, Student's t-tests, and R-2 values. Poisson's ratio exhibited the highest correlation with SE and seemed to be the most reliable SE prediction tool in sandstones.
Resumo:
This paper presents a forecasting technique for forward energy prices, one day ahead. This technique combines a wavelet transform and forecasting models such as multi- layer perceptron, linear regression or GARCH. These techniques are applied to real data from the UK gas markets to evaluate their performance. The results show that the forecasting accuracy is improved significantly by using the wavelet transform. The methodology can be also applied to forecasting market clearing prices and electricity/gas loads.
Resumo:
This thesis is a study of three techniques to improve performance of some standard fore-casting models, application to the energy demand and prices. We focus on forecasting demand and price one-day ahead. First, the wavelet transform was used as a pre-processing procedure with two approaches: multicomponent-forecasts and direct-forecasts. We have empirically compared these approaches and found that the former consistently outperformed the latter. Second, adaptive models were introduced to continuously update model parameters in the testing period by combining ?lters with standard forecasting methods. Among these adaptive models, the adaptive LR-GARCH model was proposed for the fi?rst time in the thesis. Third, with regard to noise distributions of the dependent variables in the forecasting models, we used either Gaussian or Student-t distributions. This thesis proposed a novel algorithm to infer parameters of Student-t noise models. The method is an extension of earlier work for models that are linear in parameters to the non-linear multilayer perceptron. Therefore, the proposed method broadens the range of models that can use a Student-t noise distribution. Because these techniques cannot stand alone, they must be combined with prediction models to improve their performance. We combined these techniques with some standard forecasting models: multilayer perceptron, radial basis functions, linear regression, and linear regression with GARCH. These techniques and forecasting models were applied to two datasets from the UK energy markets: daily electricity demand (which is stationary) and gas forward prices (non-stationary). The results showed that these techniques provided good improvement to prediction performance.
Resumo:
This paper presents some forecasting techniques for energy demand and price prediction, one day ahead. These techniques combine wavelet transform (WT) with fixed and adaptive machine learning/time series models (multi-layer perceptron (MLP), radial basis functions, linear regression, or GARCH). To create an adaptive model, we use an extended Kalman filter or particle filter to update the parameters continuously on the test set. The adaptive GARCH model is a new contribution, broadening the applicability of GARCH methods. We empirically compared two approaches of combining the WT with prediction models: multicomponent forecasts and direct forecasts. These techniques are applied to large sets of real data (both stationary and non-stationary) from the UK energy markets, so as to provide comparative results that are statistically stronger than those previously reported. The results showed that the forecasting accuracy is significantly improved by using the WT and adaptive models. The best models on the electricity demand/gas price forecast are the adaptive MLP/GARCH with the multicomponent forecast; their MSEs are 0.02314 and 0.15384 respectively.
Resumo:
2010 Mathematics Subject Classification: 68T50,62H30,62J05.
Resumo:
This paper explains how Poisson regression can be used in studies in which the dependent variable describes the number of occurrences of some rare event such as suicide. After pointing out why ordinary linear regression is inappropriate for treating dependent variables of this sort, we go on to present the basic Poisson regression model and show how it fits in the broad class of generalized linear models. Then we turn to discussing a major problem of Poisson regression known as overdispersion and suggest possible solutions, including the correction of standard errors and negative binomial regression. The paper ends with a detailed empirical example, drawn from our own research on suicide.
Resumo:
The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^
Resumo:
Hydrophobicity as measured by Log P is an important molecular property related to toxicity and carcinogenicity. With increasing public health concerns for the effects of Disinfection By-Products (DBPs), there are considerable benefits in developing Quantitative Structure and Activity Relationship (QSAR) models capable of accurately predicting Log P. In this research, Log P values of 173 DBP compounds in 6 functional classes were used to develop QSAR models, by applying 3 molecular descriptors, namely, Energy of the Lowest Unoccupied Molecular Orbital (ELUMO), Number of Chlorine (NCl) and Number of Carbon (NC) by Multiple Linear Regression (MLR) analysis. The QSAR models developed were validated based on the Organization for Economic Co-operation and Development (OECD) principles. The model Applicability Domain (AD) and mechanistic interpretation were explored. Considering the very complex nature of DBPs, the established QSAR models performed very well with respect to goodness-of-fit, robustness and predictability. The predicted values of Log P of DBPs by the QSAR models were found to be significant with a correlation coefficient R2 from 81% to 98%. The Leverage Approach by Williams Plot was applied to detect and remove outliers, consequently increasing R 2 by approximately 2% to 13% for different DBP classes. The developed QSAR models were statistically validated for their predictive power by the Leave-One-Out (LOO) and Leave-Many-Out (LMO) cross validation methods. Finally, Monte Carlo simulation was used to assess the variations and inherent uncertainties in the QSAR models of Log P and determine the most influential parameters in connection with Log P prediction. The developed QSAR models in this dissertation will have a broad applicability domain because the research data set covered six out of eight common DBP classes, including halogenated alkane, halogenated alkene, halogenated aromatic, halogenated aldehyde, halogenated ketone, and halogenated carboxylic acid, which have been brought to the attention of regulatory agencies in recent years. Furthermore, the QSAR models are suitable to be used for prediction of similar DBP compounds within the same applicability domain. The selection and integration of various methodologies developed in this research may also benefit future research in similar fields.
Resumo:
Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: (1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (E LUMO) via QSAR modelling and analysis; (2) to validate the models by using internal and external cross-validation techniques; (3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl ) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: (1) Linear or Multi-linear Regression (MLR); (2) Partial Least Squares (PLS); and (3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: (1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; (2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; (3) E LUMO are shown to correlate highly with the NCl for several classes of DBPs; and (4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.
Resumo:
Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: 1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (ELUMO) via QSAR modelling and analysis; 2) to validate the models by using internal and external cross-validation techniques; 3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: 1) Linear or Multi-linear Regression (MLR); 2) Partial Least Squares (PLS); and 3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: 1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; 2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; 3) ELUMO are shown to correlate highly with the NCl for several classes of DBPs; and 4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.
Resumo:
This work presents a computational, called MOMENTS, code developed to be used in process control to determine a characteristic transfer function to industrial units when radiotracer techniques were been applied to study the unit´s performance. The methodology is based on the measuring the residence time distribution function (RTD) and calculate the first and second temporal moments of the tracer data obtained by two scintillators detectors NaI positioned to register a complete tracer movement inside the unit. Non linear regression technique has been used to fit various mathematical models and a statistical test was used to select the best result to the transfer function. Using the code MOMENTS, twelve different models can be used to fit a curve and calculate technical parameters to the unit.