862 resultados para stochastic regression, consistency
Resumo:
In our study we use a kernel based classification technique, Support Vector Machine Regression for predicting the Melting Point of Drug – like compounds in terms of Topological Descriptors, Topological Charge Indices, Connectivity Indices and 2D Auto Correlations. The Machine Learning model was designed, trained and tested using a dataset of 100 compounds and it was found that an SVMReg model with RBF Kernel could predict the Melting Point with a mean absolute error 15.5854 and Root Mean Squared Error 19.7576
Resumo:
Lower partial moments plays an important role in the analysis of risks and in income/poverty studies. In the present paper, we further investigate its importance in stochastic modeling and prove some characterization theorems arising out of it. We also identify its relationships with other important applied models such as weighted and equilibrium models. Finally, some applications of lower partial moments in poverty studies are also examined
Resumo:
The classical methods of analysing time series by Box-Jenkins approach assume that the observed series uctuates around changing levels with constant variance. That is, the time series is assumed to be of homoscedastic nature. However, the nancial time series exhibits the presence of heteroscedasticity in the sense that, it possesses non-constant conditional variance given the past observations. So, the analysis of nancial time series, requires the modelling of such variances, which may depend on some time dependent factors or its own past values. This lead to introduction of several classes of models to study the behaviour of nancial time series. See Taylor (1986), Tsay (2005), Rachev et al. (2007). The class of models, used to describe the evolution of conditional variances is referred to as stochastic volatility modelsThe stochastic models available to analyse the conditional variances, are based on either normal or log-normal distributions. One of the objectives of the present study is to explore the possibility of employing some non-Gaussian distributions to model the volatility sequences and then study the behaviour of the resulting return series. This lead us to work on the related problem of statistical inference, which is the main contribution of the thesis
Resumo:
Es ist bekannt, dass die Dichte eines gelösten Stoffes die Richtung und die Stärke seiner Bewegung im Untergrund entscheidend bestimmen kann. Eine Vielzahl von Untersuchungen hat gezeigt, dass die Verteilung der Durchlässigkeiten eines porösen Mediums diese Dichteffekte verstärken oder abmindern kann. Wie sich dieser gekoppelte Effekt auf die Vermischung zweier Fluide auswirkt, wurde in dieser Arbeit untersucht und dabei das experimentelle sowohl mit dem numerischen als auch mit dem analytischen Modell gekoppelt. Die auf der Störungstheorie basierende stochastische Theorie der macrodispersion wurde in dieser Arbeit für den Fall der transversalen Makodispersion. Für den Fall einer stabilen Schichtung wurde in einem Modelltank (10m x 1.2m x 0.1m) der Universität Kassel eine Serie sorgfältig kontrollierter zweidimensionaler Experimente an einem stochastisch heterogenen Modellaquifer durchgeführt. Es wurden Versuchsreihen mit variierenden Konzentrationsdifferenzen (250 ppm bis 100 000 ppm) und Strömungsgeschwindigkeiten (u = 1 m/ d bis 8 m/d) an drei verschieden anisotrop gepackten porösen Medien mit variierender Varianzen und Korrelationen der lognormal verteilten Permeabilitäten durchgeführt. Die stationäre räumliche Konzentrationsausbreitung der sich ausbreitenden Salzwasserfahne wurde anhand der Leitfähigkeit gemessen und aus der Höhendifferenz des 84- und 16-prozentigen relativen Konzentrationsdurchgang die Dispersion berechnet. Parallel dazu wurde ein numerisches Modell mit dem dichteabhängigen Finite-Elemente-Strömungs- und Transport-Programm SUTRA aufgestellt. Mit dem kalibrierten numerischen Modell wurden Prognosen für mögliche Transportszenarien, Sensitivitätsanalysen und stochastische Simulationen nach der Monte-Carlo-Methode durchgeführt. Die Einstellung der Strömungsgeschwindigkeit erfolgte - sowohl im experimentellen als auch im numerischen Modell - über konstante Druckränder an den Ein- und Auslauftanks. Dabei zeigte sich eine starke Sensitivität der räumlichen Konzentrationsausbreitung hinsichtlich lokaler Druckvariationen. Die Untersuchungen ergaben, dass sich die Konzentrationsfahne mit steigendem Abstand von der Einströmkante wellenförmig einem effektiven Wert annähert, aus dem die Makrodispersivität ermittelt werden kann. Dabei zeigten sich sichtbare nichtergodische Effekte, d.h. starke Abweichungen in den zweiten räumlichen Momenten der Konzentrationsverteilung der deterministischen Experimente von den Erwartungswerten aus der stochastischen Theorie. Die transversale Makrodispersivität stieg proportional zur Varianz und Korrelation der lognormalen Permeabilitätsverteilung und umgekehrt proportional zur Strömungsgeschwindigkeit und Dichtedifferenz zweier Fluide. Aus dem von Welty et al. [2003] mittels Störungstheorie entwickelten dichteabhängigen Makrodispersionstensor konnte in dieser Arbeit die stochastische Formel für die transversale Makrodispersion weiter entwickelt und - sowohl experimentell als auch numerisch - verifiziert werden.
Resumo:
Cubicle should provide good resting comfort as well as clean udders. Dairy cows in cubicle houses often face a restrictive environment with regard to resting behaviour, whereas cleanliness may still be impaired. This study aimed to determine reliable behavioural measures regarding resting comfort applicable in on-farm welfare assessments. Furthermore, relationships between cubicle design, cow sizes, management factors and udder cleanliness (namely teats and teat tips) were investigated. Altogether 15 resting measures were examined in terms of feasibility, inter-observer reliability (IOR) and consistency of results per farm over time. They were recorded during three farm visits on farms in Germany and Austria with cubicle, deep litter and tie stall systems. Seven measures occurred to infrequently to allow reliable recording within a limited observation time. IOR was generally acceptable to excellent except for 'collisions during lying down', which only showed good IOR after improvement of the definition. Only three measures were acceptably repeatable over time: 'duration of lying down', 'percentage of collisions during lying down' and 'percentage of cows lying partly or completely outside lying area'. These measures were evaluated as suitable animal based welfare measures regarding resting behaviour in the framework of an on-farm welfare assessment protocol. The second part of the thesis comprises a cross-sectional study on resting comfort and cow cleanliness including 23 Holstein Friesian dairy herds with very low within-farm variation in cubicle measures. Height at withers, shoulder width and diagonal body length were measured in 79-100 % of the cows (herd size 30 to115 cows). Based on the 25 % largest animals, compliance with recommendations for cubicle measures was calculated. Cleanliness of different body parts, the udder, teats and teat tips was assessed for each cow in the herd prior to morning milking. No significant correlation was found between udder soiling and teat or teat tip soiling on herd level. The final model of a stepwise regression regarding the percentage of dirty teats per farm explained 58.5 % the variance and contained four factors. Teat dipping after milking which might be associated with an overall clean and accurate management style, deep bedded cubicles, increasing cubicle maintenance times and decreasing compliance concerning total cubicle length predicted lower teat soiling. The final model concerning teat tip soiling explained 46.0 % of the variance and contained three factors. Increasing litter height in the rear part of the cubicle and increased alley soiling which is difficult to explain, predicted for less soiled teat tips, whereas increasing compliance concerning resting length was associated with higher percentages of dirty teat tips. The dependent variable ‘duration of lying down’ was analysed using again stepwise regression. The final model explained 54.8 % of the total variance. Lying down duration was significantly shorter in deep bedded cubicles. Further explanatory though not significant factors in the model were neck-rail height, deep bedding or comfort mattresses versus concrete floor or rubber mats and clearance height of side partitions. In the attempt to create a more comprehensive lying down measure, another analysis was carried out with percentage of ‘impaired lying down’ (i.e. events exceeding 6.3 seconds, with collisions or being interrupted) as dependent variable. The explanatory value of this final model was 41.3 %. An increase in partition length, in compliance concerning cubicle width and the presence of straw within bedding predicted a lower proportion of impaired lying down. The effect of partition length is difficult to interpret, but partition length and height were positively correlated on the study farms, possibly leading to a bigger zone of clear space for pelvis freedom. No associations could be found between impaired lying down and teat or teat tip soiling. Altogether, in agreement with earlier studies it was found that cubicle dimensions in practice are often inadequate with regard to the body dimensions of the cows, leading to high proportions of impaired lying down behaviour, whereas teat cleanliness is still unsatisfactory. Connections between cleanliness and cow comfort are far from simplistic. Especially the relationship between cubicle characteristics and lying down behaviour apparently is very complex, so that it is difficult to identify single influential factors that are valid for all farm situations. However, based on the results of the present study the use of deep bedded cubicles can be recommended as well as improved management with special regard to cubicle and litter maintenance in order to achieve both better resting comfort and teat cleanliness.
Resumo:
The research of this thesis dissertation covers developments and applications of short-and long-term climate predictions. The short-term prediction emphasizes monthly and seasonal climate, i.e. forecasting from up to the next month over a season to up to a year or so. The long-term predictions pertain to the analysis of inter-annual- and decadal climate variations over the whole 21st century. These two climate prediction methods are validated and applied in the study area, namely, Khlong Yai (KY) water basin located in the eastern seaboard of Thailand which is a major industrial zone of the country and which has been suffering from severe drought and water shortage in recent years. Since water resources are essential for the further industrial development in this region, a thorough analysis of the potential climate change with its subsequent impact on the water supply in the area is at the heart of this thesis research. The short-term forecast of the next-season climate, such as temperatures and rainfall, offers a potential general guideline for water management and reservoir operation. To that avail, statistical models based on autoregressive techniques, i.e., AR-, ARIMA- and ARIMAex-, which includes additional external regressors, and multiple linear regression- (MLR) models, are developed and applied in the study region. Teleconnections between ocean states and the local climate are investigated and used as extra external predictors in the ARIMAex- and the MLR-model and shown to enhance the accuracy of the short-term predictions significantly. However, as the ocean state – local climate teleconnective relationships provide only a one- to four-month ahead lead time, the ocean state indices can support only a one-season-ahead forecast. Hence, GCM- climate predictors are also suggested as an additional predictor-set for a more reliable and somewhat longer short-term forecast. For the preparation of “pre-warning” information for up-coming possible future climate change with potential adverse hydrological impacts in the study region, the long-term climate prediction methodology is applied. The latter is based on the downscaling of climate predictions from several single- and multi-domain GCMs, using the two well-known downscaling methods SDSM and LARS-WG and a newly developed MLR-downscaling technique that allows the incorporation of a multitude of monthly or daily climate predictors from one- or several (multi-domain) parent GCMs. The numerous downscaling experiments indicate that the MLR- method is more accurate than SDSM and LARS-WG in predicting the recent past 20th-century (1971-2000) long-term monthly climate in the region. The MLR-model is, consequently, then employed to downscale 21st-century GCM- climate predictions under SRES-scenarios A1B, A2 and B1. However, since the hydrological watershed model requires daily-scale climate input data, a new stochastic daily climate generator is developed to rescale monthly observed or predicted climate series to daily series, while adhering to the statistical and geospatial distributional attributes of observed (past) daily climate series in the calibration phase. Employing this daily climate generator, 30 realizations of future daily climate series from downscaled monthly GCM-climate predictor sets are produced and used as input in the SWAT- distributed watershed model, to simulate future streamflow and other hydrological water budget components in the study region in a multi-realization manner. In addition to a general examination of the future changes of the hydrological regime in the KY-basin, potential future changes of the water budgets of three main reservoirs in the basin are analysed, as these are a major source of water supply in the study region. The results of the long-term 21st-century downscaled climate predictions provide evidence that, compared with the past 20th-reference period, the future climate in the study area will be more extreme, particularly, for SRES A1B. Thus, the temperatures will be higher and exhibit larger fluctuations. Although the future intensity of the rainfall is nearly constant, its spatial distribution across the region is partially changing. There is further evidence that the sequential rainfall occurrence will be decreased, so that short periods of high intensities will be followed by longer dry spells. This change in the sequential rainfall pattern will also lead to seasonal reductions of the streamflow and seasonal changes (decreases) of the water storage in the reservoirs. In any case, these predicted future climate changes with their hydrological impacts should encourage water planner and policy makers to develop adaptation strategies to properly handle the future water supply in this area, following the guidelines suggested in this study.
Resumo:
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
Resumo:
We study the relation between support vector machines (SVMs) for regression (SVMR) and SVM for classification (SVMC). We show that for a given SVMC solution there exists a SVMR solution which is equivalent for a certain choice of the parameters. In particular our result is that for $epsilon$ sufficiently close to one, the optimal hyperplane and threshold for the SVMC problem with regularization parameter C_c are equal to (1-epsilon)^{- 1} times the optimal hyperplane and threshold for SVMR with regularization parameter C_r = (1-epsilon)C_c. A direct consequence of this result is that SVMC can be seen as a special case of SVMR.
Resumo:
Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.
Resumo:
This paper presents a computation of the $V_gamma$ dimension for regression in bounded subspaces of Reproducing Kernel Hilbert Spaces (RKHS) for the Support Vector Machine (SVM) regression $epsilon$-insensitive loss function, and general $L_p$ loss functions. Finiteness of the RV_gamma$ dimension is shown, which also proves uniform convergence in probability for regression machines in RKHS subspaces that use the $L_epsilon$ or general $L_p$ loss functions. This paper presenta a novel proof of this result also for the case that a bias is added to the functions in the RKHS.
Resumo:
Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants
Resumo:
It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation of the analysis in terms of components. An exemple with time-budgets illustrates the method and the graphical features
Resumo:
In CoDaWork’05, we presented an application of discriminant function analysis (DFA) to 4 different compositional datasets and modelled the first canonical variable using a segmented regression model solely based on an observation about the scatter plots. In this paper, multiple linear regressions are applied to different datasets to confirm the validity of our proposed model. In addition to dating the unknown tephras by calibration as discussed previously, another method of mapping the unknown tephras into samples of the reference set or missing samples in between consecutive reference samples is proposed. The application of these methodologies is demonstrated with both simulated and real datasets. This new proposed methodology provides an alternative, more acceptable approach for geologists as their focus is on mapping the unknown tephra with relevant eruptive events rather than estimating the age of unknown tephra. Kew words: Tephrochronology; Segmented regression
Resumo:
Based on Rijt-Plooij and Plooij’s (1992) research on emergence of regression periods in the first two years of life, the presence of such periods in a group of 18 babies (10 boys and 8 girls, aged between 3 weeks and 14 months) from a Catalonian population was analyzed. The measurements were a questionnaire filled in by the infants’ mothers, a semi-structured weekly tape-recorded interview, and observations in their homes. The procedure and the instruments used in the project follow those proposed by Rijt-Plooij and Plooij. Our results confirm the existence of the regression periods in the first year of children’s life. Inter-coder agreement for trained coders was 78.2% and within-coder agreement was 90.1 %. In the discussion, the possible meaning and relevance of regression periods in order to understand development from a psychobiological and social framework is commented upon
Resumo:
Often practical performance of analytical redundancy for fault detection and diagnosis is decreased by uncertainties prevailing not only in the system model, but also in the measurements. In this paper, the problem of fault detection is stated as a constraint satisfaction problem over continuous domains with a big number of variables and constraints. This problem can be solved using modal interval analysis and consistency techniques. Consistency techniques are then shown to be particularly efficient to check the consistency of the analytical redundancy relations (ARRs), dealing with uncertain measurements and parameters. Through the work presented in this paper, it can be observed that consistency techniques can be used to increase the performance of a robust fault detection tool, which is based on interval arithmetic. The proposed method is illustrated using a nonlinear dynamic model of a hydraulic system