925 resultados para MIXED LINEAR-MODELS


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis addresses data assimilation, which typically refers to the estimation of the state of a physical system given a model and observations, and its application to short-term precipitation forecasting. A general introduction to data assimilation is given, both from a deterministic and' stochastic point of view. Data assimilation algorithms are reviewed, in the static case (when no dynamics are involved), then in the dynamic case. A double experiment on two non-linear models, the Lorenz 63 and the Lorenz 96 models, is run and the comparative performance of the methods is discussed in terms of quality of the assimilation, robustness "in the non-linear regime and computational time. Following the general review and analysis, data assimilation is discussed in the particular context of very short-term rainfall forecasting (nowcasting) using radar images. An extended Bayesian precipitation nowcasting model is introduced. The model is stochastic in nature and relies on the spatial decomposition of the rainfall field into rain "cells". Radar observations are assimilated using a Variational Bayesian method in which the true posterior distribution of the parameters is approximated by a more tractable distribution. The motion of the cells is captured by a 20 Gaussian process. The model is tested on two precipitation events, the first dominated by convective showers, the second by precipitation fronts. Several deterministic and probabilistic validation methods are applied and the model is shown to retain reasonable prediction skill at up to 3 hours lead time. Extensions to the model are discussed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we discuss some practical implications for implementing adaptable network algorithms applied to non-stationary time series problems. Two real world data sets, containing electricity load demands and foreign exchange market prices, are used to test several different methods, ranging from linear models with fixed parameters, to non-linear models which adapt both parameters and model order on-line. Training with the extended Kalman filter, we demonstrate that the dynamic model-order increment procedure of the resource allocating RBF network (RAN) is highly sensitive to the parameters of the novelty criterion. We investigate the use of system noise for increasing the plasticity of the Kalman filter training algorithm, and discuss the consequences for on-line model order selection. The results of our experiments show that there are advantages to be gained in tracking real world non-stationary data through the use of more complex adaptive models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents an effective decision making system for leak detection based on multiple generalized linear models and clustering techniques. The training data for the proposed decision system is obtained by setting up an experimental pipeline fully operational distribution system. The system is also equipped with data logging for three variables; namely, inlet pressure, outlet pressure, and outlet flow. The experimental setup is designed such that multi-operational conditions of the distribution system, including multi pressure and multi flow can be obtained. We then statistically tested and showed that pressure and flow variables can be used as signature of leak under the designed multi-operational conditions. It is then shown that the detection of leakages based on the training and testing of the proposed multi model decision system with pre data clustering, under multi operational conditions produces better recognition rates in comparison to the training based on the single model approach. This decision system is then equipped with the estimation of confidence limits and a method is proposed for using these confidence limits for obtaining more robust leakage recognition results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fermentation processes as objects of modelling and high-quality control are characterized with interdependence and time-varying of process variables that lead to non-linear models with a very complex structure. This is why the conventional optimization methods cannot lead to a satisfied solution. As an alternative, genetic algorithms, like the stochastic global optimization method, can be applied to overcome these limitations. The application of genetic algorithms is a precondition for robustness and reaching of a global minimum that makes them eligible and more workable for parameter identification of fermentation models. Different types of genetic algorithms, namely simple, modified and multi-population ones, have been applied and compared for estimation of nonlinear dynamic model parameters of fed-batch cultivation of S. cerevisiae.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

2002 Mathematics Subject Classification: 62M10.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62P10, 62J12.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We build the Conditional Least Squares Estimator of 0 based on the observation of a single trajectory of {Zk,Ck}k, and give conditions ensuring its strong consistency. The particular case of general linear models according to 0=( 0, 0) and among them, regenerative processes, are studied more particularly. In this frame, we may also prove the consistency of the estimator of 0 although it belongs to an asymptotic negligible part of the model, and the asymptotic law of the estimator may also be calculated.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper explains how Poisson regression can be used in studies in which the dependent variable describes the number of occurrences of some rare event such as suicide. After pointing out why ordinary linear regression is inappropriate for treating dependent variables of this sort, we go on to present the basic Poisson regression model and show how it fits in the broad class of generalized linear models. Then we turn to discussing a major problem of Poisson regression known as overdispersion and suggest possible solutions, including the correction of standard errors and negative binomial regression. The paper ends with a detailed empirical example, drawn from our own research on suicide.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This study explores factors related to the prompt difficulty in Automated Essay Scoring. The sample was composed of 6,924 students. For each student, there were 1-4 essays, across 20 different writing prompts, for a total of 20,243 essays. E-rater® v.2 essay scoring engine developed by the Educational Testing Service was used to score the essays. The scoring engine employs a statistical model that incorporates 10 predictors associated with writing characteristics of which 8 were used. The Rasch partial credit analysis was applied to the scores to determine the difficulty levels of prompts. In addition, the scores were used as outcomes in the series of hierarchical linear models (HLM) in which students and prompts constituted the cross-classification levels. This methodology was used to explore the partitioning of the essay score variance.^ The results indicated significant differences in prompt difficulty levels due to genre. Descriptive prompts, as a group, were found to be more difficult than the persuasive prompts. In addition, the essay score variance was partitioned between students and prompts. The amount of the essay score variance that lies between prompts was found to be relatively small (4 to 7 percent). When the essay-level, student-level-and prompt-level predictors were included in the model, it was able to explain almost all variance that lies between prompts. Since in most high-stakes writing assessments only 1-2 prompts per students are used, the essay score variance that lies between prompts represents an undesirable or "noise" variation. Identifying factors associated with this "noise" variance may prove to be important for prompt writing and for constructing Automated Essay Scoring mechanisms for weighting prompt difficulty when assigning essay score.^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Diabetes and diabetes-related complications are major causes of morbidity and mortality in the United States. Depressive symptoms and perceived stress have been identified as possible risk factors for beta cell dysfunction and diabetes. The purpose of this study was to assess associations between depression symptoms and perceived stress with beta cell function between African and Haitian Americans with and without type 2 diabetes. Participants and Methods: Informed consent and data were available for 462 participants (231 African Americans and 231 Haitian Americans) for this cross-sectional study. A demographic questionnaire developed by the Primary Investigator was used to collect information regarding age, gender, smoking, and ethnicity. Diabetes status was determined by self-report and confirmed by fasting blood glucose. Anthropometrics (weight, and height and waist circumference) and vital signs (blood pressure) were taken. Blood samples were drawn after 8 10 hours over-night fasting to measure lipid panel, fasting plasma glucose and serum insulin concentrations. The homeostatic model assessment, version 2 (HOMA2) computer model was used to calculate beta cell function. Depression was assessed using the Beck Depression Inventory-II (BDI-II) and stress levels were assessed using the Perceived Stress Scale (PSS). Results: Moderate to severe depressive symptoms were more likely for persons with diabetes (p = 0.030). There were no differences in perceived stress between ethnicity and diabetes status (p = 0.283). General linear models for participants with and without type 2 diabetes using beta cell function as the dependent variable showed no association with depressive symptoms and perceived stress; however, Haitian Americans had significantly lower beta cell function than African Americans both with and without diabetes and adjusting for age, gender, waist circumference and smoking. Further research is needed to compare these risk factors in other race/ethnic groups.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The purpose of this study was to analyze the behavior of Sell-Side analysts and analysts propose a classification, considering the performance of the price forecasts and recom- mendations (sell-hold-buy) in the Brazilian stock market. For this, the first step was to analyze the consensus of analysts to understand the importance of this collective interven- tion in the market; the second was to analyze the analysts individually to understand how improve their analysis in time. Third was to understand how are the main methods of ranking used in markets. Finally, propose a form of classification that reflects the previous aspects discussed. To investigate the hypotheses proposed in the study were used linear models for panel to capture elements in time. The data of price forecasts and analyst recommendations individually and consensus, in the period 2005-2013 were obtained from Bloomberg R ○ . The main results were: (i) superior performance of consensus recommen- dations, compared with the individual analyzes; (ii) associating the number of analysts issuing recommendations with improved accuracy allows supposing that this number may be associated with increased consensus strength and hence accuracy; (iii) the anchoring effect of the analysts consensus revisions makes his predictions are biased, overvaluating the assets; (iv) analysts need to have greater caution in times of economic turbulence, noting also foreign markets such as the USA. For these may result changes in bias between optimism and pessimism; (v) effects due to changes in bias, as increased pessimism can cause excessive increase in purchase recommendations number. In this case, analysts can should be more cautious in analysis, mainly for consistency between recommendation and the expected price; (vi) the experience of the analyst with the asset economic sector and the asset contributes to the improvement of forecasts, however, the overall experience showed opposite evidence; (vii) the optimism associated with the overall experience, over time, shows a similar behavior to an excess of confidence, which could cause reduction of accuracy; (viii) the conflicting effect of general experience between the accuracy and the observed return shows evidence that, over time, the analyst has effects similar to the endowment bias on assets, which would result in a conflict analysis of recommendations and forecasts ; (ix) despite the focus on fewer sectors contribute to the quality of accuracy, the same does not occur with the focus on assets. So it is possible that analysts may have economies of scale when cover more assets within the same industry; and finally, (x) was possible to develop a proposal for classification analysts to consider both returns and the consistency of these predictions, called Analysis coefficient. This ranking resulted better results, considering the return / standard deviation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Numerous works have been conducted on modelling basic compliant elements such as wire beams, and closed-form analytical models of most basic compliant elements have been well developed. However, the modelling of complex compliant mechanisms is still a challenging work. This paper proposes a constraint-force-based (CFB) modelling approach to model compliant mechanisms with a particular emphasis on modelling complex compliant mechanisms. The proposed CFB modelling approach can be regarded as an improved free-body- diagram (FBD) based modelling approach, and can be extended to a development of the screw-theory-based design approach. A compliant mechanism can be decomposed into rigid stages and compliant modules. A compliant module can offer elastic forces due to its deformation. Such elastic forces are regarded as variable constraint forces in the CFB modelling approach. Additionally, the CFB modelling approach defines external forces applied on a compliant mechanism as constant constraint forces. If a compliant mechanism is at static equilibrium, all the rigid stages are also at static equilibrium under the influence of the variable and constant constraint forces. Therefore, the constraint force equilibrium equations for all the rigid stages can be obtained, and the analytical model of the compliant mechanism can be derived based on the constraint force equilibrium equations. The CFB modelling approach can model a compliant mechanism linearly and nonlinearly, can obtain displacements of any points of the rigid stages, and allows external forces to be exerted on any positions of the rigid stages. Compared with the FBD based modelling approach, the CFB modelling approach does not need to identify the possible deformed configuration of a complex compliant mechanism to obtain the geometric compatibility conditions and the force equilibrium equations. Additionally, the mathematical expressions in the CFB approach have an easily understood physical meaning. Using the CFB modelling approach, the variable constraint forces of three compliant modules, a wire beam, a four-beam compliant module and an eight-beam compliant module, have been derived in this paper. Based on these variable constraint forces, the linear and non-linear models of a decoupled XYZ compliant parallel mechanism are derived, and verified by FEA simulations and experimental tests.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Trees and shrubs in tropical Africa use the C3 cycle as a carbon fixation pathway during photosynthesis, while grasses and sedges mostly use the C4 cycle. Leaf-wax lipids from sedimentary archives such as the long-chain n-alkanes (e.g., n-C27 to n-C33) inherit carbon isotope ratios that are representative of the carbon fixation pathway. Therefore, n-alkane d13C values are often used to reconstruct past C3/C4 composition of vegetation, assuming that the relative proportions of C3 and C4 leaf waxes reflect the relative proportions of C3 and C4 plants. We have compared the d13C values of n-alkanes from modern C3 and C4 plants with previously published values from recent lake sediments and provide a framework for estimating the fractional contribution (areal-based) of C3 vegetation cover (fC3) represented by these sedimentary archives. Samples were collected in Cameroon, across a latitudinal transect that accommodates a wide range of climate zones and vegetation types, as reflected in the progressive northward replacement of C3-dominated rain forest by C4-dominated savanna. The C3 plants analysed were characterised by substantially higher abundances of n-C29 alkanes and by substantially lower abundances of n-C33 alkanes than the C4 plants. Furthermore, the sedimentary d13C values of n-C29 and n-C31 alkanes from recent lake sediments in Cameroon (-37.4 per mil to -26.5 per mil) were generally within the range of d13C values for C3 plants, even when from sites where C4 plants dominated the catchment vegetation. In such cases simple linear mixing models fail to accurately reconstruct the relative proportions of C3 and C4 vegetation cover when using the d13C values of sedimentary n-alkanes, overestimating the proportion of C3 vegetation, likely as a consequence of the differences in plant wax production, preservation, transport, and/or deposition between C3 and C4 plants. We therefore tested a set of non-linear binary mixing models using d13C values from both C3 and C4 vegetation as end-members. The non-linear models included a sigmoid function (sine-squared) that describes small variations in the fC3 values as the minimum and maximum d13C values are approached, and a hyperbolic function that takes into account the differences between C3 and C4 plants discussed above. Model fitting and the estimation of uncertainties were completed using the Monte Carlo algorithm and can be improved by future data addition. Models that provided the best fit with the observed d13C values of sedimentary n-alkanes were either hyperbolic functions or a combination of hyperbolic and sine-squared functions. Such non-linear models may be used to convert d13C measurements on sedimentary n-alkanes directly into reconstructions of C3 vegetation cover.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

[EN]To compare the one year effect of two dietary interventions with MeDiet on GL and GI in the PREDIMED trial. Methods. Participants were older subjects at high risk for cardiovascular disease. This analysis included 2866 nondiabetic subjects. Diet was assessed with a validated 137-item food frequency questionnaire (FFQ). The GI of each FFQ item was assigned by a 5-step methodology using the International Tables of GI and GL Values. Generalized linear models were fitted to assess the relationship between the intervention group and dietary GL and GI at one year of follow-up, using control group as reference.