961 resultados para Linear Models in Temporal Series
Resumo:
Despite the widespread popularity of linear models for correlated outcomes (e.g. linear mixed models and time series models), distribution diagnostic methodology remains relatively underdeveloped in this context. In this paper we present an easy-to-implement approach that lends itself to graphical displays of model fit. Our approach involves multiplying the estimated margional residual vector by the Cholesky decomposition of the inverse of the estimated margional variance matrix. The resulting "rotated" residuals are used to construct an empirical cumulative distribution function and pointwise standard errors. The theoretical framework, including conditions and asymptotic properties, involves technical details that are motivated by Lange and Ryan (1989), Pierce (1982), and Randles (1982). Our method appears to work well in a variety of circumstances, including models having independent units of sampling (clustered data) and models for which all observations are correlated (e.g., a single time series). Our methods can produce satisfactory results even for models that do not satisfy all of the technical conditions stated in our theory.
Resumo:
Various inference procedures for linear regression models with censored failure times have been studied extensively. Recent developments on efficient algorithms to implement these procedures enhance the practical usage of such models in survival analysis. In this article, we present robust inferences for certain covariate effects on the failure time in the presence of "nuisance" confounders under a semiparametric, partial linear regression setting. Specifically, the estimation procedures for the regression coefficients of interest are derived from a working linear model and are valid even when the function of the confounders in the model is not correctly specified. The new proposals are illustrated with two examples and their validity for cases with practical sample sizes is demonstrated via a simulation study.
Resumo:
Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases. This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM to subsetted versions of the data. Additional gains in efficiency are achieved for Poisson models, commonly used in disease mapping problems, because of their special collapsibility property which allows data reduction through summaries. Convergence of the proposed iterative procedure is guaranteed for canonical link functions. The strategy is applied to investigate the relationship between ischemic heart disease, socioeconomic status and age/gender category in New South Wales, Australia, based on outcome data consisting of approximately 33 million records. A simulation study demonstrates the algorithm's reliability in analyzing a data set with 12 million records for a (non-collapsible) logistic regression model.
Resumo:
This paper proposes Poisson log-linear multilevel models to investigate population variability in sleep state transition rates. We specifically propose a Bayesian Poisson regression model that is more flexible, scalable to larger studies, and easily fit than other attempts in the literature. We further use hierarchical random effects to account for pairings of individuals and repeated measures within those individuals, as comparing diseased to non-diseased subjects while minimizing bias is of epidemiologic importance. We estimate essentially non-parametric piecewise constant hazards and smooth them, and allow for time varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming piecewise constant hazards. This relationship allows us to synthesize two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed.
Resumo:
Life expectancy has consistently increased over the last 150 years due to improvements in nutrition, medicine, and public health. Several studies found that in many developed countries, life expectancy continued to rise following a nearly linear trend, which was contrary to a common belief that the rate of improvement in life expectancy would decelerate and was fit with an S-shaped curve. Using samples of countries that exhibited a wide range of economic development levels, we explored the change in life expectancy over time by employing both nonlinear and linear models. We then observed if there were any significant differences in estimates between linear models, assuming an auto-correlated error structure. When data did not have a sigmoidal shape, nonlinear growth models sometimes failed to provide meaningful parameter estimates. The existence of an inflection point and asymptotes in the growth models made them inflexible with life expectancy data. In linear models, there was no significant difference in the life expectancy growth rate and future estimates between ordinary least squares (OLS) and generalized least squares (GLS). However, the generalized least squares model was more robust because the data involved time-series variables and residuals were positively correlated. ^
Resumo:
La diabetes comprende un conjunto de enfermedades metabólicas que se caracterizan por concentraciones de glucosa en sangre anormalmente altas. En el caso de la diabetes tipo 1 (T1D, por sus siglas en inglés), esta situación es debida a una ausencia total de secreción endógena de insulina, lo que impide a la mayoría de tejidos usar la glucosa. En tales circunstancias, se hace necesario el suministro exógeno de insulina para preservar la vida del paciente; no obstante, siempre con la precaución de evitar caídas agudas de la glucemia por debajo de los niveles recomendados de seguridad. Además de la administración de insulina, las ingestas y la actividad física son factores fundamentales que influyen en la homeostasis de la glucosa. En consecuencia, una gestión apropiada de la T1D debería incorporar estos dos fenómenos fisiológicos, en base a una identificación y un modelado apropiado de los mismos y de sus sorrespondientes efectos en el balance glucosa-insulina. En particular, los sistemas de páncreas artificial –ideados para llevar a cabo un control automático de los niveles de glucemia del paciente– podrían beneficiarse de la integración de esta clase de información. La primera parte de esta tesis doctoral cubre la caracterización del efecto agudo de la actividad física en los perfiles de glucosa. Con este objetivo se ha llevado a cabo una revisión sistemática de la literatura y meta-análisis que determinen las respuestas ante varias modalidades de ejercicio para pacientes con T1D, abordando esta caracterización mediante unas magnitudes que cuantifican las tasas de cambio en la glucemia a lo largo del tiempo. Por otro lado, una identificación fiable de los periodos con actividad física es un requisito imprescindible para poder proveer de esa información a los sistemas de páncreas artificial en condiciones libres y ambulatorias. Por esta razón, la segunda parte de esta tesis está enfocada a la propuesta y evaluación de un sistema automático diseñado para reconocer periodos de actividad física, clasificando su nivel de intensidad (ligera, moderada o vigorosa); así como, en el caso de periodos vigorosos, identificando también la modalidad de ejercicio (aeróbica, mixta o de fuerza). En este sentido, ambos aspectos tienen una influencia específica en el mecanismo metabólico que suministra la energía para llevar a cabo el ejercicio y, por tanto, en las respuestas glucémicas en T1D. En este trabajo se aplican varias combinaciones de técnicas de aprendizaje máquina y reconocimiento de patrones sobre la fusión multimodal de señales de acelerometría y ritmo cardíaco, las cuales describen tanto aspectos mecánicos del movimiento como la respuesta fisiológica del sistema cardiovascular ante el ejercicio. Después del reconocimiento de patrones se incorpora también un módulo de filtrado temporal para sacar partido a la considerable coherencia temporal presente en los datos, una redundancia que se origina en el hecho de que en la práctica, las tendencias en cuanto a actividad física suelen mantenerse estables a lo largo de cierto tiempo, sin fluctuaciones rápidas y repetitivas. El tercer bloque de esta tesis doctoral aborda el tema de las ingestas en el ámbito de la T1D. En concreto, se propone una serie de modelos compartimentales y se evalúan éstos en función de su capacidad para describir matemáticamente el efecto remoto de las concetraciones plasmáticas de insulina exógena sobre las tasas de eleiminación de la glucosa atribuible a la ingesta; un aspecto hasta ahora no incorporado en los principales modelos de paciente para T1D existentes en la literatura. Los datos aquí utilizados se obtuvieron gracias a un experimento realizado por el Institute of Metabolic Science (Universidad de Cambridge, Reino Unido) con 16 pacientes jóvenes. En el experimento, de tipo ‘clamp’ con objetivo variable, se replicaron los perfiles individuales de glucosa, según lo observado durante una visita preliminar tras la ingesta de una cena con o bien alta carga glucémica, o bien baja. Los seis modelos mecanísticos evaluados constaban de: a) submodelos de doble compartimento para las masas de trazadores de glucosa, b) un submodelo de único compartimento para reflejar el efecto remoto de la insulina, c) dos tipos de activación de este mismo efecto remoto (bien lineal, bien con un punto de corte), y d) diversas condiciones iniciales. ABSTRACT Diabetes encompasses a series of metabolic diseases characterized by abnormally high blood glucose concentrations. In the case of type 1 diabetes (T1D), this situation is caused by a total absence of endogenous insulin secretion, which impedes the use of glucose by most tissues. In these circumstances, exogenous insulin supplies are necessary to maintain patient’s life; although caution is always needed to avoid acute decays in glycaemia below safe levels. In addition to insulin administrations, meal intakes and physical activity are fundamental factors influencing glucose homoeostasis. Consequently, a successful management of T1D should incorporate these two physiological phenomena, based on an appropriate identification and modelling of these events and their corresponding effect on the glucose-insulin balance. In particular, artificial pancreas systems –designed to perform an automated control of patient’s glycaemia levels– may benefit from the integration of this type of information. The first part of this PhD thesis covers the characterization of the acute effect of physical activity on glucose profiles. With this aim, a systematic review of literature and metaanalyses are conduced to determine responses to various exercise modalities in patients with T1D, assessed via rates-of-change magnitudes to quantify temporal variations in glycaemia. On the other hand, a reliable identification of physical activity periods is an essential prerequisite to feed artificial pancreas systems with information concerning exercise in ambulatory, free-living conditions. For this reason, the second part of this thesis focuses on the proposal and evaluation of an automatic system devised to recognize physical activity, classifying its intensity level (light, moderate or vigorous) and for vigorous periods, identifying also its exercise modality (aerobic, mixed or resistance); since both aspects have a distinctive influence on the predominant metabolic pathway involved in fuelling exercise, and therefore, in the glycaemic responses in T1D. Various combinations of machine learning and pattern recognition techniques are applied on the fusion of multi-modal signal sources, namely: accelerometry and heart rate measurements, which describe both mechanical aspects of movement and the physiological response of the cardiovascular system to exercise. An additional temporal filtering module is incorporated after recognition in order to exploit the considerable temporal coherence (i.e. redundancy) present in data, which stems from the fact that in practice, physical activity trends are often maintained stable along time, instead of fluctuating rapid and repeatedly. The third block of this PhD thesis addresses meal intakes in the context of T1D. In particular, a number of compartmental models are proposed and compared in terms of their ability to describe mathematically the remote effect of exogenous plasma insulin concentrations on the disposal rates of meal-attributable glucose, an aspect which had not yet been incorporated to the prevailing T1D patient models in literature. Data were acquired in an experiment conduced at the Institute of Metabolic Science (University of Cambridge, UK) on 16 young patients. A variable-target glucose clamp replicated their individual glucose profiles, observed during a preliminary visit after ingesting either a high glycaemic-load or a low glycaemic-load evening meal. The six mechanistic models under evaluation here comprised: a) two-compartmental submodels for glucose tracer masses, b) a single-compartmental submodel for insulin’s remote effect, c) two types of activations for this remote effect (either linear or with a ‘cut-off’ point), and d) diverse forms of initial conditions.
Resumo:
This paper consides the problem of extracting the relationships between two time series in a non-linear non-stationary environment with Hidden Markov Models (HMMs). We describe an algorithm which is capable of identifying associations between variables. The method is applied both to synthetic data and real data. We show that HMMs are capable of modelling the oil drilling process and that they outperform existing methods.
Resumo:
The deficiencies of stationary models applied to financial time series are well documented. A special form of non-stationarity, where the underlying generator switches between (approximately) stationary regimes, seems particularly appropriate for financial markets. We use a dynamic switching (modelled by a hidden Markov model) combined with a linear dynamical system in a hybrid switching state space model (SSSM) and discuss the practical details of training such models with a variational EM algorithm due to [Ghahramani and Hilton,1998]. The performance of the SSSM is evaluated on several financial data sets and it is shown to improve on a number of existing benchmark methods.
Resumo:
In this paper, the exchange rate forecasting performance of neural network models are evaluated against the random walk, autoregressive moving average and generalised autoregressive conditional heteroskedasticity models. There are no guidelines available that can be used to choose the parameters of neural network models and therefore, the parameters are chosen according to what the researcher considers to be the best. Such an approach, however,implies that the risk of making bad decisions is extremely high, which could explain why in many studies, neural network models do not consistently perform better than their time series counterparts. In this paper, through extensive experimentation, the level of subjectivity in building neural network models is considerably reduced and therefore giving them a better chance of Forecasting exchange rates with linear and nonlinear models 415 performing well. The results show that in general, neural network models perform better than the traditionally used time series models in forecasting exchange rates.
Resumo:
In this paper the exchange rate forecasting performance of neural network models are evaluated against random walk and a range of time series models. There are no guidelines available that can be used to choose the parameters of neural network models and therefore the parameters are chosen according to what the researcher considers to be the best. Such an approach, however, implies that the risk of making bad decisions is extremely high which could explain why in many studies neural network models do not consistently perform better than their time series counterparts. In this paper through extensive experimentation the level of subjectivity in building neural network models is considerably reduced and therefore giving them a better chance of performing well. Our results show that in general neural network models perform better than traditionally used time series models in forecasting exchange rates.
Resumo:
In non-linear random effects some attention has been very recently devoted to the analysis ofsuitable transformation of the response variables separately (Taylor 1996) or not (Oberg and Davidian 2000) from the transformations of the covariates and, as far as we know, no investigation has been carried out on the choice of link function in such models. In our study we consider the use of a random effect model when a parameterized family of links (Aranda-Ordaz 1981, Prentice 1996, Pregibon 1980, Stukel 1988 and Czado 1997) is introduced. We point out the advantages and the drawbacks associated with the choice of this data-driven kind of modeling. Difficulties in the interpretation of regression parameters, and therefore in understanding the influence of covariates, as well as problems related to loss of efficiency of estimates and overfitting, are discussed. A case study on radiotherapy usage in breast cancer treatment is discussed.
Resumo:
The high cost of maize in Kenya is basically driven by East African regional commodity demand forces and agricultural drought. The production of maize, which is a common staple food in Kenya, is greatly affected by agricultural drought. However, calculations of drought risk and impact on maize production in Kenya is limited by the scarcity of reliable rainfall data. The objective of this study was to apply a novel hyperspectral remote sensing method to modelling temporal fluctuations of maize production and prices in five markets in Kenya. SPOT-VEGETATION NDVI time series were corrected for seasonal effects by computing the standardized NDVI anomalies. The maize residual price time series was further related to the NDVI seasonal anomalies using a multiple linear regression modelling approach. The result shows a moderately strong positive relationship (0.67) between residual price series and global maize prices. Maize prices were high during drought periods (i.e. negative NDVI anomalies) and low during wet seasons (i.e. positive NDVI anomalies). This study concludes that NDVI is a good index for monitoring the evolution of maize prices and food security emergency planning in Kenya. To obtain a very strong correlation for the relationship between the wholesale maize price and the global maize price, future research could consider adding other price-driving factors into the regression models.
Resumo:
Objectives: Air-pollution exposure has been associated with increased cardiovascular hospital admissions and mortality in time-series studies. We evaluated the relation between air pollutants and emergency room (ER) visits because of cardiac arrhythmia in a cardiology hospital. Methods: In a time-series study, we evaluated the association between the emergency room visits as a result of cardiac arrhythmia and daily variations in SO2, CO, NO2, O-3 and PM10, from January 1998 to August 1999. The cases of arrhythmia were modelled using generalised linear Poisson regression models, controlling for seasonality (short-term and long-term trend), and weather. Results: Interquartile range increases in CO (1.5 ppm), NO2 (49,5 mu g/m(3)) and PM10 (22.2 mu g/m(3)) on the concurrent day were associated with increases of 12.3% (95% CI: 7.6% to 17.2%), 10.4% (95% CI: 5.2% to 15.9%) and 6.7% (95% CI: 1.2% to 12.4%) in arrhythmia ER visits, respectively. PM10, CO and NO2 effects were dose-dependent and gaseous pollutants had thresholds. Only CO effect resisted estimates in models with more than one pollutant. Conclusions: Our results showed that air pollutant effects on arrhythmia are predominantly acute starting at concentrations below air quality standards, and the association with CO and NO2 suggests a relevant role for pollution caused by cars.
Resumo:
The search for more realistic modeling of financial time series reveals several stylized facts of real markets. In this work we focus on the multifractal properties found in price and index signals. Although the usual minority game (MG) models do not exhibit multifractality, we study here one of its variants that does. We show that the nonsynchronous MG models in the nonergodic phase is multifractal and in this sense, together with other stylized facts, constitute a better modeling tool. Using the structure function (SF) approach we detected the stationary and the scaling range of the time series generated by the MG model and, from the linear (non-linear) behavior of the SF we identified the fractal (multifractal) regimes. Finally, using the wavelet transform modulus maxima (WTMM) technique we obtained its multifractal spectrum width for different dynamical regimes. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
The objective of this work was to carry a descriptive analysis in the monthly precipitation of rainfall stations from Rio de Janeiro State, Brazil, using data of position and dispersion and graphical analyses, and to verify the presence of seasonality and trend in these data, with a study about the application of models of time series. The descriptive statistics was to characterize the general behavior of the series in three stations selected which present consistent historical series. The methodology of analysis of variance in randomized blocks and the determination of models of multiple linear regression, considering years and months as predictors variables, disclosed the presence of seasonality, what allowed to infer on the occurrence of repetitive natural phenomena throughout the time and absence of trend in the data. It was applied the methodology of multiple linear regression to removal the seasonality of these time series. The original data had been deducted from the estimates made by the adjusted model and the analysis of variance in randomized blocks for the residues of regression was preceded again. With the results obtained it was possible to conclude that the monthly rainfall present seasonality and they don`t present trend, the analysis of multiple regression was efficient in the removal of the seasonality, and the rainfall can be studied by means of time series.