978 resultados para Vector Auto Regression
Resumo:
Climate change impact assessment studies involve downscaling large-scale atmospheric predictor variables (LSAPVs) simulated by general circulation models (GCMs) to site-scale meteorological variables. This article presents a least-square support vector machine (LS-SVM)-based methodology for multi-site downscaling of maximum and minimum daily temperature series. The methodology involves (1) delineation of sites in the study area into clusters based on correlation structure of predictands, (2) downscaling LSAPVs to monthly time series of predictands at a representative site identified in each of the clusters, (3) translation of the downscaled information in each cluster from the representative site to that at other sites using LS-SVM inter-site regression relationships, and (4) disaggregation of the information at each site from monthly to daily time scale using k-nearest neighbour disaggregation methodology. Effectiveness of the methodology is demonstrated by application to data pertaining to four sites in the catchment of Beas river basin, India. Simulations of Canadian coupled global climate model (CGCM3.1/T63) for four IPCC SRES scenarios namely A1B, A2, B1 and COMMIT were downscaled to future projections of the predictands in the study area. Comparison of results with those based on recently proposed multivariate multiple linear regression (MMLR) based downscaling method and multi-site multivariate statistical downscaling (MMSD) method indicate that the proposed method is promising and it can be considered as a feasible choice in statistical downscaling studies. The performance of the method in downscaling daily minimum temperature was found to be better when compared with that in downscaling daily maximum temperature. Results indicate an increase in annual average maximum and minimum temperatures at all the sites for A1B, A2 and B1 scenarios. The projected increment is high for A2 scenario, and it is followed by that for A1B, B1 and COMMIT scenarios. Projections, in general, indicated an increase in mean monthly maximum and minimum temperatures during January to February and October to December.
Resumo:
Models of river flow time series are essential in efficient management of a river basin. It helps policy makers in developing efficient water utilization strategies to maximize the utility of scarce water resource. Time series analysis has been used extensively for modeling river flow data. The use of machine learning techniques such as support-vector regression and neural network models is gaining increasing popularity. In this paper we compare the performance of these techniques by applying it to a long-term time-series data of the inflows into the Krishnaraja Sagar reservoir (KRS) from three tributaries of the river Cauvery. In this study flow data over a period of 30 years from three different observation points established in upper Cauvery river sub-basin is analyzed to estimate their contribution to KRS. Specifically, ANN model uses a multi-layer feed forward network trained with a back-propagation algorithm and support vector regression with epsilon intensive-loss function is used. Auto-regressive moving average models are also applied to the same data. The performance of different techniques is compared using performance metrics such as root mean squared error (RMSE), correlation, normalized root mean squared error (NRMSE) and Nash-Sutcliffe Efficiency (NSE).
Resumo:
This paper proposes a new hierarchical learning structure, namely the holistic triple learning (HTL), for extending the binary support vector machine (SVM) to multi-classification problems. For an N-class problem, a HTL constructs a decision tree up to a depth of A leaf node of the decision tree is allowed to be placed with a holistic triple learning unit whose generalisation abilities are assessed and approved. Meanwhile, the remaining nodes in the decision tree each accommodate a standard binary SVM classifier. The holistic triple classifier is a regression model trained on three classes, whose training algorithm is originated from a recently proposed implementation technique, namely the least-squares support vector machine (LS-SVM). A major novelty with the holistic triple classifier is the reduced number of support vectors in the solution. For the resultant HTL-SVM, an upper bound of the generalisation error can be obtained. The time complexity of training the HTL-SVM is analysed, and is shown to be comparable to that of training the one-versus-one (1-vs.-1) SVM, particularly on small-scale datasets. Empirical studies show that the proposed HTL-SVM achieves competitive classification accuracy with a reduced number of support vectors compared to the popular 1-vs-1 alternative.
Resumo:
Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.
Resumo:
RESUMO - Objetivos: Anualmente morrem cerca de 1,3 milhões de pessoas, a nível mundial, devido aos acidentes de viação. Também mais de 20 milhões de pessoas sofrem ferimentos ligeiros ou graves devido aos acidentes de viação que resultam em incapacidade temporária ou permanente. Desta forma, consideram-se os acidentes de viação, um grave problema de saúde pública, com custos elevados para as sociedades afetando a saúde das populações e economias de cada país. Este estudo pretendeu descrever e caracterizar os condutores de veículos ligeiros, residentes em Portugal Continental, abrangendo características sociodemográficas, experiência de condução e questões relativas a atitudes, opiniões e comportamentos. Por outro lado procurou-se analisar a associação entre as opiniões, atitudes e comportamentos, auto reportados e a ocorrência de um acidente de viação nos últimos três anos a fim de construir um modelo final preditivo do risco de sofrer um acidente de viação. Método: Foi realizado um estudo observacional analítico transversal baseado num questionário traduzido para a língua portuguesa e com origem no projeto europeu SARTRE 4. A população-alvo foram todos os condutores de veículos ligeiros possuidores de uma licença de condução e residentes em Portugal Continental, baseado numa amostra de igual dimensão à definida no estudo europeu SARTRE 4 (600 condutores de veículos ligeiros). Das 52 perguntas existentes, selecionaram-se pela análise de componentes principais (ACP) variáveis potencialmente independentes e complementares para as componentes opiniões, atitudes e comportamentos. Para além das medidas descritivas usuais, recorreu-se à regressão logística binária para analisar associações e obter um modelo que permitisse estimar a probabilidade de sofrer um acidente rodoviário em função das variáveis selecionadas referentes às opiniões, atitudes e comportamentos auto reportados. Resultados: Dos 612 condutores inquiridos, 62,7% (383) responderam não ter sofrido nenhum acidente de viação nos últimos três anos enquanto 37,3% (228) respondeu ter estado envolvido em pelo menos um acidente de viação com danos materiais ou feridos, no mesmo período. De uma forma geral, o típico condutor que referiu ter sofrido um acidente nos últimos três anos é homem com mais de 65 anos de idade, com o 1º ensino básico, viúvo e sem filhos, não empregado e reside numa área urbana. Os condutores residentes numa área suburbana apresentaram um risco 5,368 mais elevado de sofrer um acidente de viação em relação aos condutores que habitam numa zona rural (IC 95%: 2,344-12,297; p<0,001). Os condutores que foram apenas submetidos uma vez a um controlo de álcool, nos últimos três anos, durante o exercício da condução apresentaram um risco 3,009 superior de sofrer um acidente de viação em relação aos condutores que nunca foram fiscalizados pela polícia (IC 95%: 1,949-4,647, p<0,001). Os condutores que referiram muito frequentemente parar para dormir quando se sentem cansados a conduzir têm uma probabilidade inferior de 81% de sofrer um acidente de viação em relação aos condutores que nunca o fazem (IC 95%: 0,058-0,620; p=0,006). Os condutores que quando cansados raramente bebem um café/bebida energética têm um risco de 4,829 superior de sofrer um acidente de viação do que os condutores que sempre o referiram fazer (IC 95%:1,807-12,903; p=0,002). Conclusões: Os resultados obtidos em relação aos fatores comportamentais vão ao encontro da maioria dos fatores de risco associados aos acidentes de viação referidos na literatura. Ainda assim, foram identificadas novas associações entre o risco de sofrer um acidente e as opiniões e as atitudes auto reportadas que através de estudos de maiores dimensões populacionais poderão vir a ser mais exploradas. Este trabalho vem reforçar a necessidade urgente de novas estratégias de intervenção, principalmente na componente comportamental, direcionadas aos grupos de risco, mantendo as existentes.
Resumo:
RESUMO - Introdução: A saúde oral é uma componente essencial na saúde geral e no bem-estar dos indivíduos. Sabe-se que os problemas de saúde oral afectam predominantemente os elementos de níveis socioeconómicos mais baixos, evidenciando a influência dos determinantes sociais da saúde na saúde oral das populações. Os objectivos deste estudo são caracterizar os comportamentos de rotinas diárias de higiene oral, frequências de idas a consultas de saúde oral, auto-avaliação do estado de saúde oral e percepção de dor na cavidade oral em crianças de 12 anos em Portugal e analisar a associação entre estes e os factores sociodemográficos. Métodos: Foi realizado um estudo observacional, transversal e analítico, abrangendo 1309 jovens e baseado em informação recolhida no III Estudo Nacional de Prevalência de Doenças Orais (ENPDO). Para além das estatísticas descritivas usuais, as estatísticas inferenciais basearam-se predominantemente em modelos de regressão logística binária. Resultados: Dos participantes, 70.6% (n=924) escova “duas ou mais vezes por dia” com associação com todas as variáveis sociodemográficas. Na análise multivariada, o género masculino (OR=2.088; IC95%: 1.574-2.770, em relação ao género feminino), a área de residência predominantemente rural ou mediamente urbana (OR= 1.800; IC95%: 2.587; OR=1.516; IC95%: 1.093-2.103, em relação a zonas predominantemente urbanas), a escolaridade da mãe ser o ensino básico (OR= 2.112; IC95%: 1.408-3.168, em relação ao ensino superior) e a actividade laboral do pai ser desempregado (OR= 1.938; IC95%: 1.280-2.934, em relação a ser trabalhador) foram as variáveis com mais impacto para a adopção de comportamentos de escovagem potencialmente inadequados (p<0.05). A maioria dos inquiridos (94.2%; n=1247) já tinham ido a uma consulta de saúde oral e 74.5% (n=860) nos últimos 12 meses, 95.5% (n=1250) encontram-se satisfeitos com a saúde oral e 44.5% (n=578) afirma ter tido algum tipo de dor na cavidade oral nos últimos 12 meses. Conclusão: Os resultados obtidos estão de acordo com a literatura em termos de factores de associação. Desta forma, a saúde oral nos jovens de 12 anos em Portugal, nos diversos contextos aqui analisados, pode ser considerada como satisfatória. A única excepção relevante é a componente da dor, com valores alarmantes embora de natureza mais subjectiva. A influência dos factores sociodemográficos sugere que futuras abordagens para a promoção da saúde oral tenham em conta os determinantes de saúde no delineamento de estratégias quer a nível individual quer a nível comunitário.
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.
Resumo:
This paper studies seemingly unrelated linear models with integrated regressors and stationary errors. By adding leads and lags of the first differences of the regressors and estimating this augmented dynamic regression model by feasible generalized least squares using the long-run covariance matrix, we obtain an efficient estimator of the cointegrating vector that has a limiting mixed normal distribution. Simulation results suggest that this new estimator compares favorably with others already proposed in the literature. We apply these new estimators to the testing of purchasing power parity (PPP) among the G-7 countries. The test based on the efficient estimates rejects the PPP hypothesis for most countries.
Resumo:
Introduction : Une majorité de Canadiens adopte un mode de vie sédentaire qui est un facteur de risque important pour différents problèmes de santé. Dernièrement, des interventions en santé publique ciblent le transport actif pour augmenter la pratique d’activité physique. Objectif : L’objectif de cette étude est de quantifier la direction et la taille de l’association entre l’état de santé rapporté par des adultes montréalais et leur utilisation de la marche et du vélo utilitaires. Méthode : L’échantillon comprend 4503 résidents de l’Île de Montréal, âgés de 18 ans et plus, ayant répondu à un sondage téléphonique sur la pratique de l’activité physique et du transport actif. Des analyses de régression logistique multiples ont été appliquées pour examiner l’association entre l’état de santé auto-rapporté et la pratique du vélo (N=4386) et entre l’état de santé auto-rapporté et la pratique de la marche utilitaire (N=4350). Résultats : Les gens ayant une santé perçue comme bonne et moyenne/mauvaise ont une probabilité plus faible de pratiquer la marche utilitaire (OR = 0,740; p < 0,05 et OR = 0,552; p < 0,01) que ceux rapportant une excellente santé, alors que cette association n’est pas significative pour la pratique du vélo utilitaire dans notre étude. Conclusion : Bien que les résultats obtenus ne soient pas tous statistiquement significatifs, la probabilité d’utiliser le transport actif semble plus faible chez les adultes indiquant un moins bon état de santé par rapport aux adultes indiquant que leur état de santé est excellent.
Resumo:
Multivariate lifetime data arise in various forms including recurrent event data when individuals are followed to observe the sequence of occurrences of a certain type of event; correlated lifetime when an individual is followed for the occurrence of two or more types of events, or when distinct individuals have dependent event times. In most studies there are covariates such as treatments, group indicators, individual characteristics, or environmental conditions, whose relationship to lifetime is of interest. This leads to a consideration of regression models.The well known Cox proportional hazards model and its variations, using the marginal hazard functions employed for the analysis of multivariate survival data in literature are not sufficient to explain the complete dependence structure of pair of lifetimes on the covariate vector. Motivated by this, in Chapter 2, we introduced a bivariate proportional hazards model using vector hazard function of Johnson and Kotz (1975), in which the covariates under study have different effect on two components of the vector hazard function. The proposed model is useful in real life situations to study the dependence structure of pair of lifetimes on the covariate vector . The well known partial likelihood approach is used for the estimation of parameter vectors. We then introduced a bivariate proportional hazards model for gap times of recurrent events in Chapter 3. The model incorporates both marginal and joint dependence of the distribution of gap times on the covariate vector . In many fields of application, mean residual life function is considered superior concept than the hazard function. Motivated by this, in Chapter 4, we considered a new semi-parametric model, bivariate proportional mean residual life time model, to assess the relationship between mean residual life and covariates for gap time of recurrent events. The counting process approach is used for the inference procedures of the gap time of recurrent events. In many survival studies, the distribution of lifetime may depend on the distribution of censoring time. In Chapter 5, we introduced a proportional hazards model for duration times and developed inference procedures under dependent (informative) censoring. In Chapter 6, we introduced a bivariate proportional hazards model for competing risks data under right censoring. The asymptotic properties of the estimators of the parameters of different models developed in previous chapters, were studied. The proposed models were applied to various real life situations.
Resumo:
We derive a new representation for a function as a linear combination of local correlation kernels at optimal sparse locations and discuss its relation to PCA, regularization, sparsity principles and Support Vector Machines. We first review previous results for the approximation of a function from discrete data (Girosi, 1998) in the context of Vapnik"s feature space and dual representation (Vapnik, 1995). We apply them to show 1) that a standard regularization functional with a stabilizer defined in terms of the correlation function induces a regression function in the span of the feature space of classical Principal Components and 2) that there exist a dual representations of the regression function in terms of a regularization network with a kernel equal to a generalized correlation function. We then describe the main observation of the paper: the dual representation in terms of the correlation function can be sparsified using the Support Vector Machines (Vapnik, 1982) technique and this operation is equivalent to sparsify a large dictionary of basis functions adapted to the task, using a variation of Basis Pursuit De-Noising (Chen, Donoho and Saunders, 1995; see also related work by Donahue and Geiger, 1994; Olshausen and Field, 1995; Lewicki and Sejnowski, 1998). In addition to extending the close relations between regularization, Support Vector Machines and sparsity, our work also illuminates and formalizes the LFA concept of Penev and Atick (1996). We discuss the relation between our results, which are about regression, and the different problem of pattern classification.
Resumo:
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples -- in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics.
Resumo:
This paper presents a computation of the $V_gamma$ dimension for regression in bounded subspaces of Reproducing Kernel Hilbert Spaces (RKHS) for the Support Vector Machine (SVM) regression $epsilon$-insensitive loss function, and general $L_p$ loss functions. Finiteness of the RV_gamma$ dimension is shown, which also proves uniform convergence in probability for regression machines in RKHS subspaces that use the $L_epsilon$ or general $L_p$ loss functions. This paper presenta a novel proof of this result also for the case that a bias is added to the functions in the RKHS.
Resumo:
Objetivo: El objetivo de este estudio fue determinar la relación entre la actividad física (AF) en el tiempo libre y la auto percepción del estado de salud en Colombia. Métodos: a partir de los datos de una muestra compleja se obtuvieron 14601 registros de sujetos entre 18 y 64 años de Colombia. Se aplicaron modelos de regresión logística para la auto percepción de la salud. Resultados: la prevalencia de AF en el tiempo libre fue de 5,8% en mujeres y de 13% en hombres (p < 0,001) y el auto reporte de salud encontró que 27,7% de las mujeres y 19,7% de los hombres se perciben regulares o malos (p < 0,001). Se encontró influencia de grupos de mayor edad, menor escolaridad, afiliados al sistema de seguridad social y área rural de residencia con pobres auto reportes de salud. Un OR de 1,92 (IC 95% 1,19 3,10) reportan las mujeres con bajos niveles de AF en el tiempo libre de auto percibirse pobre en su salud frente a las mujeres con alta AF en el tiempo libre. En los hombres no se encontró esta misma evidencia. Discusión: la influencia de un nivel vigoroso de AF en el tiempo libre sobre la auto percepción del estado de salud en el grupo de mujeres es uno de los principales hallazgos. Estos resultados permiten direccionar políticas públicas tendientes a fomentar la práctica de AF, garantizar el acceso a la educación y a la afiliación a un sistema de salud de la población.
Resumo:
This paper presents an efficient construction algorithm for obtaining sparse kernel density estimates based on a regression approach that directly optimizes model generalization capability. Computational efficiency of the density construction is ensured using an orthogonal forward regression, and the algorithm incrementally minimizes the leave-one-out test score. A local regularization method is incorporated naturally into the density construction process to further enforce sparsity. An additional advantage of the proposed algorithm is that it is fully automatic and the user is not required to specify any criterion to terminate the density construction procedure. This is in contrast to an existing state-of-art kernel density estimation method using the support vector machine (SVM), where the user is required to specify some critical algorithm parameter. Several examples are included to demonstrate the ability of the proposed algorithm to effectively construct a very sparse kernel density estimate with comparable accuracy to that of the full sample optimized Parzen window density estimate. Our experimental results also demonstrate that the proposed algorithm compares favorably with the SVM method, in terms of both test accuracy and sparsity, for constructing kernel density estimates.