991 resultados para vector auto regression


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Every year, autochthonous cases of Plasmodium vivax malaria occur in low-endemicity areas of Vale do Ribeira in the south-eastern part of the Atlantic Forest, state of São Paulo, where Anopheles cruzii and Anopheles bellator are considered the primary vectors. However, other species in the subgenus Nyssorhynchus of Anopheles (e.g., Anopheles marajoara) are abundant and may participate in the dynamics of malarial transmission in that region. The objectives of the present study were to assess the spatial distribution of An. cruzii, An. bellator and An. marajoara and to associate the presence of these species with malaria cases in the municipalities of the Vale do Ribeira. Potential habitat suitability modelling was applied to determine both the spatial distribution of An. cruzii, An. bellator and An. marajoara and to establish the density of each species. Poisson regression was utilized to associate malaria cases with estimated vector densities. As a result, An. cruzii was correlated with the forested slopes of the Serra do Mar, An. bellator with the coastal plain and An. marajoara with the deforested areas. Moreover, both An. marajoara and An. cruzii were positively associated with malaria cases. Considering that An. marajoara was demonstrated to be a primary vector of human Plasmodium in the rural areas of the state of Amapá, more attention should be given to the species in the deforested areas of the Atlantic Forest, where it might be a secondary vector.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertação apresentada para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Ciência Política e Relações Internacionais

Relevância:

30.00% 30.00%

Publicador:

Resumo:

RESUMO - Objetivos: Anualmente morrem cerca de 1,3 milhões de pessoas, a nível mundial, devido aos acidentes de viação. Também mais de 20 milhões de pessoas sofrem ferimentos ligeiros ou graves devido aos acidentes de viação que resultam em incapacidade temporária ou permanente. Desta forma, consideram-se os acidentes de viação, um grave problema de saúde pública, com custos elevados para as sociedades afetando a saúde das populações e economias de cada país. Este estudo pretendeu descrever e caracterizar os condutores de veículos ligeiros, residentes em Portugal Continental, abrangendo características sociodemográficas, experiência de condução e questões relativas a atitudes, opiniões e comportamentos. Por outro lado procurou-se analisar a associação entre as opiniões, atitudes e comportamentos, auto reportados e a ocorrência de um acidente de viação nos últimos três anos a fim de construir um modelo final preditivo do risco de sofrer um acidente de viação. Método: Foi realizado um estudo observacional analítico transversal baseado num questionário traduzido para a língua portuguesa e com origem no projeto europeu SARTRE 4. A população-alvo foram todos os condutores de veículos ligeiros possuidores de uma licença de condução e residentes em Portugal Continental, baseado numa amostra de igual dimensão à definida no estudo europeu SARTRE 4 (600 condutores de veículos ligeiros). Das 52 perguntas existentes, selecionaram-se pela análise de componentes principais (ACP) variáveis potencialmente independentes e complementares para as componentes opiniões, atitudes e comportamentos. Para além das medidas descritivas usuais, recorreu-se à regressão logística binária para analisar associações e obter um modelo que permitisse estimar a probabilidade de sofrer um acidente rodoviário em função das variáveis selecionadas referentes às opiniões, atitudes e comportamentos auto reportados. Resultados: Dos 612 condutores inquiridos, 62,7% (383) responderam não ter sofrido nenhum acidente de viação nos últimos três anos enquanto 37,3% (228) respondeu ter estado envolvido em pelo menos um acidente de viação com danos materiais ou feridos, no mesmo período. De uma forma geral, o típico condutor que referiu ter sofrido um acidente nos últimos três anos é homem com mais de 65 anos de idade, com o 1º ensino básico, viúvo e sem filhos, não empregado e reside numa área urbana. Os condutores residentes numa área suburbana apresentaram um risco 5,368 mais elevado de sofrer um acidente de viação em relação aos condutores que habitam numa zona rural (IC 95%: 2,344-12,297; p<0,001). Os condutores que foram apenas submetidos uma vez a um controlo de álcool, nos últimos três anos, durante o exercício da condução apresentaram um risco 3,009 superior de sofrer um acidente de viação em relação aos condutores que nunca foram fiscalizados pela polícia (IC 95%: 1,949-4,647, p<0,001). Os condutores que referiram muito frequentemente parar para dormir quando se sentem cansados a conduzir têm uma probabilidade inferior de 81% de sofrer um acidente de viação em relação aos condutores que nunca o fazem (IC 95%: 0,058-0,620; p=0,006). Os condutores que quando cansados raramente bebem um café/bebida energética têm um risco de 4,829 superior de sofrer um acidente de viação do que os condutores que sempre o referiram fazer (IC 95%:1,807-12,903; p=0,002). Conclusões: Os resultados obtidos em relação aos fatores comportamentais vão ao encontro da maioria dos fatores de risco associados aos acidentes de viação referidos na literatura. Ainda assim, foram identificadas novas associações entre o risco de sofrer um acidente e as opiniões e as atitudes auto reportadas que através de estudos de maiores dimensões populacionais poderão vir a ser mais exploradas. Este trabalho vem reforçar a necessidade urgente de novas estratégias de intervenção, principalmente na componente comportamental, direcionadas aos grupos de risco, mantendo as existentes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

RESUMO - Introdução: A saúde oral é uma componente essencial na saúde geral e no bem-estar dos indivíduos. Sabe-se que os problemas de saúde oral afectam predominantemente os elementos de níveis socioeconómicos mais baixos, evidenciando a influência dos determinantes sociais da saúde na saúde oral das populações. Os objectivos deste estudo são caracterizar os comportamentos de rotinas diárias de higiene oral, frequências de idas a consultas de saúde oral, auto-avaliação do estado de saúde oral e percepção de dor na cavidade oral em crianças de 12 anos em Portugal e analisar a associação entre estes e os factores sociodemográficos. Métodos: Foi realizado um estudo observacional, transversal e analítico, abrangendo 1309 jovens e baseado em informação recolhida no III Estudo Nacional de Prevalência de Doenças Orais (ENPDO). Para além das estatísticas descritivas usuais, as estatísticas inferenciais basearam-se predominantemente em modelos de regressão logística binária. Resultados: Dos participantes, 70.6% (n=924) escova “duas ou mais vezes por dia” com associação com todas as variáveis sociodemográficas. Na análise multivariada, o género masculino (OR=2.088; IC95%: 1.574-2.770, em relação ao género feminino), a área de residência predominantemente rural ou mediamente urbana (OR= 1.800; IC95%: 2.587; OR=1.516; IC95%: 1.093-2.103, em relação a zonas predominantemente urbanas), a escolaridade da mãe ser o ensino básico (OR= 2.112; IC95%: 1.408-3.168, em relação ao ensino superior) e a actividade laboral do pai ser desempregado (OR= 1.938; IC95%: 1.280-2.934, em relação a ser trabalhador) foram as variáveis com mais impacto para a adopção de comportamentos de escovagem potencialmente inadequados (p<0.05). A maioria dos inquiridos (94.2%; n=1247) já tinham ido a uma consulta de saúde oral e 74.5% (n=860) nos últimos 12 meses, 95.5% (n=1250) encontram-se satisfeitos com a saúde oral e 44.5% (n=578) afirma ter tido algum tipo de dor na cavidade oral nos últimos 12 meses. Conclusão: Os resultados obtidos estão de acordo com a literatura em termos de factores de associação. Desta forma, a saúde oral nos jovens de 12 anos em Portugal, nos diversos contextos aqui analisados, pode ser considerada como satisfatória. A única excepção relevante é a componente da dor, com valores alarmantes embora de natureza mais subjectiva. A influência dos factores sociodemográficos sugere que futuras abordagens para a promoção da saúde oral tenham em conta os determinantes de saúde no delineamento de estratégias quer a nível individual quer a nível comunitário.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper develops methods for Stochastic Search Variable Selection (currently popular with regression and Vector Autoregressive models) for Vector Error Correction models where there are many possible restrictions on the cointegration space. We show how this allows the researcher to begin with a single unrestricted model and either do model selection or model averaging in an automatic and computationally efficient manner. We apply our methods to a large UK macroeconomic model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Every year, autochthonous cases of Plasmodium vivax malaria occur in low-endemicity areas of Vale do Ribeira in the south-eastern part of the Atlantic Forest, state of São Paulo, where Anopheles cruzii and Anopheles bellator are considered the primary vectors. However, other species in the subgenus Nyssorhynchus of Anopheles (e.g., Anopheles marajoara) are abundant and may participate in the dynamics of malarial transmission in that region. The objectives of the present study were to assess the spatial distribution of An. cruzii, An. bellator and An. marajoara and to associate the presence of these species with malaria cases in the municipalities of the Vale do Ribeira. Potential habitat suitability modelling was applied to determine both the spatial distribution of An. cruzii, An. bellator and An. marajoara and to establish the density of each species. Poisson regression was utilized to associate malaria cases with estimated vector densities. As a result, An. cruzii was correlated with the forested slopes of the Serra do Mar, An. bellator with the coastal plain and An. marajoara with the deforested areas. Moreover, both An. marajoara and An. cruzii were positively associated with malaria cases. Considering that An. marajoara was demonstrated to be a primary vector of human Plasmodium in the rural areas of the state of Amapá, more attention should be given to the species in the deforested areas of the Atlantic Forest, where it might be a secondary vector.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spatial data analysis mapping and visualization is of great importance in various fields: environment, pollution, natural hazards and risks, epidemiology, spatial econometrics, etc. A basic task of spatial mapping is to make predictions based on some empirical data (measurements). A number of state-of-the-art methods can be used for the task: deterministic interpolations, methods of geostatistics: the family of kriging estimators (Deutsch and Journel, 1997), machine learning algorithms such as artificial neural networks (ANN) of different architectures, hybrid ANN-geostatistics models (Kanevski and Maignan, 2004; Kanevski et al., 1996), etc. All the methods mentioned above can be used for solving the problem of spatial data mapping. Environmental empirical data are always contaminated/corrupted by noise, and often with noise of unknown nature. That's one of the reasons why deterministic models can be inconsistent, since they treat the measurements as values of some unknown function that should be interpolated. Kriging estimators treat the measurements as the realization of some spatial randomn process. To obtain the estimation with kriging one has to model the spatial structure of the data: spatial correlation function or (semi-)variogram. This task can be complicated if there is not sufficient number of measurements and variogram is sensitive to outliers and extremes. ANN is a powerful tool, but it also suffers from the number of reasons. of a special type ? multiplayer perceptrons ? are often used as a detrending tool in hybrid (ANN+geostatistics) models (Kanevski and Maignank, 2004). Therefore, development and adaptation of the method that would be nonlinear and robust to noise in measurements, would deal with the small empirical datasets and which has solid mathematical background is of great importance. The present paper deals with such model, based on Statistical Learning Theory (SLT) - Support Vector Regression. SLT is a general mathematical framework devoted to the problem of estimation of the dependencies from empirical data (Hastie et al, 2004; Vapnik, 1998). SLT models for classification - Support Vector Machines - have shown good results on different machine learning tasks. The results of SVM classification of spatial data are also promising (Kanevski et al, 2002). The properties of SVM for regression - Support Vector Regression (SVR) are less studied. First results of the application of SVR for spatial mapping of physical quantities were obtained by the authorsin for mapping of medium porosity (Kanevski et al, 1999), and for mapping of radioactively contaminated territories (Kanevski and Canu, 2000). The present paper is devoted to further understanding of the properties of SVR model for spatial data analysis and mapping. Detailed description of the SVR theory can be found in (Cristianini and Shawe-Taylor, 2000; Smola, 1996) and basic equations for the nonlinear modeling are given in section 2. Section 3 discusses the application of SVR for spatial data mapping on the real case study - soil pollution by Cs137 radionuclide. Section 4 discusses the properties of the modelapplied to noised data or data with outliers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The quantitative structure property relationship (QSPR) for the boiling point (Tb) of polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs) was investigated. The molecular distance-edge vector (MDEV) index was used as the structural descriptor. The quantitative relationship between the MDEV index and Tb was modeled by using multivariate linear regression (MLR) and artificial neural network (ANN), respectively. Leave-one-out cross validation and external validation were carried out to assess the prediction performance of the models developed. For the MLR method, the prediction root mean square relative error (RMSRE) of leave-one-out cross validation and external validation was 1.77 and 1.23, respectively. For the ANN method, the prediction RMSRE of leave-one-out cross validation and external validation was 1.65 and 1.16, respectively. A quantitative relationship between the MDEV index and Tb of PCDD/Fs was demonstrated. Both MLR and ANN are practicable for modeling this relationship. The MLR model and ANN model developed can be used to predict the Tb of PCDD/Fs. Thus, the Tb of each PCDD/F was predicted by the developed models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this study is to examine attributes which have explanation power to the probability of default or serious overdue in secured auto loans. Another goal is to find out differences between defaulted loans and loans which have had payment difficulties but survived without defaulting. 19 independent variables used in this study reflect information available at the time of credit decision. These variables were tested with logistic regression and backward elimination procedure. The data includes 8931 auto loans from a Finnish finance company. 1118 of the contracts were taken by company customers and 7813 by private customers. 130 of the loans defaulted and 584 had serious payment problems but did not default. The maturities of those loans were from one month to 60 months and they have ended during year 2011. The LTV (loan-to-value) variable was ranked as the most significant explainer because of its strong positive relationship with probability of payment difficulties. Another important explainer in this study was the credit rating variable which got a negative relationship with payment problems. Also maturity and car age performed well having both a positive relationship with the probability of payment problems. When compared default and serious overdue situations, the most significant differences were found in the roles of LTV, Maturity and Gender variables.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper studies seemingly unrelated linear models with integrated regressors and stationary errors. By adding leads and lags of the first differences of the regressors and estimating this augmented dynamic regression model by feasible generalized least squares using the long-run covariance matrix, we obtain an efficient estimator of the cointegrating vector that has a limiting mixed normal distribution. Simulation results suggest that this new estimator compares favorably with others already proposed in the literature. We apply these new estimators to the testing of purchasing power parity (PPP) among the G-7 countries. The test based on the efficient estimates rejects the PPP hypothesis for most countries.