14 resultados para regression analyst

em Universidad Politécnica de Madrid


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Locally weighted regression is a technique that predicts the response for new data items from their neighbors in the training data set, where closer data items are assigned higher weights in the prediction. However, the original method may suffer from overfitting and fail to select the relevant variables. In this paper we propose combining a regularization approach with locally weighted regression to achieve sparse models. Specifically, the lasso is a shrinkage and selection method for linear regression. We present an algorithm that embeds lasso in an iterative procedure that alternatively computes weights and performs lasso-wise regression. The algorithm is tested on three synthetic scenarios and two real data sets. Results show that the proposed method outperforms linear and local models for several kinds of scenarios

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Linear regression is a technique widely used in digital signal processing. It consists on finding the linear function that better fits a given set of samples. This paper proposes different hardware architectures for the implementation of the linear regression method on FPGAs, specially targeting area restrictive systems. It saves area at the cost of constraining the lengths of the input signal to some fixed values. We have implemented the proposed scheme in an Automatic Modulation Classifier, meeting the hard real-time constraints this kind of systems have.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a methodology for reducing a straight line fitting regression problem to a Least Squares minimization one. This is accomplished through the definition of a measure on the data space that takes into account directional dependences of errors, and the use of polar descriptors for straight lines. This strategy improves the robustness by avoiding singularities and non-describable lines. The methodology is powerful enough to deal with non-normal bivariate heteroscedastic data error models, but can also supersede classical regression methods by making some particular assumptions. An implementation of the methodology for the normal bivariate case is developed and evaluated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the question of maximizing classifier accuracy for classifying task-related mental activity from Magnetoencelophalography (MEG) data. We propose the use of different sources of information and introduce an automatic channel selection procedure. To determine an informative set of channels, our approach combines a variety of machine learning algorithms: feature subset selection methods, classifiers based on regularized logistic regression, information fusion, and multiobjective optimization based on probabilistic modeling of the search space. The experimental results show that our proposal is able to improve classification accuracy compared to approaches whose classifiers use only one type of MEG information or for which the set of channels is fixed a priori.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aplicación de simulación de Monte Carlo y técnicas de Análisis de la Varianza (ANOVA) a la comparación de modelos estocásticos dinámicos para accidentes de tráfico.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fractal and multifractal are concepts that have grown increasingly popular in recent years in the soil analysis, along with the development of fractal models. One of the common steps is to calculate the slope of a linear fit commonly using least squares method. This shouldn?t be a special problem, however, in many situations using experimental data the researcher has to select the range of scales at which is going to work neglecting the rest of points to achieve the best linearity that in this type of analysis is necessary. Robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and non-parametric methods. In this method we don?t have to assume that the outlier point is simply an extreme observation drawn from the tail of a normal distribution not compromising the validity of the regression results. In this work we have evaluated the capacity of robust regression to select the points in the experimental data used trying to avoid subjective choices. Based on this analysis we have developed a new work methodology that implies two basic steps: ? Evaluation of the improvement of linear fitting when consecutive points are eliminated based on R pvalue. In this way we consider the implications of reducing the number of points. ? Evaluation of the significance of slope difference between fitting with the two extremes points and fitted with the available points. We compare the results applying this methodology and the common used least squares one. The data selected for these comparisons are coming from experimental soil roughness transect and simulated based on middle point displacement method adding tendencies and noise. The results are discussed indicating the advantages and disadvantages of each methodology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, multiple regression analysis is used to model the top of descent (TOD) location of user-preferred descent trajectories computed by the flight management system (FMS) on over 1000 commercial flights into Melbourne, Australia. In addition to recording TOD, the cruise altitude, final altitude, cruise Mach, descent speed, wind, and engine type were also identified for use as the independent variables in the regression analysis. Both first-order and second-order models are considered, where cross-validation, hypothesis testing, and additional analysis are used to compare models. This identifies the models that should give the smallest errors if used to predict TOD location for new data in the future. A model that is linear in TOD altitude, final altitude, descent speed, and wind gives an estimated standard deviation of 3.9 nmi for TOD location given the trajectory parame- ters, which means about 80% of predictions would have error less than 5 nmi in absolute value. This accuracy is better than demonstrated by other ground automation predictions using kinetic models. Furthermore, this approach would enable online learning of the model. Additional data or further knowledge of algorithms is necessary to conclude definitively that no second-order terms are appropriate. Possible applications of the linear model are described, including enabling arriving aircraft to fly optimized descents computed by the FMS even in congested airspace.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a model of Bayesian network for continuous variables, where densities and conditional densities are estimated with B-spline MoPs. We use a novel approach to directly obtain conditional densities estimation using B-spline properties. In particular we implement naive Bayes and wrapper variables selection. Finally we apply our techniques to the problem of predicting neurons morphological variables from electrophysiological ones.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Los accidentes del tráfico son un fenómeno social muy relevantes y una de las principales causas de mortalidad en los países desarrollados. Para entender este fenómeno complejo se aplican modelos econométricos sofisticados tanto en la literatura académica como por las administraciones públicas. Esta tesis está dedicada al análisis de modelos macroscópicos para los accidentes del tráfico en España. El objetivo de esta tesis se puede dividir en dos bloques: a. Obtener una mejor comprensión del fenómeno de accidentes de trafico mediante la aplicación y comparación de dos modelos macroscópicos utilizados frecuentemente en este área: DRAG y UCM, con la aplicación a los accidentes con implicación de furgonetas en España durante el período 2000-2009. Los análisis se llevaron a cabo con enfoque frecuencista y mediante los programas TRIO, SAS y TRAMO/SEATS. b. La aplicación de modelos y la selección de las variables más relevantes, son temas actuales de investigación y en esta tesis se ha desarrollado y aplicado una metodología que pretende mejorar, mediante herramientas teóricas y prácticas, el entendimiento de selección y comparación de los modelos macroscópicos. Se han desarrollado metodologías tanto para selección como para comparación de modelos. La metodología de selección de modelos se ha aplicado a los accidentes mortales ocurridos en la red viaria en el período 2000-2011, y la propuesta metodológica de comparación de modelos macroscópicos se ha aplicado a la frecuencia y la severidad de los accidentes con implicación de furgonetas en el período 2000-2009. Como resultado de los desarrollos anteriores se resaltan las siguientes contribuciones: a. Profundización de los modelos a través de interpretación de las variables respuesta y poder de predicción de los modelos. El conocimiento sobre el comportamiento de los accidentes con implicación de furgonetas se ha ampliado en este proceso. bl. Desarrollo de una metodología para selección de variables relevantes para la explicación de la ocurrencia de accidentes de tráfico. Teniendo en cuenta los resultados de a) la propuesta metodológica se basa en los modelos DRAG, cuyos parámetros se han estimado con enfoque bayesiano y se han aplicado a los datos de accidentes mortales entre los años 2000-2011 en España. Esta metodología novedosa y original se ha comparado con modelos de regresión dinámica (DR), que son los modelos más comunes para el trabajo con procesos estocásticos. Los resultados son comparables, y con la nueva propuesta se realiza una aportación metodológica que optimiza el proceso de selección de modelos, con escaso coste computacional. b2. En la tesis se ha diseñado una metodología de comparación teórica entre los modelos competidores mediante la aplicación conjunta de simulación Monte Cario, diseño de experimentos y análisis de la varianza ANOVA. Los modelos competidores tienen diferentes estructuras, que afectan a la estimación de efectos de las variables explicativas. Teniendo en cuenta el estudio desarrollado en bl) este desarrollo tiene el propósito de determinar como interpretar la componente de tendencia estocástica que un modelo UCM modela explícitamente, a través de un modelo DRAG, que no tiene un método específico para modelar este elemento. Los resultados de este estudio son importantes para ver si la serie necesita ser diferenciada antes de modelar. b3. Se han desarrollado nuevos algoritmos para realizar los ejercicios metodológicos, implementados en diferentes programas como R, WinBUGS, y MATLAB. El cumplimiento de los objetivos de la tesis a través de los desarrollos antes enunciados se remarcan en las siguientes conclusiones: 1. El fenómeno de accidentes del tráfico se ha analizado mediante dos modelos macroscópicos. Los efectos de los factores de influencia son diferentes dependiendo de la metodología aplicada. Los resultados de predicción son similares aunque con ligera superioridad de la metodología DRAG. 2. La metodología para selección de variables y modelos proporciona resultados prácticos en cuanto a la explicación de los accidentes de tráfico. La predicción y la interpretación también se han mejorado mediante esta nueva metodología. 3. Se ha implementado una metodología para profundizar en el conocimiento de la relación entre las estimaciones de los efectos de dos modelos competidores como DRAG y UCM. Un aspecto muy importante en este tema es la interpretación de la tendencia mediante dos modelos diferentes de la que se ha obtenido información muy útil para los investigadores en el campo del modelado. Los resultados han proporcionado una ampliación satisfactoria del conocimiento en torno al proceso de modelado y comprensión de los accidentes con implicación de furgonetas y accidentes mortales totales en España. ABSTRACT Road accidents are a very relevant social phenomenon and one of the main causes of death in industrialized countries. Sophisticated econometric models are applied in academic work and by the administrations for a better understanding of this very complex phenomenon. This thesis is thus devoted to the analysis of macro models for road accidents with application to the Spanish case. The objectives of the thesis may be divided in two blocks: a. To achieve a better understanding of the road accident phenomenon by means of the application and comparison of two of the most frequently used macro modelings: DRAG (demand for road use, accidents and their gravity) and UCM (unobserved components model); the application was made to van involved accident data in Spain in the period 2000-2009. The analysis has been carried out within the frequentist framework and using available state of the art software, TRIO, SAS and TRAMO/SEATS. b. Concern on the application of the models and on the relevant input variables to be included in the model has driven the research to try to improve, by theoretical and practical means, the understanding on methodological choice and model selection procedures. The theoretical developments have been applied to fatal accidents during the period 2000-2011 and van-involved road accidents in 2000-2009. This has resulted in the following contributions: a. Insight on the models has been gained through interpretation of the effect of the input variables on the response and prediction accuracy of both models. The behavior of van-involved road accidents has been explained during this process. b1. Development of an input variable selection procedure, which is crucial for an efficient choice of the inputs. Following the results of a) the procedure uses the DRAG-like model. The estimation is carried out within the Bayesian framework. The procedure has been applied for the total road accident data in Spain in the period 2000-2011. The results of the model selection procedure are compared and validated through a dynamic regression model given that the original data has a stochastic trend. b2. A methodology for theoretical comparison between the two models through Monte Carlo simulation, computer experiment design and ANOVA. The models have a different structure and this affects the estimation of the effects of the input variables. The comparison is thus carried out in terms of the effect of the input variables on the response, which is in general different, and should be related. Considering the results of the study carried out in b1) this study tries to find out how a stochastic time trend will be captured in DRAG model, since there is no specific trend component in DRAG. Given the results of b1) the findings of this study are crucial in order to see if the estimation of data with stochastic component through DRAG will be valid or whether the data need a certain adjustment (typically differencing) prior to the estimation. The model comparison methodology was applied to the UCM and DRAG models, considering that, as mentioned above, the UCM has a specific trend term while DRAG does not. b3. New algorithms were developed for carrying out the methodological exercises. For this purpose different softwares, R, WinBUGs and MATLAB were used. These objectives and contributions have been resulted in the following findings: 1. The road accident phenomenon has been analyzed by means of two macro models: The effects of the influential input variables may be estimated through the models, but it has been observed that the estimates vary from one model to the other, although prediction accuracy is similar, with a slight superiority of the DRAG methodology. 2. The variable selection methodology provides very practical results, as far as the explanation of road accidents is concerned. Prediction accuracy and interpretability have been improved by means of a more efficient input variable and model selection procedure. 3. Insight has been gained on the relationship between the estimates of the effects using the two models. A very relevant issue here is the role of trend in both models, relevant recommendations for the analyst have resulted from here. The results have provided a very satisfactory insight into both modeling aspects and the understanding of both van-involved and total fatal accidents behavior in Spain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work proposes an automatic methodology for modeling complex systems. Our methodology is based on the combination of Grammatical Evolution and classical regression to obtain an optimal set of features that take part of a linear and convex model. This technique provides both Feature Engineering and Symbolic Regression in order to infer accurate models with no effort or designer's expertise requirements. As advanced Cloud services are becoming mainstream, the contribution of data centers in the overall power consumption of modern cities is growing dramatically. These facilities consume from 10 to 100 times more power per square foot than typical office buildings. Modeling the power consumption for these infrastructures is crucial to anticipate the effects of aggressive optimization policies, but accurate and fast power modeling is a complex challenge for high-end servers not yet satisfied by analytical approaches. For this case study, our methodology minimizes error in power prediction. This work has been tested using real Cloud applications resulting on an average error in power estimation of 3.98%. Our work improves the possibilities of deriving Cloud energy efficient policies in Cloud data centers being applicable to other computing environments with similar characteristics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Predicting failures in a distributed system based on previous events through logistic regression is a standard approach in literature. This technique is not reliable, though, in two situations: in the prediction of rare events, which do not appear in enough proportion for the algorithm to capture, and in environments where there are too many variables, as logistic regression tends to overfit on this situations; while manually selecting a subset of variables to create the model is error- prone. On this paper, we solve an industrial research case that presented this situation with a combination of elastic net logistic regression, a method that allows us to automatically select useful variables, a process of cross-validation on top of it and the application of a rare events prediction technique to reduce computation time. This process provides two layers of cross- validation that automatically obtain the optimal model complexity and the optimal mode l parameters values, while ensuring even rare events will be correctly predicted with a low amount of training instances. We tested this method against real industrial data, obtaining a total of 60 out of 80 possible models with a 90% average model accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Social behavior is mainly based on swarm colonies, in which each individual shares its knowledge about the environment with other individuals to get optimal solutions. Such co-operative model differs from competitive models in the way that individuals die and are born by combining information of alive ones. This paper presents the particle swarm optimization with differential evolution algorithm in order to train a neural network instead the classic back propagation algorithm. The performance of a neural network for particular problems is critically dependant on the choice of the processing elements, the net architecture and the learning algorithm. This work is focused in the development of methods for the evolutionary design of artificial neural networks. This paper focuses in optimizing the topology and structure of connectivity for these networks

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The initial step in most facial age estimation systems consists of accurately aligning a model to the output of a face detector (e.g. an Active Appearance Model). This fitting process is very expensive in terms of computational resources and prone to get stuck in local minima. This makes it impractical for analysing faces in resource limited computing devices. In this paper we build a face age regressor that is able to work directly on faces cropped using a state-of-the-art face detector. Our procedure uses K nearest neighbours (K-NN) regression with a metric based on a properly tuned Fisher Linear Discriminant Analysis (LDA) projection matrix. On FG-NET we achieve a state-of-the-art Mean Absolute Error (MAE) of 5.72 years with manually aligned faces. Using face images cropped by a face detector we get a MAE of 6.87 years in the same database. Moreover, most of the algorithms presented in the literature have been evaluated on single database experiments and therefore, they report optimistically biased results. In our cross-database experiments we get a MAE of roughly 12 years, which would be the expected performance in a real world application.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of multi-output regression. This paper provides a survey on state-of-the-art multi-output regression methods, that are categorized as problem transformation and algorithm adaptation methods. In addition, we present the mostly used performance evaluation measures, publicly available data sets for multi-output regression real-world problems, as well as open-source software frameworks.