12 resultados para VARIABLE SELECTION

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the question of maximizing classifier accuracy for classifying task-related mental activity from Magnetoencelophalography (MEG) data. We propose the use of different sources of information and introduce an automatic channel selection procedure. To determine an informative set of channels, our approach combines a variety of machine learning algorithms: feature subset selection methods, classifiers based on regularized logistic regression, information fusion, and multiobjective optimization based on probabilistic modeling of the search space. The experimental results show that our proposal is able to improve classification accuracy compared to approaches whose classifiers use only one type of MEG information or for which the set of channels is fixed a priori.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This research proposes a generic methodology for dimensionality reduction upon time-frequency representations applied to the classification of different types of biosignals. The methodology directly deals with the highly redundant and irrelevant data contained in these representations, combining a first stage of irrelevant data removal by variable selection, with a second stage of redundancy reduction using methods based on linear transformations. The study addresses two techniques that provided a similar performance: the first one is based on the selection of a set of the most relevant time?frequency points, whereas the second one selects the most relevant frequency bands. The first methodology needs a lower quantity of components, leading to a lower feature space; but the second improves the capture of the time-varying dynamics of the signal, and therefore provides a more stable performance. In order to evaluate the generalization capabilities of the methodology proposed it has been applied to two types of biosignals with different kinds of non-stationary behaviors: electroencephalographic and phonocardiographic biosignals. Even when these two databases contain samples with different degrees of complexity and a wide variety of characterizing patterns, the results demonstrate a good accuracy for the detection of pathologies, over 98%.The results open the possibility to extrapolate the methodology to the study of other biosignals.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Using the Bayesian approach as the model selection criteria, the main purpose in this study is to establish a practical road accident model that can provide a better interpretation and prediction performance. For this purpose we are using a structural explanatory model with autoregressive error term. The model estimation is carried out through Bayesian inference and the best model is selected based on the goodness of fit measures. To cross validate the model estimation further prediction analysis were done. As the road safety measures the number of fatal accidents in Spain, during 2000-2011 were employed. The results of the variable selection process show that the factors explaining fatal road accidents are mainly exposure, economic factors, and surveillance and legislative measures. The model selection shows that the impact of economic factors on fatal accidents during the period under study has been higher compared to surveillance and legislative measures.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. Methods A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60-mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables. Results After an exhaustive process of pre-processing to ensure data quality--lost values imputation, probes quality, data smoothing and intraclass variability filtering--the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples. Conclusions We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new approach to build gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling. The interactions induced by the Bayesian classifiers are based both on the expression levels and on the phenotype information of the supervised variable. Feature selection and bootstrap resampling add reliability and robustness to the overall process removing the false positive findings. The consensus among all the induced models produces a hierarchy of dependences and, thus, of variables. Biologists can define the depth level of the model hierarchy so the set of interactions and genes involved can vary from a sparse to a dense set. Experimental results show how these networks perform well on classification tasks. The biological validation matches previous biological findings and opens new hypothesis for future studies

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Probabilistic modeling is the de�ning characteristic of estimation of distribution algorithms (EDAs) which determines their behavior and performance in optimization. Regularization is a well-known statistical technique used for obtaining an improved model by reducing the generalization error of estimation, especially in high-dimensional problems. `1-regularization is a type of this technique with the appealing variable selection property which results in sparse model estimations. In this thesis, we study the use of regularization techniques for model learning in EDAs. Several methods for regularized model estimation in continuous domains based on a Gaussian distribution assumption are presented, and analyzed from di�erent aspects when used for optimization in a high-dimensional setting, where the population size of EDA has a logarithmic scale with respect to the number of variables. The optimization results obtained for a number of continuous problems with an increasing number of variables show that the proposed EDA based on regularized model estimation performs a more robust optimization, and is able to achieve signi�cantly better results for larger dimensions than other Gaussian-based EDAs. We also propose a method for learning a marginally factorized Gaussian Markov random �eld model using regularization techniques and a clustering algorithm. The experimental results show notable optimization performance on continuous additively decomposable problems when using this model estimation method. Our study also covers multi-objective optimization and we propose joint probabilistic modeling of variables and objectives in EDAs based on Bayesian networks, speci�cally models inspired from multi-dimensional Bayesian network classi�ers. It is shown that with this approach to modeling, two new types of relationships are encoded in the estimated models in addition to the variable relationships captured in other EDAs: objectivevariable and objective-objective relationships. An extensive experimental study shows the e�ectiveness of this approach for multi- and many-objective optimization. With the proposed joint variable-objective modeling, in addition to the Pareto set approximation, the algorithm is also able to obtain an estimation of the multi-objective problem structure. Finally, the study of multi-objective optimization based on joint probabilistic modeling is extended to noisy domains, where the noise in objective values is represented by intervals. A new version of the Pareto dominance relation for ordering the solutions in these problems, namely �-degree Pareto dominance, is introduced and its properties are analyzed. We show that the ranking methods based on this dominance relation can result in competitive performance of EDAs with respect to the quality of the approximated Pareto sets. This dominance relation is then used together with a method for joint probabilistic modeling based on `1-regularization for multi-objective feature subset selection in classi�cation, where six di�erent measures of accuracy are considered as objectives with interval values. The individual assessment of the proposed joint probabilistic modeling and solution ranking methods on datasets with small-medium dimensionality, when using two di�erent Bayesian classi�ers, shows that comparable or better Pareto sets of feature subsets are approximated in comparison to standard methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background:Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. Methods: A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60-mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables. Results: After an exhaustive process of pre-processing to ensure data quality--lost values imputation, probes quality, data smoothing and intraclass variability filtering--the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples. Conclusions: We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El comercio electrónico ha experimentado un fuerte crecimiento en los últimos años, favorecido especialmente por el aumento de las tasas de penetración de Internet en todo el mundo. Sin embargo, no todos los países están evolucionando de la misma manera, con un espectro que va desde las naciones pioneras en desarrollo de tecnologías de la información y comunicaciones, que cuentan con una elevado porcentaje de internautas y de compradores online, hasta las rezagadas de rápida adopción en las que, pese a contar con una menor penetración de acceso, presentan una alta tasa de internautas compradores. Entre ambos extremos se encuentran países como España que, aunque alcanzó hace años una tasa considerable de penetración de usuarios de Internet, no ha conseguido una buena tasa de transformación de internautas en compradores. Pese a que el comercio electrónico ha experimentado importantes aumentos en los últimos años, sus tasas de crecimiento siguen estando por debajo de países con características socio-económicas similares. Para intentar conocer las razones que afectan a la adopción del comercio por parte de los compradores, la investigación científica del fenómeno ha empleado diferentes enfoques teóricos. De entre todos ellos ha destacado el uso de los modelos de adopción, proveniente de la literatura de adopción de sistemas de información en entornos organizativos. Estos modelos se basan en las percepciones de los compradores para determinar qué factores pueden predecir mejor la intención de compra y, en consecuencia, la conducta real de compra de los usuarios. Pese a que en los últimos años han proliferado los trabajos de investigación que aplican los modelos de adopción al comercio electrónico, casi todos tratan de validar sus hipótesis mediante el análisis de muestras de consumidores tratadas como un único conjunto, y del que se obtienen conclusiones generales. Sin embargo, desde el origen del marketing, y en especial a partir de la segunda mitad del siglo XIX, se considera que existen diferencias en el comportamiento de los consumidores, que pueden ser debidas a características demográficas, sociológicas o psicológicas. Estas diferencias se traducen en necesidades distintas, que sólo podrán ser satisfechas con una oferta adaptada por parte de los vendedores. Además, por contar el comercio electrónico con unas características particulares que lo diferencian del comercio tradicional –especialmente por la falta de contacto físico entre el comprador y el producto– a las diferencias en la adopción para cada consumidor se le añaden las diferencias derivadas del tipo de producto adquirido, que si bien habían sido consideradas en el canal físico, en el comercio electrónico cobran especial relevancia. A la vista de todo ello, el presente trabajo pretende abordar el estudio de los factores determinantes de la intención de compra y la conducta real de compra en comercio electrónico por parte del consumidor final español, teniendo en cuenta el tipo de segmento al que pertenezca dicho comprador y el tipo de producto considerado. Para ello, el trabajo contiene ocho apartados entre los que se encuentran cuatro bloques teóricos y tres bloques empíricos, además de las conclusiones. Estos bloques dan lugar a los siguientes ocho capítulos por orden de aparición en el trabajo: introducción, situación del comercio electrónico, modelos de adopción de tecnología, segmentación en comercio electrónico, diseño previo del trabajo empírico, diseño de la investigación, análisis de los resultados y conclusiones. El capítulo introductorio justifica la relevancia de la investigación, además de fijar los objetivos, la metodología y las fases seguidas para el desarrollo del trabajo. La justificación se complementa con el segundo capítulo, que cuenta con dos elementos principales: en primer lugar se define el concepto de comercio electrónico y se hace una breve retrospectiva desde sus orígenes hasta la situación actual en un contexto global; en segundo lugar, el análisis estudia la evolución del comercio electrónico en España, mostrando su desarrollo y situación presente a partir de sus principales indicadores. Este apartado no sólo permite conocer el contexto de la investigación, sino que además permite contrastar la relevancia de la muestra utilizada en el presente estudio con el perfil español respecto al comercio electrónico. Los capítulos tercero –modelos de adopción de tecnologías– y cuarto –segmentación en comercio electrónico– sientan las bases teóricas necesarias para abordar el estudio. En el capítulo tres se hace una revisión general de la literatura de modelos de adopción de tecnología y, en particular, de los modelos de adopción empleados en el ámbito del comercio electrónico. El resultado de dicha revisión deriva en la construcción de un modelo adaptado basado en los modelos UTAUT (Unified Theory of Acceptance and Use of Technology, Teoría unificada de la aceptación y el uso de la tecnología) y UTAUT2, combinado con dos factores específicos de adopción del comercio electrónico: el riesgo percibido y la confianza percibida. Por su parte, en el capítulo cuatro se revisan las metodologías de segmentación de clientes y productos empleadas en la literatura. De dicha revisión se obtienen un amplio conjunto de variables de las que finalmente se escogen nueve variables de clasificación que se consideran adecuadas tanto por su adaptación al contexto del comercio electrónico como por su adecuación a las características de la muestra empleada para validar el modelo. Las nueve variables se agrupan en tres conjuntos: variables de tipo socio-demográfico –género, edad, nivel de estudios, nivel de ingresos, tamaño de la unidad familiar y estado civil–, de comportamiento de compra – experiencia de compra por Internet y frecuencia de compra por Internet– y de tipo psicográfico –motivaciones de compra por Internet. La segunda parte del capítulo cuatro se dedica a la revisión de los criterios empleados en la literatura para la clasificación de los productos en el contexto del comercio electrónico. De dicha revisión se obtienen quince grupos de variables que pueden tomar un total de treinta y cuatro valores, lo que deriva en un elevado número de combinaciones posibles. Sin embargo, pese a haber sido utilizados en el contexto del comercio electrónico, no en todos los casos se ha comprobado la influencia de dichas variables respecto a la intención de compra o la conducta real de compra por Internet; por este motivo, y con el objetivo de definir una clasificación robusta y abordable de tipos de productos, en el capitulo cinco se lleva a cabo una validación de las variables de clasificación de productos mediante un experimento previo con 207 muestras. Seleccionando sólo aquellas variables objetivas que no dependan de la interpretación personal del consumidores y que determinen grupos significativamente distintos respecto a la intención y conducta de compra de los consumidores, se obtiene un modelo de dos variables que combinadas dan lugar a cuatro tipos de productos: bien digital, bien no digital, servicio digital y servicio no digital. Definidos el modelo de adopción y los criterios de segmentación de consumidores y productos, en el sexto capítulo se desarrolla el modelo completo de investigación formado por un conjunto de hipótesis obtenidas de la revisión de la literatura de los capítulos anteriores, en las que se definen las hipótesis de investigación con respecto a las influencias esperadas de las variables de segmentación sobre las relaciones del modelo de adopción. Este modelo confiere a la investigación un carácter social y de tipo fundamentalmente exploratorio, en el que en muchos casos ni siquiera se han encontrado evidencias empíricas previas que permitan el enunciado de hipótesis sobre la influencia de determinadas variables de segmentación. El capítulo seis contiene además la descripción del instrumento de medida empleado en la investigación, conformado por un total de 125 preguntas y sus correspondientes escalas de medida, así como la descripción de la muestra representativa empleada en la validación del modelo, compuesta por un grupo de 817 personas españolas o residentes en España. El capítulo siete constituye el núcleo del análisis empírico del trabajo de investigación, que se compone de dos elementos fundamentales. Primeramente se describen las técnicas estadísticas aplicadas para el estudio de los datos que, dada la complejidad del análisis, se dividen en tres grupos fundamentales: Método de mínimos cuadrados parciales (PLS, Partial Least Squares): herramienta estadística de análisis multivariante con capacidad de análisis predictivo que se emplea en la determinación de las relaciones estructurales de los modelos propuestos. Análisis multigrupo: conjunto de técnicas que permiten comparar los resultados obtenidos con el método PLS entre dos o más grupos derivados del uso de una o más variables de segmentación. En este caso se emplean cinco métodos de comparación, lo que permite asimismo comparar los rendimientos de cada uno de los métodos. Determinación de segmentos no identificados a priori: en el caso de algunas de las variables de segmentación no existe un criterio de clasificación definido a priori, sino que se obtiene a partir de la aplicación de técnicas estadísticas de clasificación. En este caso se emplean dos técnicas fundamentales: análisis de componentes principales –dado el elevado número de variables empleadas para la clasificación– y análisis clúster –del que se combina una técnica jerárquica que calcula el número óptimo de segmentos, con una técnica por etapas que es más eficiente en la clasificación, pero exige conocer el número de clústeres a priori. La aplicación de dichas técnicas estadísticas sobre los modelos resultantes de considerar los distintos criterios de segmentación, tanto de clientes como de productos, da lugar al análisis de un total de 128 modelos de adopción de comercio electrónico y 65 comparaciones multigrupo, cuyos resultados y principales consideraciones son elaboradas a lo largo del capítulo. Para concluir, el capítulo ocho recoge las conclusiones del trabajo divididas en cuatro partes diferenciadas. En primer lugar se examina el grado de alcance de los objetivos planteados al inicio de la investigación; después se desarrollan las principales contribuciones que este trabajo aporta tanto desde el punto de vista metodológico, como desde los punto de vista teórico y práctico; en tercer lugar, se profundiza en las conclusiones derivadas del estudio empírico, que se clasifican según los criterios de segmentación empleados, y que combinan resultados confirmatorios y exploratorios; por último, el trabajo recopila las principales limitaciones de la investigación, tanto de carácter teórico como empírico, así como aquellos aspectos que no habiendo podido plantearse dentro del contexto de este estudio, o como consecuencia de los resultados alcanzados, se presentan como líneas futuras de investigación. ABSTRACT Favoured by an increase of Internet penetration rates across the globe, electronic commerce has experienced a rapid growth over the last few years. Nevertheless, adoption of electronic commerce has differed from one country to another. On one hand, it has been observed that countries leading e-commerce adoption have a large percentage of Internet users as well as of online purchasers; on the other hand, other markets, despite having a low percentage of Internet users, show a high percentage of online buyers. Halfway between those two ends of the spectrum, we find countries such as Spain which, despite having moderately high Internet penetration rates and similar socio-economic characteristics as some of the leading countries, have failed to turn Internet users into active online buyers. Several theoretical approaches have been taken in an attempt to define the factors that influence the use of electronic commerce systems by customers. One of the betterknown frameworks to characterize adoption factors is the acceptance modelling theory, which is derived from the information systems adoption in organizational environments. These models are based on individual perceptions on which factors determine purchase intention, as a mean to explain users’ actual purchasing behaviour. Even though research on electronic commerce adoption models has increased in terms of volume and scope over the last years, the majority of studies validate their hypothesis by using a single sample of consumers from which they obtain general conclusions. Nevertheless, since the birth of marketing, and more specifically from the second half of the 19th century, differences in consumer behaviour owing to demographic, sociologic and psychological characteristics have also been taken into account. And such differences are generally translated into different needs that can only be satisfied when sellers adapt their offer to their target market. Electronic commerce has a number of features that makes it different when compared to traditional commerce; the best example of this is the lack of physical contact between customers and products, and between customers and vendors. Other than that, some differences that depend on the type of product may also play an important role in electronic commerce. From all the above, the present research aims to address the study of the main factors influencing purchase intention and actual purchase behaviour in electronic commerce by Spanish end-consumers, taking into consideration both the customer group to which they belong and the type of product being purchased. In order to achieve this goal, this Thesis is structured in eight chapters: four theoretical sections, three empirical blocks and a final section summarizing the conclusions derived from the research. The chapters are arranged in sequence as follows: introduction, current state of electronic commerce, technology adoption models, electronic commerce segmentation, preliminary design of the empirical work, research design, data analysis and results, and conclusions. The introductory chapter offers a detailed justification of the relevance of this study in the context of e-commerce adoption research; it also sets out the objectives, methodology and research stages. The second chapter further expands and complements the introductory chapter, focusing on two elements: the concept of electronic commerce and its evolution from a general point of view, and the evolution of electronic commerce in Spain and main indicators of adoption. This section is intended to allow the reader to understand the research context, and also to serve as a basis to justify the relevance and representativeness of the sample used in this study. Chapters three (technology acceptance models) and four (segmentation in electronic commerce) set the theoretical foundations for the study. Chapter 3 presents a thorough literature review of technology adoption modelling, focusing on previous studies on electronic commerce acceptance. As a result of the literature review, the research framework is built upon a model based on UTAUT (Unified Theory of Acceptance and Use of Technology) and its evolution, UTAUT2, including two specific electronic commerce adoption factors: perceived risk and perceived trust. Chapter 4 deals with client and product segmentation methodologies used by experts. From the literature review, a wide range of classification variables is studied, and a shortlist of nine classification variables has been selected for inclusion in the research. The criteria for variable selection were their adequacy to electronic commerce characteristics, as well as adequacy to the sample characteristics. The nine variables have been classified in three groups: socio-demographic (gender, age, education level, income, family size and relationship status), behavioural (experience in electronic commerce and frequency of purchase) and psychographic (online purchase motivations) variables. The second half of chapter 4 is devoted to a review of the product classification criteria in electronic commerce. The review has led to the identification of a final set of fifteen groups of variables, whose combination offered a total of thirty-four possible outputs. However, due to the lack of empirical evidence in the context of electronic commerce, further investigation on the validity of this set of product classifications was deemed necessary. For this reason, chapter 5 proposes an empirical study to test the different product classification variables with 207 samples. A selection of product classifications including only those variables that are objective, able to identify distinct groups and not dependent on consumers’ point of view, led to a final classification of products which consisted on two groups of variables for the final empirical study. The combination of these two groups gave rise to four types of products: digital and non-digital goods, and digital and non-digital services. Chapter six characterizes the research –social, exploratory research– and presents the final research model and research hypotheses. The exploratory nature of the research becomes patent in instances where no prior empirical evidence on the influence of certain segmentation variables was found. Chapter six also includes the description of the measurement instrument used in the research, consisting of a total of 125 questions –and the measurement scales associated to each of them– as well as the description of the sample used for model validation (consisting of 817 Spanish residents). Chapter 7 is the core of the empirical analysis performed to validate the research model, and it is divided into two separate parts: description of the statistical techniques used for data analysis, and actual data analysis and results. The first part is structured in three different blocks: Partial Least Squares Method (PLS): the multi-variable analysis is a statistical method used to determine structural relationships of models and their predictive validity; Multi-group analysis: a set of techniques that allow comparing the outcomes of PLS analysis between two or more groups, by using one or more segmentation variables. More specifically, five comparison methods were used, which additionally gives the opportunity to assess the efficiency of each method. Determination of a priori undefined segments: in some cases, classification criteria did not necessarily exist for some segmentation variables, such as customer motivations. In these cases, the application of statistical classification techniques is required. For this study, two main classification techniques were used sequentially: principal component factor analysis –in order to reduce the number of variables– and cluster analysis. The application of the statistical methods to the models derived from the inclusion of the various segmentation criteria –for both clients and products–, led to the analysis of 128 different electronic commerce adoption models and 65 multi group comparisons. Finally, chapter 8 summarizes the conclusions from the research, divided into four parts: first, an assessment of the degree of achievement of the different research objectives is offered; then, methodological, theoretical and practical implications of the research are drawn; this is followed by a discussion on the results from the empirical study –based on the segmentation criteria for the research–; fourth, and last, the main limitations of the research –both empirical and theoretical– as well as future avenues of research are detailed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Los accidentes del tráfico son un fenómeno social muy relevantes y una de las principales causas de mortalidad en los países desarrollados. Para entender este fenómeno complejo se aplican modelos econométricos sofisticados tanto en la literatura académica como por las administraciones públicas. Esta tesis está dedicada al análisis de modelos macroscópicos para los accidentes del tráfico en España. El objetivo de esta tesis se puede dividir en dos bloques: a. Obtener una mejor comprensión del fenómeno de accidentes de trafico mediante la aplicación y comparación de dos modelos macroscópicos utilizados frecuentemente en este área: DRAG y UCM, con la aplicación a los accidentes con implicación de furgonetas en España durante el período 2000-2009. Los análisis se llevaron a cabo con enfoque frecuencista y mediante los programas TRIO, SAS y TRAMO/SEATS. b. La aplicación de modelos y la selección de las variables más relevantes, son temas actuales de investigación y en esta tesis se ha desarrollado y aplicado una metodología que pretende mejorar, mediante herramientas teóricas y prácticas, el entendimiento de selección y comparación de los modelos macroscópicos. Se han desarrollado metodologías tanto para selección como para comparación de modelos. La metodología de selección de modelos se ha aplicado a los accidentes mortales ocurridos en la red viaria en el período 2000-2011, y la propuesta metodológica de comparación de modelos macroscópicos se ha aplicado a la frecuencia y la severidad de los accidentes con implicación de furgonetas en el período 2000-2009. Como resultado de los desarrollos anteriores se resaltan las siguientes contribuciones: a. Profundización de los modelos a través de interpretación de las variables respuesta y poder de predicción de los modelos. El conocimiento sobre el comportamiento de los accidentes con implicación de furgonetas se ha ampliado en este proceso. bl. Desarrollo de una metodología para selección de variables relevantes para la explicación de la ocurrencia de accidentes de tráfico. Teniendo en cuenta los resultados de a) la propuesta metodológica se basa en los modelos DRAG, cuyos parámetros se han estimado con enfoque bayesiano y se han aplicado a los datos de accidentes mortales entre los años 2000-2011 en España. Esta metodología novedosa y original se ha comparado con modelos de regresión dinámica (DR), que son los modelos más comunes para el trabajo con procesos estocásticos. Los resultados son comparables, y con la nueva propuesta se realiza una aportación metodológica que optimiza el proceso de selección de modelos, con escaso coste computacional. b2. En la tesis se ha diseñado una metodología de comparación teórica entre los modelos competidores mediante la aplicación conjunta de simulación Monte Cario, diseño de experimentos y análisis de la varianza ANOVA. Los modelos competidores tienen diferentes estructuras, que afectan a la estimación de efectos de las variables explicativas. Teniendo en cuenta el estudio desarrollado en bl) este desarrollo tiene el propósito de determinar como interpretar la componente de tendencia estocástica que un modelo UCM modela explícitamente, a través de un modelo DRAG, que no tiene un método específico para modelar este elemento. Los resultados de este estudio son importantes para ver si la serie necesita ser diferenciada antes de modelar. b3. Se han desarrollado nuevos algoritmos para realizar los ejercicios metodológicos, implementados en diferentes programas como R, WinBUGS, y MATLAB. El cumplimiento de los objetivos de la tesis a través de los desarrollos antes enunciados se remarcan en las siguientes conclusiones: 1. El fenómeno de accidentes del tráfico se ha analizado mediante dos modelos macroscópicos. Los efectos de los factores de influencia son diferentes dependiendo de la metodología aplicada. Los resultados de predicción son similares aunque con ligera superioridad de la metodología DRAG. 2. La metodología para selección de variables y modelos proporciona resultados prácticos en cuanto a la explicación de los accidentes de tráfico. La predicción y la interpretación también se han mejorado mediante esta nueva metodología. 3. Se ha implementado una metodología para profundizar en el conocimiento de la relación entre las estimaciones de los efectos de dos modelos competidores como DRAG y UCM. Un aspecto muy importante en este tema es la interpretación de la tendencia mediante dos modelos diferentes de la que se ha obtenido información muy útil para los investigadores en el campo del modelado. Los resultados han proporcionado una ampliación satisfactoria del conocimiento en torno al proceso de modelado y comprensión de los accidentes con implicación de furgonetas y accidentes mortales totales en España. ABSTRACT Road accidents are a very relevant social phenomenon and one of the main causes of death in industrialized countries. Sophisticated econometric models are applied in academic work and by the administrations for a better understanding of this very complex phenomenon. This thesis is thus devoted to the analysis of macro models for road accidents with application to the Spanish case. The objectives of the thesis may be divided in two blocks: a. To achieve a better understanding of the road accident phenomenon by means of the application and comparison of two of the most frequently used macro modelings: DRAG (demand for road use, accidents and their gravity) and UCM (unobserved components model); the application was made to van involved accident data in Spain in the period 2000-2009. The analysis has been carried out within the frequentist framework and using available state of the art software, TRIO, SAS and TRAMO/SEATS. b. Concern on the application of the models and on the relevant input variables to be included in the model has driven the research to try to improve, by theoretical and practical means, the understanding on methodological choice and model selection procedures. The theoretical developments have been applied to fatal accidents during the period 2000-2011 and van-involved road accidents in 2000-2009. This has resulted in the following contributions: a. Insight on the models has been gained through interpretation of the effect of the input variables on the response and prediction accuracy of both models. The behavior of van-involved road accidents has been explained during this process. b1. Development of an input variable selection procedure, which is crucial for an efficient choice of the inputs. Following the results of a) the procedure uses the DRAG-like model. The estimation is carried out within the Bayesian framework. The procedure has been applied for the total road accident data in Spain in the period 2000-2011. The results of the model selection procedure are compared and validated through a dynamic regression model given that the original data has a stochastic trend. b2. A methodology for theoretical comparison between the two models through Monte Carlo simulation, computer experiment design and ANOVA. The models have a different structure and this affects the estimation of the effects of the input variables. The comparison is thus carried out in terms of the effect of the input variables on the response, which is in general different, and should be related. Considering the results of the study carried out in b1) this study tries to find out how a stochastic time trend will be captured in DRAG model, since there is no specific trend component in DRAG. Given the results of b1) the findings of this study are crucial in order to see if the estimation of data with stochastic component through DRAG will be valid or whether the data need a certain adjustment (typically differencing) prior to the estimation. The model comparison methodology was applied to the UCM and DRAG models, considering that, as mentioned above, the UCM has a specific trend term while DRAG does not. b3. New algorithms were developed for carrying out the methodological exercises. For this purpose different softwares, R, WinBUGs and MATLAB were used. These objectives and contributions have been resulted in the following findings: 1. The road accident phenomenon has been analyzed by means of two macro models: The effects of the influential input variables may be estimated through the models, but it has been observed that the estimates vary from one model to the other, although prediction accuracy is similar, with a slight superiority of the DRAG methodology. 2. The variable selection methodology provides very practical results, as far as the explanation of road accidents is concerned. Prediction accuracy and interpretability have been improved by means of a more efficient input variable and model selection procedure. 3. Insight has been gained on the relationship between the estimates of the effects using the two models. A very relevant issue here is the role of trend in both models, relevant recommendations for the analyst have resulted from here. The results have provided a very satisfactory insight into both modeling aspects and the understanding of both van-involved and total fatal accidents behavior in Spain.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Road accidents are a very relevant issue in many countries and macroeconomic models are very frequently applied by academia and administrations to reduce their frequency and consequences. The selection of explanatory variables and response transformation parameter within the Bayesian framework for the selection of the set of explanatory variables a TIM and 3IM (two input and three input models) procedures are proposed. The procedure also uses the DIC and pseudo -R2 goodness of fit criteria. The model to which the methodology is applied is a dynamic regression model with Box-Cox transformation (BCT) for the explanatory variables and autorgressive (AR) structure for the response. The initial set of 22 explanatory variables are identified. The effects of these factors on the fatal accident frequency in Spain, during 2000-2012, are estimated. The dependent variable is constructed considering the stochastic trend component.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a mechanism to generate virtual buildings considering designer constraints and guidelines. This mechanism is implemented as a pipeline of different Variable Neighborhood Search (VNS) optimization processes in which several subproblems are tackled (1) rooms locations, (2) connectivity graph, and (3) element placement. The core VNS algorithm includes some variants to improve its performance, such as, for example constraint handling and biased operator selection. The optimization process uses a toolkit of construction primitives implemented as "smart objects" providing basic elements such as rooms, doors, staircases and other connectors. The paper also shows experimental results of the application of different designer constraints to a wide range of buildings from small houses to a large castle with several underground levels.