19 resultados para Variable-selection Problems
em Universidad Politécnica de Madrid
Resumo:
This paper addresses the question of maximizing classifier accuracy for classifying task-related mental activity from Magnetoencelophalography (MEG) data. We propose the use of different sources of information and introduce an automatic channel selection procedure. To determine an informative set of channels, our approach combines a variety of machine learning algorithms: feature subset selection methods, classifiers based on regularized logistic regression, information fusion, and multiobjective optimization based on probabilistic modeling of the search space. The experimental results show that our proposal is able to improve classification accuracy compared to approaches whose classifiers use only one type of MEG information or for which the set of channels is fixed a priori.
Resumo:
This research proposes a generic methodology for dimensionality reduction upon time-frequency representations applied to the classification of different types of biosignals. The methodology directly deals with the highly redundant and irrelevant data contained in these representations, combining a first stage of irrelevant data removal by variable selection, with a second stage of redundancy reduction using methods based on linear transformations. The study addresses two techniques that provided a similar performance: the first one is based on the selection of a set of the most relevant time?frequency points, whereas the second one selects the most relevant frequency bands. The first methodology needs a lower quantity of components, leading to a lower feature space; but the second improves the capture of the time-varying dynamics of the signal, and therefore provides a more stable performance. In order to evaluate the generalization capabilities of the methodology proposed it has been applied to two types of biosignals with different kinds of non-stationary behaviors: electroencephalographic and phonocardiographic biosignals. Even when these two databases contain samples with different degrees of complexity and a wide variety of characterizing patterns, the results demonstrate a good accuracy for the detection of pathologies, over 98%.The results open the possibility to extrapolate the methodology to the study of other biosignals.
Resumo:
Probabilistic modeling is the de�ning characteristic of estimation of distribution algorithms (EDAs) which determines their behavior and performance in optimization. Regularization is a well-known statistical technique used for obtaining an improved model by reducing the generalization error of estimation, especially in high-dimensional problems. `1-regularization is a type of this technique with the appealing variable selection property which results in sparse model estimations. In this thesis, we study the use of regularization techniques for model learning in EDAs. Several methods for regularized model estimation in continuous domains based on a Gaussian distribution assumption are presented, and analyzed from di�erent aspects when used for optimization in a high-dimensional setting, where the population size of EDA has a logarithmic scale with respect to the number of variables. The optimization results obtained for a number of continuous problems with an increasing number of variables show that the proposed EDA based on regularized model estimation performs a more robust optimization, and is able to achieve signi�cantly better results for larger dimensions than other Gaussian-based EDAs. We also propose a method for learning a marginally factorized Gaussian Markov random �eld model using regularization techniques and a clustering algorithm. The experimental results show notable optimization performance on continuous additively decomposable problems when using this model estimation method. Our study also covers multi-objective optimization and we propose joint probabilistic modeling of variables and objectives in EDAs based on Bayesian networks, speci�cally models inspired from multi-dimensional Bayesian network classi�ers. It is shown that with this approach to modeling, two new types of relationships are encoded in the estimated models in addition to the variable relationships captured in other EDAs: objectivevariable and objective-objective relationships. An extensive experimental study shows the e�ectiveness of this approach for multi- and many-objective optimization. With the proposed joint variable-objective modeling, in addition to the Pareto set approximation, the algorithm is also able to obtain an estimation of the multi-objective problem structure. Finally, the study of multi-objective optimization based on joint probabilistic modeling is extended to noisy domains, where the noise in objective values is represented by intervals. A new version of the Pareto dominance relation for ordering the solutions in these problems, namely �-degree Pareto dominance, is introduced and its properties are analyzed. We show that the ranking methods based on this dominance relation can result in competitive performance of EDAs with respect to the quality of the approximated Pareto sets. This dominance relation is then used together with a method for joint probabilistic modeling based on `1-regularization for multi-objective feature subset selection in classi�cation, where six di�erent measures of accuracy are considered as objectives with interval values. The individual assessment of the proposed joint probabilistic modeling and solution ranking methods on datasets with small-medium dimensionality, when using two di�erent Bayesian classi�ers, shows that comparable or better Pareto sets of feature subsets are approximated in comparison to standard methods.
Resumo:
Using the Bayesian approach as the model selection criteria, the main purpose in this study is to establish a practical road accident model that can provide a better interpretation and prediction performance. For this purpose we are using a structural explanatory model with autoregressive error term. The model estimation is carried out through Bayesian inference and the best model is selected based on the goodness of fit measures. To cross validate the model estimation further prediction analysis were done. As the road safety measures the number of fatal accidents in Spain, during 2000-2011 were employed. The results of the variable selection process show that the factors explaining fatal road accidents are mainly exposure, economic factors, and surveillance and legislative measures. The model selection shows that the impact of economic factors on fatal accidents during the period under study has been higher compared to surveillance and legislative measures.
Resumo:
Background Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. Methods A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60-mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables. Results After an exhaustive process of pre-processing to ensure data quality--lost values imputation, probes quality, data smoothing and intraclass variability filtering--the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples. Conclusions We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955)
Resumo:
The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new approach to build gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling. The interactions induced by the Bayesian classifiers are based both on the expression levels and on the phenotype information of the supervised variable. Feature selection and bootstrap resampling add reliability and robustness to the overall process removing the false positive findings. The consensus among all the induced models produces a hierarchy of dependences and, thus, of variables. Biologists can define the depth level of the model hierarchy so the set of interactions and genes involved can vary from a sparse to a dense set. Experimental results show how these networks perform well on classification tasks. The biological validation matches previous biological findings and opens new hypothesis for future studies
Resumo:
Background:Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. Methods: A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60-mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables. Results: After an exhaustive process of pre-processing to ensure data quality--lost values imputation, probes quality, data smoothing and intraclass variability filtering--the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples. Conclusions: We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955).
Resumo:
El comercio electrónico ha experimentado un fuerte crecimiento en los últimos años, favorecido especialmente por el aumento de las tasas de penetración de Internet en todo el mundo. Sin embargo, no todos los países están evolucionando de la misma manera, con un espectro que va desde las naciones pioneras en desarrollo de tecnologías de la información y comunicaciones, que cuentan con una elevado porcentaje de internautas y de compradores online, hasta las rezagadas de rápida adopción en las que, pese a contar con una menor penetración de acceso, presentan una alta tasa de internautas compradores. Entre ambos extremos se encuentran países como España que, aunque alcanzó hace años una tasa considerable de penetración de usuarios de Internet, no ha conseguido una buena tasa de transformación de internautas en compradores. Pese a que el comercio electrónico ha experimentado importantes aumentos en los últimos años, sus tasas de crecimiento siguen estando por debajo de países con características socio-económicas similares. Para intentar conocer las razones que afectan a la adopción del comercio por parte de los compradores, la investigación científica del fenómeno ha empleado diferentes enfoques teóricos. De entre todos ellos ha destacado el uso de los modelos de adopción, proveniente de la literatura de adopción de sistemas de información en entornos organizativos. Estos modelos se basan en las percepciones de los compradores para determinar qué factores pueden predecir mejor la intención de compra y, en consecuencia, la conducta real de compra de los usuarios. Pese a que en los últimos años han proliferado los trabajos de investigación que aplican los modelos de adopción al comercio electrónico, casi todos tratan de validar sus hipótesis mediante el análisis de muestras de consumidores tratadas como un único conjunto, y del que se obtienen conclusiones generales. Sin embargo, desde el origen del marketing, y en especial a partir de la segunda mitad del siglo XIX, se considera que existen diferencias en el comportamiento de los consumidores, que pueden ser debidas a características demográficas, sociológicas o psicológicas. Estas diferencias se traducen en necesidades distintas, que sólo podrán ser satisfechas con una oferta adaptada por parte de los vendedores. Además, por contar el comercio electrónico con unas características particulares que lo diferencian del comercio tradicional –especialmente por la falta de contacto físico entre el comprador y el producto– a las diferencias en la adopción para cada consumidor se le añaden las diferencias derivadas del tipo de producto adquirido, que si bien habían sido consideradas en el canal físico, en el comercio electrónico cobran especial relevancia. A la vista de todo ello, el presente trabajo pretende abordar el estudio de los factores determinantes de la intención de compra y la conducta real de compra en comercio electrónico por parte del consumidor final español, teniendo en cuenta el tipo de segmento al que pertenezca dicho comprador y el tipo de producto considerado. Para ello, el trabajo contiene ocho apartados entre los que se encuentran cuatro bloques teóricos y tres bloques empíricos, además de las conclusiones. Estos bloques dan lugar a los siguientes ocho capítulos por orden de aparición en el trabajo: introducción, situación del comercio electrónico, modelos de adopción de tecnología, segmentación en comercio electrónico, diseño previo del trabajo empírico, diseño de la investigación, análisis de los resultados y conclusiones. El capítulo introductorio justifica la relevancia de la investigación, además de fijar los objetivos, la metodología y las fases seguidas para el desarrollo del trabajo. La justificación se complementa con el segundo capítulo, que cuenta con dos elementos principales: en primer lugar se define el concepto de comercio electrónico y se hace una breve retrospectiva desde sus orígenes hasta la situación actual en un contexto global; en segundo lugar, el análisis estudia la evolución del comercio electrónico en España, mostrando su desarrollo y situación presente a partir de sus principales indicadores. Este apartado no sólo permite conocer el contexto de la investigación, sino que además permite contrastar la relevancia de la muestra utilizada en el presente estudio con el perfil español respecto al comercio electrónico. Los capítulos tercero –modelos de adopción de tecnologías– y cuarto –segmentación en comercio electrónico– sientan las bases teóricas necesarias para abordar el estudio. En el capítulo tres se hace una revisión general de la literatura de modelos de adopción de tecnología y, en particular, de los modelos de adopción empleados en el ámbito del comercio electrónico. El resultado de dicha revisión deriva en la construcción de un modelo adaptado basado en los modelos UTAUT (Unified Theory of Acceptance and Use of Technology, Teoría unificada de la aceptación y el uso de la tecnología) y UTAUT2, combinado con dos factores específicos de adopción del comercio electrónico: el riesgo percibido y la confianza percibida. Por su parte, en el capítulo cuatro se revisan las metodologías de segmentación de clientes y productos empleadas en la literatura. De dicha revisión se obtienen un amplio conjunto de variables de las que finalmente se escogen nueve variables de clasificación que se consideran adecuadas tanto por su adaptación al contexto del comercio electrónico como por su adecuación a las características de la muestra empleada para validar el modelo. Las nueve variables se agrupan en tres conjuntos: variables de tipo socio-demográfico –género, edad, nivel de estudios, nivel de ingresos, tamaño de la unidad familiar y estado civil–, de comportamiento de compra – experiencia de compra por Internet y frecuencia de compra por Internet– y de tipo psicográfico –motivaciones de compra por Internet. La segunda parte del capítulo cuatro se dedica a la revisión de los criterios empleados en la literatura para la clasificación de los productos en el contexto del comercio electrónico. De dicha revisión se obtienen quince grupos de variables que pueden tomar un total de treinta y cuatro valores, lo que deriva en un elevado número de combinaciones posibles. Sin embargo, pese a haber sido utilizados en el contexto del comercio electrónico, no en todos los casos se ha comprobado la influencia de dichas variables respecto a la intención de compra o la conducta real de compra por Internet; por este motivo, y con el objetivo de definir una clasificación robusta y abordable de tipos de productos, en el capitulo cinco se lleva a cabo una validación de las variables de clasificación de productos mediante un experimento previo con 207 muestras. Seleccionando sólo aquellas variables objetivas que no dependan de la interpretación personal del consumidores y que determinen grupos significativamente distintos respecto a la intención y conducta de compra de los consumidores, se obtiene un modelo de dos variables que combinadas dan lugar a cuatro tipos de productos: bien digital, bien no digital, servicio digital y servicio no digital. Definidos el modelo de adopción y los criterios de segmentación de consumidores y productos, en el sexto capítulo se desarrolla el modelo completo de investigación formado por un conjunto de hipótesis obtenidas de la revisión de la literatura de los capítulos anteriores, en las que se definen las hipótesis de investigación con respecto a las influencias esperadas de las variables de segmentación sobre las relaciones del modelo de adopción. Este modelo confiere a la investigación un carácter social y de tipo fundamentalmente exploratorio, en el que en muchos casos ni siquiera se han encontrado evidencias empíricas previas que permitan el enunciado de hipótesis sobre la influencia de determinadas variables de segmentación. El capítulo seis contiene además la descripción del instrumento de medida empleado en la investigación, conformado por un total de 125 preguntas y sus correspondientes escalas de medida, así como la descripción de la muestra representativa empleada en la validación del modelo, compuesta por un grupo de 817 personas españolas o residentes en España. El capítulo siete constituye el núcleo del análisis empírico del trabajo de investigación, que se compone de dos elementos fundamentales. Primeramente se describen las técnicas estadísticas aplicadas para el estudio de los datos que, dada la complejidad del análisis, se dividen en tres grupos fundamentales: Método de mínimos cuadrados parciales (PLS, Partial Least Squares): herramienta estadística de análisis multivariante con capacidad de análisis predictivo que se emplea en la determinación de las relaciones estructurales de los modelos propuestos. Análisis multigrupo: conjunto de técnicas que permiten comparar los resultados obtenidos con el método PLS entre dos o más grupos derivados del uso de una o más variables de segmentación. En este caso se emplean cinco métodos de comparación, lo que permite asimismo comparar los rendimientos de cada uno de los métodos. Determinación de segmentos no identificados a priori: en el caso de algunas de las variables de segmentación no existe un criterio de clasificación definido a priori, sino que se obtiene a partir de la aplicación de técnicas estadísticas de clasificación. En este caso se emplean dos técnicas fundamentales: análisis de componentes principales –dado el elevado número de variables empleadas para la clasificación– y análisis clúster –del que se combina una técnica jerárquica que calcula el número óptimo de segmentos, con una técnica por etapas que es más eficiente en la clasificación, pero exige conocer el número de clústeres a priori. La aplicación de dichas técnicas estadísticas sobre los modelos resultantes de considerar los distintos criterios de segmentación, tanto de clientes como de productos, da lugar al análisis de un total de 128 modelos de adopción de comercio electrónico y 65 comparaciones multigrupo, cuyos resultados y principales consideraciones son elaboradas a lo largo del capítulo. Para concluir, el capítulo ocho recoge las conclusiones del trabajo divididas en cuatro partes diferenciadas. En primer lugar se examina el grado de alcance de los objetivos planteados al inicio de la investigación; después se desarrollan las principales contribuciones que este trabajo aporta tanto desde el punto de vista metodológico, como desde los punto de vista teórico y práctico; en tercer lugar, se profundiza en las conclusiones derivadas del estudio empírico, que se clasifican según los criterios de segmentación empleados, y que combinan resultados confirmatorios y exploratorios; por último, el trabajo recopila las principales limitaciones de la investigación, tanto de carácter teórico como empírico, así como aquellos aspectos que no habiendo podido plantearse dentro del contexto de este estudio, o como consecuencia de los resultados alcanzados, se presentan como líneas futuras de investigación. ABSTRACT Favoured by an increase of Internet penetration rates across the globe, electronic commerce has experienced a rapid growth over the last few years. Nevertheless, adoption of electronic commerce has differed from one country to another. On one hand, it has been observed that countries leading e-commerce adoption have a large percentage of Internet users as well as of online purchasers; on the other hand, other markets, despite having a low percentage of Internet users, show a high percentage of online buyers. Halfway between those two ends of the spectrum, we find countries such as Spain which, despite having moderately high Internet penetration rates and similar socio-economic characteristics as some of the leading countries, have failed to turn Internet users into active online buyers. Several theoretical approaches have been taken in an attempt to define the factors that influence the use of electronic commerce systems by customers. One of the betterknown frameworks to characterize adoption factors is the acceptance modelling theory, which is derived from the information systems adoption in organizational environments. These models are based on individual perceptions on which factors determine purchase intention, as a mean to explain users’ actual purchasing behaviour. Even though research on electronic commerce adoption models has increased in terms of volume and scope over the last years, the majority of studies validate their hypothesis by using a single sample of consumers from which they obtain general conclusions. Nevertheless, since the birth of marketing, and more specifically from the second half of the 19th century, differences in consumer behaviour owing to demographic, sociologic and psychological characteristics have also been taken into account. And such differences are generally translated into different needs that can only be satisfied when sellers adapt their offer to their target market. Electronic commerce has a number of features that makes it different when compared to traditional commerce; the best example of this is the lack of physical contact between customers and products, and between customers and vendors. Other than that, some differences that depend on the type of product may also play an important role in electronic commerce. From all the above, the present research aims to address the study of the main factors influencing purchase intention and actual purchase behaviour in electronic commerce by Spanish end-consumers, taking into consideration both the customer group to which they belong and the type of product being purchased. In order to achieve this goal, this Thesis is structured in eight chapters: four theoretical sections, three empirical blocks and a final section summarizing the conclusions derived from the research. The chapters are arranged in sequence as follows: introduction, current state of electronic commerce, technology adoption models, electronic commerce segmentation, preliminary design of the empirical work, research design, data analysis and results, and conclusions. The introductory chapter offers a detailed justification of the relevance of this study in the context of e-commerce adoption research; it also sets out the objectives, methodology and research stages. The second chapter further expands and complements the introductory chapter, focusing on two elements: the concept of electronic commerce and its evolution from a general point of view, and the evolution of electronic commerce in Spain and main indicators of adoption. This section is intended to allow the reader to understand the research context, and also to serve as a basis to justify the relevance and representativeness of the sample used in this study. Chapters three (technology acceptance models) and four (segmentation in electronic commerce) set the theoretical foundations for the study. Chapter 3 presents a thorough literature review of technology adoption modelling, focusing on previous studies on electronic commerce acceptance. As a result of the literature review, the research framework is built upon a model based on UTAUT (Unified Theory of Acceptance and Use of Technology) and its evolution, UTAUT2, including two specific electronic commerce adoption factors: perceived risk and perceived trust. Chapter 4 deals with client and product segmentation methodologies used by experts. From the literature review, a wide range of classification variables is studied, and a shortlist of nine classification variables has been selected for inclusion in the research. The criteria for variable selection were their adequacy to electronic commerce characteristics, as well as adequacy to the sample characteristics. The nine variables have been classified in three groups: socio-demographic (gender, age, education level, income, family size and relationship status), behavioural (experience in electronic commerce and frequency of purchase) and psychographic (online purchase motivations) variables. The second half of chapter 4 is devoted to a review of the product classification criteria in electronic commerce. The review has led to the identification of a final set of fifteen groups of variables, whose combination offered a total of thirty-four possible outputs. However, due to the lack of empirical evidence in the context of electronic commerce, further investigation on the validity of this set of product classifications was deemed necessary. For this reason, chapter 5 proposes an empirical study to test the different product classification variables with 207 samples. A selection of product classifications including only those variables that are objective, able to identify distinct groups and not dependent on consumers’ point of view, led to a final classification of products which consisted on two groups of variables for the final empirical study. The combination of these two groups gave rise to four types of products: digital and non-digital goods, and digital and non-digital services. Chapter six characterizes the research –social, exploratory research– and presents the final research model and research hypotheses. The exploratory nature of the research becomes patent in instances where no prior empirical evidence on the influence of certain segmentation variables was found. Chapter six also includes the description of the measurement instrument used in the research, consisting of a total of 125 questions –and the measurement scales associated to each of them– as well as the description of the sample used for model validation (consisting of 817 Spanish residents). Chapter 7 is the core of the empirical analysis performed to validate the research model, and it is divided into two separate parts: description of the statistical techniques used for data analysis, and actual data analysis and results. The first part is structured in three different blocks: Partial Least Squares Method (PLS): the multi-variable analysis is a statistical method used to determine structural relationships of models and their predictive validity; Multi-group analysis: a set of techniques that allow comparing the outcomes of PLS analysis between two or more groups, by using one or more segmentation variables. More specifically, five comparison methods were used, which additionally gives the opportunity to assess the efficiency of each method. Determination of a priori undefined segments: in some cases, classification criteria did not necessarily exist for some segmentation variables, such as customer motivations. In these cases, the application of statistical classification techniques is required. For this study, two main classification techniques were used sequentially: principal component factor analysis –in order to reduce the number of variables– and cluster analysis. The application of the statistical methods to the models derived from the inclusion of the various segmentation criteria –for both clients and products–, led to the analysis of 128 different electronic commerce adoption models and 65 multi group comparisons. Finally, chapter 8 summarizes the conclusions from the research, divided into four parts: first, an assessment of the degree of achievement of the different research objectives is offered; then, methodological, theoretical and practical implications of the research are drawn; this is followed by a discussion on the results from the empirical study –based on the segmentation criteria for the research–; fourth, and last, the main limitations of the research –both empirical and theoretical– as well as future avenues of research are detailed.
Resumo:
This paper describes an approach to solve the inverse kinematics problem of humanoid robots whose construction shows a small but non negligible offset at the hip which prevents any purely analytical solution to be developed. Knowing that a purely numerical solution is not feasible due to variable efficiency problems, the proposed one first neglects the offset presence in order to obtain an approximate “solution” by means of an analytical algorithm based on screw theory, and then uses it as the initial condition of a numerical refining procedure based on the Levenberg‐Marquardt algorithm. In this way, few iterations are needed for any specified attitude, making it possible to implement the algorithm for real‐time applications. As a way to show the algorithm’s implementation, one case of study is considered throughout the paper, represented by the SILO2 humanoid robot.
Resumo:
Los accidentes del tráfico son un fenómeno social muy relevantes y una de las principales causas de mortalidad en los países desarrollados. Para entender este fenómeno complejo se aplican modelos econométricos sofisticados tanto en la literatura académica como por las administraciones públicas. Esta tesis está dedicada al análisis de modelos macroscópicos para los accidentes del tráfico en España. El objetivo de esta tesis se puede dividir en dos bloques: a. Obtener una mejor comprensión del fenómeno de accidentes de trafico mediante la aplicación y comparación de dos modelos macroscópicos utilizados frecuentemente en este área: DRAG y UCM, con la aplicación a los accidentes con implicación de furgonetas en España durante el período 2000-2009. Los análisis se llevaron a cabo con enfoque frecuencista y mediante los programas TRIO, SAS y TRAMO/SEATS. b. La aplicación de modelos y la selección de las variables más relevantes, son temas actuales de investigación y en esta tesis se ha desarrollado y aplicado una metodología que pretende mejorar, mediante herramientas teóricas y prácticas, el entendimiento de selección y comparación de los modelos macroscópicos. Se han desarrollado metodologías tanto para selección como para comparación de modelos. La metodología de selección de modelos se ha aplicado a los accidentes mortales ocurridos en la red viaria en el período 2000-2011, y la propuesta metodológica de comparación de modelos macroscópicos se ha aplicado a la frecuencia y la severidad de los accidentes con implicación de furgonetas en el período 2000-2009. Como resultado de los desarrollos anteriores se resaltan las siguientes contribuciones: a. Profundización de los modelos a través de interpretación de las variables respuesta y poder de predicción de los modelos. El conocimiento sobre el comportamiento de los accidentes con implicación de furgonetas se ha ampliado en este proceso. bl. Desarrollo de una metodología para selección de variables relevantes para la explicación de la ocurrencia de accidentes de tráfico. Teniendo en cuenta los resultados de a) la propuesta metodológica se basa en los modelos DRAG, cuyos parámetros se han estimado con enfoque bayesiano y se han aplicado a los datos de accidentes mortales entre los años 2000-2011 en España. Esta metodología novedosa y original se ha comparado con modelos de regresión dinámica (DR), que son los modelos más comunes para el trabajo con procesos estocásticos. Los resultados son comparables, y con la nueva propuesta se realiza una aportación metodológica que optimiza el proceso de selección de modelos, con escaso coste computacional. b2. En la tesis se ha diseñado una metodología de comparación teórica entre los modelos competidores mediante la aplicación conjunta de simulación Monte Cario, diseño de experimentos y análisis de la varianza ANOVA. Los modelos competidores tienen diferentes estructuras, que afectan a la estimación de efectos de las variables explicativas. Teniendo en cuenta el estudio desarrollado en bl) este desarrollo tiene el propósito de determinar como interpretar la componente de tendencia estocástica que un modelo UCM modela explícitamente, a través de un modelo DRAG, que no tiene un método específico para modelar este elemento. Los resultados de este estudio son importantes para ver si la serie necesita ser diferenciada antes de modelar. b3. Se han desarrollado nuevos algoritmos para realizar los ejercicios metodológicos, implementados en diferentes programas como R, WinBUGS, y MATLAB. El cumplimiento de los objetivos de la tesis a través de los desarrollos antes enunciados se remarcan en las siguientes conclusiones: 1. El fenómeno de accidentes del tráfico se ha analizado mediante dos modelos macroscópicos. Los efectos de los factores de influencia son diferentes dependiendo de la metodología aplicada. Los resultados de predicción son similares aunque con ligera superioridad de la metodología DRAG. 2. La metodología para selección de variables y modelos proporciona resultados prácticos en cuanto a la explicación de los accidentes de tráfico. La predicción y la interpretación también se han mejorado mediante esta nueva metodología. 3. Se ha implementado una metodología para profundizar en el conocimiento de la relación entre las estimaciones de los efectos de dos modelos competidores como DRAG y UCM. Un aspecto muy importante en este tema es la interpretación de la tendencia mediante dos modelos diferentes de la que se ha obtenido información muy útil para los investigadores en el campo del modelado. Los resultados han proporcionado una ampliación satisfactoria del conocimiento en torno al proceso de modelado y comprensión de los accidentes con implicación de furgonetas y accidentes mortales totales en España. ABSTRACT Road accidents are a very relevant social phenomenon and one of the main causes of death in industrialized countries. Sophisticated econometric models are applied in academic work and by the administrations for a better understanding of this very complex phenomenon. This thesis is thus devoted to the analysis of macro models for road accidents with application to the Spanish case. The objectives of the thesis may be divided in two blocks: a. To achieve a better understanding of the road accident phenomenon by means of the application and comparison of two of the most frequently used macro modelings: DRAG (demand for road use, accidents and their gravity) and UCM (unobserved components model); the application was made to van involved accident data in Spain in the period 2000-2009. The analysis has been carried out within the frequentist framework and using available state of the art software, TRIO, SAS and TRAMO/SEATS. b. Concern on the application of the models and on the relevant input variables to be included in the model has driven the research to try to improve, by theoretical and practical means, the understanding on methodological choice and model selection procedures. The theoretical developments have been applied to fatal accidents during the period 2000-2011 and van-involved road accidents in 2000-2009. This has resulted in the following contributions: a. Insight on the models has been gained through interpretation of the effect of the input variables on the response and prediction accuracy of both models. The behavior of van-involved road accidents has been explained during this process. b1. Development of an input variable selection procedure, which is crucial for an efficient choice of the inputs. Following the results of a) the procedure uses the DRAG-like model. The estimation is carried out within the Bayesian framework. The procedure has been applied for the total road accident data in Spain in the period 2000-2011. The results of the model selection procedure are compared and validated through a dynamic regression model given that the original data has a stochastic trend. b2. A methodology for theoretical comparison between the two models through Monte Carlo simulation, computer experiment design and ANOVA. The models have a different structure and this affects the estimation of the effects of the input variables. The comparison is thus carried out in terms of the effect of the input variables on the response, which is in general different, and should be related. Considering the results of the study carried out in b1) this study tries to find out how a stochastic time trend will be captured in DRAG model, since there is no specific trend component in DRAG. Given the results of b1) the findings of this study are crucial in order to see if the estimation of data with stochastic component through DRAG will be valid or whether the data need a certain adjustment (typically differencing) prior to the estimation. The model comparison methodology was applied to the UCM and DRAG models, considering that, as mentioned above, the UCM has a specific trend term while DRAG does not. b3. New algorithms were developed for carrying out the methodological exercises. For this purpose different softwares, R, WinBUGs and MATLAB were used. These objectives and contributions have been resulted in the following findings: 1. The road accident phenomenon has been analyzed by means of two macro models: The effects of the influential input variables may be estimated through the models, but it has been observed that the estimates vary from one model to the other, although prediction accuracy is similar, with a slight superiority of the DRAG methodology. 2. The variable selection methodology provides very practical results, as far as the explanation of road accidents is concerned. Prediction accuracy and interpretability have been improved by means of a more efficient input variable and model selection procedure. 3. Insight has been gained on the relationship between the estimates of the effects using the two models. A very relevant issue here is the role of trend in both models, relevant recommendations for the analyst have resulted from here. The results have provided a very satisfactory insight into both modeling aspects and the understanding of both van-involved and total fatal accidents behavior in Spain.
Resumo:
Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers.
Resumo:
Geologic storage of carbon dioxide (CO2) has been proposed as a viable means for reducing anthropogenic CO2 emissions. Once injection begins, a program for measurement, monitoring, and verification (MMV) of CO2 distribution is required in order to: a) research key features, effects and processes needed for risk assessment; b) manage the injection process; c) delineate and identify leakage risk and surface escape; d) provide early warnings of failure near the reservoir; and f) verify storage for accounting and crediting. The selection of the methodology of monitoring (characterization of site and control and verification in the post-injection phase) is influenced by economic and technological variables. Multiple Criteria Decision Making (MCDM) refers to a methodology developed for making decisions in the presence of multiple criteria. MCDM as a discipline has only a relatively short history of 40 years, and it has been closely related to advancements on computer technology. Evaluation methods and multicriteria decisions include the selection of a set of feasible alternatives, the simultaneous optimization of several objective functions, and a decision-making process and evaluation procedures that must be rational and consistent. The application of a mathematical model of decision-making will help to find the best solution, establishing the mechanisms to facilitate the management of information generated by number of disciplines of knowledge. Those problems in which decision alternatives are finite are called Discrete Multicriteria Decision problems. Such problems are most common in reality and this case scenario will be applied in solving the problem of site selection for storing CO2. Discrete MCDM is used to assess and decide on issues that by nature or design support a finite number of alternative solutions. Recently, Multicriteria Decision Analysis has been applied to hierarchy policy incentives for CCS, to assess the role of CCS, and to select potential areas which could be suitable to store. For those reasons, MCDM have been considered in the monitoring phase of CO2 storage, in order to select suitable technologies which could be techno-economical viable. In this paper, we identify techniques of gas measurements in subsurface which are currently applying in the phase of characterization (pre-injection); MCDM will help decision-makers to hierarchy the most suitable technique which fit the purpose to monitor the specific physic-chemical parameter.
Resumo:
En los últimos años la externalización de TI ha ganado mucha importancia en el mercado y, por ejemplo, el mercado externalización de servicios de TI sigue creciendo cada año. Ahora más que nunca, las organizaciones son cada vez más los compradores de las capacidades necesarias mediante la obtención de productos y servicios de los proveedores, desarrollando cada vez menos estas capacidades dentro de la empresa. La selección de proveedores de TI es un problema de decisión complejo. Los gerentes que enfrentan una decisión sobre la selección de proveedores de TI tienen dificultades en la elaboración de lo que hay que pensar, además en sus discursos. También de acuerdo con un estudio del SEI (Software Engineering Institute) [40], del 20 al 25 por ciento de los grandes proyectos de adquisición de TI fracasan en dos años y el 50 por ciento fracasan dentro de cinco años. La mala gestión, la mala definición de requisitos, la falta de evaluaciones exhaustivas, que pueden ser utilizadas para llegar a los mejores candidatos para la contratación externa, la selección de proveedores y los procesos de contratación inadecuados, la insuficiencia de procedimientos de selección tecnológicos, y los cambios de requisitos no controlados son factores que contribuyen al fracaso del proyecto. La mayoría de los fracasos podrían evitarse si el cliente aprendiese a comprender los problemas de decisión, hacer un mejor análisis de decisiones, y el buen juicio. El objetivo principal de este trabajo es el desarrollo de un modelo de decisión para la selección de proveedores de TI que tratará de reducir la cantidad de fracasos observados en las relaciones entre el cliente y el proveedor. La mayor parte de estos fracasos son causados por una mala selección, por parte del cliente, del proveedor. Además de estos problemas mostrados anteriormente, la motivación para crear este trabajo es la inexistencia de cualquier modelo de decisión basado en un multi modelo (mezcla de modelos adquisición y métodos de decisión) para el problema de la selección de proveedores de TI. En el caso de estudio, nueve empresas españolas fueron analizadas de acuerdo con el modelo de decisión para la selección de proveedores de TI desarrollado en este trabajo. Dos softwares se utilizaron en este estudio de caso: Expert Choice, y D-Sight. ABSTRACT In the past few years IT outsourcing has gained a lot of importance in the market and, for example, the IT services outsourcing market is still growing every year. Now more than ever, organizations are increasingly becoming acquirers of needed capabilities by obtaining products and services from suppliers and developing less and less of these capabilities in-house. IT supplier selection is a complex and opaque decision problem. Managers facing a decision about IT supplier selection have difficulty in framing what needs to be thought about further in their discourses. Also according to a study from SEI (Software Engineering Institute) [40], 20 to 25 percent of large information technology (IT) acquisition projects fail within two years and 50 percent fail within five years. Mismanagement, poor requirements definition, lack of comprehensive evaluations, which can be used to come up with the best candidates for outsourcing, inadequate supplier selection and contracting processes, insufficient technology selection procedures, and uncontrolled requirements changes are factors that contribute to project failure. The majority of project failures could be avoided if the acquirer learns how to understand the decision problems, make better decision analysis, and good judgment. The main objective of this work is the development of a decision model for IT supplier selection that will try to decrease the amount of failures seen in the relationships between the client-supplier. Most of these failures are caused by a not well selection of the supplier. Besides these problems showed above, the motivation to create this work is the inexistence of any decision model based on multi model (mixture of acquisition models and decision methods) for the problem of IT supplier selection. In the case study, nine different Spanish companies were analyzed based on the IT supplier selection decision model developed in this work. Two software products were used in this case study, Expert Choice and D-Sight.
Resumo:
La influencia de la aerodinámica en el diseño de los trenes de alta velocidad, unida a la necesidad de resolver nuevos problemas surgidos con el aumento de la velocidad de circulación y la reducción de peso del vehículo, hace evidente el interés de plantear un estudio de optimización que aborde tales puntos. En este contexto, se presenta en esta tesis la optimización aerodinámica del testero de un tren de alta velocidad, llevada a cabo mediante el uso de métodos de optimización avanzados. Entre estos métodos, se ha elegido aquí a los algoritmos genéticos y al método adjunto como las herramientas para llevar a cabo dicha optimización. La base conceptual, las características y la implementación de los mismos se detalla a lo largo de la tesis, permitiendo entender los motivos de su elección, y las consecuencias, en términos de ventajas y desventajas que cada uno de ellos implican. El uso de los algorimos genéticos implica a su vez la necesidad de una parametrización geométrica de los candidatos a óptimo y la generación de un modelo aproximado que complementa al método de optimización. Estos puntos se describen de modo particular en el primer bloque de la tesis, enfocada a la metodología seguida en este estudio. El segundo bloque se centra en la aplicación de los métodos a fin de optimizar el comportamiento aerodinámico del tren en distintos escenarios. Estos escenarios engloban los casos más comunes y también algunos de los más exigentes a los que hace frente un tren de alta velocidad: circulación en campo abierto con viento frontal o viento lateral, y entrada en túnel. Considerando el caso de viento frontal en campo abierto, los dos métodos han sido aplicados, permitiendo una comparación de las diferentes metodologías, así como el coste computacional asociado a cada uno, y la minimización de la resistencia aerodinámica conseguida en esa optimización. La posibilidad de evitar parametrizar la geometría y, por tanto, reducir el coste computacional del proceso de optimización es la característica más significativa de los métodos adjuntos, mientras que en el caso de los algoritmos genéticos se destaca la simplicidad y capacidad de encontrar un óptimo global en un espacio de diseño multi-modal o de resolver problemas multi-objetivo. El caso de viento lateral en campo abierto considera nuevamente los dos métoxi dos de optimización anteriores. La parametrización se ha simplificado en este estudio, lo que notablemente reduce el coste numérico de todo el estudio de optimización, a la vez que aún recoge las características geométricas más relevantes en un tren de alta velocidad. Este análisis ha permitido identificar y cuantificar la influencia de cada uno de los parámetros geométricos incluídos en la parametrización, y se ha observado que el diseño de la arista superior a barlovento es fundamental, siendo su influencia mayor que la longitud del testero o que la sección frontal del mismo. Finalmente, se ha considerado un escenario más a fin de validar estos métodos y su capacidad de encontrar un óptimo global. La entrada de un tren de alta velocidad en un túnel es uno de los casos más exigentes para un tren por el pico de sobrepresión generado, el cual afecta a la confortabilidad del pasajero, así como a la estabilidad del vehículo y al entorno próximo a la salida del túnel. Además de este problema, otro objetivo a minimizar es la resistencia aerodinámica, notablemente superior al caso de campo abierto. Este problema se resuelve usando algoritmos genéticos. Dicho método permite obtener un frente de Pareto donde se incluyen el conjunto de óptimos que minimizan ambos objetivos. ABSTRACT Aerodynamic design of trains influences several aspects of high-speed trains performance in a very significant level. In this situation, considering also that new aerodynamic problems have arisen due to the increase of the cruise speed and lightness of the vehicle, it is evident the necessity of proposing an optimization study concerning the train aerodynamics. Thus, the aerodynamic optimization of the nose shape of a high-speed train is presented in this thesis. This optimization is based on advanced optimization methods. Among these methods, genetic algorithms and the adjoint method have been selected. A theoretical description of their bases, the characteristics and the implementation of each method is detailed in this thesis. This introduction permits understanding the causes of their selection, and the advantages and drawbacks of their application. The genetic algorithms requirethe geometrical parameterization of any optimal candidate and the generation of a metamodel or surrogate model that complete the optimization process. These points are addressed with a special attention in the first block of the thesis, focused on the methodology considered in this study. The second block is referred to the use of these methods with the purpose of optimizing the aerodynamic performance of a high-speed train in several scenarios. These scenarios englobe the most representative operating conditions of high-speed trains, and also some of the most exigent train aerodynamic problems: front wind and cross-wind situations in open air, and the entrance of a high-speed train in a tunnel. The genetic algorithms and the adjoint method have been applied in the minimization of the aerodynamic drag on the train with front wind in open air. The comparison of these methods allows to evaluate the methdology and computational cost of each one, as well as the resulting minimization of the aerodynamic drag. Simplicity and robustness, the straightforward realization of a multi-objective optimization, and the capability of searching a global optimum are the main attributes of genetic algorithm. However, the requirement of geometrically parameterize any optimal candidate is a significant drawback that is avoided with the use of the adjoint method. This independence of the number of design variables leads to a relevant reduction of the pre-processing and computational cost. Considering the cross-wind stability, both methods are used again for the minimization of the side force. In this case, a simplification of the geometric parameterization of the train nose is adopted, what dramatically reduces the computational cost of the optimization process. Nevertheless, some of the most important geometrical characteristics are still described with this simplified parameterization. This analysis identifies and quantifies the influence of each design variable on the side force on the train. It is observed that the A-pillar roundness is the most demanding design parameter, with a more important effect than the nose length or the train cross-section area. Finally, a third scenario is considered for the validation of these methods in the aerodynamic optimization of a high-speed train. The entrance of a train in a tunnel is one of the most exigent train aerodynamic problems. The aerodynamic consequences of high-speed trains running in a tunnel are basically resumed in two correlated phenomena, the generation of pressure waves and an increase in aerodynamic drag. This multi-objective optimization problem is solved with genetic algorithms. The result is a Pareto front where a set of optimal solutions that minimize both objectives.
Resumo:
Road accidents are a very relevant issue in many countries and macroeconomic models are very frequently applied by academia and administrations to reduce their frequency and consequences. The selection of explanatory variables and response transformation parameter within the Bayesian framework for the selection of the set of explanatory variables a TIM and 3IM (two input and three input models) procedures are proposed. The procedure also uses the DIC and pseudo -R2 goodness of fit criteria. The model to which the methodology is applied is a dynamic regression model with Box-Cox transformation (BCT) for the explanatory variables and autorgressive (AR) structure for the response. The initial set of 22 explanatory variables are identified. The effects of these factors on the fatal accident frequency in Spain, during 2000-2012, are estimated. The dependent variable is constructed considering the stochastic trend component.