921 resultados para Bayesian classifier
Resumo:
El objetivo principal de esta tesis doctoral es profundizar en el análisis y diseño de un sistema inteligente para la predicción y control del acabado superficial en un proceso de fresado a alta velocidad, basado fundamentalmente en clasificadores Bayesianos, con el prop´osito de desarrollar una metodolog´ıa que facilite el diseño de este tipo de sistemas. El sistema, cuyo propósito es posibilitar la predicción y control de la rugosidad superficial, se compone de un modelo aprendido a partir de datos experimentales con redes Bayesianas, que ayudar´a a comprender los procesos dinámicos involucrados en el mecanizado y las interacciones entre las variables relevantes. Dado que las redes neuronales artificiales son modelos ampliamente utilizados en procesos de corte de materiales, también se incluye un modelo para fresado usándolas, donde se introdujo la geometría y la dureza del material como variables novedosas hasta ahora no estudiadas en este contexto. Por lo tanto, una importante contribución en esta tesis son estos dos modelos para la predicción de la rugosidad superficial, que se comparan con respecto a diferentes aspectos: la influencia de las nuevas variables, los indicadores de evaluación del desempeño, interpretabilidad. Uno de los principales problemas en la modelización con clasificadores Bayesianos es la comprensión de las enormes tablas de probabilidad a posteriori producidas. Introducimos un m´etodo de explicación que genera un conjunto de reglas obtenidas de árboles de decisión. Estos árboles son inducidos a partir de un conjunto de datos simulados generados de las probabilidades a posteriori de la variable clase, calculadas con la red Bayesiana aprendida a partir de un conjunto de datos de entrenamiento. Por último, contribuimos en el campo multiobjetivo en el caso de que algunos de los objetivos no se puedan cuantificar en números reales, sino como funciones en intervalo de valores. Esto ocurre a menudo en aplicaciones de aprendizaje automático, especialmente las basadas en clasificación supervisada. En concreto, se extienden las ideas de dominancia y frontera de Pareto a esta situación. Su aplicación a los estudios de predicción de la rugosidad superficial en el caso de maximizar al mismo tiempo la sensibilidad y la especificidad del clasificador inducido de la red Bayesiana, y no solo maximizar la tasa de clasificación correcta. Los intervalos de estos dos objetivos provienen de un m´etodo de estimación honesta de ambos objetivos, como e.g. validación cruzada en k rodajas o bootstrap.---ABSTRACT---The main objective of this PhD Thesis is to go more deeply into the analysis and design of an intelligent system for surface roughness prediction and control in the end-milling machining process, based fundamentally on Bayesian network classifiers, with the aim of developing a methodology that makes easier the design of this type of systems. The system, whose purpose is to make possible the surface roughness prediction and control, consists of a model learnt from experimental data with the aid of Bayesian networks, that will help to understand the dynamic processes involved in the machining and the interactions among the relevant variables. Since artificial neural networks are models widely used in material cutting proceses, we include also an end-milling model using them, where the geometry and hardness of the piecework are introduced as novel variables not studied so far within this context. Thus, an important contribution in this thesis is these two models for surface roughness prediction, that are then compared with respecto to different aspects: influence of the new variables, performance evaluation metrics, interpretability. One of the main problems with Bayesian classifier-based modelling is the understanding of the enormous posterior probabilitiy tables produced. We introduce an explanation method that generates a set of rules obtained from decision trees. Such trees are induced from a simulated data set generated from the posterior probabilities of the class variable, calculated with the Bayesian network learned from a training data set. Finally, we contribute in the multi-objective field in the case that some of the objectives cannot be quantified as real numbers but as interval-valued functions. This often occurs in machine learning applications, especially those based on supervised classification. Specifically, the dominance and Pareto front ideas are extended to this setting. Its application to the surface roughness prediction studies the case of maximizing simultaneously the sensitivity and specificity of the induced Bayesian network classifier, rather than only maximizing the correct classification rate. Intervals in these two objectives come from a honest estimation method of both objectives, like e.g. k-fold cross-validation or bootstrap.
Resumo:
Sin duda, el rostro humano ofrece mucha más información de la que pensamos. La cara transmite sin nuestro consentimiento señales no verbales, a partir de las interacciones faciales, que dejan al descubierto nuestro estado afectivo, actividad cognitiva, personalidad y enfermedades. Estudios recientes [OFT14, TODMS15] demuestran que muchas de nuestras decisiones sociales e interpersonales derivan de un previo análisis facial de la cara que nos permite establecer si esa persona es confiable, trabajadora, inteligente, etc. Esta interpretación, propensa a errores, deriva de la capacidad innata de los seres humanas de encontrar estas señales e interpretarlas. Esta capacidad es motivo de estudio, con un especial interés en desarrollar métodos que tengan la habilidad de calcular de manera automática estas señales o atributos asociados a la cara. Así, el interés por la estimación de atributos faciales ha crecido rápidamente en los últimos años por las diversas aplicaciones en que estos métodos pueden ser utilizados: marketing dirigido, sistemas de seguridad, interacción hombre-máquina, etc. Sin embargo, éstos están lejos de ser perfectos y robustos en cualquier dominio de problemas. La principal dificultad encontrada es causada por la alta variabilidad intra-clase debida a los cambios en la condición de la imagen: cambios de iluminación, oclusiones, expresiones faciales, edad, género, etnia, etc.; encontradas frecuentemente en imágenes adquiridas en entornos no controlados. Este de trabajo de investigación estudia técnicas de análisis de imágenes para estimar atributos faciales como el género, la edad y la postura, empleando métodos lineales y explotando las dependencias estadísticas entre estos atributos. Adicionalmente, nuestra propuesta se centrará en la construcción de estimadores que tengan una fuerte relación entre rendimiento y coste computacional. Con respecto a éste último punto, estudiamos un conjunto de estrategias para la clasificación de género y las comparamos con una propuesta basada en un clasificador Bayesiano y una adecuada extracción de características. Analizamos en profundidad el motivo de porqué las técnicas lineales no han logrado resultados competitivos hasta la fecha y mostramos cómo obtener rendimientos similares a las mejores técnicas no-lineales. Se propone un segundo algoritmo para la estimación de edad, basado en un regresor K-NN y una adecuada selección de características tal como se propuso para la clasificación de género. A partir de los experimentos desarrollados, observamos que el rendimiento de los clasificadores se reduce significativamente si los ´estos han sido entrenados y probados sobre diferentes bases de datos. Hemos encontrado que una de las causas es la existencia de dependencias entre atributos faciales que no han sido consideradas en la construcción de los clasificadores. Nuestro resultados demuestran que la variabilidad intra-clase puede ser reducida cuando se consideran las dependencias estadísticas entre los atributos faciales de el género, la edad y la pose; mejorando el rendimiento de nuestros clasificadores de atributos faciales con un coste computacional pequeño. Abstract Surely the human face provides much more information than we think. The face provides without our consent nonverbal cues from facial interactions that reveal our emotional state, cognitive activity, personality and disease. Recent studies [OFT14, TODMS15] show that many of our social and interpersonal decisions derive from a previous facial analysis that allows us to establish whether that person is trustworthy, hardworking, intelligent, etc. This error-prone interpretation derives from the innate ability of human beings to find and interpret these signals. This capability is being studied, with a special interest in developing methods that have the ability to automatically calculate these signs or attributes associated with the face. Thus, the interest in the estimation of facial attributes has grown rapidly in recent years by the various applications in which these methods can be used: targeted marketing, security systems, human-computer interaction, etc. However, these are far from being perfect and robust in any domain of problems. The main difficulty encountered is caused by the high intra-class variability due to changes in the condition of the image: lighting changes, occlusions, facial expressions, age, gender, ethnicity, etc.; often found in images acquired in uncontrolled environments. This research work studies image analysis techniques to estimate facial attributes such as gender, age and pose, using linear methods, and exploiting the statistical dependencies between these attributes. In addition, our proposal will focus on the construction of classifiers that have a good balance between performance and computational cost. We studied a set of strategies for gender classification and we compare them with a proposal based on a Bayesian classifier and a suitable feature extraction based on Linear Discriminant Analysis. We study in depth why linear techniques have failed to provide competitive results to date and show how to obtain similar performances to the best non-linear techniques. A second algorithm is proposed for estimating age, which is based on a K-NN regressor and proper selection of features such as those proposed for the classification of gender. From our experiments we note that performance estimates are significantly reduced if they have been trained and tested on different databases. We have found that one of the causes is the existence of dependencies between facial features that have not been considered in the construction of classifiers. Our results demonstrate that intra-class variability can be reduced when considering the statistical dependencies between facial attributes gender, age and pose, thus improving the performance of our classifiers with a reduced computational cost.
Resumo:
Peer-reviewed
Resumo:
Abstract. Two ideas taken from Bayesian optimization and classifier systems are presented for personnel scheduling based on choosing a suitable scheduling rule from a set for each person's assignment. Unlike our previous work of using genetic algorithms whose learning is implicit, the learning in both approaches is explicit, i.e. we are able to identify building blocks directly. To achieve this target, the Bayesian optimization algorithm builds a Bayesian network of the joint probability distribution of the rules used to construct solutions, while the adapted classifier system assigns each rule a strength value that is constantly updated according to its usefulness in the current situation. Computational results from 52 real data instances of nurse scheduling demonstrate the success of both approaches. It is also suggested that the learning mechanism in the proposed approaches might be suitable for other scheduling problems.
Resumo:
Abstract. Two ideas taken from Bayesian optimization and classifier systems are presented for personnel scheduling based on choosing a suitable scheduling rule from a set for each person's assignment. Unlike our previous work of using genetic algorithms whose learning is implicit, the learning in both approaches is explicit, i.e. we are able to identify building blocks directly. To achieve this target, the Bayesian optimization algorithm builds a Bayesian network of the joint probability distribution of the rules used to construct solutions, while the adapted classifier system assigns each rule a strength value that is constantly updated according to its usefulness in the current situation. Computational results from 52 real data instances of nurse scheduling demonstrate the success of both approaches. It is also suggested that the learning mechanism in the proposed approaches might be suitable for other scheduling problems.
Resumo:
The growing population in cities increases the energy demand and affects the environment by increasing carbon emissions. Information and communications technology solutions which enable energy optimization are needed to address this growing energy demand in cities and to reduce carbon emissions. District heating systems optimize the energy production by reusing waste energy with combined heat and power plants. Forecasting the heat load demand in residential buildings assists in optimizing energy production and consumption in a district heating system. However, the presence of a large number of factors such as weather forecast, district heating operational parameters and user behavioural parameters, make heat load forecasting a challenging task. This thesis proposes a probabilistic machine learning model using a Naive Bayes classifier, to forecast the hourly heat load demand for three residential buildings in the city of Skellefteå, Sweden over a period of winter and spring seasons. The district heating data collected from the sensors equipped at the residential buildings in Skellefteå, is utilized to build the Bayesian network to forecast the heat load demand for horizons of 1, 2, 3, 6 and 24 hours. The proposed model is validated by using four cases to study the influence of various parameters on the heat load forecast by carrying out trace driven analysis in Weka and GeNIe. Results show that current heat load consumption and outdoor temperature forecast are the two parameters with most influence on the heat load forecast. The proposed model achieves average accuracies of 81.23 % and 76.74 % for a forecast horizon of 1 hour in the three buildings for winter and spring seasons respectively. The model also achieves an average accuracy of 77.97 % for three buildings across both seasons for the forecast horizon of 1 hour by utilizing only 10 % of the training data. The results indicate that even a simple model like Naive Bayes classifier can forecast the heat load demand by utilizing less training data.
A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms
Resumo:
Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.
Resumo:
tWe develop an orthogonal forward selection (OFS) approach to construct radial basis function (RBF)network classifiers for two-class problems. Our approach integrates several concepts in probabilisticmodelling, including cross validation, mutual information and Bayesian hyperparameter fitting. At eachstage of the OFS procedure, one model term is selected by maximising the leave-one-out mutual infor-mation (LOOMI) between the classifier’s predicted class labels and the true class labels. We derive theformula of LOOMI within the OFS framework so that the LOOMI can be evaluated efficiently for modelterm selection. Furthermore, a Bayesian procedure of hyperparameter fitting is also integrated into theeach stage of the OFS to infer the l2-norm based local regularisation parameter from the data. Since eachforward stage is effectively fitting of a one-variable model, this task is very fast. The classifier construc-tion procedure is automatically terminated without the need of using additional stopping criterion toyield very sparse RBF classifiers with excellent classification generalisation performance, which is par-ticular useful for the noisy data sets with highly overlapping class distribution. A number of benchmarkexamples are employed to demonstrate the effectiveness of our proposed approach.
Resumo:
This work proposes and discusses an approach for inducing Bayesian classifiers aimed at balancing the tradeoff between the precise probability estimates produced by time consuming unrestricted Bayesian networks and the computational efficiency of Naive Bayes (NB) classifiers. The proposed approach is based on the fundamental principles of the Heuristic Search Bayesian network learning. The Markov Blanket concept, as well as a proposed ""approximate Markov Blanket"" are used to reduce the number of nodes that form the Bayesian network to be induced from data. Consequently, the usually high computational cost of the heuristic search learning algorithms can be lessened, while Bayesian network structures better than NB can be achieved. The resulting algorithms, called DMBC (Dynamic Markov Blanket Classifier) and A-DMBC (Approximate DMBC), are empirically assessed in twelve domains that illustrate scenarios of particular interest. The obtained results are compared with NB and Tree Augmented Network (TAN) classifiers, and confinn that both proposed algorithms can provide good classification accuracies and better probability estimates than NB and TAN, while being more computationally efficient than the widely used K2 Algorithm.
Resumo:
This paper addresses biometric identification using large databases, in particular, iris databases. In such applications, it is critical to have low response time, while maintaining an acceptable recognition rate. Thus, the trade-off between speed and accuracy must be evaluated for processing and recognition parts of an identification system. In this paper, a graph-based framework for pattern recognition, called Optimum-Path Forest (OPF), is utilized as a classifier in a pre-developed iris recognition system. The aim of this paper is to verify the effectiveness of OPF in the field of iris recognition, and its performance for various scale iris databases. The existing Gauss-Laguerre Wavelet based coding scheme is used for iris encoding. The performance of the OPF and two other - Hamming and Bayesian - classifiers, is compared using small, medium, and large-scale databases. Such a comparison shows that the OPF has faster response for large-scale databases, thus performing better than the more accurate, but slower, classifiers.
Resumo:
Neuronal morphology is a key feature in the study of brain circuits, as it is highly related to information processing and functional identification. Neuronal morphology affects the process of integration of inputs from other neurons and determines the neurons which receive the output of the neurons. Different parts of the neurons can operate semi-independently according to the spatial location of the synaptic connections. As a result, there is considerable interest in the analysis of the microanatomy of nervous cells since it constitutes an excellent tool for better understanding cortical function. However, the morphologies, molecular features and electrophysiological properties of neuronal cells are extremely variable. Except for some special cases, this variability makes it hard to find a set of features that unambiguously define a neuronal type. In addition, there are distinct types of neurons in particular regions of the brain. This morphological variability makes the analysis and modeling of neuronal morphology a challenge. Uncertainty is a key feature in many complex real-world problems. Probability theory provides a framework for modeling and reasoning with uncertainty. Probabilistic graphical models combine statistical theory and graph theory to provide a tool for managing domains with uncertainty. In particular, we focus on Bayesian networks, the most commonly used probabilistic graphical model. In this dissertation, we design new methods for learning Bayesian networks and apply them to the problem of modeling and analyzing morphological data from neurons. The morphology of a neuron can be quantified using a number of measurements, e.g., the length of the dendrites and the axon, the number of bifurcations, the direction of the dendrites and the axon, etc. These measurements can be modeled as discrete or continuous data. The continuous data can be linear (e.g., the length or the width of a dendrite) or directional (e.g., the direction of the axon). These data may follow complex probability distributions and may not fit any known parametric distribution. Modeling this kind of problems using hybrid Bayesian networks with discrete, linear and directional variables poses a number of challenges regarding learning from data, inference, etc. In this dissertation, we propose a method for modeling and simulating basal dendritic trees from pyramidal neurons using Bayesian networks to capture the interactions between the variables in the problem domain. A complete set of variables is measured from the dendrites, and a learning algorithm is applied to find the structure and estimate the parameters of the probability distributions included in the Bayesian networks. Then, a simulation algorithm is used to build the virtual dendrites by sampling values from the Bayesian networks, and a thorough evaluation is performed to show the model’s ability to generate realistic dendrites. In this first approach, the variables are discretized so that discrete Bayesian networks can be learned and simulated. Then, we address the problem of learning hybrid Bayesian networks with different kinds of variables. Mixtures of polynomials have been proposed as a way of representing probability densities in hybrid Bayesian networks. We present a method for learning mixtures of polynomials approximations of one-dimensional, multidimensional and conditional probability densities from data. The method is based on basis spline interpolation, where a density is approximated as a linear combination of basis splines. The proposed algorithms are evaluated using artificial datasets. We also use the proposed methods as a non-parametric density estimation technique in Bayesian network classifiers. Next, we address the problem of including directional data in Bayesian networks. These data have some special properties that rule out the use of classical statistics. Therefore, different distributions and statistics, such as the univariate von Mises and the multivariate von Mises–Fisher distributions, should be used to deal with this kind of information. In particular, we extend the naive Bayes classifier to the case where the conditional probability distributions of the predictive variables given the class follow either of these distributions. We consider the simple scenario, where only directional predictive variables are used, and the hybrid case, where discrete, Gaussian and directional distributions are mixed. The classifier decision functions and their decision surfaces are studied at length. Artificial examples are used to illustrate the behavior of the classifiers. The proposed classifiers are empirically evaluated over real datasets. We also study the problem of interneuron classification. An extensive group of experts is asked to classify a set of neurons according to their most prominent anatomical features. A web application is developed to retrieve the experts’ classifications. We compute agreement measures to analyze the consensus between the experts when classifying the neurons. Using Bayesian networks and clustering algorithms on the resulting data, we investigate the suitability of the anatomical terms and neuron types commonly used in the literature. Additionally, we apply supervised learning approaches to automatically classify interneurons using the values of their morphological measurements. Then, a methodology for building a model which captures the opinions of all the experts is presented. First, one Bayesian network is learned for each expert, and we propose an algorithm for clustering Bayesian networks corresponding to experts with similar behaviors. Then, a Bayesian network which represents the opinions of each group of experts is induced. Finally, a consensus Bayesian multinet which models the opinions of the whole group of experts is built. A thorough analysis of the consensus model identifies different behaviors between the experts when classifying the interneurons in the experiment. A set of characterizing morphological traits for the neuronal types can be defined by performing inference in the Bayesian multinet. These findings are used to validate the model and to gain some insights into neuron morphology. Finally, we study a classification problem where the true class label of the training instances is not known. Instead, a set of class labels is available for each instance. This is inspired by the neuron classification problem, where a group of experts is asked to individually provide a class label for each instance. We propose a novel approach for learning Bayesian networks using count vectors which represent the number of experts who selected each class label for each instance. These Bayesian networks are evaluated using artificial datasets from supervised learning problems. Resumen La morfología neuronal es una característica clave en el estudio de los circuitos cerebrales, ya que está altamente relacionada con el procesado de información y con los roles funcionales. La morfología neuronal afecta al proceso de integración de las señales de entrada y determina las neuronas que reciben las salidas de otras neuronas. Las diferentes partes de la neurona pueden operar de forma semi-independiente de acuerdo a la localización espacial de las conexiones sinápticas. Por tanto, existe un interés considerable en el análisis de la microanatomía de las células nerviosas, ya que constituye una excelente herramienta para comprender mejor el funcionamiento de la corteza cerebral. Sin embargo, las propiedades morfológicas, moleculares y electrofisiológicas de las células neuronales son extremadamente variables. Excepto en algunos casos especiales, esta variabilidad morfológica dificulta la definición de un conjunto de características que distingan claramente un tipo neuronal. Además, existen diferentes tipos de neuronas en regiones particulares del cerebro. La variabilidad neuronal hace que el análisis y el modelado de la morfología neuronal sean un importante reto científico. La incertidumbre es una propiedad clave en muchos problemas reales. La teoría de la probabilidad proporciona un marco para modelar y razonar bajo incertidumbre. Los modelos gráficos probabilísticos combinan la teoría estadística y la teoría de grafos con el objetivo de proporcionar una herramienta con la que trabajar bajo incertidumbre. En particular, nos centraremos en las redes bayesianas, el modelo más utilizado dentro de los modelos gráficos probabilísticos. En esta tesis hemos diseñado nuevos métodos para aprender redes bayesianas, inspirados por y aplicados al problema del modelado y análisis de datos morfológicos de neuronas. La morfología de una neurona puede ser cuantificada usando una serie de medidas, por ejemplo, la longitud de las dendritas y el axón, el número de bifurcaciones, la dirección de las dendritas y el axón, etc. Estas medidas pueden ser modeladas como datos continuos o discretos. A su vez, los datos continuos pueden ser lineales (por ejemplo, la longitud o la anchura de una dendrita) o direccionales (por ejemplo, la dirección del axón). Estos datos pueden llegar a seguir distribuciones de probabilidad muy complejas y pueden no ajustarse a ninguna distribución paramétrica conocida. El modelado de este tipo de problemas con redes bayesianas híbridas incluyendo variables discretas, lineales y direccionales presenta una serie de retos en relación al aprendizaje a partir de datos, la inferencia, etc. En esta tesis se propone un método para modelar y simular árboles dendríticos basales de neuronas piramidales usando redes bayesianas para capturar las interacciones entre las variables del problema. Para ello, se mide un amplio conjunto de variables de las dendritas y se aplica un algoritmo de aprendizaje con el que se aprende la estructura y se estiman los parámetros de las distribuciones de probabilidad que constituyen las redes bayesianas. Después, se usa un algoritmo de simulación para construir dendritas virtuales mediante el muestreo de valores de las redes bayesianas. Finalmente, se lleva a cabo una profunda evaluaci ón para verificar la capacidad del modelo a la hora de generar dendritas realistas. En esta primera aproximación, las variables fueron discretizadas para poder aprender y muestrear las redes bayesianas. A continuación, se aborda el problema del aprendizaje de redes bayesianas con diferentes tipos de variables. Las mixturas de polinomios constituyen un método para representar densidades de probabilidad en redes bayesianas híbridas. Presentamos un método para aprender aproximaciones de densidades unidimensionales, multidimensionales y condicionales a partir de datos utilizando mixturas de polinomios. El método se basa en interpolación con splines, que aproxima una densidad como una combinación lineal de splines. Los algoritmos propuestos se evalúan utilizando bases de datos artificiales. Además, las mixturas de polinomios son utilizadas como un método no paramétrico de estimación de densidades para clasificadores basados en redes bayesianas. Después, se estudia el problema de incluir información direccional en redes bayesianas. Este tipo de datos presenta una serie de características especiales que impiden el uso de las técnicas estadísticas clásicas. Por ello, para manejar este tipo de información se deben usar estadísticos y distribuciones de probabilidad específicos, como la distribución univariante von Mises y la distribución multivariante von Mises–Fisher. En concreto, en esta tesis extendemos el clasificador naive Bayes al caso en el que las distribuciones de probabilidad condicionada de las variables predictoras dada la clase siguen alguna de estas distribuciones. Se estudia el caso base, en el que sólo se utilizan variables direccionales, y el caso híbrido, en el que variables discretas, lineales y direccionales aparecen mezcladas. También se estudian los clasificadores desde un punto de vista teórico, derivando sus funciones de decisión y las superficies de decisión asociadas. El comportamiento de los clasificadores se ilustra utilizando bases de datos artificiales. Además, los clasificadores son evaluados empíricamente utilizando bases de datos reales. También se estudia el problema de la clasificación de interneuronas. Desarrollamos una aplicación web que permite a un grupo de expertos clasificar un conjunto de neuronas de acuerdo a sus características morfológicas más destacadas. Se utilizan medidas de concordancia para analizar el consenso entre los expertos a la hora de clasificar las neuronas. Se investiga la idoneidad de los términos anatómicos y de los tipos neuronales utilizados frecuentemente en la literatura a través del análisis de redes bayesianas y la aplicación de algoritmos de clustering. Además, se aplican técnicas de aprendizaje supervisado con el objetivo de clasificar de forma automática las interneuronas a partir de sus valores morfológicos. A continuación, se presenta una metodología para construir un modelo que captura las opiniones de todos los expertos. Primero, se genera una red bayesiana para cada experto y se propone un algoritmo para agrupar las redes bayesianas que se corresponden con expertos con comportamientos similares. Después, se induce una red bayesiana que modela la opinión de cada grupo de expertos. Por último, se construye una multired bayesiana que modela las opiniones del conjunto completo de expertos. El análisis del modelo consensuado permite identificar diferentes comportamientos entre los expertos a la hora de clasificar las neuronas. Además, permite extraer un conjunto de características morfológicas relevantes para cada uno de los tipos neuronales mediante inferencia con la multired bayesiana. Estos descubrimientos se utilizan para validar el modelo y constituyen información relevante acerca de la morfología neuronal. Por último, se estudia un problema de clasificación en el que la etiqueta de clase de los datos de entrenamiento es incierta. En cambio, disponemos de un conjunto de etiquetas para cada instancia. Este problema está inspirado en el problema de la clasificación de neuronas, en el que un grupo de expertos proporciona una etiqueta de clase para cada instancia de manera individual. Se propone un método para aprender redes bayesianas utilizando vectores de cuentas, que representan el número de expertos que seleccionan cada etiqueta de clase para cada instancia. Estas redes bayesianas se evalúan utilizando bases de datos artificiales de problemas de aprendizaje supervisado.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
Bayesian network classifiers are a powerful machine learning tool. In order to evaluate the expressive power of these models, we compute families of polynomials that sign-represent decision functions induced by Bayesian network classifiers. We prove that those families are linear combinations of products of Lagrange basis polynomials. In absence of V-structures in the predictor sub-graph, we are also able to prove that this family of polynomials does in- deed characterize the specific classifier considered. We then use this representation to bound the number of decision functions representable by Bayesian network classifiers with a given structure and we compare these bounds to the ones obtained using Vapnik-Chervonenkis dimension.
Resumo:
Bayesian network classifiers are a powerful machine learning tool. In order to evaluate the expressive power of these models, we compute families of polynomials that sign-represent decision functions induced by Bayesian network classifiers. We prove that those families are linear combinations of products of Lagrange basis polynomials. In absence of V-structures in the predictor sub-graph, we are also able to prove that this family of polynomials does in- deed characterize the specific classifier considered. We then use this representation to bound the number of decision functions representable by Bayesian network classifiers with a given structure and we compare these bounds to the ones obtained using Vapnik-Chervonenkis dimension.