1000 resultados para Bayes networks
Resumo:
Esta dissertação incide sobre o estudo e análise de uma solução para a criação de um sistema de recomendação para uma comunidade de consumidores de media e no consequente desenvolvimento da mesma cujo âmbito inicial engloba consumidores de jogos, filmes e/ou séries, com o intuito de lhes proporcionar a oportunidade de partilharem experiências, bem como manterem um registo das mesmas. Com a informação adquirida, o sistema reúne condições para proceder a sugestões direcionadas a cada membro da comunidade. O sistema atualiza a sua informação mediante as ações e os dados fornecidos pelos membros, bem como pelo seu feedback às sugestões. Esta aprendizagem ao longo do tempo permite que as sugestões do sistema evoluam juntamente com a mudança de preferência dos membros ou se autocorrijam. O sistema toma iniciativa de sugerir mediante determinadas ações, mas também pode ser invocada uma sugestão diretamente pelo utilizador, na medida em que este não precisa de esperar por sugestões, podendo pedir ao sistema que as forneça num determinado momento. Nos testes realizados foi possível apurar que o sistema de recomendação desenvolvido forneceu sugestões adequadas a cada utilizador específico, tomando em linha de conta as suas ações prévias. Para além deste facto, o sistema não forneceu qualquer sugestão quando o histórico destas tinha provado incomodar o utilizador.
Resumo:
Esta dissertação incide sobre o estudo e análise de uma solução para a criação de um sistema de recomendação para uma comunidade de consumidores de media e no consequente desenvolvimento da mesma cujo âmbito inicial engloba consumidores de jogos, filmes e/ou séries, com o intuito de lhes proporcionar a oportunidade de partilharem experiências, bem como manterem um registo das mesmas. Com a informação adquirida, o sistema reúne condições para proceder a sugestões direccionadas a cada membro da comunidade. O sistema actualiza a sua informação mediante as acções e os dados fornecidos pelos membros, bem como pelo seu feedback às sugestões. Esta aprendizagem ao longo do tempo permite que as sugestões do sistema evoluam juntamente com a mudança de preferência dos membros ou se autocorrijam. O sistema toma iniciativa de sugerir mediante determinadas acções, mas também pode ser invocada uma sugestão directamente pelo utilizador, na medida em que este não precisa de esperar por sugestões, podendo pedir ao sistema que as forneça num determinado momento. Nos testes realizados foi possível apurar que o sistema de recomendação desenvolvido forneceu sugestões adequadas a cada utilizador específico, tomando em linha de conta as suas acções prévias. Para além deste facto, o sistema não forneceu qualquer sugestão quando o histórico destas tinha provado incomodar o utilizador.
Resumo:
This paper aims to identify the communication goal(s) of a user's information-seeking query out of a finite set of within-domain goals in natural language queries. It proposes using Tree-Augmented Naive Bayes networks (TANs) for goal detection. The problem is formulated as N binary decisions, and each is performed by a TAN. Comparative study has been carried out to compare the performance with Naive Bayes, fully-connected TANs, and multi-layer neural networks. Experimental results show that TANs consistently give better results when tested on the ATIS and DARPA Communicator corpora.
Resumo:
This paper describes the modeling of a weed infestation risk inference system that implements a collaborative inference scheme based on rules extracted from two Bayesian network classifiers. The first Bayesian classifier infers a categorical variable value for the weed-crop competitiveness using as input categorical variables for the total density of weeds and corresponding proportions of narrow and broad-leaved weeds. The inferred categorical variable values for the weed-crop competitiveness along with three other categorical variables extracted from estimated maps for the weed seed production and weed coverage are then used as input for a second Bayesian network classifier to infer categorical variables values for the risk of infestation. Weed biomass and yield loss data samples are used to learn the probability relationship among the nodes of the first and second Bayesian classifiers in a supervised fashion, respectively. For comparison purposes, two types of Bayesian network structures are considered, namely an expert-based Bayesian classifier and a naive Bayes classifier. The inference system focused on the knowledge interpretation by translating a Bayesian classifier into a set of classification rules. The results obtained for the risk inference in a corn-crop field are presented and discussed. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
This paper discusses the analysis of cases in which the inclusion or exclusion of a particular suspect, as a possible contributor to a DNA mixture, depends on the value of a variable (the number of contributors) that cannot be determined with certainty. It offers alternative ways to deal with such cases, including sensitivity analysis and object-oriented Bayesian networks, that separate uncertainty about the inclusion of the suspect from uncertainty about other variables. The paper presents a case study in which the value of DNA evidence varies radically depending on the number of contributors to a DNA mixture: if there are two contributors, the suspect is excluded; if there are three or more, the suspect is included; but the number of contributors cannot be determined with certainty. It shows how an object-oriented Bayesian network can accommodate and integrate varying perspectives on the unknown variable and how it can reduce the potential for bias by directing attention to relevant considerations and distinguishing different sources of uncertainty. It also discusses the challenge of presenting such evidence to lay audiences.
Resumo:
Internet access by wireless networks has grown considerably in recent years. However, these networks are vulnerable to security problems, especially those related to denial of service attacks. Intrusion Detection Systems(IDS)are widely used to improve network security, but comparison among the several existing approaches is not a trivial task. This paper proposes building a datasetfor evaluating IDS in wireless environments. The data were captured in a real, operating network. We conducted tests using traditional IDS and achieved great results, which showed the effectiveness of our proposed approach.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
We present a model of Bayesian network for continuous variables, where densities and conditional densities are estimated with B-spline MoPs. We use a novel approach to directly obtain conditional densities estimation using B-spline properties. In particular we implement naive Bayes and wrapper variables selection. Finally we apply our techniques to the problem of predicting neurons morphological variables from electrophysiological ones.
Resumo:
A novel approach, based on statistical mechanics, to analyze typical performance of optimum code-division multiple-access (CDMA) multiuser detectors is reviewed. A `black-box' view ot the basic CDMA channel is introduced, based on which the CDMA multiuser detection problem is regarded as a `learning-from-examples' problem of the `binary linear perceptron' in the neural network literature. Adopting Bayes framework, analysis of the performance of the optimum CDMA multiuser detectors is reduced to evaluation of the average of the cumulant generating function of a relevant posterior distribution. The evaluation of the average cumulant generating function is done, based on formal analogy with a similar calculation appearing in the spin glass theory in statistical mechanics, by making use of the replica method, a method developed in the spin glass theory.
Resumo:
Human land use tends to decrease the diversity of native plant species and facilitate the invasion and establishment of exotic ones. Such changes in land use and plant community composition usually have negative impacts on the assemblages of native herbivorous insects. Highly specialized herbivores are expected to be especially sensitive to land use intensification and the presence of exotic plant species because they are neither capable of consuming alternative plant species of the native flora nor exotic plant species. Therefore, higher levels of land use intensity might reduce the proportion of highly specialized herbivores, which ultimately would lead to changes in the specialization of interactions in plant-herbivore networks. This study investigates the community-wide effects of land use intensity on the degree of specialization of 72 plant-herbivore networks, including effects mediated by the increase in the proportion of exotic plant species. Contrary to our expectation, the net effect of land use intensity on network specialization was positive. However, this positive effect of land use intensity was partially canceled by an opposite effect of the proportion of exotic plant species on network specialization. When we analyzed networks composed exclusively of endophagous herbivores separately from those composed exclusively of exophagous herbivores, we found that only endophages showed a consistent change in network specialization at higher land use levels. Altogether, these results indicate that land use intensity is an important ecological driver of network specialization, by way of reducing the local host range of herbivore guilds with highly specialized feeding habits. However, because the effect of land use intensity is offset by an opposite effect owing to the proportion of exotic host species, the net effect of land use in a given herbivore assemblage will likely depend on the extent of the replacement of native host species with exotic ones.
Resumo:
Though introduced recently, complex networks research has grown steadily because of its potential to represent, characterize and model a wide range of intricate natural systems and phenomena. Because of the intrinsic complexity and systemic organization of life, complex networks provide a specially promising framework for systems biology investigation. The current article is an up-to-date review of the major developments related to the application of complex networks in biology, with special attention focused on the more recent literature. The main concepts and models of complex networks are presented and illustrated in an accessible fashion. Three main types of networks are covered: transcriptional regulatory networks, protein-protein interaction networks and metabolic networks. The key role of complex networks for systems biology is extensively illustrated by several of the papers reviewed.
Resumo:
PURPOSE: The main goal of this study was to develop and compare two different techniques for classification of specific types of corneal shapes when Zernike coefficients are used as inputs. A feed-forward artificial Neural Network (NN) and discriminant analysis (DA) techniques were used. METHODS: The inputs both for the NN and DA were the first 15 standard Zernike coefficients for 80 previously classified corneal elevation data files from an Eyesys System 2000 Videokeratograph (VK), installed at the Departamento de Oftalmologia of the Escola Paulista de Medicina, São Paulo. The NN had 5 output neurons which were associated with 5 typical corneal shapes: keratoconus, with-the-rule astigmatism, against-the-rule astigmatism, "regular" or "normal" shape and post-PRK. RESULTS: The NN and DA responses were statistically analyzed in terms of precision ([true positive+true negative]/total number of cases). Mean overall results for all cases for the NN and DA techniques were, respectively, 94% and 84.8%. CONCLUSION: Although we used a relatively small database, results obtained in the present study indicate that Zernike polynomials as descriptors of corneal shape may be a reliable parameter as input data for diagnostic automation of VK maps, using either NN or DA.
Resumo:
Fifty Bursa of Fabricius (BF) were examined by conventional optical microscopy and digital images were acquired and processed using Matlab® 6.5 software. The Artificial Neuronal Network (ANN) was generated using Neuroshell® Classifier software and the optical and digital data were compared. The ANN was able to make a comparable classification of digital and optical scores. The use of ANN was able to classify correctly the majority of the follicles, reaching sensibility and specificity of 89% and 96%, respectively. When the follicles were scored and grouped in a binary fashion the sensibility increased to 90% and obtained the maximum value for the specificity of 92%. These results demonstrate that the use of digital image analysis and ANN is a useful tool for the pathological classification of the BF lymphoid depletion. In addition it provides objective results that allow measuring the dimension of the error in the diagnosis and classification therefore making comparison between databases feasible.
Resumo:
This work proposes a new approach using a committee machine of artificial neural networks to classify masses found in mammograms as benign or malignant. Three shape factors, three edge-sharpness measures, and 14 texture measures are used for the classification of 20 regions of interest (ROIs) related to malignant tumors and 37 ROIs related to benign masses. A group of multilayer perceptrons (MLPs) is employed as a committee machine of neural network classifiers. The classification results are reached by combining the responses of the individual classifiers. Experiments involving changes in the learning algorithm of the committee machine are conducted. The classification accuracy is evaluated using the area A. under the receiver operating characteristics (ROC) curve. The A, result for the committee machine is compared with the A, results obtained using MLPs and single-layer perceptrons (SLPs), as well as a linear discriminant analysis (LDA) classifier Tests are carried out using the student's t-distribution. The committee machine classifier outperforms the MLP SLP, and LDA classifiers in the following cases: with the shape measure of spiculation index, the A, values of the four methods are, in order 0.93, 0.84, 0.75, and 0.76; and with the edge-sharpness measure of acutance, the values are 0.79, 0.70, 0.69, and 0.74. Although the features with which improvement is obtained with the committee machines are not the same as those that provided the maximal value of A(z) (A(z) = 0.99 with some shape features, with or without the committee machine), they correspond to features that are not critically dependent on the accuracy of the boundaries of the masses, which is an important result. (c) 2008 SPIE and IS&T.
Resumo:
Synchronization plays an important role in telecommunication systems, integrated circuits, and automation systems. Formerly, the masterslave synchronization strategy was used in the great majority of cases due to its reliability and simplicity. Recently, with the wireless networks development, and with the increase of the operation frequency of integrated circuits, the decentralized clock distribution strategies are gaining importance. Consequently, fully connected clock distribution systems with nodes composed of phase-locked loops (PLLs) appear as a convenient engineering solution. In this work, the stability of the synchronous state of these networks is studied in two relevant situations: when the node filters are first-order lag-lead low-pass or when the node filters are second-order low-pass. For first-order filters, the synchronous state of the network shows to be stable for any number of nodes. For second-order filter, there is a superior limit for the number of nodes, depending on the PLL parameters. Copyright (C) 2009 Atila Madureira Bueno et al.