11 resultados para Dirichlet-multinomial

em Universidad Politécnica de Madrid


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson’s Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson’s patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson’s disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The implementation of a charging policy for heavy goods vehicles in European Union (EU) member countries has been imposed to reflect costs of construction and maintenance of infrastructure as well as externalities such as congestion, accidents and environmental impact. In this context, EU countries approved the Eurovignette directive (1999/62/EC) and its amending directive (2006 /38/EC) which established a legal framework to regulate the system of tolls. Even if that regulation seek s to increase the efficien cy of freight, it will trigger direct and indirect effects on Spain’s regional economies by increasing transport costs. This paper presents the development of a multiregional Input-Output methodology (MRIO) with elastic trade coefficients to predict in terregional trade, using transport attributes integrated in multinomial logit models. This method is highly useful to carry out an ex-ante evaluation of transport policies because it involves road freight transport cost sensitivity, and determine regional distributive and substitution economic effect s of countries like Spain, characterized by socio-demographic and economic attributes, differentiated region by region. It will thus be possible to determine cost-effective strategies, given different policy scenarios. MRIO mode l would then be used to determine the impact on the employment rate of imposing a charge in the Madrid-Sevilla corridor in Spain. This methodology is important for measuring the impact on the employment rate since it is one of the main macroeconomic indicators of Spain’s regional and national economic situation. A previous research developed (DESTINO) using a MRIO method estimated employment impacts of road pricing policy across Spanish regions considering a fuel tax charge (€/liter) in the entire shortest cost path network for freight transport. Actually, it found that the variation in employment is expected to be substantial for some regions, and negligible for others. For example, in this Spanish case study of regional employment has showed reductions between 16.1% (Rioja) and 1.4% (Madrid region). This variation range seems to be related to either the intensity of freight transport in each region or dependency of regions to transport intensive economic sect ors. In fact, regions with freight transport intensive sectors will lose more jobs while regions with a predominantly service economy undergo a fairly insignificant loss of employment. This paper is focused on evaluating a freight transport vehicle-kilometer charge (€/km) in a non-tolled motorway corridor (A-4) between Madrid-Sevilla (517 Km.). The consequences of the road pricing policy implementation show s that the employment reductions are not as high as the diminution stated in the previous research because this corridor does not affect the whole freight transport system of Spain.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Twitter lists organise Twitter users into multiple, often overlapping, sets. We believe that these lists capture some form of emergent semantics, which may be useful to characterise. In this paper we describe an approach for such characterisation, which consists of deriving semantic relations between lists and users by analyzing the cooccurrence of keywords in list names. We use the vector space model and Latent Dirichlet Allocation to obtain similar keywords according to co-occurrence patterns. These results are then compared to similarity measures relying on WordNet and to existing Linked Data sets. Results show that co-occurrence of keywords based on members of the lists produce more synonyms and more correlated results to that of WordNet similarity measures.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

La ley para la Promoción y Desarrollo de Biocombustibles aprobada en México en 2007 permite la producción de bioetanol y biodiesel. Esta producción puede entrar en conflicto con la producción de alimentos y con los ecosistemas naturales y en esta tesis se desarrolla un modelo microeconométrico que puede servir de base para anticiparse a esos conflictos y para diseñar medidas de política agraria orientadas a potenciar la compatibilidad de la producción de biocombustibles con la de alimentos y con la conservación de los ecosistemas naturales. A partir de una muestra de explotaciones de tres Estados de México – Hidalgo, Querétaro y Tamaulipas- y de un modelo logit multinomial mixto, se estima la elasticidad de la superficie destinada a cultivos alimentarios respecto a cambios en los márgenes económicos de los cultivos agroenergéticos. Esa elasticidad resulta ser significativa. Mostramos que su estimación es útil para anticipar cambios en la superficie destinada a los cultivos alimentarios y a los forestales. Se evalúa el impacto de varios escenarios relativos a los márgenes brutos de los cultivos sobre las decisiones de los agricultores y se muestra la utilidad del modelo para detectar tendencias de cambio a largo plazo en la alternativa de cultivos, incluyendo los forestales. ABSTRACT The Law for the Promotion and Development of Biofuels in Mexico adopted in 2007 allows for the production of bioethanol and biodiesel. This production may conflict with food production and natural ecosystems and this thesis develops a microeconometric model that can serve as a basis to anticipate such conflicts and to implement agricultural policy measures designed to enhance the compatibility of biofuels with production food and natural ecosystems conservation. We estimate the elasticity of the area devoted to food crops with respect to changes in economic margins of energy crops, using a sample of farms in three states of Mexico - Hidalgo, Queretaro and Tamaulipas - , and a multinomial mixed logit model. We found that this elasticity is significant. And we show how it can be useful to anticipate changes in area under food crops and forests. The impact of various scenarios about gross margins on farmers' decisions is assessed and it is shown the usefulness of the model to detect trends of long-term change in the crops area, including forests.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A general theory that describes the B.I.E. linear approximation in potential and elasticity problems, is developed. A method to tread the Dirichlet condition in sharp vertex is presented. Though the study is developed for linear elements, its extension to higher order interpolation is straightforward. A new direct assembling procedure of the global of equations to be solved, is finally showed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

En este proyecto se trata la simulación numérica de un fenómeno dinámico, basado en el comportamiento de una onda transmitida a lo largo de una cuerda elástica de un instrumento musical, cuyos extremos se encuentran anclados. El fenómeno físico, se desarrolla utilizando una ecuación en derivadas parciales hiperbólicas con variables espacial y temporal, acompañada por unas condiciones de contorno tipo Dirichlet en los extremos y por más condiciones iniciales que dan comienzo al proceso. Posteriormente se han generado algoritmos para el método numérico empleado (Diferencias finitas centrales y progresivas) y la programación del problema aproximado con su consistencia, estabilidad y convergencia, obteniéndose unos resultados acordes con la solución analítica del problema matemático. La programación y salida de resultados se ha realizado con Visual Studio 8.0. y la programación de objetos con Visual Basic .Net In this project the topic is the numerical simulation of a dynamic phenomenon, based on the behavior of a transmitted wave along an elastic string of a musical instrument, whose ends are anchored. The physical phenomenon is developed using a hyperbolic partial differential equation with spatial and temporal variables, accompanied by a Dirichlet boundary conditions at the ends and more initial conditions that start the process. Subsequently generated algorithms for the numerical method used (central and forward finite differences) and the programming of the approximate problem with consistency, stability and convergence, yielding results in line with the analytical solution of the mathematical problem. Programming and output results has been made with Visual Studio 8.0. and object programming with Visual Basic. Net

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes our participation at the RepLab 2014 reputation dimensions scenario. Our idea was to evaluate the best combination strategy of a machine learning classifier with a rule-based algorithm based on logical expressions of terms. Results show that our baseline experiment using just Naive Bayes Multinomial with a term vector model representation of the tweet text is ranked second among runs from all participants in terms of accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes our participation at PAN 2014 author profiling task. Our idea was to define, develop and evaluate a simple machine learning classifier able to guess the gender and the age of a given user based on his/her texts, which could become part of the solution portfolio of the company. We were interested in finding not the best possible classifier that achieves the highest accuracy, but to find the optimum balance between performance and throughput using the most simple strategy and less dependent of external systems. Results show that our software using Naive Bayes Multinomial with a term vector model representation of the text is ranked quite well among the rest of participants in terms of accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Esta tesis aborda la formulación, análisis e implementación de métodos numéricos de integración temporal para la solución de sistemas disipativos suaves de dimensión finita o infinita de manera que su estructura continua sea conservada. Se entiende por dichos sistemas aquellos que involucran acoplamiento termo-mecánico y/o efectos disipativos internos modelados por variables internas que siguen leyes continuas, de modo que su evolución es considerada suave. La dinámica de estos sistemas está gobernada por las leyes de la termodinámica y simetrías, las cuales constituyen la estructura que se pretende conservar de forma discreta. Para ello, los sistemas disipativos se describen geométricamente mediante estructuras metriplécticas que identifican claramente las partes reversible e irreversible de la evolución del sistema. Así, usando una de estas estructuras conocida por las siglas (en inglés) de GENERIC, la estructura disipativa de los sistemas es identificada del mismo modo que lo es la Hamiltoniana para sistemas conservativos. Con esto, métodos (EEM) con precisión de segundo orden que conservan la energía, producen entropía y conservan los impulsos lineal y angular son formulados mediante el uso del operador derivada discreta introducido para asegurar la conservación de la Hamiltoniana y las simetrías de sistemas conservativos. Siguiendo estas directrices, se formulan dos tipos de métodos EEM basados en el uso de la temperatura o de la entropía como variable de estado termodinámica, lo que presenta importantes implicaciones que se discuten a lo largo de esta tesis. Entre las cuales cabe destacar que las condiciones de contorno de Dirichlet son naturalmente impuestas con la formulación basada en la temperatura. Por último, se validan dichos métodos y se comprueban sus mejores prestaciones en términos de la estabilidad y robustez en comparación con métodos estándar. This dissertation is concerned with the formulation, analysis and implementation of structure-preserving time integration methods for the solution of the initial(-boundary) value problems describing the dynamics of smooth dissipative systems, either finite- or infinite-dimensional ones. Such systems are understood as those involving thermo-mechanical coupling and/or internal dissipative effects modeled by internal state variables considered to be smooth in the sense that their evolutions follow continuos laws. The dynamics of such systems are ruled by the laws of thermodynamics and symmetries which constitutes the structure meant to be preserved in the numerical setting. For that, dissipative systems are geometrically described by metriplectic structures which clearly identify the reversible and irreversible parts of their dynamical evolution. In particular, the framework known by the acronym GENERIC is used to reveal the systems' dissipative structure in the same way as the Hamiltonian is for conserving systems. Given that, energy-preserving, entropy-producing and momentum-preserving (EEM) second-order accurate methods are formulated using the discrete derivative operator that enabled the formulation of Energy-Momentum methods ensuring the preservation of the Hamiltonian and symmetries for conservative systems. Following these guidelines, two kind of EEM methods are formulated in terms of entropy and temperature as a thermodynamical state variable, involving important implications discussed throughout the dissertation. Remarkably, the formulation in temperature becomes central to accommodate Dirichlet boundary conditions. EEM methods are finally validated and proved to exhibit enhanced numerical stability and robustness properties compared to standard ones.