Biblioteca Digital

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Veja mais

Assembling a consistent set of sentences in relational probabilistic logic with stochastic independence

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We examine the representation of judgements of stochastic independence in probabilistic logics. We focus on a relational logic where (i) judgements of stochastic independence are encoded by directed acyclic graphs, and (ii) probabilistic assessments are flexible in the sense that they are not required to specify a single probability measure. We discuss issues of knowledge representation and inference that arise from our particular combination of graphs, stochastic independence, logical formulas and probabilistic assessments. (C) 2007 Elsevier B.V. All rights reserved.

Veja mais

Approximate algorithms for credal networks with binary variables

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a family of algorithms for approximate inference in credal networks (that is, models based on directed acyclic graphs and set-valued probabilities) that contain only binary variables. Such networks can represent incomplete or vague beliefs, lack of data, and disagreements among experts; they can also encode models based on belief functions and possibilistic measures. All algorithms for approximate inference in this paper rely on exact inferences in credal networks based on polytrees with binary variables, as these inferences have polynomial complexity. We are inspired by approximate algorithms for Bayesian networks; thus the Loopy 2U algorithm resembles Loopy Belief Propagation, while the Iterated Partial Evaluation and Structured Variational 2U algorithms are, respectively, based on Localized Partial Evaluation and variational techniques. (C) 2007 Elsevier Inc. All rights reserved.

Veja mais

Mesures subjectives et épidémiologie : problèmes méthodologiques liés à l'utilisation des techniques psychométriques

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’utilisation des mesures subjectives en épidémiologie s’est intensifiée récemment, notamment avec la volonté de plus en plus affirmée d’intégrer la perception qu’ont les sujets de leur santé dans l’étude des maladies et l’évaluation des interventions. La psychométrie regroupe les méthodes statistiques utilisées pour la construction des questionnaires et l’analyse des données qui en sont issues. Ce travail de thèse avait pour but d’explorer différents problèmes méthodologiques soulevés par l’utilisation des techniques psychométriques en épidémiologie. Trois études empiriques sont présentées et concernent 1/ la phase de validation de l’instrument : l’objectif était de développer, à l’aide de données simulées, un outil de calcul de la taille d’échantillon pour la validation d’échelle en psychiatrie ; 2/ les propriétés mathématiques de la mesure obtenue : l’objectif était de comparer les performances de la différence minimale cliniquement pertinente d’un questionnaire calculée sur des données de cohorte, soit dans le cadre de la théorie classique des tests (CTT), soit dans celui de la théorie de réponse à l’item (IRT) ; 3/ son utilisation dans un schéma longitudinal : l’objectif était de comparer, à l’aide de données simulées, les performances d’une méthode statistique d’analyse de l’évolution longitudinale d’un phénomène subjectif mesuré à l’aide de la CTT ou de l’IRT, en particulier lorsque certains items disponibles pour la mesure différaient à chaque temps. Enfin, l’utilisation de graphes orientés acycliques a permis de discuter, à l’aide des résultats de ces trois études, la notion de biais d’information lors de l’utilisation des mesures subjectives en épidémiologie.

Veja mais

A feasible theory of truth over combinatory algebra

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We define an applicative theory of truth TPT which proves totality exactly for the polynomial time computable functions. TPT has natural and simple axioms since nearly all its truth axioms are standard for truth theories over an applicative framework. The only exception is the axiom dealing with the word predicate. The truth predicate can only reflect elementhood in the words for terms that have smaller length than a given word. This makes it possible to achieve the very low proof-theoretic strength. Truth induction can be allowed without any constraints. For these reasons the system TPT has the high expressive power one expects from truth theories. It allows embeddings of feasible systems of explicit mathematics and bounded arithmetic. The proof that the theory TPT is feasible is not easy. It is not possible to apply a standard realisation approach. For this reason we develop a new realisation approach whose realisation functions work on directed acyclic graphs. In this way, we can express and manipulate realisation information more efficiently.

Veja mais

Learning an L1-regularized Gaussian Bayesian Network in the Equivalence Class Space

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning the structure of a graphical model from data is a common task in a wide range of practical applications. In this paper, we focus on Gaussian Bayesian networks, i.e., on continuous data and directed acyclic graphs with a joint probability density of all variables given by a Gaussian. We propose to work in an equivalence class search space, specifically using the k-greedy equivalence search algorithm. This, combined with regularization techniques to guide the structure search, can learn sparse networks close to the one that generated the data. We provide results on some synthetic networks and on modeling the gene network of the two biological pathways regulating the biosynthesis of isoprenoids for the Arabidopsis thaliana plant

Veja mais

Bayesian nonparametric latent variable models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.

Veja mais

La cultura y el liderazgo en la estrategia de crecimiento por exportaciones del modelo de desarrollo japonés (1960-1980)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Normalmente el desarrollo de un país se ha explicado desde una perspectiva tradicional en términos de su crecimiento económico, teniendo en cuenta indicadores macroeconómicos como el PIB, la inflación y el desempleo. Poca atención se le ha puesto a la importancia que para el desarrollo de un país representan el capital humano y el proceso de liderazgo. Debido a lo anterior, mediante este estudio de caso, se pretende entender el éxito de la estrategia de crecimiento por exportaciones de Japón entre los años 1960-1980 teniendo en cuenta estos aspectos. Así, se busca sustentar que la incorporación de un tipo de liderazgo transformacional- transaccional y los elementos propios de su cultura como el confucianismo y el budismo, le imprimieron una perspectiva no economicista al éxito del modelo de desarrollo como parte de la triada empresa-estado-universidad. Lo anterior se realizará partiendo de un análisis cualitativo y con un enfoque en la economía política internacional y en el liderazgo. Este último estudiado desde las disciplinas de la administración, la sociología y la psicología

Veja mais

PARALLEL ALGORITHMS FOR MAXIMAL CLIQUES IN CIRCLE GRAPHS AND UNRESTRICTED DEPTH SEARCH

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present parallel algorithms on the BSP/CGM model, with p processors, to count and generate all the maximal cliques of a circle graph with n vertices and m edges. To count the number of all the maximal cliques, without actually generating them, our algorithm requires O(log p) communication rounds with O(nm/p) local computation time. We also present an algorithm to generate the first maximal clique in O(log p) communication rounds with O(nm/p) local computation, and to generate each one of the subsequent maximal cliques this algorithm requires O(log p) communication rounds with O(m/p) local computation. The maximal cliques generation algorithm is based on generating all maximal paths in a directed acyclic graph, and we present an algorithm for this problem that uses O(log p) communication rounds with O(m/p) local computation for each maximal path. We also show that the presented algorithms can be extended to the CREW PRAM model.

Veja mais

Directed Abelian algebras and their application to stochastic models

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With each directed acyclic graph (this includes some D-dimensional lattices) one can associate some Abelian algebras that we call directed Abelian algebras (DAAs). On each site of the graph one attaches a generator of the algebra. These algebras depend on several parameters and are semisimple. Using any DAA, one can define a family of Hamiltonians which give the continuous time evolution of a stochastic process. The calculation of the spectra and ground-state wave functions (stationary state probability distributions) is an easy algebraic exercise. If one considers D-dimensional lattices and chooses Hamiltonians linear in the generators, in finite-size scaling the Hamiltonian spectrum is gapless with a critical dynamic exponent z=D. One possible application of the DAA is to sandpile models. In the paper we present this application, considering one- and two-dimensional lattices. In the one-dimensional case, when the DAA conserves the number of particles, the avalanches belong to the random walker universality class (critical exponent sigma(tau)=3/2). We study the local density of particles inside large avalanches, showing a depletion of particles at the source of the avalanche and an enrichment at its end. In two dimensions we did extensive Monte-Carlo simulations and found sigma(tau)=1.780 +/- 0.005.

Veja mais

938 resultados para Directed acyclic graphs

Filtro por publicador