964 resultados para graph-based regularization
Resumo:
The advancement of science and technology makes it clear that no single perspective is any longer sufficient to describe the true nature of any phenomenon. That is why the interdisciplinary research is gaining more attention overtime. An excellent example of this type of research is natural computing which stands on the borderline between biology and computer science. The contribution of research done in natural computing is twofold: on one hand, it sheds light into how nature works and how it processes information and, on the other hand, it provides some guidelines on how to design bio-inspired technologies. The first direction in this thesis focuses on a nature-inspired process called gene assembly in ciliates. The second one studies reaction systems, as a modeling framework with its rationale built upon the biochemical interactions happening within a cell. The process of gene assembly in ciliates has attracted a lot of attention as a research topic in the past 15 years. Two main modelling frameworks have been initially proposed in the end of 1990s to capture ciliates’ gene assembly process, namely the intermolecular model and the intramolecular model. They were followed by other model proposals such as templatebased assembly and DNA rearrangement pathways recombination models. In this thesis we are interested in a variation of the intramolecular model called simple gene assembly model, which focuses on the simplest possible folds in the assembly process. We propose a new framework called directed overlap-inclusion (DOI) graphs to overcome the limitations that previously introduced models faced in capturing all the combinatorial details of the simple gene assembly process. We investigate a number of combinatorial properties of these graphs, including a necessary property in terms of forbidden induced subgraphs. We also introduce DOI graph-based rewriting rules that capture all the operations of the simple gene assembly model and prove that they are equivalent to the string-based formalization of the model. Reaction systems (RS) is another nature-inspired modeling framework that is studied in this thesis. Reaction systems’ rationale is based upon two main regulation mechanisms, facilitation and inhibition, which control the interactions between biochemical reactions. Reaction systems is a complementary modeling framework to traditional quantitative frameworks, focusing on explicit cause-effect relationships between reactions. The explicit formulation of facilitation and inhibition mechanisms behind reactions, as well as the focus on interactions between reactions (rather than dynamics of concentrations) makes their applicability potentially wide and useful beyond biological case studies. In this thesis, we construct a reaction system model corresponding to the heat shock response mechanism based on a novel concept of dominance graph that captures the competition on resources in the ODE model. We also introduce for RS various concepts inspired by biology, e.g., mass conservation, steady state, periodicity, etc., to do model checking of the reaction systems based models. We prove that the complexity of the decision problems related to these properties varies from P to NP- and coNP-complete to PSPACE-complete. We further focus on the mass conservation relation in an RS and introduce the conservation dependency graph to capture the relation between the species and also propose an algorithm to list the conserved sets of a given reaction system.
Resumo:
The Baltic Sea is a unique environment that contains unique genetic populations. In order to study these populations on a genetic level basic molecular research is needed. The aim of this thesis was to provide a basic genetic resource for population genomic studies by de novo assembling a transcriptome for the Baltic Sea isopod Idotea balthica. RNA was extracted from a whole single adult male isopod and sequenced using Illumina (125bp PE) RNA-Seq. The reads were preprocessed using FASTQC for quality control, TRIMMOMATIC for trimming, and RCORRECTOR for error correction. The preprocessed reads were then assembled with TRINITY, a de Bruijn graph-based assembler, using different k-mer sizes. The different assemblies were combined and clustered using CD-HIT. The assemblies were evaluated using TRANSRATE for quality and filtering, BUSCO for completeness, and TRANSDECODER for annotation potential. The 25-mer assembly was annotated using PANNZER (protein annotation with z-score) and BLASTX. The 25-mer assembly represents the best first draft assembly since it contains the most information. However, this assembly shows high levels of polymorphism, which currently cannot be differentiated as paralogs or allelic variants. Furthermore, this assembly is incomplete, which could be improved by sampling additional developmental stages.
Resumo:
Récemment, nous avons pu observer un intérêt grandissant pour l'application de l'analogie formelle à l'analyse morphologique. L'intérêt premier de ce concept repose sur ses parallèles avec le processus mental impliqué dans la création de nouveaux termes basée sur les relations morphologiques préexistantes de la langue. Toutefois, l'utilisation de ce concept reste tout de même marginale due notamment à son coût de calcul élevé.Dans ce document, nous présenterons le système à base de graphe Moranapho fondé sur l'analogie formelle. Nous démontrerons par notre participation au Morpho Challenge 2009 (Kurimo:10) et nos expériences subséquentes, que la qualité des analyses obtenues par ce système rivalise avec l'état de l'art. Nous analyserons aussi l'influence de certaines de ses composantes sur la qualité des analyses morphologiques produites. Nous appuierons les conclusions tirées de nos analyses sur des théories bien établies dans le domaine de la linguistique. Ceci nous permet donc de fournir certaines prédictions sur les succès et les échecs de notre système, lorsqu'appliqué à d'autres langues que celles testées au cours de nos expériences.
Resumo:
Malgré des progrès constants en termes de capacité de calcul, mémoire et quantité de données disponibles, les algorithmes d'apprentissage machine doivent se montrer efficaces dans l'utilisation de ces ressources. La minimisation des coûts est évidemment un facteur important, mais une autre motivation est la recherche de mécanismes d'apprentissage capables de reproduire le comportement d'êtres intelligents. Cette thèse aborde le problème de l'efficacité à travers plusieurs articles traitant d'algorithmes d'apprentissage variés : ce problème est vu non seulement du point de vue de l'efficacité computationnelle (temps de calcul et mémoire utilisés), mais aussi de celui de l'efficacité statistique (nombre d'exemples requis pour accomplir une tâche donnée). Une première contribution apportée par cette thèse est la mise en lumière d'inefficacités statistiques dans des algorithmes existants. Nous montrons ainsi que les arbres de décision généralisent mal pour certains types de tâches (chapitre 3), de même que les algorithmes classiques d'apprentissage semi-supervisé à base de graphe (chapitre 5), chacun étant affecté par une forme particulière de la malédiction de la dimensionalité. Pour une certaine classe de réseaux de neurones, appelés réseaux sommes-produits, nous montrons qu'il peut être exponentiellement moins efficace de représenter certaines fonctions par des réseaux à une seule couche cachée, comparé à des réseaux profonds (chapitre 4). Nos analyses permettent de mieux comprendre certains problèmes intrinsèques liés à ces algorithmes, et d'orienter la recherche dans des directions qui pourraient permettre de les résoudre. Nous identifions également des inefficacités computationnelles dans les algorithmes d'apprentissage semi-supervisé à base de graphe (chapitre 5), et dans l'apprentissage de mélanges de Gaussiennes en présence de valeurs manquantes (chapitre 6). Dans les deux cas, nous proposons de nouveaux algorithmes capables de traiter des ensembles de données significativement plus grands. Les deux derniers chapitres traitent de l'efficacité computationnelle sous un angle différent. Dans le chapitre 7, nous analysons de manière théorique un algorithme existant pour l'apprentissage efficace dans les machines de Boltzmann restreintes (la divergence contrastive), afin de mieux comprendre les raisons qui expliquent le succès de cet algorithme. Finalement, dans le chapitre 8 nous présentons une application de l'apprentissage machine dans le domaine des jeux vidéo, pour laquelle le problème de l'efficacité computationnelle est relié à des considérations d'ingénierie logicielle et matérielle, souvent ignorées en recherche mais ô combien importantes en pratique.
Resumo:
The assessment of routing protocols for mobile wireless networks is a difficult task, because of the networks` dynamic behavior and the absence of benchmarks. However, some of these networks, such as intermittent wireless sensors networks, periodic or cyclic networks, and some delay tolerant networks (DTNs), have more predictable dynamics, as the temporal variations in the network topology can be considered as deterministic, which may make them easier to study. Recently, a graph theoretic model-the evolving graphs-was proposed to help capture the dynamic behavior of such networks, in view of the construction of least cost routing and other algorithms. The algorithms and insights obtained through this model are theoretically very efficient and intriguing. However, there is no study about the use of such theoretical results into practical situations. Therefore, the objective of our work is to analyze the applicability of the evolving graph theory in the construction of efficient routing protocols in realistic scenarios. In this paper, we use the NS2 network simulator to first implement an evolving graph based routing protocol, and then to use it as a benchmark when comparing the four major ad hoc routing protocols (AODV, DSR, OLSR and DSDV). Interestingly, our experiments show that evolving graphs have the potential to be an effective and powerful tool in the development and analysis of algorithms for dynamic networks, with predictable dynamics at least. In order to make this model widely applicable, however, some practical issues still have to be addressed and incorporated into the model, like adaptive algorithms. We also discuss such issues in this paper, as a result of our experience.
Resumo:
The authors take a broad view that ultimately Grid- or Web-services must be located via personalised, semantic-rich discovery processes. They argue that such processes must rely on the storage of arbitrary metadata about services that originates from both service providers and service users. Examples of such metadata are reliability metrics, quality of service data, or semantic service description markup. This paper presents UDDI-MT, an extension to the standard UDDI service directory approach that supports the storage of such metadata via a tunnelling technique that ties the metadata store to the original UDDI directory. They also discuss the use of a rich, graph-based RDF query language for syntactic queries on this data. Finally, they analyse the performance of each of these contributions in our implementation.
Resumo:
We take a broad view that ultimately Grid- or Web-services must be located via personalised, semantic-rich discovery processes. We argue that such processes must rely on the storage of arbitrary metadata about services that originates from both service providers and service users. Examples of such metadata are reliability metrics, quality of service data, or semantic service description markup. This paper presents UDDI-MT, an extension to the standard UDDI service directory approach that supports the storage of such metadata via a tunnelling technique that ties the metadata store to the original UDDI directory. We also discuss the use of a rich, graph-based RDF query language for syntactic queries on this data. Finally, we analyse the performance of each of these contributions in our implementation.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This paper addresses biometric identification using large databases, in particular, iris databases. In such applications, it is critical to have low response time, while maintaining an acceptable recognition rate. Thus, the trade-off between speed and accuracy must be evaluated for processing and recognition parts of an identification system. In this paper, a graph-based framework for pattern recognition, called Optimum-Path Forest (OPF), is utilized as a classifier in a pre-developed iris recognition system. The aim of this paper is to verify the effectiveness of OPF in the field of iris recognition, and its performance for various scale iris databases. The existing Gauss-Laguerre Wavelet based coding scheme is used for iris encoding. The performance of the OPF and two other - Hamming and Bayesian - classifiers, is compared using small, medium, and large-scale databases. Such a comparison shows that the OPF has faster response for large-scale databases, thus performing better than the more accurate, but slower, classifiers.
Resumo:
Concept drift is a problem of increasing importance in machine learning and data mining. Data sets under analysis are no longer only static databases, but also data streams in which concepts and data distributions may not be stable over time. However, most learning algorithms produced so far are based on the assumption that data comes from a fixed distribution, so they are not suitable to handle concept drifts. Moreover, some concept drifts applications requires fast response, which means an algorithm must always be (re) trained with the latest available data. But the process of labeling data is usually expensive and/or time consuming when compared to unlabeled data acquisition, thus only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are also based on the assumption that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenge in machine learning. Recently, a particle competition and cooperation approach was used to realize graph-based semi-supervised learning from static data. In this paper, we extend that approach to handle data streams and concept drift. The result is a passive algorithm using a single classifier, which naturally adapts to concept changes, without any explicit drift detection mechanism. Its built-in mechanisms provide a natural way of learning from new data, gradually forgetting older knowledge as older labeled data items became less influent on the classification of newer data items. Some computer simulation are presented, showing the effectiveness of the proposed method.
Resumo:
Majority of biometric researchers focus on the accuracy of matching using biometrics databases, including iris databases, while the scalability and speed issues have been neglected. In the applications such as identification in airports and borders, it is critical for the identification system to have low-time response. In this paper, a graph-based framework for pattern recognition, called Optimum-Path Forest (OPF), is utilized as a classifier in a pre-developed iris recognition system. The aim of this paper is to verify the effectiveness of OPF in the field of iris recognition, and its performance for various scale iris databases. This paper investigates several classifiers, which are widely used in iris recognition papers, and the response time along with accuracy. The existing Gauss-Laguerre Wavelet based iris coding scheme, which shows perfect discrimination with rotary Hamming distance classifier, is used for iris coding. The performance of classifiers is compared using small, medium, and large scale databases. Such comparison shows that OPF has faster response for large scale database, thus performing better than more accurate but slower Bayesian classifier.
Resumo:
The increase of computing power of the microcomputers has stimulated the building of direct manipulation interfaces that allow graphical representation of Linear Programming (LP) models. This work discusses the components of such a graphical interface as the basis for a system to assist users in the process of formulating LP problems. In essence, this work proposes a methodology which considers the modelling task as divided into three stages which are specification of the Data Model, the Conceptual Model and the LP Model. The necessity for using Artificial Intelligence techniques in the problem conceptualisation and to help the model formulation task is illustrated.