988 resultados para Inference Technique
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
Efficient new Bayesian inference technique is employed for studying critical properties of the Ising linear perceptron and for signal detection in code division multiple access (CDMA). The approach is based on a recently introduced message passing technique for densely connected systems. Here we study both critical and non-critical regimes. Results obtained in the non-critical regime give rise to a highly efficient signal detection algorithm in the context of CDMA; while in the critical regime one observes a first-order transition line that ends in a continuous phase transition point. Finite size effects are also studied. © 2006 Elsevier B.V. All rights reserved.
Resumo:
Recommender systems are one of the recent inventions to deal with ever growing information overload in relation to the selection of goods and services in a global economy. Collaborative Filtering (CF) is one of the most popular techniques in recommender systems. The CF recommends items to a target user based on the preferences of a set of similar users known as the neighbours, generated from a database made up of the preferences of past users. With sufficient background information of item ratings, its performance is promising enough but research shows that it performs very poorly in a cold start situation where there is not enough previous rating data. As an alternative to ratings, trust between the users could be used to choose the neighbour for recommendation making. Better recommendations can be achieved using an inferred trust network which mimics the real world "friend of a friend" recommendations. To extend the boundaries of the neighbour, an effective trust inference technique is required. This thesis proposes a trust interference technique called Directed Series Parallel Graph (DSPG) which performs better than other popular trust inference algorithms such as TidalTrust and MoleTrust. Another problem is that reliable explicit trust data is not always available. In real life, people trust "word of mouth" recommendations made by people with similar interests. This is often assumed in the recommender system. By conducting a survey, we can confirm that interest similarity has a positive relationship with trust and this can be used to generate a trust network for recommendation. In this research, we also propose a new method called SimTrust for developing trust networks based on user's interest similarity in the absence of explicit trust data. To identify the interest similarity, we use user's personalised tagging information. However, we are interested in what resources the user chooses to tag, rather than the text of the tag applied. The commonalities of the resources being tagged by the users can be used to form the neighbours used in the automated recommender system. Our experimental results show that our proposed tag-similarity based method outperforms the traditional collaborative filtering approach which usually uses rating data.
Resumo:
We formulate the problem of detecting the constituent instruments in a polyphonic music piece as a joint decoding problem. From monophonic data, parametric Gaussian Mixture Hidden Markov Models (GM-HMM) are obtained for each instrument. We propose a method to use the above models in a factorial framework, termed as Factorial GM-HMM (F-GM-HMM). The states are jointly inferred to explain the evolution of each instrument in the mixture observation sequence. The dependencies are decoupled using variational inference technique. We show that the joint time evolution of all instruments' states can be captured using F-GM-HMM. We compare performance of proposed method with that of Student's-t mixture model (tMM) and GM-HMM in an existing latent variable framework. Experiments on two to five polyphony with 8 instrument models trained on the RWC dataset, tested on RWC and TRIOS datasets show that F-GM-HMM gives an advantage over the other considered models in segments containing co-occurring instruments.
Resumo:
We study the behavior of granular materials at three length scales. At the smallest length scale, the grain-scale, we study inter-particle forces and "force chains". Inter-particle forces are the natural building blocks of constitutive laws for granular materials. Force chains are a key signature of the heterogeneity of granular systems. Despite their fundamental importance for calibrating grain-scale numerical models and elucidating constitutive laws, inter-particle forces have not been fully quantified in natural granular materials. We present a numerical force inference technique for determining inter-particle forces from experimental data and apply the technique to two-dimensional and three-dimensional systems under quasi-static and dynamic load. These experiments validate the technique and provide insight into the quasi-static and dynamic behavior of granular materials.
At a larger length scale, the mesoscale, we study the emergent frictional behavior of a collection of grains. Properties of granular materials at this intermediate scale are crucial inputs for macro-scale continuum models. We derive friction laws for granular materials at the mesoscale by applying averaging techniques to grain-scale quantities. These laws portray the nature of steady-state frictional strength as a competition between steady-state dilation and grain-scale dissipation rates. The laws also directly link the rate of dilation to the non-steady-state frictional strength.
At the macro-scale, we investigate continuum modeling techniques capable of simulating the distinct solid-like, liquid-like, and gas-like behaviors exhibited by granular materials in a single computational domain. We propose a Smoothed Particle Hydrodynamics (SPH) approach for granular materials with a viscoplastic constitutive law. The constitutive law uses a rate-dependent and dilation-dependent friction law. We provide a theoretical basis for a dilation-dependent friction law using similar analysis to that performed at the mesoscale. We provide several qualitative and quantitative validations of the technique and discuss ongoing work aiming to couple the granular flow with gas and fluid flows.
Resumo:
In many real world situations, we make decisions in the presence of multiple, often conflicting and non-commensurate objectives. The process of optimizing systematically and simultaneously over a set of objective functions is known as multi-objective optimization. In multi-objective optimization, we have a (possibly exponentially large) set of decisions and each decision has a set of alternatives. Each alternative depends on the state of the world, and is evaluated with respect to a number of criteria. In this thesis, we consider the decision making problems in two scenarios. In the first scenario, the current state of the world, under which the decisions are to be made, is known in advance. In the second scenario, the current state of the world is unknown at the time of making decisions. For decision making under certainty, we consider the framework of multiobjective constraint optimization and focus on extending the algorithms to solve these models to the case where there are additional trade-offs. We focus especially on branch-and-bound algorithms that use a mini-buckets algorithm for generating the upper bound at each node of the search tree (in the context of maximizing values of objectives). Since the size of the guiding upper bound sets can become very large during the search, we introduce efficient methods for reducing these sets, yet still maintaining the upper bound property. We define a formalism for imprecise trade-offs, which allows the decision maker during the elicitation stage, to specify a preference for one multi-objective utility vector over another, and use such preferences to infer other preferences. The induced preference relation then is used to eliminate the dominated utility vectors during the computation. For testing the dominance between multi-objective utility vectors, we present three different approaches. The first is based on a linear programming approach, the second is by use of distance-based algorithm (which uses a measure of the distance between a point and a convex cone); the third approach makes use of a matrix multiplication, which results in much faster dominance checks with respect to the preference relation induced by the trade-offs. Furthermore, we show that our trade-offs approach, which is based on a preference inference technique, can also be given an alternative semantics based on the well known Multi-Attribute Utility Theory. Our comprehensive experimental results on common multi-objective constraint optimization benchmarks demonstrate that the proposed enhancements allow the algorithms to scale up to much larger problems than before. For decision making problems under uncertainty, we describe multi-objective influence diagrams, based on a set of p objectives, where utility values are vectors in Rp, and are typically only partially ordered. These can be solved by a variable elimination algorithm, leading to a set of maximal values of expected utility. If the Pareto ordering is used this set can often be prohibitively large. We consider approximate representations of the Pareto set based on ϵ-coverings, allowing much larger problems to be solved. In addition, we define a method for incorporating user trade-offs, which also greatly improves the efficiency.
Resumo:
The assimilation of discrete higher fidelity data points with model predictions can be used to achieve a reduction in the uncertainty of the model input parameters which generate accurate predictions. The problem investigated here involves the prediction of limit-cycle oscillations using a High-Dimensional Harmonic Balance method (HDHB). The efficiency of the HDHB method is exploited to enable calibration of structural input parameters using a Bayesian inference technique. Markov-chain Monte Carlo is employed to sample the posterior distributions. Parameter estimation is carried out on both a pitch/plunge aerofoil and Goland wing configuration. In both cases significant refinement was achieved in the distribution of possible structural parameters allowing better predictions of their
true deterministic values.
Resumo:
Les données provenant de l'échantillonnage fin d'un processus continu (champ aléatoire) peuvent être représentées sous forme d'images. Un test statistique permettant de détecter une différence entre deux images peut être vu comme un ensemble de tests où chaque pixel est comparé au pixel correspondant de l'autre image. On utilise alors une méthode de contrôle de l'erreur de type I au niveau de l'ensemble de tests, comme la correction de Bonferroni ou le contrôle du taux de faux-positifs (FDR). Des méthodes d'analyse de données ont été développées en imagerie médicale, principalement par Keith Worsley, utilisant la géométrie des champs aléatoires afin de construire un test statistique global sur une image entière. Il s'agit d'utiliser l'espérance de la caractéristique d'Euler de l'ensemble d'excursion du champ aléatoire sous-jacent à l'échantillon au-delà d'un seuil donné, pour déterminer la probabilité que le champ aléatoire dépasse ce même seuil sous l'hypothèse nulle (inférence topologique). Nous exposons quelques notions portant sur les champs aléatoires, en particulier l'isotropie (la fonction de covariance entre deux points du champ dépend seulement de la distance qui les sépare). Nous discutons de deux méthodes pour l'analyse des champs anisotropes. La première consiste à déformer le champ puis à utiliser les volumes intrinsèques et les compacités de la caractéristique d'Euler. La seconde utilise plutôt les courbures de Lipschitz-Killing. Nous faisons ensuite une étude de niveau et de puissance de l'inférence topologique en comparaison avec la correction de Bonferroni. Finalement, nous utilisons l'inférence topologique pour décrire l'évolution du changement climatique sur le territoire du Québec entre 1991 et 2100, en utilisant des données de température simulées et publiées par l'Équipe Simulations climatiques d'Ouranos selon le modèle régional canadien du climat.
Resumo:
A partir de uma adaptação da metodologia de Osler e Chang (1995), este trabalho avalia, empiricamente, a lucratividade de estratégias de investimento baseadas na identificação do padrão gráfico de Análise Técnica Ombro-Cabeça-Ombro no mercado de ações brasileiro. Para isso, foram definidas diversas estratégias de investimento condicionais à identificação de padrões Ombro-Cabeça- Ombro (em suas formas padrão e invertida), por um algoritmo computadorizado, em séries diárias de preços de 47 ações no período de janeiro de 1994 a agosto de 2006. Para testar o poder de previsão de cada estratégia, foram construídos intervalos de confiança, a partir da técnica Bootstrap de inferência amostral, consistentes com a hipótese nula de que, baseado apenas em dados históricos, não é possível criar estratégias com retornos positivos. Mais especificamente, os retornos médios obtidos por cada estratégia nas séries de preços das ações, foram comparados àqueles obtidos pelas mesmas estratégias aplicadas a 1.000 séries de preços artificiais - para cada ação - geradas segundo dois modelos de preços de ações largamente utilizados: Random Walk e E-GARCH. De forma geral, os resultados encontrados mostram que é possível criar estratégias condicionais à realização dos padrões Ombro- Cabeça-Ombro com retornos positivos, indicando que esses padrões conseguem capturar nas séries históricas de preços de ações sinais a respeito da sua movimentação futura de preços, que não são explicados nem por um Random Walk e nem por um E-GARCH. No entanto, se levados em consideração os efeitos das taxas e dos custos de transação, dependendo das suas magnitudes, essas conclusões somente se mantêm para o padrão na sua forma invertida
Resumo:
El objetivo de esta Tesis es crear un Modelo de Diseño Orientado a Marcos que, intermedio entre el Mundo Externo y el Modelo Interno del Mundo que supone el sistema ímplementado, disminuya la pérdida de conocimiento que se produce al formalizar la realidad en Bases de Conocimientos. El modelo disminuye la pérdida de conocimiento al formalizar Bases de Conocimiento, acercando el formalismo de Marcos al Mundo Externo, porque: 1. Crea una base teórica que uniformiza el concepto de Marco en el plano de la Formalización, estableciendo un conjunto de restricciones sintácticas y semánticas que impedirán, al Ingeniero del Conocimiento (IC) cuando formaliza, definir elementos no permitidos o el uso indebido de ellos. 2. Se incrementa la expresividad del formalismo al asociar a cada una de las propiedades de un marco clase un parámetro adicional que simboliza la representatividad de la propiedad en el concepto. Este parámetro, y las técnicas de inferencia que trabajan con él, permitirán al IC introducir en el Modelo Formalizado conocimiento que antes no introducía al construir la base de conocimientos y que, sin embargo, sí existía en la realidad. 3. Se propone una técnica de equiparación que trabaja con el conocimiento incierto presente en el dominio. Esta técnica de equiparación, utiliza la representatividad de las propiedades en los marcos clase y el grado de certeza de las propiedades de las entidades para calcular el valor de equiparación y, así, determinar en qué medida los marcos clase seleccionados son consistentes con la descripción de la situación actual dada por una entidad. 4. Proporciona nuevas técnicas de inferencia basadas en la transferencia de propiedades y modifica las ya existentes. Las transferencias de propiedades realizadas sobre relaciones "ad hoc" definidas por el IC al construir el sistema, es una nueva técnica de inferencia independiente y complementaria a la transferencia de propiedades llamada tradicionalmente Herencia (cesión de propiedades entre padres e hijos). A esta nueva técnica, se le ha llamado Donación, es decir, cesión de propiedades entre marcos sin parentesco. Como aportación práctica, se ha construido un entorno de construcción de Sistemas Basados en el Conocimiento formalizados en Marcos, donde se han introducido todos los nuevos conceptos del Modelo Teórico de la Tesis. Se trata de una cierta anidación. Es decir, son marcos que permiten formalizar cualquier SBC en marcos. El entorno permitirá al IC formalizar bases de conocimientos automáticamente y éste podrá validar el conocimiento del dominio en la fase de formalización en lugar de tener que esperar a que la BC esté implementada. Todo ello lleva a describir el Modelo de Diseño Orientado a Marcos como un puente que aproxima y comunica el Mundo Externo con el Modelo Interno asociado a la realidad e implementado en una computadora, disminuyendo así las diversas pérdidas de conocimiento que si bien no ocurren simultáneamente al construir Sistemas Basados en el Conocimiento, sí coexisten en él.---ABSTRACT---The goal of this thesis is to créate a Frame-Orlented Deslgn Model that, bridging the Outside World and the implemented system's Internal Model of the World, reduces the amount of knowledge lost when reality is formalized in Knowledge Bases (KB). The model diminishes the loss of knowledge when formalizing a KB and brings the Frame-formalized Model closer to the Outside World because: 1. It creates a theory that standardizes the concept of trame at the formalization level to establish a set of syntactic and semantic constraints that will prevent the Knowledge Engineer (KE) from defining forbidden elements or their undue use in the formalization process. 2. The formalism's expressiveness is increased by associating an additional parameter to each of the properties of a class frame to symbolize the representativeness of the concept property. This parameter and the related inference techniques will allow the KE to enter knowledge into the Formalized Model that actually existed but that was not used previously when building the KB. 3. The proposed technique involves matching and works with uncertain knowledge present in the domain. This matching technique takes the representativeness of the properties in the class frame and the degree of certainty of the properties of the entities to calcúlate the matching valué and thus determine to what extent the class frames selected are consistent with the description of the present situation given by an entity. 4. It offers new inference techniques based on property transfer and alters existing ones. Property transfer on ad hoc relations defined by the KE when building a system is a new inference technique independent of and complementary to property transfer traditionally termed Inheritance (transfer of properties between parents and children). This new technique has been callad Donation (transfer of properties between trames without relationships). 5. It improves control of the procedural knowledge defined in the trames by introducing OO concepta. A frame-formalized KBS building environment has been constructed, incorporating all the new concepts of the theoretical model set out in the thesis. There is some embedding, that is, they are trames that provide for any KBS to be formalizad in trames. The environment will enable the KE to formaliza KB automatically, and he will be able to valídate the domain knowledge in the formalization stage instead of havíng to wait until the KB has been implemented. This is a description of the Frame-oriented Design Model, a bridge that brings closer and communicates the Outside World with the Interna! Model associated to reality and implemented on a computar, thus reducing the different losses in knowledge that, though they do not occur simultaneosly when building a Knowledge-based System, coexist within it.
Resumo:
Supply chain formation (SCF) is the process of determining the set of participants and exchange relationships within a network with the goal of setting up a supply chain that meets some predefined social objective. Many proposed solutions for the SCF problem rely on centralized computation, which presents a single point of failure and can also lead to problems with scalability. Decentralized techniques that aid supply chain emergence offer a more robust and scalable approach by allowing participants to deliberate between themselves about the structure of the optimal supply chain. Current decentralized supply chain emergence mechanisms are only able to deal with simplistic scenarios in which goods are produced and traded in single units only and without taking into account production capacities or input-output ratios other than 1:1. In this paper, we demonstrate the performance of a graphical inference technique, max-sum loopy belief propagation (LBP), in a complex multiunit unit supply chain emergence scenario which models additional constraints such as production capacities and input-to-output ratios. We also provide results demonstrating the performance of LBP in dynamic environments, where the properties and composition of participants are altered as the algorithm is running. Our results suggest that max-sum LBP produces consistently strong solutions on a variety of network structures in a multiunit problem scenario, and that performance tends not to be affected by on-the-fly changes to the properties or composition of participants.
Resumo:
A class of intelligent systems located on anthropocentric objects that provide a crew with recommendations on the anthropocentric object's rational behavior in typical situations of operation is considered. We refer to this class of intelligent systems as onboard real-time advisory expert systems. Here, we present a formal model of the object domain, procedures for obtaining knowledge about the object domain, and a semantic structure of basic functional units of the onboard real-time advisory expert systems of typical situations. The stages of the development and improvement of knowledge bases for onboard real-time advisory expert systems of typical situations that are important in practice are considered.
Resumo:
We present a novel approach for developing summary statistics for use in approximate Bayesian computation (ABC) algorithms by using indirect inference. ABC methods are useful for posterior inference in the presence of an intractable likelihood function. In the indirect inference approach to ABC the parameters of an auxiliary model fitted to the data become the summary statistics. Although applicable to any ABC technique, we embed this approach within a sequential Monte Carlo algorithm that is completely adaptive and requires very little tuning. This methodological development was motivated by an application involving data on macroparasite population evolution modelled by a trivariate stochastic process for which there is no tractable likelihood function. The auxiliary model here is based on a beta–binomial distribution. The main objective of the analysis is to determine which parameters of the stochastic model are estimable from the observed data on mature parasite worms.
Resumo:
Prediction of variable bit rate compressed video traffic is critical to dynamic allocation of resources in a network. In this paper, we propose a technique for preprocessing the dataset used for training a video traffic predictor. The technique involves identifying the noisy instances in the data using a fuzzy inference system. We focus on three prediction techniques, namely, linear regression, neural network and support vector regression and analyze their performance on H.264 video traces. Our experimental results reveal that data preprocessing greatly improves the performance of linear regression and neural network, but is not effective on support vector regression.
Resumo:
Inference of molecular function of proteins is the fundamental task in the quest for understanding cellular processes. The task is getting increasingly difficult with thousands of new proteins discovered each day. The difficulty arises primarily due to lack of high-throughput experimental technique for assessing protein molecular function, a lacunae that computational approaches are trying hard to fill. The latter too faces a major bottleneck in absence of clear evidence based on evolutionary information. Here we propose a de novo approach to annotate protein molecular function through structural dynamics match for a pair of segments from two dissimilar proteins, which may share even <10% sequence identity. To screen these matches, corresponding 1 mu s coarse-grained (CG) molecular dynamics trajectories were used to compute normalized root-mean-square-fluctuation graphs and select mobile segments, which were, thereafter, matched for all pairs using unweighted three-dimensional autocorrelation vectors. Our in-house custom-built forcefield (FF), extensively validated against dynamics information obtained from experimental nuclear magnetic resonance data, was specifically used to generate the CG dynamics trajectories. The test for correspondence of dynamics-signature of protein segments and function revealed 87% true positive rate and 93.5% true negative rate, on a dataset of 60 experimentally validated proteins, including moonlighting proteins and those with novel functional motifs. A random test against 315 unique fold/function proteins for a negative test gave >99% true recall. A blind prediction on a novel protein appears consistent with additional evidences retrieved therein. This is the first proof-of-principle of generalized use of structural dynamics for inferring protein molecular function leveraging our custom-made CG FF, useful to all. (C) 2014 Wiley Periodicals, Inc.