935 resultados para Random Walk Models
Resumo:
We tested the prediction that, if hoverflies are Batesian mimics, this may extend to behavioral mimicry such that their numerical abundance at each hour of the day (the daily activity pattern) is related to the numbers of their hymenopteran models. After accounting for site, season, microclimatic responses and for general hoverfly abundance at three sites in north-west England, the residual numbers of mimics were significantly correlated positively with their models 9 times out of 17, while 16 out of 17 relationships were positive, itself a highly significant non-random pattern. Several eristaline flies showed significant relationships with honeybees even though some of them mimic wasps or bumblebees, perhaps reflecting an ancestral resemblance to honeybees. There was no evidence that good and poor mimics differed in their daily activity pattern relationships with models. However, the common mimics showed significant activity pattern relationships with their models, but the rarer mimics did not. We conclude that many hoverflies show behavioral mimicry of their hymenopteran models.
Resumo:
Duchenne muscular dystrophy (DMD) is a neuromuscular disease caused by mutations in the dystrophin gene. DMD is clinically characterized by severe, progressive and irreversible loss of muscle function, in which most patients lose the ability to walk by their early teens and die by their early 20’s. Impaired intracellular calcium (Ca2+) regulation and activation of cell degradation pathways have been proposed as key contributors to DMD disease progression. This dissertation research consists of three studies investigating the role of intracellular Ca2+ in skeletal muscle dysfunction in different mouse models of DMD. Study one evaluated the role of Ca2+-activated enzymes (proteases) that activate protein degradation in excitation-contraction (E-C) coupling failure following repeated contractions in mdx and dystrophin-utrophin null (mdx/utr-/-) mice. Single muscle fibers from mdx/utr-/- mice had greater E-C coupling failure following repeated contractions compared to fibers from mdx mice. Moreover, protease inhibition during these contractions was sufficient to attenuate E-C coupling failure in muscle fibers from both mdx and mdx/utr-/- mice. Study two evaluated the effects of overexpressing the Ca2+ buffering protein sarcoplasmic/endoplasmic reticulum Ca2+-ATPase 1 (SERCA1) in skeletal muscles from mdx and mdx/utr-/- mice. Overall, SERCA1 overexpression decreased muscle damage and protected the muscle from contraction-induced injury in mdx and mdx/utr-/- mice. In study three, the cellular mechanisms underlying the beneficial effects of SERCA1 overexpression in mdx and mdx/utr-/- mice were investigated. SERCA1 overexpression attenuated calpain activation in mdx muscle only, while partially attenuating the degradation of the calpain target desmin in mdx/utr-/- mice. Additionally, SERCA1 overexpression decreased the SERCA-inhibitory protein sarcolipin in mdx muscle but did not alter levels of Ca2+ regulatory proteins (parvalbumin and calsequestrin) in either dystrophic model. Lastly, SERCA1 overexpression blunted the increase in endoplasmic reticulum stress markers Grp78/BiP in mdx mice and C/EBP homologous protein (CHOP) in mdx and mdx/utr-/- mice. Overall, findings from the studies presented in this dissertation provide new insight into the role of Ca2+ in muscle dysfunction and damage in different dystrophic mouse models. Further, these findings support the overall strategy for improving intracellular Ca2+ control for the development of novel therapies for DMD.
Resumo:
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.
Decoherence models for discrete-time quantum walks and their application to neutral atom experiments
Resumo:
We discuss decoherence in discrete-time quantum walks in terms of a phenomenological model that distinguishes spin and spatial decoherence. We identify the dominating mechanisms that affect quantum-walk experiments realized with neutral atoms walking in an optical lattice. From the measured spatial distributions, we determine with good precision the amount of decoherence per step, which provides a quantitative indication of the quality of our quantum walks. In particular, we find that spin decoherence is the main mechanism responsible for the loss of coherence in our experiment. We also find that the sole observation of ballistic-instead of diffusive-expansion in position space is not a good indicator of the range of coherent delocalization. We provide further physical insight by distinguishing the effects of short- and long-time spin dephasing mechanisms. We introduce the concept of coherence length in the discrete-time quantum walk, which quantifies the range of spatial coherences. Unexpectedly, we find that quasi-stationary dephasing does not modify the local properties of the quantum walk, but instead affects spatial coherences. For a visual representation of decoherence phenomena in phase space, we have developed a formalism based on a discrete analogue of the Wigner function. We show that the effects of spin and spatial decoherence differ dramatically in momentum space.
Resumo:
The power-law size distributions obtained experimentally for neuronal avalanches are an important evidence of criticality in the brain. This evidence is supported by the fact that a critical branching process exhibits the same exponent t~3=2. Models at criticality have been employed to mimic avalanche propagation and explain the statistics observed experimentally. However, a crucial aspect of neuronal recordings has been almost completely neglected in the models: undersampling. While in a typical multielectrode array hundreds of neurons are recorded, in the same area of neuronal tissue tens of thousands of neurons can be found. Here we investigate the consequences of undersampling in models with three different topologies (two-dimensional, small-world and random network) and three different dynamical regimes (subcritical, critical and supercritical). We found that undersampling modifies avalanche size distributions, extinguishing the power laws observed in critical systems. Distributions from subcritical systems are also modified, but the shape of the undersampled distributions is more similar to that of a fully sampled system. Undersampled supercritical systems can recover the general characteristics of the fully sampled version, provided that enough neurons are measured. Undersampling in two-dimensional and small-world networks leads to similar effects, while the random network is insensitive to sampling density due to the lack of a well-defined neighborhood. We conjecture that neuronal avalanches recorded from local field potentials avoid undersampling effects due to the nature of this signal, but the same does not hold for spike avalanches. We conclude that undersampled branching-process-like models in these topologies fail to reproduce the statistics of spike avalanches.
Resumo:
In this dissertation, we apply mathematical programming techniques (i.e., integer programming and polyhedral combinatorics) to develop exact approaches for influence maximization on social networks. We study four combinatorial optimization problems that deal with maximizing influence at minimum cost over a social network. To our knowl- edge, all previous work to date involving influence maximization problems has focused on heuristics and approximation. We start with the following viral marketing problem that has attracted a significant amount of interest from the computer science literature. Given a social network, find a target set of customers to seed with a product. Then, a cascade will be caused by these initial adopters and other people start to adopt this product due to the influence they re- ceive from earlier adopters. The idea is to find the minimum cost that results in the entire network adopting the product. We first study a problem called the Weighted Target Set Selection (WTSS) Prob- lem. In the WTSS problem, the diffusion can take place over as many time periods as needed and a free product is given out to the individuals in the target set. Restricting the number of time periods that the diffusion takes place over to be one, we obtain a problem called the Positive Influence Dominating Set (PIDS) problem. Next, incorporating partial incentives, we consider a problem called the Least Cost Influence Problem (LCIP). The fourth problem studied is the One Time Period Least Cost Influence Problem (1TPLCIP) which is identical to the LCIP except that we restrict the number of time periods that the diffusion takes place over to be one. We apply a common research paradigm to each of these four problems. First, we work on special graphs: trees and cycles. Based on the insights we obtain from special graphs, we develop efficient methods for general graphs. On trees, first, we propose a polynomial time algorithm. More importantly, we present a tight and compact extended formulation. We also project the extended formulation onto the space of the natural vari- ables that gives the polytope on trees. Next, building upon the result for trees---we derive the polytope on cycles for the WTSS problem; as well as a polynomial time algorithm on cycles. This leads to our contribution on general graphs. For the WTSS problem and the LCIP, using the observation that the influence propagation network must be a directed acyclic graph (DAG), the strong formulation for trees can be embedded into a formulation on general graphs. We use this to design and implement a branch-and-cut approach for the WTSS problem and the LCIP. In our computational study, we are able to obtain high quality solutions for random graph instances with up to 10,000 nodes and 20,000 edges (40,000 arcs) within a reasonable amount of time.
Resumo:
This paper is concerned with a stochastic SIR (susceptible-infective-removed) model for the spread of an epidemic amongst a population of individuals, with a random network of social contacts, that is also partitioned into households. The behaviour of the model as the population size tends to infinity in an appropriate fashion is investigated. A threshold parameter which determines whether or not an epidemic with few initial infectives can become established and lead to a major outbreak is obtained, as are the probability that a major outbreak occurs and the expected proportion of the population that are ultimately infected by such an outbreak, together with methods for calculating these quantities. Monte Carlo simulations demonstrate that these asymptotic quantities accurately reflect the behaviour of finite populations, even for only moderately sized finite populations. The model is compared and contrasted with related models previously studied in the literature. The effects of the amount of clustering present in the overall population structure and the infectious period distribution on the outcomes of the model are also explored.
Resumo:
This thesis is concerned with change point analysis for time series, i.e. with detection of structural breaks in time-ordered, random data. This long-standing research field regained popularity over the last few years and is still undergoing, as statistical analysis in general, a transformation to high-dimensional problems. We focus on the fundamental »change in the mean« problem and provide extensions of the classical non-parametric Darling-Erdős-type cumulative sum (CUSUM) testing and estimation theory within highdimensional Hilbert space settings. In the first part we contribute to (long run) principal component based testing methods for Hilbert space valued time series under a rather broad (abrupt, epidemic, gradual, multiple) change setting and under dependence. For the dependence structure we consider either traditional m-dependence assumptions or more recently developed m-approximability conditions which cover, e.g., MA, AR and ARCH models. We derive Gumbel and Brownian bridge type approximations of the distribution of the test statistic under the null hypothesis of no change and consistency conditions under the alternative. A new formulation of the test statistic using projections on subspaces allows us to simplify the standard proof techniques and to weaken common assumptions on the covariance structure. Furthermore, we propose to adjust the principal components by an implicit estimation of a (possible) change direction. This approach adds flexibility to projection based methods, weakens typical technical conditions and provides better consistency properties under the alternative. In the second part we contribute to estimation methods for common changes in the means of panels of Hilbert space valued time series. We analyze weighted CUSUM estimates within a recently proposed »high-dimensional low sample size (HDLSS)« framework, where the sample size is fixed but the number of panels increases. We derive sharp conditions on »pointwise asymptotic accuracy« or »uniform asymptotic accuracy« of those estimates in terms of the weighting function. Particularly, we prove that a covariance-based correction of Darling-Erdős-type CUSUM estimates is required to guarantee uniform asymptotic accuracy under moderate dependence conditions within panels and that these conditions are fulfilled, e.g., by any MA(1) time series. As a counterexample we show that for AR(1) time series, close to the non-stationary case, the dependence is too strong and uniform asymptotic accuracy cannot be ensured. Finally, we conduct simulations to demonstrate that our results are practically applicable and that our methodological suggestions are advantageous.
Resumo:
Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.
Resumo:
Le tecniche di Machine Learning sono molto utili in quanto consento di massimizzare l’utilizzo delle informazioni in tempo reale. Il metodo Random Forests può essere annoverato tra le tecniche di Machine Learning più recenti e performanti. Sfruttando le caratteristiche e le potenzialità di questo metodo, la presente tesi di dottorato affronta due casi di studio differenti; grazie ai quali è stato possibile elaborare due differenti modelli previsionali. Il primo caso di studio si è incentrato sui principali fiumi della regione Emilia-Romagna, caratterizzati da tempi di risposta molto brevi. La scelta di questi fiumi non è stata casuale: negli ultimi anni, infatti, in detti bacini si sono verificati diversi eventi di piena, in gran parte di tipo “flash flood”. Il secondo caso di studio riguarda le sezioni principali del fiume Po, dove il tempo di propagazione dell’onda di piena è maggiore rispetto ai corsi d’acqua del primo caso di studio analizzato. Partendo da una grande quantità di dati, il primo passo è stato selezionare e definire i dati in ingresso in funzione degli obiettivi da raggiungere, per entrambi i casi studio. Per l’elaborazione del modello relativo ai fiumi dell’Emilia-Romagna, sono stati presi in considerazione esclusivamente i dati osservati; a differenza del bacino del fiume Po in cui ai dati osservati sono stati affiancati anche i dati di previsione provenienti dalla catena modellistica Mike11 NAM/HD. Sfruttando una delle principali caratteristiche del metodo Random Forests, è stata stimata una probabilità di accadimento: questo aspetto è fondamentale sia nella fase tecnica che in fase decisionale per qualsiasi attività di intervento di protezione civile. L'elaborazione dei dati e i dati sviluppati sono stati effettuati in ambiente R. Al termine della fase di validazione, gli incoraggianti risultati ottenuti hanno permesso di inserire il modello sviluppato nel primo caso studio all’interno dell’architettura operativa di FEWS.
Resumo:
The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.
Resumo:
The main topic of this thesis is confounding in linear regression models. It arises when a relationship between an observed process, the covariate, and an outcome process, the response, is influenced by an unmeasured process, the confounder, associated with both. Consequently, the estimators for the regression coefficients of the measured covariates might be severely biased, less efficient and characterized by misleading interpretations. Confounding is an issue when the primary target of the work is the estimation of the regression parameters. The central point of the dissertation is the evaluation of the sampling properties of parameter estimators. This work aims to extend the spatial confounding framework to general structured settings and to understand the behaviour of confounding as a function of the data generating process structure parameters in several scenarios focusing on the joint covariate-confounder structure. In line with the spatial statistics literature, our purpose is to quantify the sampling properties of the regression coefficient estimators and, in turn, to identify the most prominent quantities depending on the generative mechanism impacting confounding. Once the sampling properties of the estimator conditionally on the covariate process are derived as ratios of dependent quadratic forms in Gaussian random variables, we provide an analytic expression of the marginal sampling properties of the estimator using Carlson’s R function. Additionally, we propose a representative quantity for the magnitude of confounding as a proxy of the bias, its first-order Laplace approximation. To conclude, we work under several frameworks considering spatial and temporal data with specific assumptions regarding the covariance and cross-covariance functions used to generate the processes involved. This study allows us to claim that the variability of the confounder-covariate interaction and of the covariate plays the most relevant role in determining the principal marker of the magnitude of confounding.
Resumo:
La presenti tesi ha come obiettivo lo studio di due algoritmi per il rilevamento di anomalie all' interno di grafi random. Per entrambi gli algoritmi sono stati creati dei modelli generativi di grafi dinamici in modo da eseguire dei test sintetici. La tesi si compone in una parte iniziale teorica e di una seconda parte sperimentale. Il secondo capitolo introduce la teoria dei grafi. Il terzo capitolo presenta il problema del rilevamento di comunità. Il quarto capitolo introduce possibili definizioni del concetto di anomalie dinamiche e il problema del loro rilevamento. Il quinto capitolo propone l' introduzione di un punteggio di outlierness associato ad ogni nodo sulla base del confronto tra la sua dinamica e quella della comunità a cui appartiene. L' ultimo capitolo si incentra sul problema della ricerca di una descrizione della rete in termini di gruppi o ruoli sulla base della quale incentrare la ricerca delle anomalie dinamiche.
Resumo:
Prosopis rubriflora and Prosopis ruscifolia are important species in the Chaquenian regions of Brazil. Because of the restriction and frequency of their physiognomy, they are excellent models for conservation genetics studies. The use of microsatellite markers (Simple Sequence Repeats, SSRs) has become increasingly important in recent years and has proven to be a powerful tool for both ecological and molecular studies. In this study, we present the development and characterization of 10 new markers for P. rubriflora and 13 new markers for P. ruscifolia. The genotyping was performed using 40 P. rubriflora samples and 48 P. ruscifolia samples from the Chaquenian remnants in Brazil. The polymorphism information content (PIC) of the P. rubriflora markers ranged from 0.073 to 0.791, and no null alleles or deviation from Hardy-Weinberg equilibrium (HW) were detected. The PIC values for the P. ruscifolia markers ranged from 0.289 to 0.883, but a departure from HW and null alleles were detected for certain loci; however, this departure may have resulted from anthropic activities, such as the presence of livestock, which is very common in the remnant areas. In this study, we describe novel SSR polymorphic markers that may be helpful in future genetic studies of P. rubriflora and P. ruscifolia.
Resumo:
Resource specialisation, although a fundamental component of ecological theory, is employed in disparate ways. Most definitions derive from simple counts of resource species. We build on recent advances in ecophylogenetics and null model analysis to propose a concept of specialisation that comprises affinities among resources as well as their co-occurrence with consumers. In the distance-based specialisation index (DSI), specialisation is measured as relatedness (phylogenetic or otherwise) of resources, scaled by the null expectation of random use of locally available resources. Thus, specialists use significantly clustered sets of resources, whereas generalists use over-dispersed resources. Intermediate species are classed as indiscriminate consumers. The effectiveness of this approach was assessed with differentially restricted null models, applied to a data set of 168 herbivorous insect species and their hosts. Incorporation of plant relatedness and relative abundance greatly improved specialisation measures compared to taxon counts or simpler null models, which overestimate the fraction of specialists, a problem compounded by insufficient sampling effort. This framework disambiguates the concept of specialisation with an explicit measure applicable to any mode of affinity among resource classes, and is also linked to ecological and evolutionary processes. This will enable a more rigorous deployment of ecological specialisation in empirical and theoretical studies.