957 resultados para Pair distributions
Resumo:
Chow and Liu introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimesion of the domain, and linear in the number of data points that define the target distribution $P$. This paper shows that for sparse, discrete data, fitting a tree distribution can be done in time and memory that is jointly subquadratic in the number of variables and the size of the data set. The new algorithm, called the acCL algorithm, takes advantage of the sparsity of the data to accelerate the computation of pairwise marginals and the sorting of the resulting mutual informations, achieving speed ups of up to 2-3 orders of magnitude in the experiments.
Resumo:
Essery, RLH & JW, Pomeroy, (2004). Vegetation and topographic control of wind-blown snow distributions in distributed and aggregated simulations. Journal of Hydrometeorology, 5, 735-744.
Resumo:
Thomas, L., Ratcliffe, M., and Robertson, A. 2003. Code warriors and code-a-phobes: a study in attitude and pair programming. SIGCSE Bull. 35, 1 (Jan. 2003), 363-367.
Resumo:
Brian Huntley, Rhys E. Green, Yvonne C. Collingham, Jane K. Hill, Stephen G. Willis , Patrick J. Bartlein, Wolfgang Cramer, Ward J. M. Hagemeijer and Christopher J. Thomas (2004). The performance of models relating species geographical distributions to climate is independent of trophic level. Ecology Letters, 7(5), 417-426. Sponsorship: NERC (awards: GR9/3016, GR9/04270, GR3/12542, NER/F/S/2000/00166) / RSPB RAE2008
Resumo:
Recent studies have noted that vertex degree in the autonomous system (AS) graph exhibits a highly variable distribution [15, 22]. The most prominent explanatory model for this phenomenon is the Barabási-Albert (B-A) model [5, 2]. A central feature of the B-A model is preferential connectivity—meaning that the likelihood a new node in a growing graph will connect to an existing node is proportional to the existing node’s degree. In this paper we ask whether a more general explanation than the B-A model, and absent the assumption of preferential connectivity, is consistent with empirical data. We are motivated by two observations: first, AS degree and AS size are highly correlated [11]; and second, highly variable AS size can arise simply through exponential growth. We construct a model incorporating exponential growth in the size of the Internet, and in the number of ASes. We then show via analysis that such a model yields a size distribution exhibiting a power-law tail. In such a model, if an AS’s link formation is roughly proportional to its size, then AS degree will also show high variability. We instantiate such a model with empirically derived estimates of growth rates and show that the resulting degree distribution is in good agreement with that of real AS graphs.
Resumo:
Fast forward error correction codes are becoming an important component in bulk content delivery. They fit in naturally with multicast scenarios as a way to deal with losses and are now seeing use in peer to peer networks as a basis for distributing load. In particular, new irregular sparse parity check codes have been developed with provable average linear time performance, a significant improvement over previous codes. In this paper, we present a new heuristic for generating codes with similar performance based on observing a server with an oracle for client state. This heuristic is easy to implement and provides further intuition into the need for an irregular heavy tailed distribution.
Resumo:
End-to-End differentiation between wireless and congestion loss can equip TCP control so it operates effectively in a hybrid wired/wireless environment. Our approach integrates two techniques: packet loss pairs (PLP) and Hidden Markov Modeling (HMM). A packet loss pair is formed by two back-to-back packets, where one packet is lost while the second packet is successfully received. The purpose is for the second packet to carry the state of the network path, namely the round trip time (RTT), at the time the other packet is lost. Under realistic conditions, PLP provides strong differentiation between congestion and wireless type of loss based on distinguishable RTT distributions. An HMM is then trained so observed RTTs can be mapped to model states that represent either congestion loss or wireless loss. Extensive simulations confirm the accuracy of our HMM-based technique in classifying the cause of a packet loss. We also show the superiority of our technique over the Vegas predictor, which was recently found to perform best and which exemplifies other existing loss labeling techniques.
Resumo:
A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and based on predictions of the Markov model. The evolution of the skin color distribution at each frame is parameterized by translation, scaling and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and re-sampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using Maximum Likelihood Estimation, and also evolve over time. Quantitative evaluation of the method was conducted on labeled ground-truth video sequences taken from popular movies.
Resumo:
The increasing practicality of large-scale flow capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis is as yet unmet. We argue that the distributions of packet features (IP addresses and ports) observed in flow traces reveals both the presence and the structure of a wide range of anomalies. Using entropy as a summarization tool, we show that the analysis of feature distributions leads to significant advances on two fronts: (1) it enables highly sensitive detection of a wide range of anomalies, augmenting detections by volume-based methods, and (2) it enables automatic classification of anomalies via unsupervised learning. We show that using feature distributions, anomalies naturally fall into distinct and meaningful clusters. These clusters can be used to automatically classify anomalies and to uncover new anomaly types. We validate our claims on data from two backbone networks (Abilene and Geant) and conclude that feature distributions show promise as a key element of a fairly general network anomaly diagnosis framework.
Resumo:
This thesis investigates the optimisation of Coarse-Fine (CF) spectrum sensing architectures under a distribution of SNRs for Dynamic Spectrum Access (DSA). Three different detector architectures are investigated: the Coarse-Sorting Fine Detector (CSFD), the Coarse-Deciding Fine Detector (CDFD) and the Hybrid Coarse-Fine Detector (HCFD). To date, the majority of the work on coarse-fine spectrum sensing for cognitive radio has focused on a single value for the SNR. This approach overlooks the key advantage that CF sensing has to offer, namely that high powered signals can be easily detected without extra signal processing. By considering a range of SNR values, the detector can be optimised more effectively and greater performance gains realised. This work considers the optimisation of CF spectrum sensing schemes where the security and performance are treated separately. Instead of optimising system performance at a single, constant, low SNR value, the system instead is optimised for the average operating conditions. The security is still provided such that at the low SNR values the safety specifications are met. By decoupling the security and performance, the system’s average performance increases whilst maintaining the protection of licensed users from harmful interference. The different architectures considered in this thesis are investigated in theory, simulation and physical implementation to provide a complete overview of the performance of each system. This thesis provides a method for estimating SNR distributions which is quick, accurate and relatively low cost. The CSFD is modelled and the characteristic equations are found for the CDFD scheme. The HCFD is introduced and optimisation schemes for all three architectures are proposed. Finally, using the Implementing Radio In Software (IRIS) test-bed to confirm simulation results, CF spectrum sensing is shown to be significantly quicker than naive methods, whilst still meeting the required interference probability rates and not requiring substantial receiver complexity increases.
Resumo:
A popular way to account for unobserved heterogeneity is to assume that the data are drawn from a finite mixture distribution. A barrier to using finite mixture models is that parameters that could previously be estimated in stages must now be estimated jointly: using mixture distributions destroys any additive separability of the log-likelihood function. We show, however, that an extension of the EM algorithm reintroduces additive separability, thus allowing one to estimate parameters sequentially during each maximization step. In establishing this result, we develop a broad class of estimators for mixture models. Returning to the likelihood problem, we show that, relative to full information maximum likelihood, our sequential estimator can generate large computational savings with little loss of efficiency.
Resumo:
This article examines the behavior of equity trading volume and volatility for the individual firms composing the Standard & Poor's 100 composite index. Using multivariate spectral methods, we find that fractionally integrated processes best describe the long-run temporal dependencies in both series. Consistent with a stylized mixture-of-distributions hypothesis model in which the aggregate "news"-arrival process possesses long-memory characteristics, the long-run hyperbolic decay rates appear to be common across each volume-volatility pair.
Resumo:
The long-term soil carbon dynamics may be approximated by networks of linear compartments, permitting theoretical analysis of transit time (i.e., the total time spent by a molecule in the system) and age (the time elapsed since the molecule entered the system) distributions. We compute and compare these distributions for different network. configurations, ranging from the simple individual compartment, to series and parallel linear compartments, feedback systems, and models assuming a continuous distribution of decay constants. We also derive the transit time and age distributions of some complex, widely used soil carbon models (the compartmental models CENTURY and Rothamsted, and the continuous-quality Q-Model), and discuss them in the context of long-term carbon sequestration in soils. We show how complex models including feedback loops and slow compartments have distributions with heavier tails than simpler models. Power law tails emerge when using continuous-quality models, indicating long retention times for an important fraction of soil carbon. The responsiveness of the soil system to changes in decay constants due to altered climatic conditions or plant species composition is found to be stronger when all compartments respond equally to the environmental change, and when the slower compartments are more sensitive than the faster ones or lose more carbon through microbial respiration. Copyright 2009 by the American Geophysical Union.
Resumo:
The paper investigates stochastic processes forced by independent and identically distributed jumps occurring according to a Poisson process. The impact of different distributions of the jump amplitudes are analyzed for processes with linear drift. Exact expressions of the probability density functions are derived when jump amplitudes are distributed as exponential, gamma, and mixture of exponential distributions for both natural and reflecting boundary conditions. The mean level-crossing properties are studied in relation to the different jump amplitudes. As an example of application of the previous theoretical derivations, the role of different rainfall-depth distributions on an existing stochastic soil water balance model is analyzed. It is shown how the shape of distribution of daily rainfall depths plays a more relevant role on the soil moisture probability distribution as the rainfall frequency decreases, as predicted by future climatic scenarios. © 2010 The American Physical Society.
Resumo:
The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.