913 resultados para Random walk model
Resumo:
To improve percolation modelling on soils the geometrical properties of the pore space must be understood; this includes porosity, particle and pore size distribution and connectivity of the pores. A study was conducted with a soil at different bulk densities based on 3D grey images acquired by X-ray computed tomography. The objective was to analyze the effect in percolation of aspects of pore network geometry and discuss the influence of the grey threshold applied to the images. A model based on random walk algorithms was applied to the images, combining five bulk densities with up to six threshold values per density. This allowed for a dynamical perspective of soil structure in relation to water transport through the inclusion of percolation speed in the analyses. To evaluate separately connectivity and isolate the effect of the grey threshold, a critical value of 35% of porosity was selected for every density. This value was the smallest at which total-percolation walks appeared for the all images of the same porosity and may represent a situation of percolation comparable among bulks densities. This criterion avoided an arbitrary decision in grey thresholds. Besides, a random matrix simulation at 35% of porosity with real images was used to test the existence of pore connectivity as a consequence of a non-random soil structure.
Resumo:
Probabilistic modeling is the de�ning characteristic of estimation of distribution algorithms (EDAs) which determines their behavior and performance in optimization. Regularization is a well-known statistical technique used for obtaining an improved model by reducing the generalization error of estimation, especially in high-dimensional problems. `1-regularization is a type of this technique with the appealing variable selection property which results in sparse model estimations. In this thesis, we study the use of regularization techniques for model learning in EDAs. Several methods for regularized model estimation in continuous domains based on a Gaussian distribution assumption are presented, and analyzed from di�erent aspects when used for optimization in a high-dimensional setting, where the population size of EDA has a logarithmic scale with respect to the number of variables. The optimization results obtained for a number of continuous problems with an increasing number of variables show that the proposed EDA based on regularized model estimation performs a more robust optimization, and is able to achieve signi�cantly better results for larger dimensions than other Gaussian-based EDAs. We also propose a method for learning a marginally factorized Gaussian Markov random �eld model using regularization techniques and a clustering algorithm. The experimental results show notable optimization performance on continuous additively decomposable problems when using this model estimation method. Our study also covers multi-objective optimization and we propose joint probabilistic modeling of variables and objectives in EDAs based on Bayesian networks, speci�cally models inspired from multi-dimensional Bayesian network classi�ers. It is shown that with this approach to modeling, two new types of relationships are encoded in the estimated models in addition to the variable relationships captured in other EDAs: objectivevariable and objective-objective relationships. An extensive experimental study shows the e�ectiveness of this approach for multi- and many-objective optimization. With the proposed joint variable-objective modeling, in addition to the Pareto set approximation, the algorithm is also able to obtain an estimation of the multi-objective problem structure. Finally, the study of multi-objective optimization based on joint probabilistic modeling is extended to noisy domains, where the noise in objective values is represented by intervals. A new version of the Pareto dominance relation for ordering the solutions in these problems, namely �-degree Pareto dominance, is introduced and its properties are analyzed. We show that the ranking methods based on this dominance relation can result in competitive performance of EDAs with respect to the quality of the approximated Pareto sets. This dominance relation is then used together with a method for joint probabilistic modeling based on `1-regularization for multi-objective feature subset selection in classi�cation, where six di�erent measures of accuracy are considered as objectives with interval values. The individual assessment of the proposed joint probabilistic modeling and solution ranking methods on datasets with small-medium dimensionality, when using two di�erent Bayesian classi�ers, shows that comparable or better Pareto sets of feature subsets are approximated in comparison to standard methods.
Resumo:
We propose distributed algorithms for sampling networks based on a new class of random walks that we call Centrifugal Random Walks (CRW). A CRW is a random walk that starts at a source and always moves away from it. We propose CRW algorithms for connected networks with arbitrary probability distributions, and for grids and networks with regular concentric connectivity with distance based distributions. All CRW sampling algorithms select a node with the exact probability distribution, do not need warm-up, and end in a number of hops bounded by the network diameter.
Resumo:
Sampling a network with a given probability distribution has been identified as a useful operation. In this paper we propose distributed algorithms for sampling networks, so that nodes are selected by a special node, called the source, with a given probability distribution. All these algorithms are based on a new class of random walks, that we call Random Centrifugal Walks (RCW). A RCW is a random walk that starts at the source and always moves away from it. Firstly, an algorithm to sample any connected network using RCW is proposed. The algorithm assumes that each node has a weight, so that the sampling process must select a node with a probability proportional to its weight. This algorithm requires a preprocessing phase before the sampling of nodes. In particular, a minimum diameter spanning tree (MDST) is created in the network, and then nodes weights are efficiently aggregated using the tree. The good news are that the preprocessing is done only once, regardless of the number of sources and the number of samples taken from the network. After that, every sample is done with a RCW whose length is bounded by the network diameter. Secondly, RCW algorithms that do not require preprocessing are proposed for grids and networks with regular concentric connectivity, for the case when the probability of selecting a node is a function of its distance to the source. The key features of the RCW algorithms (unlike previous Markovian approaches) are that (1) they do not need to warm-up (stabilize), (2) the sampling always finishes in a number of hops bounded by the network diameter, and (3) it selects a node with the exact probability distribution.
Resumo:
En esta tesis se va a describir y aplicar de forma novedosa la técnica del alisado exponencial multivariante a la predicción a corto plazo, a un día vista, de los precios horarios de la electricidad, un problema que se está estudiando intensivamente en la literatura estadística y económica reciente. Se van a demostrar ciertas propiedades interesantes del alisado exponencial multivariante que permiten reducir el número de parámetros para caracterizar la serie temporal y que al mismo tiempo permiten realizar un análisis dinámico factorial de la serie de precios horarios de la electricidad. En particular, este proceso multivariante de elevada dimensión se estimará descomponiéndolo en un número reducido de procesos univariantes independientes de alisado exponencial caracterizado cada uno por un solo parámetro de suavizado que variará entre cero (proceso de ruido blanco) y uno (paseo aleatorio). Para ello, se utilizará la formulación en el espacio de los estados para la estimación del modelo, ya que ello permite conectar esa secuencia de modelos univariantes más eficientes con el modelo multivariante. De manera novedosa, las relaciones entre los dos modelos se obtienen a partir de un simple tratamiento algebraico sin requerir la aplicación del filtro de Kalman. De este modo, se podrán analizar y poner al descubierto las razones últimas de la dinámica de precios de la electricidad. Por otra parte, la vertiente práctica de esta metodología se pondrá de manifiesto con su aplicación práctica a ciertos mercados eléctricos spot, tales como Omel, Powernext y Nord Pool. En los citados mercados se caracterizará la evolución de los precios horarios y se establecerán sus predicciones comparándolas con las de otras técnicas de predicción. ABSTRACT This thesis describes and applies the multivariate exponential smoothing technique to the day-ahead forecast of the hourly prices of electricity in a whole new way. This problem is being studied intensively in recent statistics and economics literature. It will start by demonstrating some interesting properties of the multivariate exponential smoothing that reduce drastically the number of parameters to characterize the time series and that at the same time allow a dynamic factor analysis of the hourly prices of electricity series. In particular this very complex multivariate process of dimension 24 will be estimated by decomposing a very reduced number of univariate independent of exponentially smoothing processes each characterized by a single smoothing parameter that varies between zero (white noise process) and one (random walk). To this end, the formulation is used in the state space model for the estimation, since this connects the sequence of efficient univariate models to the multivariate model. Through a novel way, relations between the two models are obtained from a simple algebraic treatment without applying the Kalman filter. Thus, we will analyze and expose the ultimate reasons for the dynamics of the electricity price. Moreover, the practical aspect of this methodology will be shown by applying this new technique to certain electricity spot markets such as Omel, Powernext and Nord Pool. In those markets the behavior of prices will be characterized, their predictions will be formulated and the results will be compared with those of other forecasting techniques.
Resumo:
The question of whether proteins originate from random sequences of amino acids is addressed. A statistical analysis is performed in terms of blocked and random walk values formed by binary hydrophobic assignments of the amino acids along the protein chains. Theoretical expectations of these variables from random distributions of hydrophobicities are compared with those obtained from functional proteins. The results, which are based upon proteins in the SWISS-PROT data base, convincingly show that the amino acid sequences in proteins differ from what is expected from random sequences in a statistically significant way. By performing Fourier transforms on the random walks, one obtains additional evidence for nonrandomness of the distributions. We have also analyzed results from a synthetic model containing only two amino acid types, hydrophobic and hydrophilic. With reasonable criteria on good folding properties in terms of thermodynamical and kinetic behavior, sequences that fold well are isolated. Performing the same statistical analysis on the sequences that fold well indicates similar deviations from randomness as for the functional proteins. The deviations from randomness can be interpreted as originating from anticorrelations in terms of an Ising spin model for the hydrophobicities. Our results, which differ from some previous investigations using other methods, might have impact on how permissive with respect to sequence specificity protein folding process is-only sequences with nonrandom hydrophobicity distributions fold well. Other distributions give rise to energy landscapes with poor folding properties and hence did not survive the evolution.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
A quantum random walk on the integers exhibits pseudo memory effects, in that its probability distribution after N steps is determined by reshuffling the first N distributions that arise in a classical random walk with the same initial distribution. In a classical walk, entropy increase can be regarded as a consequence of the majorization ordering of successive distributions. The Lorenz curves of successive distributions for a symmetric quantum walk reveal no majorization ordering in general. Nevertheless, entropy can increase, and computer experiments show that it does so on average. Varying the stages at which the quantum coin system is traced out leads to new quantum walks, including a symmetric walk for which majorization ordering is valid but the spreading rate exceeds that of the usual symmetric quantum walk.
Resumo:
The focus of the present work is the well-known feature of the probability density function (PDF) transport equations in turbulent flows-the inverse parabolicity of the equations. While it is quite common in fluid mechanics to interpret equations with direct (forward-time) parabolicity as diffusive (or as a combination of diffusion, convection and reaction), the possibility of a similar interpretation for equations with inverse parabolicity is not clear. According to Einstein's point of view, a diffusion process is associated with the random walk of some physical or imaginary particles, which can be modelled by a Markov diffusion process. In the present paper it is shown that the Markov diffusion process directly associated with the PDF equation represents a reasonable model for dealing with the PDFs of scalars but it significantly underestimates the diffusion rate required to simulate turbulent dispersion when the velocity components are considered.
Resumo:
Niche apportionment models have only been applied once to parasite communities. Only the random assortment model (RA), which indicates that species abundances are independent from each other and that interspecific competition is unimportant, provided a good fit to 3 out of 6 parasite communities investigated. The generality of this result needs to be validated, however. In this study we apply 5 niche apportionment models to the parasite communities of 14 fish species from the Great Barrier Reef. We determined which model fitted the data when using either numerical abundance or biomass as an estimate of parasite abundance, and whether the fit of niche apportionment models depends on how the parasite community is defined (e.g. ecto, endoparasites or all parasites considered together). The RA model provided a good fit for the whole community of parasites in 7 fish species when using biovolume (as a surrogate of biomass) as a measure of species abundance. The RA model also fitted observed data when ecto- and endoparasites were considered separately, using abundance or biovolume, but less frequently. Variation in fish sizes among species was not associated with the probability of a model fitting the data. Total numerical abundance and biovolume of parasites were not related across host species, suggesting that they capture different aspects of abundance. Biovolume is not only a better measurement to use with niche-orientated models, it should also be the preferred descriptor to analyse parasite community structure in other contexts. Most of the biological assumptions behind the RA model, i.e. randomness in apportioning niche space, lack of interspecific competition, independence of abundance among different species, and species with variable niches in changeable environments, are in accordance with some previous findings on parasite communities. Thus, parasite communities may generally be unsaturated with species, with empty niches, and interspecific interactions may generally be unimportant in determining parasite community structure.
Resumo:
Movements of wide-ranging top predators can now be studied effectively using satellite and archival telemetry. However, the motivations underlying movements remain difficult to determine because trajectories are seldom related to key biological gradients, such as changing prey distributions. Here, we use a dynamic prey landscape of zooplankton biomass in the north-east Atlantic Ocean to examine active habitat selection in the plankton-feeding basking shark Cetorhinus maximus. The relative success of shark searches across this landscape was examined by comparing prey biomass encountered by sharks with encounters by random-walk simulations of ‘model’ sharks. Movements of transmitter-tagged sharks monitored for 964 days (16754km estimated minimum distance) were concentrated on the European continental shelf in areas characterized by high seasonal productivity and complex prey distributions. We show movements by adult and sub-adult sharks yielded consistently higher prey encounter rates than 90% of random-walk simulations. Behavioural patterns were consistent with basking sharks using search tactics structured across multiple scales to exploit the richest prey areas available in preferred habitats. Simple behavioural rules based on learned responses to previously encountered prey distributions may explain the high performances. This study highlights how dynamic prey landscapes enable active habitat selection in large predators to be investigated from a trophic perspective, an approach that may inform conservation by identifying critical habitat of vulnerable species.
Resumo:
For analysing financial time series two main opposing viewpoints exist, either capital markets are completely stochastic and therefore prices follow a random walk, or they are deterministic and consequently predictable. For each of these views a great variety of tools exist with which it can be tried to confirm the hypotheses. Unfortunately, these methods are not well suited for dealing with data characterised in part by both paradigms. This thesis investigates these two approaches in order to model the behaviour of financial time series. In the deterministic framework methods are used to characterise the dimensionality of embedded financial data. The stochastic approach includes here an estimation of the unconditioned and conditional return distributions using parametric, non- and semi-parametric density estimation techniques. Finally, it will be shown how elements from these two approaches could be combined to achieve a more realistic model for financial time series.
Resumo:
In Statnote 9, we described a one-way analysis of variance (ANOVA) ‘random effects’ model in which the objective was to estimate the degree of variation of a particular measurement and to compare different sources of variation in space and time. The illustrative scenario involved the role of computer keyboards in a University communal computer laboratory as a possible source of microbial contamination of the hands. The study estimated the aerobic colony count of ten selected keyboards with samples taken from two keys per keyboard determined at 9am and 5pm. This type of design is often referred to as a ‘nested’ or ‘hierarchical’ design and the ANOVA estimated the degree of variation: (1) between keyboards, (2) between keys within a keyboard, and (3) between sample times within a key. An alternative to this design is a 'fixed effects' model in which the objective is not to measure sources of variation per se but to estimate differences between specific groups or treatments, which are regarded as 'fixed' or discrete effects. This statnote describes two scenarios utilizing this type of analysis: (1) measuring the degree of bacterial contamination on 2p coins collected from three types of business property, viz., a butcher’s shop, a sandwich shop, and a newsagent and (2) the effectiveness of drugs in the treatment of a fungal eye infection.
Resumo:
This paper compares the UK/US exchange rate forecasting performance of linear and nonlinear models based on monetary fundamentals, to a random walk (RW) model. Structural breaks are identified and taken into account. The exchange rate forecasting framework is also used for assessing the relative merits of the official Simple Sum and the weighted Divisia measures of money. Overall, there are four main findings. First, the majority of the models with fundamentals are able to beat the RW model in forecasting the UK/US exchange rate. Second, the most accurate forecasts of the UK/US exchange rate are obtained with a nonlinear model. Third, taking into account structural breaks reveals that the Divisia aggregate performs better than its Simple Sum counterpart. Finally, Divisia-based models provide more accurate forecasts than Simple Sum-based models provided they are constructed within a nonlinear framework.
Resumo:
Mathematics Subject Classification: 26A33, 45K05, 60J60, 60G50, 65N06, 80-99.