15 resultados para Deterministic imputation
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.
Resumo:
In this paper, we present a study on a deterministic partially self-avoiding walk (tourist walk), which provides a novel method for texture feature extraction. The method is able to explore an image on all scales simultaneously. Experiments were conducted using different dynamics concerning the tourist walk. A new strategy, based on histograms. to extract information from its joint probability distribution is presented. The promising results are discussed and compared to the best-known methods for texture description reported in the literature. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Texture is one of the most important visual attributes for image analysis. It has been widely used in image analysis and pattern recognition. A partially self-avoiding deterministic walk has recently been proposed as an approach for texture analysis with promising results. This approach uses walkers (called tourists) to exploit the gray scale image contexts in several levels. Here, we present an approach to generate graphs out of the trajectories produced by the tourist walks. The generated graphs embody important characteristics related to tourist transitivity in the image. Computed from these graphs, the statistical position (degree mean) and dispersion (entropy of two vertices with the same degree) measures are used as texture descriptors. A comparison with traditional texture analysis methods is performed to illustrate the high performance of this novel approach. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
We introduce jump processes in R(k), called density-profile processes, to model biological signaling networks. Our modeling setup describes the macroscopic evolution of a finite-size spin-flip model with k types of spins with arbitrary number of internal states interacting through a non-reversible stochastic dynamics. We are mostly interested on the multi-dimensional empirical-magnetization vector in the thermodynamic limit, and prove that, within arbitrary finite time-intervals, its path converges almost surely to a deterministic trajectory determined by a first-order (non-linear) differential equation with explicit bounds on the distance between the stochastic and deterministic trajectories. As parameters of the spin-flip dynamics change, the associated dynamical system may go through bifurcations, associated to phase transitions in the statistical mechanical setting. We present a simple example of spin-flip stochastic model, associated to a synthetic biology model known as repressilator, which leads to a dynamical system with Hopf and pitchfork bifurcations. Depending on the parameter values, the magnetization random path can either converge to a unique stable fixed point, converge to one of a pair of stable fixed points, or asymptotically evolve close to a deterministic orbit in Rk. We also discuss a simple signaling pathway related to cancer research, called p53 module.
Resumo:
A particle filter method is presented for the discrete-time filtering problem with nonlinear ItA ` stochastic ordinary differential equations (SODE) with additive noise supposed to be analytically integrable as a function of the underlying vector-Wiener process and time. The Diffusion Kernel Filter is arrived at by a parametrization of small noise-driven state fluctuations within branches of prediction and a local use of this parametrization in the Bootstrap Filter. The method applies for small noise and short prediction steps. With explicit numerical integrators, the operations count in the Diffusion Kernel Filter is shown to be smaller than in the Bootstrap Filter whenever the initial state for the prediction step has sufficiently few moments. The established parametrization is a dual-formula for the analysis of sensitivity to gaussian-initial perturbations and the analysis of sensitivity to noise-perturbations, in deterministic models, showing in particular how the stability of a deterministic dynamics is modeled by noise on short times and how the diffusion matrix of an SODE should be modeled (i.e. defined) for a gaussian-initial deterministic problem to be cast into an SODE problem. From it, a novel definition of prediction may be proposed that coincides with the deterministic path within the branch of prediction whose information entropy at the end of the prediction step is closest to the average information entropy over all branches. Tests are made with the Lorenz-63 equations, showing good results both for the filter and the definition of prediction.
Resumo:
This paper presents a GIS-based multicriteria flood risk assessment and mapping approach applied to coastal drainage basins where hydrological data are not available. It involves risk to different types of possible processes: coastal inundation (storm surge), river, estuarine and flash flood, either at urban or natural areas, and fords. Based on the causes of these processes, several environmental indicators were taken to build-up the risk assessment. Geoindicators include geological-geomorphologic proprieties of Quaternary sedimentary units, water table, drainage basin morphometry, coastal dynamics, beach morphodynamics and microclimatic characteristics. Bioindicators involve coastal plain and low slope native vegetation categories and two alteration states. Anthropogenic indicators encompass land use categories properties such as: type, occupation density, urban structure type and occupation consolidation degree. The selected indicators were stored within an expert Geoenvironmental Information System developed for the State of Sao Paulo Coastal Zone (SIIGAL), which attributes were mathematically classified through deterministic approaches, in order to estimate natural susceptibilities (Sn), human-induced susceptibilities (Sa), return period of rain events (Ri), potential damages (Dp) and the risk classification (R), according to the equation R=(Sn.Sa.Ri).Dp. Thematic maps were automatically processed within the SIIGAL, in which automata cells (""geoenvironmental management units"") aggregating geological-geomorphologic and land use/native vegetation categories were the units of classification. The method has been applied to the Northern Littoral of the State of Sao Paulo (Brazil) in 32 small drainage basins, demonstrating to be very useful for coastal zone public politics, civil defense programs and flood management.
Resumo:
Conventional procedures employed in the modeling of viscoelastic properties of polymer rely on the determination of the polymer`s discrete relaxation spectrum from experimentally obtained data. In the past decades, several analytical regression techniques have been proposed to determine an explicit equation which describes the measured spectra. With a diverse approach, the procedure herein introduced constitutes a simulation-based computational optimization technique based on non-deterministic search method arisen from the field of evolutionary computation. Instead of comparing numerical results, this purpose of this paper is to highlight some Subtle differences between both strategies and focus on what properties of the exploited technique emerge as new possibilities for the field, In oder to illustrate this, essayed cases show how the employed technique can outperform conventional approaches in terms of fitting quality. Moreover, in some instances, it produces equivalent results With much fewer fitting parameters, which is convenient for computational simulation applications. I-lie problem formulation and the rationale of the highlighted method are herein discussed and constitute the main intended contribution. (C) 2009 Wiley Periodicals, Inc. J Appl Polym Sci 113: 122-135, 2009
Resumo:
In this paper, we consider a classical problem of complete test generation for deterministic finite-state machines (FSMs) in a more general setting. The first generalization is that the number of states in implementation FSMs can even be smaller than that of the specification FSM. Previous work deals only with the case when the implementation FSMs are allowed to have the same number of states as the specification FSM. This generalization provides more options to the test designer: when traditional methods trigger a test explosion for large specification machines, tests with a lower, but yet guaranteed, fault coverage can still be generated. The second generalization is that tests can be generated starting with a user-defined test suite, by incrementally extending it until the desired fault coverage is achieved. Solving the generalized test derivation problem, we formulate sufficient conditions for test suite completeness weaker than the existing ones and use them to elaborate an algorithm that can be used both for extending user-defined test suites to achieve the desired fault coverage and for test generation. We present the experimental results that indicate that the proposed algorithm allows obtaining a trade-off between the length and fault coverage of test suites.
Resumo:
We have investigated plasma turbulence at the edge of a tokamak plasma using data from electrostatic potential fluctuations measured in the Brazilian tokamak TCABR. Recurrence quantification analysis has been used to provide diagnostics of the deterministic content of the series. We have focused our analysis on the radial dependence of potential fluctuations and their characterization by recurrence-based diagnostics. Our main result is that the deterministic content of the experimental signals is most pronounced at the external part of the plasma column just before the plasma radius. Since the chaoticity of the signals follows the same trend, we have concluded that the electrostatic plasma turbulence at the tokamak plasma edge can be partially explained by means of a deterministic nonlinear system. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
We show that the Hausdorff dimension of the spectral measure of a class of deterministic, i.e. nonrandom, block-Jacobi matrices may be determined with any degree of precision, improving a result of Zlatos [Andrej Zlatos,. Sparse potentials with fractional Hausdorff dimension, J. Funct. Anal. 207 (2004) 216-252]. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Recently, the deterministic tourist walk has emerged as a novel approach for texture analysis. This method employs a traveler visiting image pixels using a deterministic walk rule. Resulting trajectories provide clues about pixel interaction in the image that can be used for image classification and identification tasks. This paper proposes a new walk rule for the tourist which is based on contrast direction of a neighborhood. The yielded results using this approach are comparable with those from traditional texture analysis methods in the classification of a set of Brodatz textures and their rotated versions, thus confirming the potential of the method as a feasible texture analysis methodology. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The issue of how children learn the meaning of words is fundamental to developmental psychology. The recent attempts to develop or evolve efficient communication protocols among interacting robots or Virtual agents have brought that issue to a central place in more applied research fields, such as computational linguistics and neural networks, as well. An attractive approach to learning an object-word mapping is the so-called cross-situational learning. This learning scenario is based on the intuitive notion that a learner can determine the meaning of a word by finding something in common across all observed uses of that word. Here we show how the deterministic Neural Modeling Fields (NMF) categorization mechanism can be used by the learner as an efficient algorithm to infer the correct object-word mapping. To achieve that we first reduce the original on-line learning problem to a batch learning problem where the inputs to the NMF mechanism are all possible object-word associations that Could be inferred from the cross-situational learning scenario. Since many of those associations are incorrect, they are considered as clutter or noise and discarded automatically by a clutter detector model included in our NMF implementation. With these two key ingredients - batch learning and clutter detection - the NMF mechanism was capable to infer perfectly the correct object-word mapping. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
The issue of smoothing in kriging has been addressed either by estimation or simulation. The solution via estimation calls for postprocessing kriging estimates in order to correct the smoothing effect. Stochastic simulation provides equiprobable images presenting no smoothing and reproducing the covariance model. Consequently, these images reproduce both the sample histogram and the sample semivariogram. However, there is still a problem, which is the lack of local accuracy of simulated images. In this paper, a postprocessing algorithm for correcting the smoothing effect of ordinary kriging estimates is compared with sequential Gaussian simulation realizations. Based on samples drawn from exhaustive data sets, the postprocessing algorithm is shown to be superior to any individual simulation realization yet, at the expense of providing one deterministic estimate of the random function.
Resumo:
The assessment of routing protocols for mobile wireless networks is a difficult task, because of the networks` dynamic behavior and the absence of benchmarks. However, some of these networks, such as intermittent wireless sensors networks, periodic or cyclic networks, and some delay tolerant networks (DTNs), have more predictable dynamics, as the temporal variations in the network topology can be considered as deterministic, which may make them easier to study. Recently, a graph theoretic model-the evolving graphs-was proposed to help capture the dynamic behavior of such networks, in view of the construction of least cost routing and other algorithms. The algorithms and insights obtained through this model are theoretically very efficient and intriguing. However, there is no study about the use of such theoretical results into practical situations. Therefore, the objective of our work is to analyze the applicability of the evolving graph theory in the construction of efficient routing protocols in realistic scenarios. In this paper, we use the NS2 network simulator to first implement an evolving graph based routing protocol, and then to use it as a benchmark when comparing the four major ad hoc routing protocols (AODV, DSR, OLSR and DSDV). Interestingly, our experiments show that evolving graphs have the potential to be an effective and powerful tool in the development and analysis of algorithms for dynamic networks, with predictable dynamics at least. In order to make this model widely applicable, however, some practical issues still have to be addressed and incorporated into the model, like adaptive algorithms. We also discuss such issues in this paper, as a result of our experience.
Resumo:
We study the asymptotic properties of the number of open paths of length n in an oriented rho-percolation model. We show that this number is e(n alpha(rho)(1+o(1))) as n ->infinity. The exponent alpha is deterministic, it can be expressed in terms of the free energy of a polymer model, and it can be explicitly computed in some range of the parameters. Moreover, in a restricted range of the parameters, we even show that the number of such paths is n(-1/2)We (n alpha(rho))(1+o(1)) for some nondegenerate random variable W. We build on connections with the model of directed polymers in random environment, and we use techniques and results developed in this context.