902 resultados para Deterministic imputation
Resumo:
The main goal of this thesis is to discuss the determination of homological invariants of polynomial ideals. Thereby we consider different coordinate systems and analyze their meaning for the computation of certain invariants. In particular, we provide an algorithm that transforms any ideal into strongly stable position if char k = 0. With a slight modification, this algorithm can also be used to achieve a stable or quasi-stable position. If our field has positive characteristic, the Borel-fixed position is the maximum we can obtain with our method. Further, we present some applications of Pommaret bases, where we focus on how to directly read off invariants from this basis. In the second half of this dissertation we take a closer look at another homological invariant, namely the (absolute) reduction number. It is a known fact that one immediately receives the reduction number from the basis of the generic initial ideal. However, we show that it is not possible to formulate an algorithm – based on analyzing only the leading ideal – that transforms an ideal into a position, which allows us to directly receive this invariant from the leading ideal. So in general we can not read off the reduction number of a Pommaret basis. This result motivates a deeper investigation of which properties a coordinate system must possess so that we can determine the reduction number easily, i.e. by analyzing the leading ideal. This approach leads to the introduction of some generalized versions of the mentioned stable positions, such as the weakly D-stable or weakly D-minimal stable position. The latter represents a coordinate system that allows to determine the reduction number without any further computations. Finally, we introduce the notion of β-maximal position, which provides lots of interesting algebraic properties. In particular, this position is in combination with weakly D-stable sufficient for the weakly D-minimal stable position and so possesses a connection to the reduction number.
Resumo:
ABSTRACT Researchers frequently have to analyze scales in which some participants have failed to respond to some items. In this paper we focus on the exploratory factor analysis of multidimensional scales (i.e., scales that consist of a number of subscales) where each subscale is made up of a number of Likert-type items, and the aim of the analysis is to estimate participants' scores on the corresponding latent traits. We propose a new approach to deal with missing responses in such a situation that is based on (1) multiple imputation of non-responses and (2) simultaneous rotation of the imputed datasets. We applied the approach in a real dataset where missing responses were artificially introduced following a real pattern of non-responses, and a simulation study based on artificial datasets. The results show that our approach (specifically, Hot-Deck multiple imputation followed of Consensus Promin rotation) was able to successfully compute factor score estimates even for participants that have missing data.
Resumo:
Fault tolerance allows a system to remain operational to some degree when some of its components fail. One of the most common fault tolerance mechanisms consists on logging the system state periodically, and recovering the system to a consistent state in the event of a failure. This paper describes a general fault tolerance logging-based mechanism, which can be layered over deterministic systems. Our proposal describes how a logging mechanism can recover the underlying system to a consistent state, even if an action or set of actions were interrupted mid-way, due to a server crash. We also propose different methods of storing the logging information, and describe how to deploy a fault tolerant master-slave cluster for information replication. We adapt our model to a previously proposed framework, which provided common relational features, like transactions with atomic, consistent, isolated and durable properties, to NoSQL database management systems.
Resumo:
We summarise the properties and the fundamental mathematical results associated with basic models which describe coagulation and fragmentation processes in a deterministic manner and in which cluster size is a discrete quantity (an integer multiple of some basic unit size). In particular, we discuss Smoluchowski's equation for aggregation, the Becker-Döring model of simultaneous aggregation and fragmentation, and more general models involving coagulation and fragmentation.
Resumo:
The Dendritic Cell Algorithm is an immune-inspired algorithm originally based on the function of natural dendritic cells. The original instantiation of the algorithm is a highly stochastic algorithm. While the performance of the algorithm is good when applied to large real-time datasets, it is difficult to analyse due to the number of random-based elements. In this paper a deterministic version of the algorithm is proposed, implemented and tested using a port scan dataset to provide a controllable system. This version consists of a controllable amount of parameters, which are experimented with in this paper. In addition the effects are examined of the use of time windows and variation on the number of cells, both which are shown to influence the algorithm. Finally a novel metric for the assessment of the algorithms output is introduced and proves to be a more sensitive metric than the metric used with the original Dendritic Cell Algorithm.
Resumo:
When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.
Resumo:
Spatio-temporal modelling is an area of increasing importance in which models and methods have often been developed to deal with specific applications. In this study, a spatio-temporal model was used to estimate daily rainfall data. Rainfall records from several weather stations, obtained from the Agritempo system for two climatic homogeneous zones, were used. Rainfall values obtained for two fixed dates (January 1 and May 1, 2012) using the spatio-temporal model were compared with the geostatisticals techniques of ordinary kriging and ordinary cokriging with altitude as auxiliary variable. The spatio-temporal model was more than 17% better at producing estimates of daily precipitation compared to kriging and cokriging in the first zone and more than 18% in the second zone. The spatio-temporal model proved to be a versatile technique, adapting to different seasons and dates.
Resumo:
Credible spatial information characterizing the structure and site quality of forests is critical to sustainable forest management and planning, especially given the increasing demands and threats to forest products and services. Forest managers and planners are required to evaluate forest conditions over a broad range of scales, contingent on operational or reporting requirements. Traditionally, forest inventory estimates are generated via a design-based approach that involves generalizing sample plot measurements to characterize an unknown population across a larger area of interest. However, field plot measurements are costly and as a consequence spatial coverage is limited. Remote sensing technologies have shown remarkable success in augmenting limited sample plot data to generate stand- and landscape-level spatial predictions of forest inventory attributes. Further enhancement of forest inventory approaches that couple field measurements with cutting edge remotely sensed and geospatial datasets are essential to sustainable forest management. We evaluated a novel Random Forest based k Nearest Neighbors (RF-kNN) imputation approach to couple remote sensing and geospatial data with field inventory collected by different sampling methods to generate forest inventory information across large spatial extents. The forest inventory data collected by the FIA program of US Forest Service was integrated with optical remote sensing and other geospatial datasets to produce biomass distribution maps for a part of the Lake States and species-specific site index maps for the entire Lake State. Targeting small-area application of the state-of-art remote sensing, LiDAR (light detection and ranging) data was integrated with the field data collected by an inexpensive method, called variable plot sampling, in the Ford Forest of Michigan Tech to derive standing volume map in a cost-effective way. The outputs of the RF-kNN imputation were compared with independent validation datasets and extant map products based on different sampling and modeling strategies. The RF-kNN modeling approach was found to be very effective, especially for large-area estimation, and produced results statistically equivalent to the field observations or the estimates derived from secondary data sources. The models are useful to resource managers for operational and strategic purposes.
Resumo:
This paper presents two techniques to evaluate soil mechanical resistance to penetration as an auxiliary method to help in a decision-making in subsoiling operations. The decision is based on the volume of soil mobilized as a function of the considered critical soil resistance to penetration in each case. The first method, probabilistic, uses statistical techniques to define the volume of soil to be mobilized. The other method, deterministic, determines the percentage of soil to be mobilized and its spatial distribution. Both cases plot the percentage curves of experimental data related to the soil mechanical resistance to penetration equal or larger to the established critical level and the volume of soil to be mobilized as a function of critical level. The deterministic method plots showed the spatial distribution of the data with resistance to penetration equal or large than the critical level. The comparison between mobilized soil curves as a function of critical level using both methods showed that they can be considered equivalent. The deterministic method has the advantage of showing the spatial distribution of the critical points.
Resumo:
The aim of this paper is to discuss some rhythmic differences between European and Brazilian Portuguese and their relationship to pretonic vowel reduction phenomena. After the basic facts of PE and PB are presented, we show that the issue cannot be discussed without taking into account secondary stress placement, and we proceed to present the algorithm-based approach to secondary stress in Portuguese, representative of Metrical Phonology analyses. After showing that this deterministic approach cannot adequately explain the variable position of secondary stress in both languages regarding words with an even number of pretonic syllables, we argue for the interpretation of secondary stress and therefore for the construction of rhythmic units at the PF interface, as suggested in Chomsky s Minimalist Program. We also propose, inspired by the constrain hierarchies as proposed in Optimality Theory, that such interpretation must take into account two different constraint rankings, in EP and BP. These different rankings would ultimately explain the rhythmic differences between both languages, as well as the different behavior of pretonic vowels with respect to reduction processes.
Resumo:
A vinculação determinística de bancos de dados sobre mortalidade por aids tem apresentado problemas causados por falhas nos arquivos. Assim, os objetivos deste estudo foram: avaliar o desempenho da vinculação determinística em bancos de óbito por aids do Programa de Aprimoramento das Informações de Mortalidade no Município de São Paulo (PRO-AIM) e da Fundação SEADE entre os anos de 2000 e 2004 e estimar a cobertura de cada banco. Utilizou-se a rotina merge de um software para vincular os bancos. A primeira etapa pareou os registros automaticamente e, na segunda etapa, cada banco foi conferido para localizar novos pares. Estimaram-se os óbitos pela soma entre casos pareados e não pareados para calcular a cobertura dos bancos. A primeira etapa da vinculação identificou 91,6% dos pares. A segunda etapa adicionou 457 pares. O total de óbitos foi estimado em 5.855, com cobertura de 97,1% do PRO-AIM e 96% do SEADE. O uso da vinculação determinística cobriu grande parte dos casos. O banco do PRO-AIM proporcionou a maior cobertura, com maior quantidade de informações completas e melhor localização geográfica dos casos.
Resumo:
Plasma edge turbulence in Tokamak Chauffage Alfven Bresilien (TCABR) [R. M. O. Galvao et al., Plasma Phys. Contr. Fusion 43, 1181 (2001)] is investigated for multifractal properties of the fluctuating floating electrostatic potential measured by Langmuir probes. The multifractality in this signal is characterized by the full multifractal spectra determined by applying the wavelet transform modulus maxima. In this work, the dependence of the multifractal spectrum with the radial position is presented. The multifractality degree inside the plasma increases with the radial position reaching a maximum near the plasma edge and becoming almost constant in the scrape-off layer. Comparisons between these results with those obtained for random test time series with the same Hurst exponents and data length statistically confirm the reported multifractal behavior. Moreover, the persistence of these signals, characterized by their Hurst exponent, present radial profile similar to the deterministic component estimated from analysis based on dynamical recurrences. (C) 2008 American Institute of Physics.
Resumo:
Consider N sites randomly and uniformly distributed in a d-dimensional hypercube. A walker explores this disordered medium going to the nearest site, which has not been visited in the last mu (memory) steps. The walker trajectory is composed of a transient part and a periodic part (cycle). For one-dimensional systems, travelers can or cannot explore all available space, giving rise to a crossover between localized and extended regimes at the critical memory mu(1) = log(2) N. The deterministic rule can be softened to consider more realistic situations with the inclusion of a stochastic parameter T (temperature). In this case, the walker movement is driven by a probability density function parameterized by T and a cost function. The cost function increases as the distance between two sites and favors hops to closer sites. As the temperature increases, the walker can escape from cycles that are reminiscent of the deterministic nature and extend the exploration. Here, we report an analytical model and numerical studies of the influence of the temperature and the critical memory in the exploration of one-dimensional disordered systems.
Resumo:
We show a function that fits well the probability density of return times between two consecutive visits of a chaotic trajectory to finite size regions in phase space. It deviates from the exponential statistics by a small power-law term, a term that represents the deterministic manifestation of the dynamics. We also show how one can quickly and easily estimate the Kolmogorov-Sinai entropy and the short-term correlation function by realizing observations of high probable returns. Our analyses are performed numerically in the Henon map and experimentally in a Chua's circuit. Finally, we discuss how our approach can be used to treat the data coming from experimental complex systems and for technological applications. (C) 2009 American Institute of Physics. [doi: 10.1063/1.3263943]
Resumo:
Noise is an intrinsic feature of population dynamics and plays a crucial role in oscillations called phase-forgetting quasicycles by converting damped into sustained oscillations. This function of noise becomes evident when considering Langevin equations whose deterministic part yields only damped oscillations. We formulate here a consistent and systematic approach to population dynamics, leading to a Fokker-Planck equation and the associate Langevin equations in accordance with this conceptual framework, founded on stochastic lattice-gas models that describe spatially structured predator-prey systems. Langevin equations in the population densities and predator-prey pair density are derived in two stages. First, a birth-and-death stochastic process in the space of prey and predator numbers and predator-prey pair number is obtained by a contraction method that reduces the degrees of freedom. Second, a van Kampen expansion in the inverse of system size is then performed to get the Fokker-Planck equation. We also study the time correlation function, the asymptotic behavior of which is used to characterize the transition from the cyclic coexistence of species to the ordinary coexistence.