898 resultados para Deterministic imputation
Resumo:
1. Ecologists are debating the relative role of deterministic and stochastic determinants of community structure. Although the high diversity and strong spatial structure of soil animal assemblages could provide ecologists with an ideal ecological scenario, surprisingly little information is available on these assemblages.
2. We studied species-rich soil oribatid mite assemblages from a Mediterranean beech forest and a grassland. We applied multivariate regression approaches and analysed spatial autocorrelation at multiple spatial scales using Moran's eigenvectors. Results were used to partition community variance in terms of the amount of variation uniquely accounted for by environmental correlates (e.g. organic matter) and geographical position. Estimated neutral diversity and immigration parameters were also applied to a soil animal group for the first time to simulate patterns of community dissimilarity expected under neutrality, thereby testing neutral predictions.
3. After accounting for spatial autocorrelation, the correlation between community structure and key environmental parameters disappeared: about 40% of community variation consisted of spatial patterns independent of measured environmental variables such as organic matter. Environmentally independent spatial patterns encompassed the entire range of scales accounted for by the sampling design (from tens of cm to 100 m). This spatial variation could be due to either unmeasured but spatially structured variables or stochastic drift mediated by dispersal. Observed levels of community dissimilarity were significantly different from those predicted by neutral models.
4. Oribatid mite assemblages are dominated by processes involving both deterministic and stochastic components and operating at multiple scales. Spatial patterns independent of the measured environmental variables are a prominent feature of the targeted assemblages, but patterns of community dissimilarity do not match neutral predictions. This suggests that either niche-mediated competition or environmental filtering or both are contributing to the core structure of the community. This study indicates new lines of investigation for understanding the mechanisms that determine the signature of the deterministic component of animal community assembly.
Resumo:
Extreme arid regions in the worlds' major deserts are typified by quartz pavement terrain. Cryptic hypolithic communities colonize the ventral surface of quartz rocks and this habitat is characterized by a relative lack of environmental and trophic complexity. Combined with readily identifiable major environmental stressors this provides a tractable model system for determining the relative role of stochastic and deterministic drivers in community assembly. Through analyzing an original, worldwide data set of 16S rRNA-gene defined bacterial communities from the most extreme deserts on the Earth, we show that functional assemblages within the communities were subject to different assembly influences. Null models applied to the photosynthetic assemblage revealed that stochastic processes exerted most effect on the assemblage, although the level of community dissimilarity varied between continents in a manner not always consistent with neutral models. The heterotrophic assemblages displayed signatures of niche processes across four continents, whereas in other cases they conformed to neutral predictions. Importantly, for continents where neutrality was either rejected or accepted, assembly drivers differed between the two functional groups. This study demonstrates that multi-trophic microbial systems may not be fully described by a single set of niche or neutral assembly rules and that stochasticity is likely a major determinant of such systems, with significant variation in the influence of these determinants on a global scale.
Resumo:
Ubiquitous parallel computing aims to make parallel programming accessible to a wide variety of programming areas using deterministic and scale-free programming models built on a task abstraction. However, it remains hard to reconcile these attributes with pipeline parallelism, where the number of pipeline stages is typically hard-coded in the program and defines the degree of parallelism.
This paper introduces hyperqueues, a programming abstraction that enables the construction of deterministic and scale-free pipeline parallel programs. Hyperqueues extend the concept of Cilk++ hyperobjects to provide thread-local views on a shared data structure. While hyperobjects are organized around private local views, hyperqueues require shared concurrent views on the underlying data structure. We define the semantics of hyperqueues and describe their implementation in a work-stealing scheduler. We demonstrate scalable performance on pipeline-parallel PARSEC benchmarks and find that hyperqueues provide comparable or up to 30% better performance than POSIX threads and Intel's Threading Building Blocks. The latter are highly tuned to the number of available processing cores, while programs using hyperqueues are scale-free.
Resumo:
Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).
Resumo:
The efficient electrocatalysts for many heterogeneous catalytic processes in energy conversion and storage systems must possess necessary surface active sites. Here we identify, from X-ray photoelectron spectroscopy and density functional theory calculations, that controlling charge density redistribution via the atomic-scale incorporation of heteroatoms is paramount to import surface active sites. We engineer the deterministic nitrogen atoms inserting the bulk material to preferentially expose active sites to turn the inactive material into a sufficient electrocatalyst. The excellent electrocatalytic activity of N-In2O3 nanocrystals leads to higher performance of dye-sensitized solar cells (DSCs) than the DSCs fabricated with Pt. The successful strategy provides the rational design of transforming abundant materials into high-efficient electrocatalysts. More importantly, the exciting discovery of turning the commonly used transparent conductive oxide (TCO) in DSCs into counter electrode material means that except for decreasing the cost, the device structure and processing techniques of DSCs can be simplified in future.
Resumo:
Since July 2014, the Office for National Statistics has committed to a predominantly online 2021 UK Census. Item-level imputation will play an important role in adjusting the 2021 Census database. Research indicates that the internet may yield cleaner data than paper based capture and attract people with particular characteristics. Here, we provide preliminary results from research directed at understanding how we might manage these features in a 2021 UK Census imputation strategy. Our findings suggest that if using a donor-based imputation method, it may need to consider including response mode as a matching variable in the underlying imputation model.
Resumo:
Electricity market players operating in a liberalized environment requires access to an adequate decision support tool, allowing them to consider all the business opportunities and take strategic decisions. Ancillary services represent a good negotiation opportunity that must be considered by market players. For this, decision support tool must include ancillary market simulation. This paper proposes two different methods (Linear Programming and Genetic Algorithm approaches) for ancillary services dispatch. The methodologies are implemented in MASCEM, a multi-agent based electricity market simulator. A test case based on California Independent System Operator (CAISO) data concerning the dispatch of Regulation Down, Regulation Up, Spinning Reserve and Non-Spinning Reserve services is included in this paper.
Resumo:
Attrition in longitudinal studies can lead to biased results. The study is motivated by the unexpected observation that alcohol consumption decreased despite increased availability, which may be due to sample attrition of heavy drinkers. Several imputation methods have been proposed, but rarely compared in longitudinal studies of alcohol consumption. The imputation of consumption level measurements is computationally particularly challenging due to alcohol consumption being a semi-continuous variable (dichotomous drinking status and continuous volume among drinkers), and the non-normality of data in the continuous part. Data come from a longitudinal study in Denmark with four waves (2003-2006) and 1771 individuals at baseline. Five techniques for missing data are compared: Last value carried forward (LVCF) was used as a single, and Hotdeck, Heckman modelling, multivariate imputation by chained equations (MICE), and a Bayesian approach as multiple imputation methods. Predictive mean matching was used to account for non-normality, where instead of imputing regression estimates, "real" observed values from similar cases are imputed. Methods were also compared by means of a simulated dataset. The simulation showed that the Bayesian approach yielded the most unbiased estimates for imputation. The finding of no increase in consumption levels despite a higher availability remained unaltered. Copyright (C) 2011 John Wiley & Sons, Ltd.
Resumo:
Smoking is a leading global cause of disease and mortality. We established the Oxford-GlaxoSmithKline study (Ox-GSK) to perform a genome-wide meta-analysis of SNP association with smoking-related behavioral traits. Our final data set included 41,150 individuals drawn from 20 disease, population and control cohorts. Our analysis confirmed an effect on smoking quantity at a locus on 15q25 (P = 9.45 x 10(-19)) that includes CHRNA5, CHRNA3 and CHRNB4, three genes encoding neuronal nicotinic acetylcholine receptor subunits. We used data from the 1000 Genomes project to investigate the region using imputation, which allowed for analysis of virtually all common SNPs in the region and offered a fivefold increase in marker density over HapMap2 (ref. 2) as an imputation reference panel. Our fine-mapping approach identified a SNP showing the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3.
Resumo:
One of the most important problems in the theory of cellular automata (CA) is determining the proportion of cells in a specific state after a given number of time iterations. We approach this problem using patterns in preimage sets - that is, the set of blocks which iterate to the desired output. This allows us to construct a response curve - a relationship between the proportion of cells in state 1 after niterations as a function of the initial proportion. We derive response curve formulae for many two-dimensional deterministic CA rules with L-neighbourhood. For all remaining rules, we find experimental response curves. We also use preimage sets to classify surjective rules. In the last part of the thesis, we consider a special class of one-dimensional probabilistic CA rules. We find response surface formula for these rules and experimental response surfaces for all remaining rules.
Resumo:
L'imputation est souvent utilisée dans les enquêtes pour traiter la non-réponse partielle. Il est bien connu que traiter les valeurs imputées comme des valeurs observées entraîne une sous-estimation importante de la variance des estimateurs ponctuels. Pour remédier à ce problème, plusieurs méthodes d'estimation de la variance ont été proposées dans la littérature, dont des méthodes adaptées de rééchantillonnage telles que le Bootstrap et le Jackknife. Nous définissons le concept de double-robustesse pour l'estimation ponctuelle et de variance sous l'approche par modèle de non-réponse et l'approche par modèle d'imputation. Nous mettons l'emphase sur l'estimation de la variance à l'aide du Jackknife qui est souvent utilisé dans la pratique. Nous étudions les propriétés de différents estimateurs de la variance à l'aide du Jackknife pour l'imputation par la régression déterministe ainsi qu'aléatoire. Nous nous penchons d'abord sur le cas de l'échantillon aléatoire simple. Les cas de l'échantillonnage stratifié et à probabilités inégales seront aussi étudiés. Une étude de simulation compare plusieurs méthodes d'estimation de variance à l'aide du Jackknife en terme de biais et de stabilité relative quand la fraction de sondage n'est pas négligeable. Finalement, nous établissons la normalité asymptotique des estimateurs imputés pour l'imputation par régression déterministe et aléatoire.
Resumo:
Les logiciels utilisés sont Splus et R.