993 resultados para compositional models
Resumo:
Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. This paper presents the beginnings of an automatic statistician, focusing on regression problems. Our system explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural- language text. Our approach treats unknown regression functions non- parametrically using Gaussian processes, which has two important consequences. First, Gaussian processes can model functions in terms of high-level properties (e.g. smoothness, trends, periodicity, changepoints). Taken together with the compositional structure of our language of models this allows us to automatically describe functions in simple terms. Second, the use of flexible nonparametric models and a rich language for composing them in an open-ended manner also results in state- of-the-art extrapolation performance evaluated over 13 real time series data sets from various domains.
Resumo:
J. Keppens and Q. Shen. Granularity and disaggregation in compositional modelling with applications to ecological systems. Applied Intelligence, 25(3):269-292, 2006.
Resumo:
J. Keppens and Q. Shen. Compositional model repositories via dynamic constraint satisfaction with order-of-magnitude preferences. Journal of Artificial Intelligence Research, 21:499-550, 2004.
Resumo:
J. Keppens and Q. Shen. Causality enabled compositional modelling of Bayesian networks. Proceedings of the 18th International Workshop on Qualitative Reasoning, pages 33-40, 2004.
Resumo:
In research areas involving mathematical rigor, there are numerous benefits to adopting a formal representation of models and arguments: reusability, automatic evaluation of examples, and verification of consistency and correctness. However, accessibility has not been a priority in the design of formal verification tools that can provide these benefits. In earlier work [30] we attempt to address this broad problem by proposing several specific design criteria organized around the notion of a natural context: the sphere of awareness a working human user maintains of the relevant constructs, arguments, experiences, and background materials necessary to accomplish the task at hand. In this report we evaluate our proposed design criteria by utilizing within the context of novel research a formal reasoning system that is designed according to these criteria. In particular, we consider how the design and capabilities of the formal reasoning system that we employ influence, aid, or hinder our ability to accomplish a formal reasoning task – the assembly of a machine-verifiable proof pertaining to the NetSketch formalism. NetSketch is a tool for the specification of constrained-flow applications and the certification of desirable safety properties imposed thereon. NetSketch is conceived to assist system integrators in two types of activities: modeling and design. It provides capabilities for compositional analysis based on a strongly-typed domain-specific language (DSL) for describing and reasoning about constrained-flow networks and invariants that need to be enforced thereupon. In a companion paper [13] we overview NetSketch, highlight its salient features, and illustrate how it could be used in actual applications. In this paper, we define using a machine-readable syntax major parts of the formal system underlying the operation of NetSketch, along with its semantics and a corresponding notion of validity. We then provide a proof of soundness for the formalism that can be partially verified using a lightweight formal reasoning system that simulates natural contexts. A traditional presentation of these definitions and arguments can be found in the full report on the NetSketch formalism [12].
Resumo:
Community structure depends on both deterministic and stochastic processes. However, patterns of community dissimilarity (e.g. difference in species composition) are difficult to interpret in terms of the relative roles of these processes. Local communities can be more dissimilar (divergence) than, less dissimilar (convergence) than, or as dissimilar as a hypothetical control based on either null or neutral models. However, several mechanisms may result in the same pattern, or act concurrently to generate a pattern, and much research has recently been focusing on unravelling these mechanisms and their relative contributions. Using a simulation approach, we addressed the effect of a complex but realistic spatial structure in the distribution of the niche axis and we analysed patterns of species co-occurrence and beta diversity as measured by dissimilarity indices (e.g. Jaccard index) using either expectations under a null model or neutral dynamics (i.e., based on switching off the niche effect). The strength of niche processes, dispersal, and environmental noise strongly interacted so that niche-driven dynamics may result in local communities that either diverge or converge depending on the combination of these factors. Thus, a fundamental result is that, in real systems, interacting processes of community assembly can be disentangled only by measuring traits such as niche breadth and dispersal. The ability to detect the signal of the niche was also dependent on the spatial resolution of the sampling strategy, which must account for the multiple scale spatial patterns in the niche axis. Notably, some of the patterns we observed correspond to patterns of community dissimilarities previously observed in the field and suggest mechanistic explanations for them or the data required to solve them. Our framework offers a synthesis of the patterns of community dissimilarity produced by the interaction of deterministic and stochastic determinants of community assembly in a spatially explicit and complex context.
Resumo:
In spite of the controversy that they have generated, neutral models provide ecologists with powerful tools for creating dynamic predictions about beta-diversity in ecological communities. Ecologists can achieve an understanding of the assembly rules operating in nature by noting when and how these predictions are met or not met. This is particularly valuable for those groups of organisms that are challenging to study under natural conditions (e.g., bacteria and fungi). Here, we focused on arbuscular mycorrhizal fungal (AMF) communities and performed an extensive literature search that allowed us to synthesize the information in 19 data sets with the minimal requisites for creating a null hypothesis in terms of community dissimilarity expected under neutral dynamics. In order to achieve this task, we calculated the first estimates of neutral parameters for several AMF communities from different ecosystems. Communities were shown either to be consistent with neutrality or to diverge or converge with respect to the levels of compositional dissimilarity expected under neutrality. These data support the hypothesis that divergence occurs in systems where the effect of limited dispersal is overwhelmed by anthropogenic disturbance or extreme biological and environmental heterogeneity, whereas communities converge when systems have the potential for niche divergence within a relatively homogeneous set of environmental conditions. Regarding the study cases that were consistent with neutrality, the sampling designs employed may have covered relatively homogeneous environments in which the effects of dispersal limitation overwhelmed minor differences among AMF taxa that would lead to environmental filtering. Using neutral models we showed for the first time for a soil microbial group the conditions under which different assembly processes may determine different patterns of beta-diversity. Our synthesis is an important step showing how the application of general ecological theories to a model microbial taxon has the potential to shed light on the assembly and ecological dynamics of communities.
Resumo:
Vector Space Models (VSMs) of Semantics are useful tools for exploring the semantics of single words, and the composition of words to make phrasal meaning. While many methods can estimate the meaning (i.e. vector) of a phrase, few do so in an interpretable way. We introduce a new method (CNNSE) that allows word and phrase vectors to adapt to the notion of composition. Our method learns a VSM that is both tailored to support a chosen semantic composition operation, and whose resulting features have an intuitive interpretation. Interpretability allows for the exploration of phrasal semantics, which we leverage to analyze performance on a behavioral task.
Resumo:
One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By an essential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur in many compositional situations, such as household budget patterns, time budgets, palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful in such situations. From consideration of such examples it seems sensible to build up a model in two stages, the first determining where the zeros will occur and the second how the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data
Resumo:
This analysis was stimulated by the real data analysis problem of household expenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that try to add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spending excluding alcohol/tobacco similar for teetotal and non-teetotal households? In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than one component, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durables within the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small. While this analysis is based on around economic data, the ideas carry over to many other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
First discussion on compositional data analysis is attributable to Karl Pearson, in 1897. However, notwithstanding the recent developments on algebraic structure of the simplex, more than twenty years after Aitchison’s idea of log-transformations of closed data, scientific literature is again full of statistical treatments of this type of data by using traditional methodologies. This is particularly true in environmental geochemistry where besides the problem of the closure, the spatial structure (dependence) of the data have to be considered. In this work we propose the use of log-contrast values, obtained by a simplicial principal component analysis, as LQGLFDWRUV of given environmental conditions. The investigation of the log-constrast frequency distributions allows pointing out the statistical laws able to generate the values and to govern their variability. The changes, if compared, for example, with the mean values of the random variables assumed as models, or other reference parameters, allow defining monitors to be used to assess the extent of possible environmental contamination. Case study on running and ground waters from Chiavenna Valley (Northern Italy) by using Na+, K+, Ca2+, Mg2+, HCO3-, SO4 2- and Cl- concentrations will be illustrated
Resumo:
A compositional time series is obtained when a compositional data vector is observed at different points in time. Inherently, then, a compositional time series is a multivariate time series with important constraints on the variables observed at any instance in time. Although this type of data frequently occurs in situations of real practical interest, a trawl through the statistical literature reveals that research in the field is very much in its infancy and that many theoretical and empirical issues still remain to be addressed. Any appropriate statistical methodology for the analysis of compositional time series must take into account the constraints which are not allowed for by the usual statistical techniques available for analysing multivariate time series. One general approach to analyzing compositional time series consists in the application of an initial transform to break the positive and unit sum constraints, followed by the analysis of the transformed time series using multivariate ARIMA models. In this paper we discuss the use of the additive log-ratio, centred log-ratio and isometric log-ratio transforms. We also present results from an empirical study designed to explore how the selection of the initial transform affects subsequent multivariate ARIMA modelling as well as the quality of the forecasts
Resumo:
A joint distribution of two discrete random variables with finite support can be displayed as a two way table of probabilities adding to one. Assume that this table has n rows and m columns and all probabilities are non-null. This kind of table can be seen as an element in the simplex of n · m parts. In this context, the marginals are identified as compositional amalgams, conditionals (rows or columns) as subcompositions. Also, simplicial perturbation appears as Bayes theorem. However, the Euclidean elements of the Aitchison geometry of the simplex can also be translated into the table of probabilities: subspaces, orthogonal projections, distances. Two important questions are addressed: a) given a table of probabilities, which is the nearest independent table to the initial one? b) which is the largest orthogonal projection of a row onto a column? or, equivalently, which is the information in a row explained by a column, thus explaining the interaction? To answer these questions three orthogonal decompositions are presented: (1) by columns and a row-wise geometric marginal, (2) by rows and a columnwise geometric marginal, (3) by independent two-way tables and fully dependent tables representing row-column interaction. An important result is that the nearest independent table is the product of the two (row and column)-wise geometric marginal tables. A corollary is that, in an independent table, the geometric marginals conform with the traditional (arithmetic) marginals. These decompositions can be compared with standard log-linear models. Key words: balance, compositional data, simplex, Aitchison geometry, composition, orthonormal basis, arithmetic and geometric marginals, amalgam, dependence measure, contingency table
Resumo:
The composition of the labour force is an important economic factor for a country. Often the changes in proportions of different groups are of interest. I this paper we study a monthly compositional time series from the Swedish Labour Force Survey from 1994 to 2005. Three models are studied: the ILR-transformed series, the ILR-transformation of the compositional differenced series of order 1, and the ILRtransformation of the compositional differenced series of order 12. For each of the three models a VAR-model is fitted based on the data 1994-2003. We predict the time series 15 steps ahead and calculate 95 % prediction regions. The predictions of the three models are compared with actual values using MAD and MSE and the prediction regions are compared graphically in a ternary time series plot. We conclude that the first, and simplest, model possesses the best predictive power of the three models
Resumo:
Evolution of compositions in time, space, temperature or other covariates is frequent in practice. For instance, the radioactive decomposition of a sample changes its composition with time. Some of the involved isotopes decompose into other isotopes of the sample, thus producing a transfer of mass from some components to other ones, but preserving the total mass present in the system. This evolution is traditionally modelled as a system of ordinary di erential equations of the mass of each component. However, this kind of evolution can be decomposed into a compositional change, expressed in terms of simplicial derivatives, and a mass evolution (constant in this example). A rst result is that the simplicial system of di erential equations is non-linear, despite of some subcompositions behaving linearly. The goal is to study the characteristics of such simplicial systems of di erential equa- tions such as linearity and stability. This is performed extracting the compositional dif ferential equations from the mass equations. Then, simplicial derivatives are expressed in coordinates of the simplex, thus reducing the problem to the standard theory of systems of di erential equations, including stability. The characterisation of stability of these non-linear systems relays on the linearisation of the system of di erential equations at the stationary point, if any. The eigenvelues of the linearised matrix and the associated behaviour of the orbits are the main tools. For a three component system, these orbits can be plotted both in coordinates of the simplex or in a ternary diagram. A characterisation of processes with transfer of mass in closed systems in terms of stability is thus concluded. Two examples are presented for illustration, one of them is a radioactive decay