856 resultados para C33 - Models with Panel Data
Resumo:
This work belongs to the field of computational high-energy physics (HEP). The key methods used in this thesis work to meet the challenges raised by the Large Hadron Collider (LHC) era experiments are object-orientation with software engineering, Monte Carlo simulation, the computer technology of clusters, and artificial neural networks. The first aspect discussed is the development of hadronic cascade models, used for the accurate simulation of medium-energy hadron-nucleus reactions, up to 10 GeV. These models are typically needed in hadronic calorimeter studies and in the estimation of radiation backgrounds. Various applications outside HEP include the medical field (such as hadron treatment simulations), space science (satellite shielding), and nuclear physics (spallation studies). Validation results are presented for several significant improvements released in Geant4 simulation tool, and the significance of the new models for computing in the Large Hadron Collider era is estimated. In particular, we estimate the ability of the Bertini cascade to simulate Compact Muon Solenoid (CMS) hadron calorimeter HCAL. LHC test beam activity has a tightly coupled cycle of simulation-to-data analysis. Typically, a Geant4 computer experiment is used to understand test beam measurements. Thus an another aspect of this thesis is a description of studies related to developing new CMS H2 test beam data analysis tools and performing data analysis on the basis of CMS Monte Carlo events. These events have been simulated in detail using Geant4 physics models, full CMS detector description, and event reconstruction. Using the ROOT data analysis framework we have developed an offline ANN-based approach to tag b-jets associated with heavy neutral Higgs particles, and we show that this kind of NN methodology can be successfully used to separate the Higgs signal from the background in the CMS experiment.
Resumo:
1. Resilience-based approaches are increasingly being called upon to inform ecosystem management, particularly in arid and semi-arid regions. This requires management frameworks that can assess ecosystem dynamics, both within and between alternative states, at relevant time scales. 2. We analysed long-term vegetation records from two representative sites in the North American sagebrush-steppe ecosystem, spanning nine decades, to determine if empirical patterns were consistent with resilience theory, and to determine if cheatgrass Bromus tectorum invasion led to thresholds as currently envisioned by expert-based state-and-transition models (STM). These data span the entire history of cheatgrass invasion at these sites and provide a unique opportunity to assess the impacts of biotic invasion on ecosystem resilience. 3. We used univariate and multivariate statistical tools to identify unique plant communities and document the magnitude, frequency and directionality of community transitions through time. Community transitions were characterized by 37-47% dissimilarity in species composition, they were not evenly distributed through time, their frequency was not correlated with precipitation, and they could not be readily attributed to fire or grazing. Instead, at both sites, the majority of community transitions occurred within an 8-10year period of increasing cheatgrass density, became infrequent after cheatgrass density peaked, and thereafter transition frequency declined. 4. Greater cheatgrass density, replacement of native species and indication of asymmetry in community transitions suggest that thresholds may have been exceeded in response to cheatgrass invasion at one site (more arid), but not at the other site (less arid). Asymmetry in the direction of community transitions also identified communities that were at-risk' of cheatgrass invasion, as well as potential restoration pathways for recovery of pre-invasion states. 5. Synthesis and applications. These results illustrate the complexities associated with threshold identification, and indicate that criteria describing the frequency, magnitude, directionality and temporal scale of community transitions may provide greater insight into resilience theory and its application for ecosystem management. These criteria are likely to vary across biogeographic regions that are susceptible to cheatgrass invasion, and necessitate more in-depth assessments of thresholds and alternative states, than currently available.
Resumo:
Approximate Bayesian computation (ABC) is a popular technique for analysing data for complex models where the likelihood function is intractable. It involves using simulation from the model to approximate the likelihood, with this approximate likelihood then being used to construct an approximate posterior. In this paper, we consider methods that estimate the parameters by maximizing the approximate likelihood used in ABC. We give a theoretical analysis of the asymptotic properties of the resulting estimator. In particular, we derive results analogous to those of consistency and asymptotic normality for standard maximum likelihood estimation. We also discuss how sequential Monte Carlo methods provide a natural method for implementing our likelihood-based ABC procedures.
Resumo:
The stress release model, a stochastic version of the elastic rebound theory, is applied to the large events from four synthetic earthquake catalogs generated by models with various levels of disorder in distribution of fault zone strength (Ben-Zion, 1996) They include models with uniform properties (U), a Parkfield-type asperity (A), fractal brittle properties (F), and multi-size-scale heterogeneities (M). The results show that the degree of regularity or predictability in the assumed fault properties, based on both the Akaike information criterion and simulations, follows the order U, F, A, and M, which is in good agreement with that obtained by pattern recognition techniques applied to the full set of synthetic data. Data simulated from the best fitting stress release models reproduce, both visually and in distributional terms, the main features of the original catalogs. The differences in character and the quality of prediction between the four cases are shown to be dependent on two main aspects: the parameter controlling the sensitivity to departures from the mean stress level and the frequency-magnitude distribution, which differs substantially between the four cases. In particular, it is shown that the predictability of the data is strongly affected by the form of frequency-magnitude distribution, being greatly reduced if a pure Gutenburg-Richter form is assumed to hold out to high magnitudes.
Resumo:
The learning of probability distributions from data is a ubiquitous problem in the fields of Statistics and Artificial Intelligence. During the last decades several learning algorithms have been proposed to learn probability distributions based on decomposable models due to their advantageous theoretical properties. Some of these algorithms can be used to search for a maximum likelihood decomposable model with a given maximum clique size, k, which controls the complexity of the model. Unfortunately, the problem of learning a maximum likelihood decomposable model given a maximum clique size is NP-hard for k > 2. In this work, we propose a family of algorithms which approximates this problem with a computational complexity of O(k · n^2 log n) in the worst case, where n is the number of implied random variables. The structures of the decomposable models that solve the maximum likelihood problem are called maximal k-order decomposable graphs. Our proposals, called fractal trees, construct a sequence of maximal i-order decomposable graphs, for i = 2, ..., k, in k − 1 steps. At each step, the algorithms follow a divide-and-conquer strategy based on the particular features of this type of structures. Additionally, we propose a prune-and-graft procedure which transforms a maximal k-order decomposable graph into another one, increasing its likelihood. We have implemented two particular fractal tree algorithms called parallel fractal tree and sequential fractal tree. These algorithms can be considered a natural extension of Chow and Liu’s algorithm, from k = 2 to arbitrary values of k. Both algorithms have been compared against other efficient approaches in artificial and real domains, and they have shown a competitive behavior to deal with the maximum likelihood problem. Due to their low computational complexity they are especially recommended to deal with high dimensional domains.
Resumo:
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.
It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.
The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.
Resumo:
A nonparametric Bayesian extension of Factor Analysis (FA) is proposed where observed data $\mathbf{Y}$ is modeled as a linear superposition, $\mathbf{G}$, of a potentially infinite number of hidden factors, $\mathbf{X}$. The Indian Buffet Process (IBP) is used as a prior on $\mathbf{G}$ to incorporate sparsity and to allow the number of latent features to be inferred. The model's utility for modeling gene expression data is investigated using randomly generated data sets based on a known sparse connectivity matrix for E. Coli, and on three biological data sets of increasing complexity.
Resumo:
Vibration and acoustic analysis at higher frequencies faces two challenges: computing the response without using an excessive number of degrees of freedom, and quantifying its uncertainty due to small spatial variations in geometry, material properties and boundary conditions. Efficient models make use of the observation that when the response of a decoupled vibro-acoustic subsystem is sufficiently sensitive to uncertainty in such spatial variations, the local statistics of its natural frequencies and mode shapes saturate to universal probability distributions. This holds irrespective of the causes that underly these spatial variations and thus leads to a nonparametric description of uncertainty. This work deals with the identification of uncertain parameters in such models by using experimental data. One of the difficulties is that both experimental errors and modeling errors, due to the nonparametric uncertainty that is inherent to the model type, are present. This is tackled by employing a Bayesian inference strategy. The prior probability distribution of the uncertain parameters is constructed using the maximum entropy principle. The likelihood function that is subsequently computed takes the experimental information, the experimental errors and the modeling errors into account. The posterior probability distribution, which is computed with the Markov Chain Monte Carlo method, provides a full uncertainty quantification of the identified parameters, and indicates how well their uncertainty is reduced, with respect to the prior information, by the experimental data. © 2013 Taylor & Francis Group, London.
Resumo:
The accuracy of two satellite models of marine primary (PP) and new production (NP) were assessed against 14C and 15N uptake measurements taken during six research cruises in the northern North Atlantic. The wavelength resolving model (WRM) was more accurate than the Vertical General Production Model (VGPM) for computation of both PP and NP. Mean monthly satellite maps of PP and NP for both models were generated from 1997 to 2010 using SeaWiFS data for the Irminger basin and North Atlantic. Intra- and inter-annual variability of the two models was compared in six hydrographic zones. Both models exhibited similar spatio-temporal patterns: PP and NP increased from April to June and decreased by August. Higher values were associated with the East Greenland Current (EGC), Iceland Basin (ICB) and the Reykjanes Ridge (RKR) and lower values occurred in the Central Irminger Current (CIC), North Irminger Current (NIC) and Southern Irminger Current (SIC). The annual PP and NP over the SeaWiFS record was 258 and 82 gC m-2 yr-1 respectively for the VGPM and 190 and 41 gC m-2 yr-1 for the WRM. Average annual cumulative sum in the anomalies of NP for the VGPM were positively correlated with the North Atlantic Oscillation (NAO) in the EGC, CIC and SIC and negatively correlated with the multivariate ENSO index (MEI) in the ICB. By contrast, cumulative sum of the anomalies of NP for the WRM were significantly correlated with NAO only in the EGC and CIC. NP from both VGPM and WRM exhibited significant negative correlations with Arctic Oscillation (AO) in all hydrographic zones. The differences in estimates of PP and NP in these hydrographic zones arise principally from the parameterisation of the euphotic depth and the SST dependence of photo-physiological term in the VGPM, which has a greater sensitivity to variations in temperature than the WRM. In waters of 0 to 5C PP using the VGPM was 43% higher than WRM, from 5 to 10C the VGPM was 29% higher and from 10 to 15C the VGPM was 27% higher.
Resumo:
Ecosystems consist of complex dynamic interactions among species and the environment, the understanding of which has implications for predicting the environmental response to changes in climate and biodiversity. However, with the recent adoption of more explorative tools, like Bayesian networks, in predictive ecology, few assumptions can be made about the data and complex, spatially varying interactions can be recovered from collected field data. In this study, we compare Bayesian network modelling approaches accounting for latent effects to reveal species dynamics for 7 geographically and temporally varied areas within the North Sea. We also apply structure learning techniques to identify functional relationships such as prey–predator between trophic groups of species that vary across space and time. We examine if the use of a general hidden variable can reflect overall changes in the trophic dynamics of each spatial system and whether the inclusion of a specific hidden variable can model unmeasured group of species. The general hidden variable appears to capture changes in the variance of different groups of species biomass. Models that include both general and specific hidden variables resulted in identifying similarity with the underlying food web dynamics and modelling spatial unmeasured effect. We predict the biomass of the trophic groups and find that predictive accuracy varies with the models' features and across the different spatial areas thus proposing a model that allows for spatial autocorrelation and two hidden variables. Our proposed model was able to produce novel insights on this ecosystem's dynamics and ecological interactions mainly because we account for the heterogeneous nature of the driving factors within each area and their changes over time. Our findings demonstrate that accounting for additional sources of variation, by combining structure learning from data and experts' knowledge in the model architecture, has the potential for gaining deeper insights into the structure and stability of ecosystems. Finally, we were able to discover meaningful functional networks that were spatially and temporally differentiated with the particular mechanisms varying from trophic associations through interactions with climate and commercial fisheries.
Willingness to Pay for Rural Landscape Improvements: Combining Mixed Logit and Random-Effects Models
Resumo:
This paper reports the findings from a discrete-choice experiment designed to estimate the economic benefits associated with rural landscape improvements in Ireland. Using a mixed logit model, the panel nature of the dataset is exploited to retrieve willingness-to-pay values for every individual in the sample. This departs from customary approaches in which the willingness-to-pay estimates are normally expressed as measures of central tendency of an a priori distribution. Random-effects models for panel data are subsequently used to identify the determinants of the individual-specific willingness-to-pay estimates. In comparison with the standard methods used to incorporate individual-specific variables into the analysis of discrete-choice experiments, the analytical approach outlined in this paper is shown to add considerable explanatory power to the welfare estimates.
Resumo:
We present results for a suite of 14 three-dimensional, high-resolution hydrodynamical simulations of delayed-detonation models of Type Ia supernova (SN Ia) explosions. This model suite comprises the first set of three-dimensional SN Ia simulations with detailed isotopic yield information. As such, it may serve as a data base for Chandrasekhar-mass delayed-detonation model nucleosynthetic yields and for deriving synthetic observables such as spectra and light curves. We employ aphysically motivated, stochastic model based on turbulent velocity fluctuations and fuel density to calculate in situ the deflagration-to-detonation transition probabilities. To obtain different strengths of the deflagration phase and thereby different degrees of pre-expansion, we have chosen a sequence of initial models with 1, 3, 5, 10, 20, 40, 100, 150, 200, 300 and 1600 (two different realizations) ignition kernels in a hydrostatic white dwarf with a central density of 2.9 × 10 g cm, as well as one high central density (5.5 × 10 g cm) and one low central density (1.0 × 10 g cm) rendition of the 100 ignition kernel configuration. For each simulation, we determined detailed nucleosynthetic yields by postprocessing10 tracer particles with a 384 nuclide reaction network. All delayed-detonation models result in explosions unbinding thewhite dwarf, producing a range of 56Ni masses from 0.32 to 1.11M. As a general trend, the models predict that the stableneutron-rich iron-group isotopes are not found at the lowest velocities, but rather at intermediate velocities (~3000×10 000 km s) in a shell surrounding a Ni-rich core. The models further predict relatively low-velocity oxygen and carbon, with typical minimum velocities around 4000 and 10 000 km s, respectively. © 2012 The Authors. Published by Oxford University Press on behalf of the Royal Astronomical Society.
Resumo:
Hidden Markov models (HMMs) are widely used models for sequential data. As with other probabilistic graphical models, they require the specification of precise probability values, which can be too restrictive for some domains, especially when data are scarce or costly to acquire. We present a generalized version of HMMs, whose quantification can be done by sets of, instead of single, probability distributions. Our models have the ability to suspend judgment when there is not enough statistical evidence, and can serve as a sensitivity analysis tool for standard non-stationary HMMs. Efficient inference algorithms are developed to address standard HMM usage such as the computation of likelihoods and most probable explanations. Experiments with real data show that the use of imprecise probabilities leads to more reliable inferences without compromising efficiency.
Resumo:
Successful innovation depends on knowledge – technological, strategic and market related. In this paper we explore the role and interaction of firms’ existing knowledge stocks and current knowledge flows in shaping innovation success. The paper contributes to our understanding of the determinants of firms’ innovation outputs and provides new information on the relationship between knowledge stocks, as measured by patents, and innovation output indicators. Our analysis uses innovation panel data relating to plants’ internal knowledge creation, external knowledge search and innovation outputs. Firm-level patent data is matched with this plant-level innovation panel data to provide a measure of firms’ knowledge stock. Two substantive conclusions follow. First, existing knowledge stocks have weak negative rather than positive impacts on firms’ innovation outputs, reflecting potential core-rigidities or negative path dependencies rather than the accumulation of competitive advantages. Second, knowledge flows derived from internal investment and external search dominate the effect of existing knowledge stocks on innovation performance. Both results emphasize the importance of firms’ knowledge search strategies. Our results also re-emphasize the potential issues which arise when using patents as a measure of innovation.
Resumo:
Hidden Markov models (HMMs) are widely used probabilistic models of sequential data. As with other probabilistic models, they require the specification of local conditional probability distributions, whose assessment can be too difficult and error-prone, especially when data are scarce or costly to acquire. The imprecise HMM (iHMM) generalizes HMMs by allowing the quantification to be done by sets of, instead of single, probability distributions. iHMMs have the ability to suspend judgment when there is not enough statistical evidence, and can serve as a sensitivity analysis tool for standard non-stationary HMMs. In this paper, we consider iHMMs under the strong independence interpretation, for which we develop efficient inference algorithms to address standard HMM usage such as the computation of likelihoods and most probable explanations, as well as performing filtering and predictive inference. Experiments with real data show that iHMMs produce more reliable inferences without compromising the computational efficiency.