948 resultados para inference problem
Resumo:
Differences-in-Differences (DID) is one of the most widely used identification strategies in applied economics. However, how to draw inferences in DID models when there are few treated groups remains an open question. We show that the usual inference methods used in DID models might not perform well when there are few treated groups and errors are heteroskedastic. In particular, we show that when there is variation in the number of observations per group, inference methods designed to work when there are few treated groups tend to (under-) over-reject the null hypothesis when the treated groups are (large) small relative to the control groups. This happens because larger groups tend to have lower variance, generating heteroskedasticity in the group x time aggregate DID model. We provide evidence from Monte Carlo simulations and from placebo DID regressions with the American Community Survey (ACS) and the Current Population Survey (CPS) datasets to show that this problem is relevant even in datasets with large numbers of observations per group. We then derive an alternative inference method that provides accurate hypothesis testing in situations where there are few treated groups (or even just one) and many control groups in the presence of heteroskedasticity. Our method assumes that we know how the heteroskedasticity is generated, which is the case when it is generated by variation in the number of observations per group. With many pre-treatment periods, we show that this assumption can be relaxed. Instead, we provide an alternative application of our method that relies on assumptions about stationarity and convergence of the moments of the time series. Finally, we consider two recent alternatives to DID when there are many pre-treatment groups. We extend our inference method to linear factor models when there are few treated groups. We also propose a permutation test for the synthetic control estimator that provided a better heteroskedasticity correction in our simulations than the test suggested by Abadie et al. (2010).
Resumo:
This paper describes a novel approach for mapping lightning processes using fuzzy logic. The core regarding lightning process is to identify and to model those uncertain information on mathematical principles. In fact, the lightning process involves several nonlinear features that our current mathematical tools would not be able to model. The estimation process has been carried out using a fuzzy system based on Sugeno's architecture. Simulation results confirm that proposed approach can be efficiently used in these types of problem.
Resumo:
This paper presents a new methodology for the adjustment of fuzzy inference systems. A novel approach, which uses unconstrained optimization techniques, is developed in order to adjust the free parameters of the fuzzy inference system, such as its intrinsic parameters of the membership function and the weights of the inference rules. This methodology is interesting, not only for the results presented and obtained through computer simulations, but also for its generality concerning to the kind of fuzzy inference system used. Therefore, this methodology is expandable either to the Mandani architecture or also to that suggested by Takagi-Sugeno. The validation of the presented methodology is accomplished through an estimation of time series. More specifically, the Mackey-Glass chaotic time series estimation is used for the validation of the proposed methodology.
Resumo:
This paper presents a new methodology for the adjustment of fuzzy inference systems, which uses technique based on error back-propagation method. The free parameters of the fuzzy inference system, such as its intrinsic parameters of the membership function and the weights of the inference rules, are automatically adjusted. This methodology is interesting, not only for the results presented and obtained through computer simulations, but also for its generality concerning to the kind of fuzzy inference system used. Therefore, this methodology is expandable either to the Mandani architecture or also to that suggested by Takagi-Sugeno. The validation of the presented methodology is accomplished through estimation of time series and by a mathematical modeling problem. More specifically, the Mackey-Glass chaotic time series is used for the validation of the proposed methodology. © Springer-Verlag Berlin Heidelberg 2007.
Resumo:
Neural networks are dynamic systems consisting of highly interconnected and parallel nonlinear processing elements that are shown to be extremely effective in computation. This paper presents an architecture of recurrent neural networks for solving the N-Queens problem. More specifically, a modified Hopfield network is developed and its internal parameters are explicitly computed using the valid-subspace technique. These parameters guarantee the convergence of the network to the equilibrium points, which represent a solution of the considered problem. The network is shown to be completely stable and globally convergent to the solutions of the N-Queens problem. A fuzzy logic controller is also incorporated in the network to minimize convergence time. Simulation results are presented to validate the proposed approach.
Resumo:
An important goal of Zebu breeding programs is to improve reproductive performance. A major problem faced with the genetic improvement of reproductive traits is that recording the time for an animal to reach sexual maturity is costly. Another issue is that accurate estimates of breeding values are obtained only a long time after the young bulls have gone through selection. An alternative to overcome these problems is to use traits that are indicators of the reproductive efficiency of the herd and are easier to measure, such as age at first calving. Another problem is that heifers that have conceived once may fail to conceive in the next breeding season, which increases production costs. Thus, increasing heifer's rebreeding rates should improve the economic efficiency of the herd. Response to selection for these traits tends to be slow, since they have a low heritability and phenotypic information is provided only later in the life of the animal. Genome-wide association studies (GWAS) are useful to investigate the genetic mechanisms that underlie these traits by identifying the genes and metabolic pathways involved. Data from 1853 females belonging to the Agricultural Jacarezinho LTDA were used. Genotyping was performed using the BovineHD BeadChip (777 962 single nucleotide polymorphisms (SNPs)) according to the protocol of Illumina - Infinium Assay II ® Multi-Sample HiScan with the unit SQ ™ System. After quality control, 305 348 SNPs were used for GWAS. Forty-two and 19 SNPs had a Bayes factor greater than 150 for heifer rebreeding and age at first calving, respectively. All significant SNPs for age at first calving were significant for heifer rebreeding. These 42 SNPs were next or within 35 genes that were distributed over 18 chromosomes and comprised 27 protein-encoding genes, six pseudogenes and two miscellaneous noncoding RNAs. The use of Bayes factor to determine the significance of SNPs allowed us to identify two sets of 42 and 19 significant SNPs for heifer rebreeding and age at first calving, respectively, which explain 11.35 % and 6.42 % of their phenotypic variance, respectively. These SNPs provide relevant information to help elucidate which genes affect these traits.
Resumo:
Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.
Resumo:
This thesis presents Bayesian solutions to inference problems for three types of social network data structures: a single observation of a social network, repeated observations on the same social network, and repeated observations on a social network developing through time. A social network is conceived as being a structure consisting of actors and their social interaction with each other. A common conceptualisation of social networks is to let the actors be represented by nodes in a graph with edges between pairs of nodes that are relationally tied to each other according to some definition. Statistical analysis of social networks is to a large extent concerned with modelling of these relational ties, which lends itself to empirical evaluation. The first paper deals with a family of statistical models for social networks called exponential random graphs that takes various structural features of the network into account. In general, the likelihood functions of exponential random graphs are only known up to a constant of proportionality. A procedure for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods is presented. The algorithm consists of two basic steps, one in which an ordinary Metropolis-Hastings up-dating step is used, and another in which an importance sampling scheme is used to calculate the acceptance probability of the Metropolis-Hastings step. In paper number two a method for modelling reports given by actors (or other informants) on their social interaction with others is investigated in a Bayesian framework. The model contains two basic ingredients: the unknown network structure and functions that link this unknown network structure to the reports given by the actors. These functions take the form of probit link functions. An intrinsic problem is that the model is not identified, meaning that there are combinations of values on the unknown structure and the parameters in the probit link functions that are observationally equivalent. Instead of using restrictions for achieving identification, it is proposed that the different observationally equivalent combinations of parameters and unknown structure be investigated a posteriori. Estimation of parameters is carried out using Gibbs sampling with a switching devise that enables transitions between posterior modal regions. The main goal of the procedures is to provide tools for comparisons of different model specifications. Papers 3 and 4, propose Bayesian methods for longitudinal social networks. The premise of the models investigated is that overall change in social networks occurs as a consequence of sequences of incremental changes. Models for the evolution of social networks using continuos-time Markov chains are meant to capture these dynamics. Paper 3 presents an MCMC algorithm for exploring the posteriors of parameters for such Markov chains. More specifically, the unobserved evolution of the network in-between observations is explicitly modelled thereby avoiding the need to deal with explicit formulas for the transition probabilities. This enables likelihood based parameter inference in a wider class of network evolution models than has been available before. Paper 4 builds on the proposed inference procedure of Paper 3 and demonstrates how to perform model selection for a class of network evolution models.
Resumo:
In this treatise we consider finite systems of branching particles where the particles move independently of each other according to d-dimensional diffusions. Particles are killed at a position dependent rate, leaving at their death position a random number of descendants according to a position dependent reproduction law. In addition particles immigrate at constant rate (one immigrant per immigration time). A process with above properties is called a branching diffusion withimmigration (BDI). In the first part we present the model in detail and discuss the properties of the BDI under our basic assumptions. In the second part we consider the problem of reconstruction of the trajectory of a BDI from discrete observations. We observe positions of the particles at discrete times; in particular we assume that we have no information about the pedigree of the particles. A natural question arises if we want to apply statistical procedures on the discrete observations: How can we find couples of particle positions which belong to the same particle? We give an easy to implement 'reconstruction scheme' which allows us to redraw or 'reconstruct' parts of the trajectory of the BDI with high accuracy. Moreover asymptotically the whole path can be reconstructed. Further we present simulations which show that our partial reconstruction rule is tractable in practice. In the third part we study how the partial reconstruction rule fits into statistical applications. As an extensive example we present a nonparametric estimator for the diffusion coefficient of a BDI where the particles move according to one-dimensional diffusions. This estimator is based on the Nadaraya-Watson estimator for the diffusion coefficient of one-dimensional diffusions and it uses the partial reconstruction rule developed in the second part above. We are able to prove a rate of convergence of this estimator and finally we present simulations which show that the estimator works well even if we leave our set of assumptions.
Resumo:
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.
Resumo:
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.
Resumo:
The compelling quality of the Global Change simulation study (Altemeyer, 2003), in which high RWA (right-wing authoritarianism)/high SDO (social dominance orientation) individuals produced poor outcomes for the planet, rests on the inference that the link between high RWA/SDO scores and disaster in the simulation can be generalized to real environmental and social situations. However, we argue that studies of the Person × Situation interaction are biased to overestimate the role of the individual variability. When variables are operationalized, strongly normative items are excluded because they are skewed and kurtotic. This occurs both in the measurement of predictor constructs, such as RWA, and in the outcome constructs, such as prejudice and war. Analyses of normal linear statistics highlight personality variables such as RWA, which produce variance, and overlook the role of norms, which produce invariance. Where both normative and personality forces are operating, as in intergroup contexts, the linear analysis generates statistics for the sample that disproportionately reflect the behavior of the deviant, antinormative minority and direct attention away from the baseline, normative position. The implications of these findings for the link between high RWA and disaster are discussed.
Resumo:
An improved inference method for densely connected systems is presented. The approach is based on passing condensed messages between variables, representing macroscopic averages of microscopic messages. We extend previous work that showed promising results in cases where the solution space is contiguous to cases where fragmentation occurs. We apply the method to the signal detection problem of Code Division Multiple Access (CDMA) for demonstrating its potential. A highly efficient practical algorithm is also derived on the basis of insight gained from the analysis. © EDP Sciences.
Resumo:
This work introduces a new variational Bayes data assimilation method for the stochastic estimation of precipitation dynamics using radar observations for short term probabilistic forecasting (nowcasting). A previously developed spatial rainfall model based on the decomposition of the observed precipitation field using a basis function expansion captures the precipitation intensity from radar images as a set of ‘rain cells’. The prior distributions for the basis function parameters are carefully chosen to have a conjugate structure for the precipitation field model to allow a novel variational Bayes method to be applied to estimate the posterior distributions in closed form, based on solving an optimisation problem, in a spirit similar to 3D VAR analysis, but seeking approximations to the posterior distribution rather than simply the most probable state. A hierarchical Kalman filter is used to estimate the advection field based on the assimilated precipitation fields at two times. The model is applied to tracking precipitation dynamics in a realistic setting, using UK Met Office radar data from both a summer convective event and a winter frontal event. The performance of the model is assessed both traditionally and using probabilistic measures of fit based on ROC curves. The model is shown to provide very good assimilation characteristics, and promising forecast skill. Improvements to the forecasting scheme are discussed
Resumo:
Inference and optimization of real-value edge variables in sparse graphs are studied using the Bethe approximation and replica method of statistical physics. Equilibrium states of general energy functions involving a large set of real edge variables that interact at the network nodes are obtained in various cases. When applied to the representative problem of network resource allocation, efficient distributed algorithms are also devised. Scaling properties with respect to the network connectivity and the resource availability are found, and links to probabilistic Bayesian approximation methods are established. Different cost measures are considered and algorithmic solutions in the various cases are devised and examined numerically. Simulation results are in full agreement with the theory. © 2007 The American Physical Society.