833 resultados para Multi-model inference
Resumo:
Complex networks can arise naturally and spontaneously from all things that act as a part of a larger system. From the patterns of socialization between people to the way biological systems organize themselves, complex networks are ubiquitous, but are currently poorly understood. A number of algorithms, designed by humans, have been proposed to describe the organizational behaviour of real-world networks. Consequently, breakthroughs in genetics, medicine, epidemiology, neuroscience, telecommunications and the social sciences have recently resulted. The algorithms, called graph models, represent significant human effort. Deriving accurate graph models is non-trivial, time-intensive, challenging and may only yield useful results for very specific phenomena. An automated approach can greatly reduce the human effort required and if effective, provide a valuable tool for understanding the large decentralized systems of interrelated things around us. To the best of the author's knowledge this thesis proposes the first method for the automatic inference of graph models for complex networks with varied properties, with and without community structure. Furthermore, to the best of the author's knowledge it is the first application of genetic programming for the automatic inference of graph models. The system and methodology was tested against benchmark data, and was shown to be capable of reproducing close approximations to well-known algorithms designed by humans. Furthermore, when used to infer a model for real biological data the resulting model was more representative than models currently used in the literature.
Object-Oriented Genetic Programming for the Automatic Inference of Graph Models for Complex Networks
Resumo:
Complex networks are systems of entities that are interconnected through meaningful relationships. The result of the relations between entities forms a structure that has a statistical complexity that is not formed by random chance. In the study of complex networks, many graph models have been proposed to model the behaviours observed. However, constructing graph models manually is tedious and problematic. Many of the models proposed in the literature have been cited as having inaccuracies with respect to the complex networks they represent. However, recently, an approach that automates the inference of graph models was proposed by Bailey [10] The proposed methodology employs genetic programming (GP) to produce graph models that approximate various properties of an exemplary graph of a targeted complex network. However, there is a great deal already known about complex networks, in general, and often specific knowledge is held about the network being modelled. The knowledge, albeit incomplete, is important in constructing a graph model. However it is difficult to incorporate such knowledge using existing GP techniques. Thus, this thesis proposes a novel GP system which can incorporate incomplete expert knowledge that assists in the evolution of a graph model. Inspired by existing graph models, an abstract graph model was developed to serve as an embryo for inferring graph models of some complex networks. The GP system and abstract model were used to reproduce well-known graph models. The results indicated that the system was able to evolve models that produced networks that had structural similarities to the networks generated by the respective target models.
Resumo:
The academic study of place has been generally defined by two distinct and highly refined discourses within outdoor recreation research: place attachment and sense of place. Place attachment generally describes the intensity of the place relationship, whereas sense of place approaches place from a more holistic and intimate orientation. This study bridges these two methodological and theoretical separate areas of place research together by re-conceptualizing the way in which place relationships are viewed within outdoor recreation research. The Psychological Continuum Model is used to extend the language of place attachment to incorporate more of the philosophy of sense of place while attending to the empirical strength and utility of place attachment. This extension results in the term place allegiance being coined to depict the strong and profound relationships outdoor recreationists build with their places of outdoor recreation. Using a concurrent mixed methods research design, this study explored place allegiance via an online survey (n = 437) and thirteen in-depth qualitative interviews with outdoor recreationists. Results indicate that place allegiance can be measured through a multi-dimensional model of place allegiance that incorporates behaviours, importance, resistance, knowledge and symbolic value. In addition, place allegiance was found to be related to an individual's influence on life course and his/her willingness to exhibit preservation and protection tendencies. Place allegiance plays an important role in acknowledging the importance of authentic place relationships in an effort to confront placelessness. Wilderness recreation is an important avenue for outdoor recreationists to build strong place relationships.
Resumo:
In the scope of the current thesis we review and analyse networks that are formed by nodes with several attributes. We suppose that different layers of communities are embedded in such networks, besides each of the layers is connected with nodes' attributes. For example, examine one of a variety of online social networks: an user participates in a plurality of different groups/communities – schoolfellows, colleagues, clients, etc. We introduce a detection algorithm for the above-mentioned communities. Normally the result of the detection is the community supplemented just by the most dominant attribute, disregarding others. We propose an algorithm that bypasses dominant communities and detects communities which are formed by other nodes' attributes. We also review formation models of the attributed networks and present a Human Communication Network (HCN) model. We introduce a High School Texting Network (HSTN) and examine our methods for that network.
Resumo:
In this paper, we develop finite-sample inference procedures for stationary and nonstationary autoregressive (AR) models. The method is based on special properties of Markov processes and a split-sample technique. The results on Markovian processes (intercalary independence and truncation) only require the existence of conditional densities. They are proved for possibly nonstationary and/or non-Gaussian multivariate Markov processes. In the context of a linear regression model with AR(1) errors, we show how these results can be used to simplify the distributional properties of the model by conditioning a subset of the data on the remaining observations. This transformation leads to a new model which has the form of a two-sided autoregression to which standard classical linear regression inference techniques can be applied. We show how to derive tests and confidence sets for the mean and/or autoregressive parameters of the model. We also develop a test on the order of an autoregression. We show that a combination of subsample-based inferences can improve the performance of the procedure. An application to U.S. domestic investment data illustrates the method.
Resumo:
We propose finite sample tests and confidence sets for models with unobserved and generated regressors as well as various models estimated by instrumental variables methods. The validity of the procedures is unaffected by the presence of identification problems or \"weak instruments\", so no detection of such problems is required. We study two distinct approaches for various models considered by Pagan (1984). The first one is an instrument substitution method which generalizes an approach proposed by Anderson and Rubin (1949) and Fuller (1987) for different (although related) problems, while the second one is based on splitting the sample. The instrument substitution method uses the instruments directly, instead of generated regressors, in order to test hypotheses about the \"structural parameters\" of interest and build confidence sets. The second approach relies on \"generated regressors\", which allows a gain in degrees of freedom, and a sample split technique. For inference about general possibly nonlinear transformations of model parameters, projection techniques are proposed. A distributional theory is obtained under the assumptions of Gaussian errors and strictly exogenous regressors. We show that the various tests and confidence sets proposed are (locally) \"asymptotically valid\" under much weaker assumptions. The properties of the tests proposed are examined in simulation experiments. In general, they outperform the usual asymptotic inference methods in terms of both reliability and power. Finally, the techniques suggested are applied to a model of Tobin’s q and to a model of academic performance.
Resumo:
It is well known that standard asymptotic theory is not valid or is extremely unreliable in models with identification problems or weak instruments [Dufour (1997, Econometrica), Staiger and Stock (1997, Econometrica), Wang and Zivot (1998, Econometrica), Stock and Wright (2000, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. One possible way out consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows one to build exact tests and confidence sets only for the full vector of the coefficients of the endogenous explanatory variables in a structural equation, which in general does not allow for individual coefficients. This problem may in principle be overcome by using projection techniques [Dufour (1997, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. AR-types are emphasized because they are robust to both weak instruments and instrument exclusion. However, these techniques can be implemented only by using costly numerical techniques. In this paper, we provide a complete analytic solution to the problem of building projection-based confidence sets from Anderson-Rubin-type confidence sets. The latter involves the geometric properties of “quadrics” and can be viewed as an extension of usual confidence intervals and ellipsoids. Only least squares techniques are required for building the confidence intervals. We also study by simulation how “conservative” projection-based confidence sets are. Finally, we illustrate the methods proposed by applying them to three different examples: the relationship between trade and growth in a cross-section of countries, returns to education, and a study of production functions in the U.S. economy.
Resumo:
We discuss statistical inference problems associated with identification and testability in econometrics, and we emphasize the common nature of the two issues. After reviewing the relevant statistical notions, we consider in turn inference in nonparametric models and recent developments on weakly identified models (or weak instruments). We point out that many hypotheses, for which test procedures are commonly proposed, are not testable at all, while some frequently used econometric methods are fundamentally inappropriate for the models considered. Such situations lead to ill-defined statistical problems and are often associated with a misguided use of asymptotic distributional results. Concerning nonparametric hypotheses, we discuss three basic problems for which such difficulties occur: (1) testing a mean (or a moment) under (too) weak distributional assumptions; (2) inference under heteroskedasticity of unknown form; (3) inference in dynamic models with an unlimited number of parameters. Concerning weakly identified models, we stress that valid inference should be based on proper pivotal functions —a condition not satisfied by standard Wald-type methods based on standard errors — and we discuss recent developments in this field, mainly from the viewpoint of building valid tests and confidence sets. The techniques discussed include alternative proposed statistics, bounds, projection, split-sampling, conditioning, Monte Carlo tests. The possibility of deriving a finite-sample distributional theory, robustness to the presence of weak instruments, and robustness to the specification of a model for endogenous explanatory variables are stressed as important criteria assessing alternative procedures.
Resumo:
We propose methods for testing hypotheses of non-causality at various horizons, as defined in Dufour and Renault (1998, Econometrica). We study in detail the case of VAR models and we propose linear methods based on running vector autoregressions at different horizons. While the hypotheses considered are nonlinear, the proposed methods only require linear regression techniques as well as standard Gaussian asymptotic distributional theory. Bootstrap procedures are also considered. For the case of integrated processes, we propose extended regression methods that avoid nonstandard asymptotics. The methods are applied to a VAR model of the U.S. economy.
Resumo:
The technique of Monte Carlo (MC) tests [Dwass (1957), Barnard (1963)] provides an attractive method of building exact tests from statistics whose finite sample distribution is intractable but can be simulated (provided it does not involve nuisance parameters). We extend this method in two ways: first, by allowing for MC tests based on exchangeable possibly discrete test statistics; second, by generalizing the method to statistics whose null distributions involve nuisance parameters (maximized MC tests, MMC). Simplified asymptotically justified versions of the MMC method are also proposed and it is shown that they provide a simple way of improving standard asymptotics and dealing with nonstandard asymptotics (e.g., unit root asymptotics). Parametric bootstrap tests may be interpreted as a simplified version of the MMC method (without the general validity properties of the latter).
Resumo:
Statistical tests in vector autoregressive (VAR) models are typically based on large-sample approximations, involving the use of asymptotic distributions or bootstrap techniques. After documenting that such methods can be very misleading even with fairly large samples, especially when the number of lags or the number of equations is not small, we propose a general simulation-based technique that allows one to control completely the level of tests in parametric VAR models. In particular, we show that maximized Monte Carlo tests [Dufour (2002)] can provide provably exact tests for such models, whether they are stationary or integrated. Applications to order selection and causality testing are considered as special cases. The technique developed is applied to quarterly and monthly VAR models of the U.S. economy, comprising income, money, interest rates and prices, over the period 1965-1996.
Resumo:
This paper constructs and estimates a sticky-price, Dynamic Stochastic General Equilibrium model with heterogenous production sectors. Sectors differ in price stickiness, capital-adjustment costs and production technology, and use output from each other as material and investment inputs following an Input-Output Matrix and Capital Flow Table that represent the U.S. economy. By relaxing the standard assumption of symmetry, this model allows different sectoral dynamics in response to monetary policy shocks. The model is estimated by Simulated Method of Moments using sectoral and aggregate U.S. time series. Results indicate 1) substantial heterogeneity in price stickiness across sectors, with quantitatively larger differences between services and goods than previously found in micro studies that focus on final goods alone, 2) a strong sensitivity to monetary policy shocks on the part of construction and durable manufacturing, and 3) similar quantitative predictions at the aggregate level by the multi-sector model and a standard model that assumes symmetry across sectors.
Resumo:
Cette thèse porte sur le rôle de l’espace dans l’organisation et dans la dynamique des communautés écologiques multi-espèces. Deux carences peuvent être identifiées dans les études théoriques actuelles portant sur la dimension spatiale des communautés écologiques : l’insuffisance de modèles multi-espèces représentant la dimension spatiale explicitement, et le manque d’attention portée aux interactions positives, tel le mutualisme, en dépit de la reconnaissance de leur ubiquité dans les systèmes écologiques. Cette thèse explore cette problématique propre à l’écologie des communautés, en utilisant une approche théorique s’inspirant de la théorie des systèmes complexes et de la mécanique statistique. Selon cette approche, les communautés d’espèces sont considérées comme des systèmes complexes dont les propriétés globales émergent des interactions locales entre les organismes qui les composent, et des interactions locales entre ces organismes et leur environnement. Le premier objectif de cette thèse est de développer un modèle de métacommunauté multi-espèces, explicitement spatial, orienté à l’échelle des individus et basé sur un réseau d’interactions interspécifiques générales comprenant à la fois des interactions d’exploitation, de compétition et de mutualisme. Dans ce modèle, les communautés locales sont formées par un processus d’assemblage des espèces à partir d’un réservoir régional. La croissance des populations est restreinte par une capacité limite et leur dynamique évolue suivant des mécanismes simples de reproduction et de dispersion des individus. Ces mécanismes sont dépendants des conditions biotiques et abiotiques des communautés locales et leur effet varie en fonction des espèces, du temps et de l’espace. Dans un deuxième temps, cette thèse a pour objectif de déterminer l’impact d’une connectivité spatiale croissante sur la dynamique spatiotemporelle et sur les propriétés structurelles et fonctionnelles de cette métacommunauté. Plus précisément, nous évaluons différentes propriétés des communautés en fonction du niveau de dispersion des espèces : i) la similarité dans la composition des communautés locales et ses patrons de corrélations spatiales; ii) la biodiversité locale et régionale, et la distribution locale de l’abondance des espèces; iii) la biomasse, la productivité et la stabilité dynamique aux échelles locale et régionale; et iv) la structure locale des interactions entre les espèces. Ces propriétés sont examinées selon deux schémas spatiaux. D’abord nous employons un environnement homogène et ensuite nous employons un environnement hétérogène où la capacité limite des communautés locales évoluent suivant un gradient. De façon générale, nos résultats révèlent que les communautés écologiques spatialement distribuées sont extrêmement sensibles aux modes et aux niveaux de dispersion des organismes. Leur dynamique spatiotemporelle et leurs propriétés structurelles et fonctionnelles peuvent subir des changements profonds sous forme de transitions significatives suivant une faible variation du niveau de dispersion. Ces changements apparaissent aussi par l’émergence de patrons spatiotemporels dans la distribution spatiale des populations qui sont typiques des transitions de phases observées généralement dans les systèmes physiques. La dynamique de la métacommunauté présente deux régimes. Dans le premier régime, correspondant aux niveaux faibles de dispersion des espèces, la dynamique d’assemblage favorise l’émergence de communautés stables, peu diverses et formées d’espèces abondantes et fortement mutualistes. La métacommunauté possède une forte diversité régionale puisque les communautés locales sont faiblement connectées et que leur composition demeure ainsi distincte. Par ailleurs dans le second régime, correspondant aux niveaux élevés de dispersion, la diversité régionale diminue au profit d’une augmentation de la diversité locale. Les communautés locales sont plus productives mais leur stabilité dynamique est réduite suite à la migration importante d’individus. Ce régime est aussi caractérisé par des assemblages incluant une plus grande diversité d’interactions interspécifiques. Ces résultats suggèrent qu’une augmentation du niveau de dispersion des organismes permet de coupler les communautés locales entre elles ce qui accroît la coexistence locale et favorise la formation de communautés écologiques plus riches et plus complexes. Finalement, notre étude suggère que le mutualisme est fondamentale à l’organisation et au maintient des communautés écologiques. Les espèces mutualistes dominent dans les habitats caractérisés par une capacité limite restreinte et servent d’ingénieurs écologiques en facilitant l’établissement de compétiteurs, prédateurs et opportunistes qui bénéficient de leur présence.
Resumo:
Bien que les champignons soient régulièrement utilisés comme modèle d'étude des systèmes eucaryotes, leurs relations phylogénétiques soulèvent encore des questions controversées. Parmi celles-ci, la classification des zygomycètes reste inconsistante. Ils sont potentiellement paraphylétiques, i.e. regroupent de lignées fongiques non directement affiliées. La position phylogénétique du genre Schizosaccharomyces est aussi controversée: appartient-il aux Taphrinomycotina (précédemment connus comme archiascomycetes) comme prédit par l'analyse de gènes nucléaires, ou est-il plutôt relié aux Saccharomycotina (levures bourgeonnantes) tel que le suggère la phylogénie mitochondriale? Une autre question concerne la position phylogénétique des nucléariides, un groupe d'eucaryotes amiboïdes que l'on suppose étroitement relié aux champignons. Des analyses multi-gènes réalisées antérieurement n'ont pu conclure, étant donné le choix d'un nombre réduit de taxons et l'utilisation de six gènes nucléaires seulement. Nous avons abordé ces questions par le biais d'inférences phylogénétiques et tests statistiques appliqués à des assemblages de données phylogénomiques nucléaires et mitochondriales. D'après nos résultats, les zygomycètes sont paraphylétiques (Chapitre 2) bien que le signal phylogénétique issu du jeu de données mitochondriales disponibles est insuffisant pour résoudre l'ordre de cet embranchement avec une confiance statistique significative. Dans le Chapitre 3, nous montrons à l'aide d'un jeu de données nucléaires important (plus de cent protéines) et avec supports statistiques concluants, que le genre Schizosaccharomyces appartient aux Taphrinomycotina. De plus, nous démontrons que le regroupement conflictuel des Schizosaccharomyces avec les Saccharomycotina, venant des données mitochondriales, est le résultat d'un type d'erreur phylogénétique connu: l'attraction des longues branches (ALB), un artéfact menant au regroupement d'espèces dont le taux d'évolution rapide n'est pas représentatif de leur véritable position dans l'arbre phylogénétique. Dans le Chapitre 4, en utilisant encore un important jeu de données nucléaires, nous démontrons avec support statistique significatif que les nucleariides constituent le groupe lié de plus près aux champignons. Nous confirmons aussi la paraphylie des zygomycètes traditionnels tel que suggéré précédemment, avec support statistique significatif, bien que ne pouvant placer tous les membres du groupe avec confiance. Nos résultats remettent en cause des aspects d'une récente reclassification taxonomique des zygomycètes et de leurs voisins, les chytridiomycètes. Contrer ou minimiser les artéfacts phylogénétiques telle l'attraction des longues branches (ALB) constitue une question récurrente majeure. Dans ce sens, nous avons développé une nouvelle méthode (Chapitre 5) qui identifie et élimine dans une séquence les sites présentant une grande variation du taux d'évolution (sites fortement hétérotaches - sites HH); ces sites sont connus comme contribuant significativement au phénomène d'ALB. Notre méthode est basée sur un test de rapport de vraisemblance (likelihood ratio test, LRT). Deux jeux de données publiés précédemment sont utilisés pour démontrer que le retrait graduel des sites HH chez les espèces à évolution accélérée (sensibles à l'ALB) augmente significativement le support pour la topologie « vraie » attendue, et ce, de façon plus efficace comparée à d'autres méthodes publiées de retrait de sites de séquences. Néanmoins, et de façon générale, la manipulation de données préalable à l'analyse est loin d’être idéale. Les développements futurs devront viser l'intégration de l'identification et la pondération des sites HH au processus d'inférence phylogénétique lui-même.
Resumo:
We study the workings of the factor analysis of high-dimensional data using artificial series generated from a large, multi-sector dynamic stochastic general equilibrium (DSGE) model. The objective is to use the DSGE model as a laboratory that allow us to shed some light on the practical benefits and limitations of using factor analysis techniques on economic data. We explain in what sense the artificial data can be thought of having a factor structure, study the theoretical and finite sample properties of the principal components estimates of the factor space, investigate the substantive reason(s) for the good performance of di¤usion index forecasts, and assess the quality of the factor analysis of highly dissagregated data. In all our exercises, we explain the precise relationship between the factors and the basic macroeconomic shocks postulated by the model.