878 resultados para Data-Information-Knowledge Chain
Resumo:
This paper investigates the role of learning by private agents and the central bank(two-sided learning) in a New Keynesian framework in which both sides of the economyhave asymmetric and imperfect knowledge about the true data generating process. Weassume that all agents employ the data that they observe (which may be distinct fordifferent sets of agents) to form beliefs about unknown aspects of the true model ofthe economy, use their beliefs to decide on actions, and revise these beliefs througha statistical learning algorithm as new information becomes available. We study theshort-run dynamics of our model and derive its policy recommendations, particularlywith respect to central bank communications. We demonstrate that two-sided learningcan generate substantial increases in volatility and persistence, and alter the behaviorof the variables in the model in a significant way. Our simulations do not convergeto a symmetric rational expectations equilibrium and we highlight one source thatinvalidates the convergence results of Marcet and Sargent (1989). Finally, we identifya novel aspect of central bank communication in models of learning: communicationcan be harmful if the central bank's model is substantially mis-specified.
Resumo:
We perform an experiment on a pure coordination game with uncertaintyabout the payoffs. Our game is closely related to models that have beenused in many macroeconomic and financial applications to solve problemsof equilibrium indeterminacy. In our experiment each subject receives anoisy signal about the true payoffs. This game has a unique strategyprofile that survives the iterative deletion of strictly dominatedstrategies (thus a unique Nash equilibrium). The equilibrium outcomecoincides, on average, with the risk-dominant equilibrium outcome ofthe underlying coordination game. The behavior of the subjects convergesto the theoretical prediction after enough experience has been gained. The data (and the comments) suggest that subjects do not apply through"a priori" reasoning the iterated deletion of dominated strategies.Instead, they adapt to the responses of other players. Thus, the lengthof the learning phase clearly varies for the different signals. We alsotest behavior in a game without uncertainty as a benchmark case. The gamewith uncertainty is inspired by the "global" games of Carlsson and VanDamme (1993).
Resumo:
In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.
Resumo:
This paper explores the possibility of using data from social bookmarking services to measure the use of information by academic researchers. Social bookmarking data can be used to augment participative methods (e.g. interviews and surveys) and other, non-participative methods (e.g. citation analysis and transaction logs) to measure the use of scholarly information. We use BibSonomy, a free resource-sharing system, as a case study. Results show that published journal articles are by far the most popular type of source bookmarked, followed by conference proceedings and books. Commercial journal publisher platforms are the most popular type of information resource bookmarked, followed by websites, records in databases and digital repositories. Usage of open access information resources is low in comparison with toll access journals. In the case of open access repositories, there is a marked preference for the use of subject-based repositories over institutional repositories. The results are consistent with those observed in related studies based on surveys and citation analysis, confirming the possible use of bookmarking data in studies of information behaviour in academic settings. The main advantages of using social bookmarking data are that is an unobtrusive approach, it captures the reading habits of researchers who are not necessarily authors, and data are readily available. The main limitation is that a significant amount of human resources is required in cleaning and standardizing the data.
Resumo:
L'utilisation efficace des systèmes géothermaux, la séquestration du CO2 pour limiter le changement climatique et la prévention de l'intrusion d'eau salée dans les aquifères costaux ne sont que quelques exemples qui démontrent notre besoin en technologies nouvelles pour suivre l'évolution des processus souterrains à partir de la surface. Un défi majeur est d'assurer la caractérisation et l'optimisation des performances de ces technologies à différentes échelles spatiales et temporelles. Les méthodes électromagnétiques (EM) d'ondes planes sont sensibles à la conductivité électrique du sous-sol et, par conséquent, à la conductivité électrique des fluides saturant la roche, à la présence de fractures connectées, à la température et aux matériaux géologiques. Ces méthodes sont régies par des équations valides sur de larges gammes de fréquences, permettant détudier de manières analogues des processus allant de quelques mètres sous la surface jusqu'à plusieurs kilomètres de profondeur. Néanmoins, ces méthodes sont soumises à une perte de résolution avec la profondeur à cause des propriétés diffusives du champ électromagnétique. Pour cette raison, l'estimation des modèles du sous-sol par ces méthodes doit prendre en compte des informations a priori afin de contraindre les modèles autant que possible et de permettre la quantification des incertitudes de ces modèles de façon appropriée. Dans la présente thèse, je développe des approches permettant la caractérisation statique et dynamique du sous-sol à l'aide d'ondes EM planes. Dans une première partie, je présente une approche déterministe permettant de réaliser des inversions répétées dans le temps (time-lapse) de données d'ondes EM planes en deux dimensions. Cette stratégie est basée sur l'incorporation dans l'algorithme d'informations a priori en fonction des changements du modèle de conductivité électrique attendus. Ceci est réalisé en intégrant une régularisation stochastique et des contraintes flexibles par rapport à la gamme des changements attendus en utilisant les multiplicateurs de Lagrange. J'utilise des normes différentes de la norme l2 pour contraindre la structure du modèle et obtenir des transitions abruptes entre les régions du model qui subissent des changements dans le temps et celles qui n'en subissent pas. Aussi, j'incorpore une stratégie afin d'éliminer les erreurs systématiques de données time-lapse. Ce travail a mis en évidence l'amélioration de la caractérisation des changements temporels par rapport aux approches classiques qui réalisent des inversions indépendantes à chaque pas de temps et comparent les modèles. Dans la seconde partie de cette thèse, j'adopte un formalisme bayésien et je teste la possibilité de quantifier les incertitudes sur les paramètres du modèle dans l'inversion d'ondes EM planes. Pour ce faire, je présente une stratégie d'inversion probabiliste basée sur des pixels à deux dimensions pour des inversions de données d'ondes EM planes et de tomographies de résistivité électrique (ERT) séparées et jointes. Je compare les incertitudes des paramètres du modèle en considérant différents types d'information a priori sur la structure du modèle et différentes fonctions de vraisemblance pour décrire les erreurs sur les données. Les résultats indiquent que la régularisation du modèle est nécessaire lorsqu'on a à faire à un large nombre de paramètres car cela permet d'accélérer la convergence des chaînes et d'obtenir des modèles plus réalistes. Cependent, ces contraintes mènent à des incertitudes d'estimations plus faibles, ce qui implique des distributions a posteriori qui ne contiennent pas le vrai modèledans les régions ou` la méthode présente une sensibilité limitée. Cette situation peut être améliorée en combinant des méthodes d'ondes EM planes avec d'autres méthodes complémentaires telles que l'ERT. De plus, je montre que le poids de régularisation des paramètres et l'écart-type des erreurs sur les données peuvent être retrouvés par une inversion probabiliste. Finalement, j'évalue la possibilité de caractériser une distribution tridimensionnelle d'un panache de traceur salin injecté dans le sous-sol en réalisant une inversion probabiliste time-lapse tridimensionnelle d'ondes EM planes. Etant donné que les inversions probabilistes sont très coûteuses en temps de calcul lorsque l'espace des paramètres présente une grande dimension, je propose une stratégie de réduction du modèle ou` les coefficients de décomposition des moments de Legendre du panache de traceur injecté ainsi que sa position sont estimés. Pour ce faire, un modèle de résistivité de base est nécessaire. Il peut être obtenu avant l'expérience time-lapse. Un test synthétique montre que la méthodologie marche bien quand le modèle de résistivité de base est caractérisé correctement. Cette méthodologie est aussi appliquée à un test de trac¸age par injection d'une solution saline et d'acides réalisé dans un système géothermal en Australie, puis comparée à une inversion time-lapse tridimensionnelle réalisée selon une approche déterministe. L'inversion probabiliste permet de mieux contraindre le panache du traceur salin gr^ace à la grande quantité d'informations a priori incluse dans l'algorithme. Néanmoins, les changements de conductivités nécessaires pour expliquer les changements observés dans les données sont plus grands que ce qu'expliquent notre connaissance actuelle des phénomenès physiques. Ce problème peut être lié à la qualité limitée du modèle de résistivité de base utilisé, indiquant ainsi que des efforts plus grands devront être fournis dans le futur pour obtenir des modèles de base de bonne qualité avant de réaliser des expériences dynamiques. Les études décrites dans cette thèse montrent que les méthodes d'ondes EM planes sont très utiles pour caractériser et suivre les variations temporelles du sous-sol sur de larges échelles. Les présentes approches améliorent l'évaluation des modèles obtenus, autant en termes d'incorporation d'informations a priori, qu'en termes de quantification d'incertitudes a posteriori. De plus, les stratégies développées peuvent être appliquées à d'autres méthodes géophysiques, et offrent une grande flexibilité pour l'incorporation d'informations additionnelles lorsqu'elles sont disponibles. -- The efficient use of geothermal systems, the sequestration of CO2 to mitigate climate change, and the prevention of seawater intrusion in coastal aquifers are only some examples that demonstrate the need for novel technologies to monitor subsurface processes from the surface. A main challenge is to assure optimal performance of such technologies at different temporal and spatial scales. Plane-wave electromagnetic (EM) methods are sensitive to subsurface electrical conductivity and consequently to fluid conductivity, fracture connectivity, temperature, and rock mineralogy. These methods have governing equations that are the same over a large range of frequencies, thus allowing to study in an analogous manner processes on scales ranging from few meters close to the surface down to several hundreds of kilometers depth. Unfortunately, they suffer from a significant resolution loss with depth due to the diffusive nature of the electromagnetic fields. Therefore, estimations of subsurface models that use these methods should incorporate a priori information to better constrain the models, and provide appropriate measures of model uncertainty. During my thesis, I have developed approaches to improve the static and dynamic characterization of the subsurface with plane-wave EM methods. In the first part of this thesis, I present a two-dimensional deterministic approach to perform time-lapse inversion of plane-wave EM data. The strategy is based on the incorporation of prior information into the inversion algorithm regarding the expected temporal changes in electrical conductivity. This is done by incorporating a flexible stochastic regularization and constraints regarding the expected ranges of the changes by using Lagrange multipliers. I use non-l2 norms to penalize the model update in order to obtain sharp transitions between regions that experience temporal changes and regions that do not. I also incorporate a time-lapse differencing strategy to remove systematic errors in the time-lapse inversion. This work presents improvements in the characterization of temporal changes with respect to the classical approach of performing separate inversions and computing differences between the models. In the second part of this thesis, I adopt a Bayesian framework and use Markov chain Monte Carlo (MCMC) simulations to quantify model parameter uncertainty in plane-wave EM inversion. For this purpose, I present a two-dimensional pixel-based probabilistic inversion strategy for separate and joint inversions of plane-wave EM and electrical resistivity tomography (ERT) data. I compare the uncertainties of the model parameters when considering different types of prior information on the model structure and different likelihood functions to describe the data errors. The results indicate that model regularization is necessary when dealing with a large number of model parameters because it helps to accelerate the convergence of the chains and leads to more realistic models. These constraints also lead to smaller uncertainty estimates, which imply posterior distributions that do not include the true underlying model in regions where the method has limited sensitivity. This situation can be improved by combining planewave EM methods with complimentary geophysical methods such as ERT. In addition, I show that an appropriate regularization weight and the standard deviation of the data errors can be retrieved by the MCMC inversion. Finally, I evaluate the possibility of characterizing the three-dimensional distribution of an injected water plume by performing three-dimensional time-lapse MCMC inversion of planewave EM data. Since MCMC inversion involves a significant computational burden in high parameter dimensions, I propose a model reduction strategy where the coefficients of a Legendre moment decomposition of the injected water plume and its location are estimated. For this purpose, a base resistivity model is needed which is obtained prior to the time-lapse experiment. A synthetic test shows that the methodology works well when the base resistivity model is correctly characterized. The methodology is also applied to an injection experiment performed in a geothermal system in Australia, and compared to a three-dimensional time-lapse inversion performed within a deterministic framework. The MCMC inversion better constrains the water plumes due to the larger amount of prior information that is included in the algorithm. The conductivity changes needed to explain the time-lapse data are much larger than what is physically possible based on present day understandings. This issue may be related to the base resistivity model used, therefore indicating that more efforts should be given to obtain high-quality base models prior to dynamic experiments. The studies described herein give clear evidence that plane-wave EM methods are useful to characterize and monitor the subsurface at a wide range of scales. The presented approaches contribute to an improved appraisal of the obtained models, both in terms of the incorporation of prior information in the algorithms and the posterior uncertainty quantification. In addition, the developed strategies can be applied to other geophysical methods, and offer great flexibility to incorporate additional information when available.
Resumo:
Time-lapse geophysical measurements are widely used to monitor the movement of water and solutes through the subsurface. Yet commonly used deterministic least squares inversions typically suffer from relatively poor mass recovery, spread overestimation, and limited ability to appropriately estimate nonlinear model uncertainty. We describe herein a novel inversion methodology designed to reconstruct the three-dimensional distribution of a tracer anomaly from geophysical data and provide consistent uncertainty estimates using Markov chain Monte Carlo simulation. Posterior sampling is made tractable by using a lower-dimensional model space related both to the Legendre moments of the plume and to predefined morphological constraints. Benchmark results using cross-hole ground-penetrating radar travel times measurements during two synthetic water tracer application experiments involving increasingly complex plume geometries show that the proposed method not only conserves mass but also provides better estimates of plume morphology and posterior model uncertainty than deterministic inversion results.
Resumo:
The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.
Resumo:
Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.
Resumo:
Expression data contribute significantly to the biological value of the sequenced human genome, providing extensive information about gene structure and the pattern of gene expression. ESTs, together with SAGE libraries and microarray experiment information, provide a broad and rich view of the transcriptome. However, it is difficult to perform large-scale expression mining of the data generated by these diverse experimental approaches. Not only is the data stored in disparate locations, but there is frequent ambiguity in the meaning of terms used to describe the source of the material used in the experiment. Untangling semantic differences between the data provided by different resources is therefore largely reliant on the domain knowledge of a human expert. We present here eVOC, a system which associates labelled target cDNAs for microarray experiments, or cDNA libraries and their associated transcripts with controlled terms in a set of hierarchical vocabularies. eVOC consists of four orthogonal controlled vocabularies suitable for describing the domains of human gene expression data including Anatomical System, Cell Type, Pathology and Developmental Stage. We have curated and annotated 7016 cDNA libraries represented in dbEST, as well as 104 SAGE libraries,with expression information,and provide this as an integrated, public resource that allows the linking of transcripts and libraries with expression terms. Both the vocabularies and the vocabulary-annotated libraries can be retrieved from http://www.sanbi.ac.za/evoc/. Several groups are involved in developing this resource with the aim of unifying transcript expression information.
Resumo:
In this article we propose a model to explain how voters' perceptions of their ideological proximity to a party affect their propensity to vote for that party. We argue that political knowledge plays a crucial moderating role in the relationship between party proximity and voting propensity. It is necessary, however, to distinguish between institutional knowledge (information about the political system) and party knowledge (information about the parties' left-right positions). An analysis of survey data from the 2007 Swiss federal elections supports our main hypothesis that party knowledge enhances the link between party proximity and voting propensity. Institutional knowledge may have additional influence, but clear evidence for this effect was obtained only for propensities to vote for the Swiss People's Party (SVP). Overall, the impact of political knowledge was found to be substantial, even after controlling for the outstanding influence of party identification and other predictors of voting propensities
Resumo:
This paper presents a process of mining research & development abstract databases to profile current status and to project potential developments for target technologies, The process is called "technology opportunities analysis." This article steps through the process using a sample data set of abstracts from the INSPEC database on the topic o "knowledge discovery and data mining." The paper offers a set of specific indicators suitable for mining such databases to understand innovation prospects. In illustrating the uses of such indicators, it offers some insights into the status of knowledge discovery research*.
Resumo:
Geophysical methods have the potential to provide valuable information on hydrological properties in the unsaturated zone. In particular, time-lapse geophysical data, when coupled with a hydrological model and inverted stochastically, may allow for the effective estimation of subsurface hydraulic parameters and their corresponding uncertainties. In this study, we use a Bayesian Markov-chain-Monte-Carlo (MCMC) inversion approach to investigate how much information regarding vadose zone hydraulic properties can be retrieved from time-lapse crosshole GPR data collected at the Arrenaes field site in Denmark during a forced infiltration experiment.
Resumo:
Quantifying the spatial configuration of hydraulic conductivity (K) in heterogeneous geological environments is essential for accurate predictions of contaminant transport, but is difficult because of the inherent limitations in resolution and coverage associated with traditional hydrological measurements. To address this issue, we consider crosshole and surface-based electrical resistivity geophysical measurements, collected in time during a saline tracer experiment. We use a Bayesian Markov-chain-Monte-Carlo (McMC) methodology to jointly invert the dynamic resistivity data, together with borehole tracer concentration data, to generate multiple posterior realizations of K that are consistent with all available information. We do this within a coupled inversion framework, whereby the geophysical and hydrological forward models are linked through an uncertain relationship between electrical resistivity and concentration. To minimize computational expense, a facies-based subsurface parameterization is developed. The Bayesian-McMC methodology allows us to explore the potential benefits of including the geophysical data into the inverse problem by examining their effect on our ability to identify fast flowpaths in the subsurface, and their impact on hydrological prediction uncertainty. Using a complex, geostatistically generated, two-dimensional numerical example representative of a fluvial environment, we demonstrate that flow model calibration is improved and prediction error is decreased when the electrical resistivity data are included. The worth of the geophysical data is found to be greatest for long spatial correlation lengths of subsurface heterogeneity with respect to wellbore separation, where flow and transport are largely controlled by highly connected flowpaths.
Resumo:
This report provides key juvenile justice system planning data, most of which are taken from Iowa’s 2015 Juvenile Justice and Delinquency Prevention Act Three Year Plan. The data and related descriptions serve as an overview of decision making for major juvenile justice system processing points, and also assist state and local officials with policy and practice. Included in the report are school discipline data and data related to juvenile in the adult criminal justice system.