942 resultados para Data Driven Modeling
Resumo:
Panel data can be arranged into a matrix in two ways, called 'long' and 'wide' formats (LFand WF). The two formats suggest two alternative model approaches for analyzing paneldata: (i) univariate regression with varying intercept; and (ii) multivariate regression withlatent variables (a particular case of structural equation model, SEM). The present papercompares the two approaches showing in which circumstances they yield equivalent?insome cases, even numerically equal?results. We show that the univariate approach givesresults equivalent to the multivariate approach when restrictions of time invariance (inthe paper, the TI assumption) are imposed on the parameters of the multivariate model.It is shown that the restrictions implicit in the univariate approach can be assessed bychi-square difference testing of two nested multivariate models. In addition, commontests encountered in the econometric analysis of panel data, such as the Hausman test, areshown to have an equivalent representation as chi-square difference tests. Commonalitiesand differences between the univariate and multivariate approaches are illustrated usingan empirical panel data set of firms' profitability as well as a simulated panel data.
Resumo:
La présente thèse s'intitule "Développent et Application des Méthodologies Computationnelles pour la Modélisation Qualitative". Elle comprend tous les différents projets que j'ai entrepris en tant que doctorante. Plutôt qu'une mise en oeuvre systématique d'un cadre défini a priori, cette thèse devrait être considérée comme une exploration des méthodes qui peuvent nous aider à déduire le plan de processus regulatoires et de signalisation. Cette exploration a été mue par des questions biologiques concrètes, plutôt que par des investigations théoriques. Bien que tous les projets aient inclus des systèmes divergents (réseaux régulateurs de gènes du cycle cellulaire, réseaux de signalisation de cellules pulmonaires) ainsi que des organismes (levure à fission, levure bourgeonnante, rat, humain), nos objectifs étaient complémentaires et cohérents. Le projet principal de la thèse est la modélisation du réseau de l'initiation de septation (SIN) du S.pombe. La cytokinèse dans la levure à fission est contrôlée par le SIN, un réseau signalant de protéines kinases qui utilise le corps à pôle-fuseau comme échafaudage. Afin de décrire le comportement qualitatif du système et prédire des comportements mutants inconnus, nous avons décidé d'adopter l'approche de la modélisation booléenne. Dans cette thèse, nous présentons la construction d'un modèle booléen étendu du SIN, comprenant la plupart des composantes et des régulateurs du SIN en tant que noeuds individuels et testable expérimentalement. Ce modèle utilise des niveaux d'activité du CDK comme noeuds de contrôle pour la simulation d'évènements du SIN à différents stades du cycle cellulaire. Ce modèle a été optimisé en utilisant des expériences d'un seul "knock-out" avec des effets phénotypiques connus comme set d'entraînement. Il a permis de prédire correctement un set d'évaluation de "knock-out" doubles. De plus, le modèle a fait des prédictions in silico qui ont été validées in vivo, permettant d'obtenir de nouvelles idées de la régulation et l'organisation hiérarchique du SIN. Un autre projet concernant le cycle cellulaire qui fait partie de cette thèse a été la construction d'un modèle qualitatif et minimal de la réciprocité des cyclines dans la S.cerevisiae. Les protéines Clb dans la levure bourgeonnante présentent une activation et une dégradation caractéristique et séquentielle durant le cycle cellulaire, qu'on appelle communément les vagues des Clbs. Cet évènement est coordonné avec la courbe d'activation inverse du Sic1, qui a un rôle inhibitoire dans le système. Pour l'identification des modèles qualitatifs minimaux qui peuvent expliquer ce phénomène, nous avons sélectionné des expériences bien définies et construit tous les modèles minimaux possibles qui, une fois simulés, reproduisent les résultats attendus. Les modèles ont été filtrés en utilisant des simulations ODE qualitatives et standardisées; seules celles qui reproduisaient le phénotype des vagues ont été gardées. L'ensemble des modèles minimaux peut être utilisé pour suggérer des relations regulatoires entre les molécules participant qui peuvent ensuite être testées expérimentalement. Enfin, durant mon doctorat, j'ai participé au SBV Improver Challenge. Le but était de déduire des réseaux spécifiques à des espèces (humain et rat) en utilisant des données de phosphoprotéines, d'expressions des gènes et des cytokines, ainsi qu'un réseau de référence, qui était mis à disposition comme donnée préalable. Notre solution pour ce concours a pris la troisième place. L'approche utilisée est expliquée en détail dans le dernier chapitre de la thèse. -- The present dissertation is entitled "Development and Application of Computational Methodologies in Qualitative Modeling". It encompasses the diverse projects that were undertaken during my time as a PhD student. Instead of a systematic implementation of a framework defined a priori, this thesis should be considered as an exploration of the methods that can help us infer the blueprint of regulatory and signaling processes. This exploration was driven by concrete biological questions, rather than theoretical investigation. Even though the projects involved divergent systems (gene regulatory networks of cell cycle, signaling networks in lung cells), as well as organisms (fission yeast, budding yeast, rat, human), our goals were complementary and coherent. The main project of the thesis is the modeling of the Septation Initiation Network (SIN) in S.pombe. Cytokinesis in fission yeast is controlled by the SIN, a protein kinase signaling network that uses the spindle pole body as scaffold. In order to describe the qualitative behavior of the system and predict unknown mutant behaviors we decided to adopt a Boolean modeling approach. In this thesis, we report the construction of an extended, Boolean model of the SIN, comprising most SIN components and regulators as individual, experimentally testable nodes. The model uses CDK activity levels as control nodes for the simulation of SIN related events in different stages of the cell cycle. The model was optimized using single knock-out experiments of known phenotypic effect as a training set, and was able to correctly predict a double knock-out test set. Moreover, the model has made in silico predictions that have been validated in vivo, providing new insights into the regulation and hierarchical organization of the SIN. Another cell cycle related project that is part of this thesis was to create a qualitative, minimal model of cyclin interplay in S.cerevisiae. CLB proteins in budding yeast present a characteristic, sequential activation and decay during the cell cycle, commonly referred to as Clb waves. This event is coordinated with the inverse activation curve of Sic1, which has an inhibitory role in the system. To generate minimal qualitative models that can explain this phenomenon, we selected well-defined experiments and constructed all possible minimal models that, when simulated, reproduce the expected results. The models were filtered using standardized qualitative ODE simulations; only the ones reproducing the wave-like phenotype were kept. The set of minimal models can be used to suggest regulatory relations among the participating molecules, which will subsequently be tested experimentally. Finally, during my PhD I participated in the SBV Improver Challenge. The goal was to infer species-specific (human and rat) networks, using phosphoprotein, gene expression and cytokine data and a reference network provided as prior knowledge. Our solution to the challenge was selected as in the final chapter of the thesis.
Resumo:
How a stimulus or a task alters the spontaneous dynamics of the brain remains a fundamental open question in neuroscience. One of the most robust hallmarks of task/stimulus-driven brain dynamics is the decrease of variability with respect to the spontaneous level, an effect seen across multiple experimental conditions and in brain signals observed at different spatiotemporal scales. Recently, it was observed that the trial-to-trial variability and temporal variance of functional magnetic resonance imaging (fMRI) signals decrease in the task-driven activity. Here we examined the dynamics of a large-scale model of the human cortex to provide a mechanistic understanding of these observations. The model allows computing the statistics of synaptic activity in the spontaneous condition and in putative tasks determined by external inputs to a given subset of brain regions. We demonstrated that external inputs decrease the variance, increase the covariances, and decrease the autocovariance of synaptic activity as a consequence of single node and large-scale network dynamics. Altogether, these changes in network statistics imply a reduction of entropy, meaning that the spontaneous synaptic activity outlines a larger multidimensional activity space than does the task-driven activity. We tested this model's prediction on fMRI signals from healthy humans acquired during rest and task conditions and found a significant decrease of entropy in the stimulus-driven activity. Altogether, our study proposes a mechanism for increasing the information capacity of brain networks by enlarging the volume of possible activity configurations at rest and reliably settling into a confined stimulus-driven state to allow better transmission of stimulus-related information.
Resumo:
BACKGROUND: Variations in physical activity (PA) across nations may be driven by socioeconomic position. As national incomes increase, car ownership becomes within reach of more individuals. This report characterizes associations between car ownership and PA in African-origin populations across 5 sites at different levels of economic development and with different transportation infrastructures: US, Seychelles, Jamaica, South Africa, and Ghana. METHODS: Twenty-five hundred adults, ages 25-45, were enrolled in the study. A total of 2,101 subjects had valid accelerometer-based PA measures (reported as average daily duration of moderate to vigorous PA, MVPA) and complete socioeconomic information. Our primary exposure of interest was whether the household owned a car. We adjusted for socioeconomic position using household income and ownership of common goods. RESULTS: Overall, PA levels did not vary largely between sites, with highest levels in South Africa, lowest in the US. Across all sites, greater PA was consistently associated with male gender, fewer years of education, manual occupations, lower income, and owning fewer material goods. We found heterogeneity across sites in car ownership: after adjustment for confounders, car owners in the US had 24.3 fewer minutes of MVPA compared to non-car owners in the US (20.7 vs. 45.1 minutes/day of MVPA); in the non-US sites, car-owners had an average of 9.7 fewer minutes of MVPA than non-car owners (24.9 vs. 34.6 minutes/day of MVPA). CONCLUSIONS: PA levels are similar across all study sites except Jamaica, despite very different levels of socioeconomic development. Not owning a car in the US is associated with especially high levels of MVPA. As car ownership becomes prevalent in the developing world, strategies to promote alternative forms of active transit may become important.
Resumo:
This study was carried to evaluate the efficiency of the Bitterlich method in growth and yield modeling of the even-aged Eucalyptus stands. 25 plots were setup in Eucalyptus grandis cropped under a high bole system in the Central Western Region of Minas Gerais, Brazil. The sampling points were setup in the center of each plot. The data of four annual mesurements were colleted and used to adjust the three model types using the age, the site index and the basal area as independent variables. The growths models were fitted for volume and mass of trees. The efficiency of the Bitterlich method was confirmed for generating the data for growth and yield modeling.
Resumo:
This work evaluated eight hypsometric models to represent tree height-diameter relationship, using data obtained from the scaling of 118 trees and 25 inventory plots. Residue graphic analysis and percent deviation mean criteria, qui-square test precision, residual standard error between real and estimated heights and the graybill f test were adopted. The identity of the hypsometric models was also verified by applying the F(Ho) test on the plot data grouped to the scaling data. It was concluded that better accuracy can be obtained by using the model prodan, with h and d1,3 data measured in 10 trees by plots grouped into these scaling data measurements of even-aged forest stands.
Resumo:
Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 1–5 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.
Resumo:
Med prediktion avses att man skattar det framtida värdet på en observerbar storhet. Kännetecknande för det bayesianska paradigmet är att osäkerhet gällande okända storheter uttrycks i form av sannolikheter. En bayesiansk prediktiv modell är således en sannolikhetsfördelning över de möjliga värden som en observerbar, men ännu inte observerad storhet kan anta. I de artiklar som ingår i avhandlingen utvecklas metoder, vilka bl.a. tillämpas i analys av kromatografiska data i brottsutredningar. Med undantag för den första artikeln, bygger samtliga metoder på bayesiansk prediktiv modellering. I artiklarna betraktas i huvudsak tre olika typer av problem relaterade till kromatografiska data: kvantifiering, parvis matchning och klustring. I den första artikeln utvecklas en icke-parametrisk modell för mätfel av kromatografiska analyser av alkoholhalt i blodet. I den andra artikeln utvecklas en prediktiv inferensmetod för jämförelse av två stickprov. Metoden tillämpas i den tredje artik eln för jämförelse av oljeprover i syfte att kunna identifiera den förorenande källan i samband med oljeutsläpp. I den fjärde artikeln härleds en prediktiv modell för klustring av data av blandad diskret och kontinuerlig typ, vilken bl.a. tillämpas i klassificering av amfetaminprover med avseende på produktionsomgångar.
Resumo:
The theoretical research of the study focused to business process management and business process modeling, the goal was to found a new business process modeling method for electrical accessories manufacturing enterprise. The focus was to find few options for business process modeling methods where company could have chosen the best one for its needs The study was carried out as a qualitative research with an action study and a case study as the most important ways collect data. In the empirical part of the study examples of company’s processes modeled with the new modeling method and process modeling process are presented. The new way of modeling processes improves especially visual presentation of the processes and improves the understanding how employees should work in the organizational interfaces of the process and in the interfaces between different processes. The results of the study is a new unified way to model company’s processes, which makes it easier to understand and create the process models. This improved readability makes it possible to reduce the costs that were created from the unclear old process models.
Resumo:
La douleur est une expérience perceptive comportant de nombreuses dimensions. Ces dimensions de douleur sont inter-reliées et recrutent des réseaux neuronaux qui traitent les informations correspondantes. L’élucidation de l'architecture fonctionnelle qui supporte les différents aspects perceptifs de l'expérience est donc une étape fondamentale pour notre compréhension du rôle fonctionnel des différentes régions de la matrice cérébrale de la douleur dans les circuits corticaux qui sous tendent l'expérience subjective de la douleur. Parmi les diverses régions du cerveau impliquées dans le traitement de l'information nociceptive, le cortex somatosensoriel primaire et secondaire (S1 et S2) sont les principales régions généralement associées au traitement de l'aspect sensori-discriminatif de la douleur. Toutefois, l'organisation fonctionnelle dans ces régions somato-sensorielles n’est pas complètement claire et relativement peu d'études ont examiné directement l'intégration de l'information entre les régions somatiques sensorielles. Ainsi, plusieurs questions demeurent concernant la relation hiérarchique entre S1 et S2, ainsi que le rôle fonctionnel des connexions inter-hémisphériques des régions somatiques sensorielles homologues. De même, le traitement en série ou en parallèle au sein du système somatosensoriel constitue un autre élément de questionnement qui nécessite un examen plus approfondi. Le but de la présente étude était de tester un certain nombre d'hypothèses sur la causalité dans les interactions fonctionnelle entre S1 et S2, alors que les sujets recevaient des chocs électriques douloureux. Nous avons mis en place une méthode de modélisation de la connectivité, qui utilise une description de causalité de la dynamique du système, afin d'étudier les interactions entre les sites d'activation définie par un ensemble de données provenant d'une étude d'imagerie fonctionnelle. Notre paradigme est constitué de 3 session expérimentales en utilisant des chocs électriques à trois différents niveaux d’intensité, soit modérément douloureux (niveau 3), soit légèrement douloureux (niveau 2), soit complètement non douloureux (niveau 1). Par conséquent, notre paradigme nous a permis d'étudier comment l'intensité du stimulus est codé dans notre réseau d'intérêt, et comment la connectivité des différentes régions est modulée dans les conditions de stimulation différentes. Nos résultats sont en faveur du mode sériel de traitement de l’information somatosensorielle nociceptive avec un apport prédominant de la voie thalamocorticale vers S1 controlatérale au site de stimulation. Nos résultats impliquent que l'information se propage de S1 controlatéral à travers notre réseau d'intérêt composé des cortex S1 bilatéraux et S2. Notre analyse indique que la connexion S1→S2 est renforcée par la douleur, ce qui suggère que S2 est plus élevé dans la hiérarchie du traitement de la douleur que S1, conformément aux conclusions précédentes neurophysiologiques et de magnétoencéphalographie. Enfin, notre analyse fournit des preuves de l'entrée de l'information somatosensorielle dans l'hémisphère controlatéral au côté de stimulation, avec des connexions inter-hémisphériques responsable du transfert de l'information à l'hémisphère ipsilatéral.
Resumo:
there has been much research on analyzing various forms of competing risks data. Nevertheless, there are several occasions in survival studies, where the existing models and methodologies are inadequate for the analysis competing risks data. ldentifiabilty problem and various types of and censoring induce more complications in the analysis of competing risks data than in classical survival analysis. Parametric models are not adequate for the analysis of competing risks data since the assumptions about the underlying lifetime distributions may not hold well. Motivated by this, in the present study. we develop some new inference procedures, which are completely distribution free for the analysis of competing risks data.
Resumo:
Genetic programming is known to provide good solutions for many problems like the evolution of network protocols and distributed algorithms. In such cases it is most likely a hardwired module of a design framework that assists the engineer to optimize specific aspects of the system to be developed. It provides its results in a fixed format through an internal interface. In this paper we show how the utility of genetic programming can be increased remarkably by isolating it as a component and integrating it into the model-driven software development process. Our genetic programming framework produces XMI-encoded UML models that can easily be loaded into widely available modeling tools which in turn posses code generation as well as additional analysis and test capabilities. We use the evolution of a distributed election algorithm as an example to illustrate how genetic programming can be combined with model-driven development. This example clearly illustrates the advantages of our approach – the generation of source code in different programming languages.
Resumo:
Land use is a crucial link between human activities and the natural environment and one of the main driving forces of global environmental change. Large parts of the terrestrial land surface are used for agriculture, forestry, settlements and infrastructure. Given the importance of land use, it is essential to understand the multitude of influential factors and resulting land use patterns. An essential methodology to study and quantify such interactions is provided by the adoption of land-use models. By the application of land-use models, it is possible to analyze the complex structure of linkages and feedbacks and to also determine the relevance of driving forces. Modeling land use and land use changes has a long-term tradition. In particular on the regional scale, a variety of models for different regions and research questions has been created. Modeling capabilities grow with steady advances in computer technology, which on the one hand are driven by increasing computing power on the other hand by new methods in software development, e.g. object- and component-oriented architectures. In this thesis, SITE (Simulation of Terrestrial Environments), a novel framework for integrated regional sland-use modeling, will be introduced and discussed. Particular features of SITE are the notably extended capability to integrate models and the strict separation of application and implementation. These features enable efficient development, test and usage of integrated land-use models. On its system side, SITE provides generic data structures (grid, grid cells, attributes etc.) and takes over the responsibility for their administration. By means of a scripting language (Python) that has been extended by language features specific for land-use modeling, these data structures can be utilized and manipulated by modeling applications. The scripting language interpreter is embedded in SITE. The integration of sub models can be achieved via the scripting language or by usage of a generic interface provided by SITE. Furthermore, functionalities important for land-use modeling like model calibration, model tests and analysis support of simulation results have been integrated into the generic framework. During the implementation of SITE, specific emphasis was laid on expandability, maintainability and usability. Along with the modeling framework a land use model for the analysis of the stability of tropical rainforest margins was developed in the context of the collaborative research project STORMA (SFB 552). In a research area in Central Sulawesi, Indonesia, socio-environmental impacts of land-use changes were examined. SITE was used to simulate land-use dynamics in the historical period of 1981 to 2002. Analogous to that, a scenario that did not consider migration in the population dynamics, was analyzed. For the calculation of crop yields and trace gas emissions, the DAYCENT agro-ecosystem model was integrated. In this case study, it could be shown that land-use changes in the Indonesian research area could mainly be characterized by the expansion of agricultural areas at the expense of natural forest. For this reason, the situation had to be interpreted as unsustainable even though increased agricultural use implied economic improvements and higher farmers' incomes. Due to the importance of model calibration, it was explicitly addressed in the SITE architecture through the introduction of a specific component. The calibration functionality can be used by all SITE applications and enables largely automated model calibration. Calibration in SITE is understood as a process that finds an optimal or at least adequate solution for a set of arbitrarily selectable model parameters with respect to an objective function. In SITE, an objective function typically is a map comparison algorithm capable of comparing a simulation result to a reference map. Several map optimization and map comparison methodologies are available and can be combined. The STORMA land-use model was calibrated using a genetic algorithm for optimization and the figure of merit map comparison measure as objective function. The time period for the calibration ranged from 1981 to 2002. For this period, respective reference land-use maps were compiled. It could be shown, that an efficient automated model calibration with SITE is possible. Nevertheless, the selection of the calibration parameters required detailed knowledge about the underlying land-use model and cannot be automated. In another case study decreases in crop yields and resulting losses in income from coffee cultivation were analyzed and quantified under the assumption of four different deforestation scenarios. For this task, an empirical model, describing the dependence of bee pollination and resulting coffee fruit set from the distance to the closest natural forest, was integrated. Land-use simulations showed, that depending on the magnitude and location of ongoing forest conversion, pollination services are expected to decline continuously. This results in a reduction of coffee yields of up to 18% and a loss of net revenues per hectare of up to 14%. However, the study also showed that ecological and economic values can be preserved if patches of natural vegetation are conservated in the agricultural landscape. -----------------------------------------------------------------------
Resumo:
Stock markets employ specialized traders, market-makers, designed to provide liquidity and volume to the market by constantly supplying both supply and demand. In this paper, we demonstrate a novel method for modeling the market as a dynamic system and a reinforcement learning algorithm that learns profitable market-making strategies when run on this model. The sequence of buys and sells for a particular stock, the order flow, we model as an Input-Output Hidden Markov Model fit to historical data. When combined with the dynamics of the order book, this creates a highly non-linear and difficult dynamic system. Our reinforcement learning algorithm, based on likelihood ratios, is run on this partially-observable environment. We demonstrate learning results for two separate real stocks.
Resumo:
Esta tesis está dividida en dos partes: en la primera parte se presentan y estudian los procesos telegráficos, los procesos de Poisson con compensador telegráfico y los procesos telegráficos con saltos. El estudio presentado en esta primera parte incluye el cálculo de las distribuciones de cada proceso, las medias y varianzas, así como las funciones generadoras de momentos entre otras propiedades. Utilizando estas propiedades en la segunda parte se estudian los modelos de valoración de opciones basados en procesos telegráficos con saltos. En esta parte se da una descripción de cómo calcular las medidas neutrales al riesgo, se encuentra la condición de no arbitraje en este tipo de modelos y por último se calcula el precio de las opciones Europeas de compra y venta.