914 resultados para Dynamic data analysis
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
This paper examines the relationship between the level of public infrastructure and the level of productivity using panel data for the Spanish provinces over the period 1984-2004, a period which is particularly relevant due to the substantial changes occurring in the Spanish economy at that time. The underlying model used for the data analysis is based on the wage equation, which is one of a handful of simultaneous equations which when satisfied correspond to the short-run equilibrium of New Economic Geography theory. This is estimated using a spatial panel model with fixed time and province effects, so that unmodelled space and time constant sources of heterogeneity are eliminated. The model assumes that productivity depends on the level of educational attainment and the public capital stock endowment of each province. The results show that although changes in productivity are positively associated with changes in public investment within the same province, there is a negative relationship between productivity changes and changes in public investment in other regions.
Resumo:
The paper investigates the role of real exchange rate misalignment on long-run growth for a set of ninety countries using time series data from 1980 to 2004. We first estimate a panel data model (using fixed and random effects) for the real exchange rate, with different model specifications, in order to produce estimates of the equilibrium real exchange rate and this is then used to construct measures of real exchange rate misalignment. We also provide an alternative set of estimates of real exchange rate misalignment using panel cointegration methods. The variables used in our real exchange rate models are: real per capita GDP; net foreign assets; terms of trade and government consumption. The results for the two-step System GMM panel growth models indicate that the coefficients for real exchange rate misalignment are positive for different model specification and samples, which means that a more depreciated (appreciated) real exchange rate helps (harms) long-run growth. The estimated coefficients are higher for developing and emerging countries.
Resumo:
This paper investigates the role of institutions in determining per capita income levels and growth. It contributes to the empirical literature by using different variables as proxies for institutions and by developing a deeper analysis of the issues arising from the use of weak and too many instruments in per capita income and growth regressions. The cross-section estimation suggests that institutions seem to matter, regardless if they are the only explanatory variable or are combined with geographical and integration variables, although most models suffer from the issue of weak instruments. The results from the growth models provides some interesting results: there is mixed evidence on the role of institutions and such evidence is more likely to be associated with law and order and investment profile; government spending is an important policy variable; collapsing the number of instruments results in fewer significant coefficients for institutions.
Resumo:
SUMMARY : Eukaryotic DNA interacts with the nuclear proteins using non-covalent ionic interactions. Proteins can recognize specific nucleotide sequences based on the sterical interactions with the DNA and these specific protein-DNA interactions are the basis for many nuclear processes, e.g. gene transcription, chromosomal replication, and recombination. New technology termed ChIP-Seq has been recently developed for the analysis of protein-DNA interactions on a whole genome scale and it is based on immunoprecipitation of chromatin and high-throughput DNA sequencing procedure. ChIP-Seq is a novel technique with a great potential to replace older techniques for mapping of protein-DNA interactions. In this thesis, we bring some new insights into the ChIP-Seq data analysis. First, we point out to some common and so far unknown artifacts of the method. Sequence tag distribution in the genome does not follow uniform distribution and we have found extreme hot-spots of tag accumulation over specific loci in the human and mouse genomes. These artifactual sequence tags accumulations will create false peaks in every ChIP-Seq dataset and we propose different filtering methods to reduce the number of false positives. Next, we propose random sampling as a powerful analytical tool in the ChIP-Seq data analysis that could be used to infer biological knowledge from the massive ChIP-Seq datasets. We created unbiased random sampling algorithm and we used this methodology to reveal some of the important biological properties of Nuclear Factor I DNA binding proteins. Finally, by analyzing the ChIP-Seq data in detail, we revealed that Nuclear Factor I transcription factors mainly act as activators of transcription, and that they are associated with specific chromatin modifications that are markers of open chromatin. We speculate that NFI factors only interact with the DNA wrapped around the nucleosome. We also found multiple loci that indicate possible chromatin barrier activity of NFI proteins, which could suggest the use of NFI binding sequences as chromatin insulators in biotechnology applications. RESUME : L'ADN des eucaryotes interagit avec les protéines nucléaires par des interactions noncovalentes ioniques. Les protéines peuvent reconnaître les séquences nucléotidiques spécifiques basées sur l'interaction stérique avec l'ADN, et des interactions spécifiques contrôlent de nombreux processus nucléaire, p.ex. transcription du gène, la réplication chromosomique, et la recombinaison. Une nouvelle technologie appelée ChIP-Seq a été récemment développée pour l'analyse des interactions protéine-ADN à l'échelle du génome entier et cette approche est basée sur l'immuno-précipitation de la chromatine et sur la procédure de séquençage de l'ADN à haut débit. La nouvelle approche ChIP-Seq a donc un fort potentiel pour remplacer les anciennes techniques de cartographie des interactions protéine-ADN. Dans cette thèse, nous apportons de nouvelles perspectives dans l'analyse des données ChIP-Seq. Tout d'abord, nous avons identifié des artefacts très communs associés à cette méthode qui étaient jusqu'à présent insoupçonnés. La distribution des séquences dans le génome ne suit pas une distribution uniforme et nous avons constaté des positions extrêmes d'accumulation de séquence à des régions spécifiques, des génomes humains et de la souris. Ces accumulations des séquences artéfactuelles créera de faux pics dans toutes les données ChIP-Seq, et nous proposons différentes méthodes de filtrage pour réduire le nombre de faux positifs. Ensuite, nous proposons un nouvel échantillonnage aléatoire comme un outil puissant d'analyse des données ChIP-Seq, ce qui pourraient augmenter l'acquisition de connaissances biologiques à partir des données ChIP-Seq. Nous avons créé un algorithme d'échantillonnage aléatoire et nous avons utilisé cette méthode pour révéler certaines des propriétés biologiques importantes de protéines liant à l'ADN nommés Facteur Nucléaire I (NFI). Enfin, en analysant en détail les données de ChIP-Seq pour la famille de facteurs de transcription nommés Facteur Nucléaire I, nous avons révélé que ces protéines agissent principalement comme des activateurs de transcription, et qu'elles sont associées à des modifications de la chromatine spécifiques qui sont des marqueurs de la chromatine ouverte. Nous pensons que lés facteurs NFI interagir uniquement avec l'ADN enroulé autour du nucléosome. Nous avons également constaté plusieurs régions génomiques qui indiquent une éventuelle activité de barrière chromatinienne des protéines NFI, ce qui pourrait suggérer l'utilisation de séquences de liaison NFI comme séquences isolatrices dans des applications de la biotechnologie.
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
The geographic information system approach has permitted integration between demographic, socio-economic and environmental data, providing correlation between information from several data banks. In the current work, occurrence of human and canine visceral leishmaniases and insect vectors (Lutzomyia longipalpis) as well as biogeographic information related to 9 areas that comprise the city of Belo Horizonte, Brazil, between April 2001 and March 2002 were correlated and georeferenced. By using this technique it was possible to define concentration loci of canine leishmaniasis in the following regions: East; Northeast; Northwest; West; and Venda Nova. However, as for human leishmaniasis, it was not possible to perform the same analysis. Data analysis has also shown that 84.2% of the human leishmaniasis cases were related with canine leishmaniasis cases. Concerning biogeographic (altitude, area of vegetation influence, hydrographic, and areas of poverty) analysis, only altitude showed to influence emergence of leishmaniasis cases. A number of 4673 canine leishmaniasis cases and 64 human leishmaniasis cases were georeferenced, of which 67.5 and 71.9%, respectively, were living between 780 and 880 m above the sea level. At these same altitudes, a large number of phlebotomine sand flies were collected. Therefore, we suggest control measures for leishmaniasis in the city of Belo Horizonte, giving priority to canine leishmaniasis foci and regions at altitudes between 780 and 880 m.
Resumo:
In this paper we look at how a web-based social software can be used to make qualitative data analysis of online peer-to-peer learning experiences. Specifically, we propose to use Cohere, a web-based social sense-making tool, to observe, track, annotate and visualize discussion group activities in online courses. We define a specific methodology for data observation and structuring, and present results of the analysis of peer interactions conducted in discussion forum in a real case study of a P2PU course. Finally we discuss how network visualization and analysis can be used to gather a better understanding of the peer-to-peer learning experience. To do so, we provide preliminary insights on the social, dialogical and conceptual connections that have been generated within one online discussion group.
Resumo:
Until now, mortality atlases have been static. Most of them describe the geographical distribution of mortality using count data aggregated over time and standardized mortality rates. However, this methodology has several limitations. Count data aggregated over time produce a bias in the estimation of death rates. Moreover, this practice difficult the study of temporal changes in geographical distribution of mortality. On the other hand, using standardized mortality hamper to check differences in mortality among groups. The Interactive Mortality Atlas in Andalusia (AIMA) is an alternative to conventional static atlases. It is a dynamic Geographical Information System that allows visualizing in web-site more than 12.000 maps and 338.00 graphics related to the spatio-temporal distribution of the main death causes in Andalusia by age and sex groups from 1981. The objective of this paper is to describe the methods used for AIMA development, to show technical specifications and to present their interactivity. The system is available from the link products in www.demap.es. AIMA is the first interactive GIS that have been developed in Spain with these characteristics. Spatio-temporal Hierarchical Bayesian Models were used for statistical data analysis. The results were integrated into web-site using a PHP environment and a dynamic cartography in Flash. Thematic maps in AIMA demonstrate that the geographical distribution of mortality is dynamic, with differences among year, age and sex groups. The information nowadays provided by AIMA and the future updating will contribute to reflect on the past, the present and the future of population health in Andalusia.
Resumo:
Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult toachieve because the relative values of the forecast components often fail to behave ina way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It hasbeen shown that cause-specic mortality forecasts are pessimistic when compared withall-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approachof using log mortality rates and forecasts the density of deaths in the life table. Sincethese values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbingstate), they are intrinsically relative rather than absolute values across decrements aswell as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison(1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that theunit sum constraint is honoured. The structure of the best-known, single-decrementmortality-rate forecasting model, devised by Lee and Carter (1992), is expressed incompositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortalityby cause of death for Japan
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observedfor each individual. A particular case of FDA is when the observed functions are densityfunctions, that are also an example of infinite dimensional compositional data. In thiswork we compare several methods for dimensionality reduction for this particular typeof data: functional principal components analysis (PCA) with or without a previousdata transformation and multidimensional scaling (MDS) for diferent inter-densitiesdistances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (householdsincome distributions)
Resumo:
In this paper we examine the problem of compositional data from a different startingpoint. Chemical compositional data, as used in provenance studies on archaeologicalmaterials, will be approached from the measurement theory. The results will show, in avery intuitive way that chemical data can only be treated by using the approachdeveloped for compositional data. It will be shown that compositional data analysis is aparticular case in projective geometry, when the projective coordinates are in thepositive orthant, and they have the properties of logarithmic interval metrics. Moreover,it will be shown that this approach can be extended to a very large number ofapplications, including shape analysis. This will be exemplified with a case study inarchitecture of Early Christian churches dated back to the 5th-7th centuries AD
Resumo:
This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
The statistical analysis of compositional data should be treated using logratios of parts,which are difficult to use correctly in standard statistical packages. For this reason afreeware package, named CoDaPack was created. This software implements most of thebasic statistical methods suitable for compositional data.In this paper we describe the new version of the package that now is calledCoDaPack3D. It is developed in Visual Basic for applications (associated with Excel©),Visual Basic and Open GL, and it is oriented towards users with a minimum knowledgeof computers with the aim at being simple and easy to use.This new version includes new graphical output in 2D and 3D. These outputs could bezoomed and, in 3D, rotated. Also a customization menu is included and outputs couldbe saved in jpeg format. Also this new version includes an interactive help and alldialog windows have been improved in order to facilitate its use.To use CoDaPack one has to access Excel© and introduce the data in a standardspreadsheet. These should be organized as a matrix where Excel© rows correspond tothe observations and columns to the parts. The user executes macros that returnnumerical or graphical results. There are two kinds of numerical results: new variablesand descriptive statistics, and both appear on the same sheet. Graphical output appearsin independent windows. In the present version there are 8 menus, with a total of 38submenus which, after some dialogue, directly call the corresponding macro. Thedialogues ask the user to input variables and further parameters needed, as well aswhere to put these results. The web site http://ima.udg.es/CoDaPack contains thisfreeware package and only Microsoft Excel© under Microsoft Windows© is required torun the software.Kew words: Compositional data Analysis, Software
Resumo:
Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.