922 resultados para Compositional data analysis-roots in geosciences
Resumo:
This thesis introduces heat demand forecasting models which are generated by using data mining algorithms. The forecast spans one full day and this forecast can be used in regulating heat consumption of buildings. For training the data mining models, two years of heat consumption data from a case building and weather measurement data from Finnish Meteorological Institute are used. The thesis utilizes Microsoft SQL Server Analysis Services data mining tools in generating the data mining models and CRISP-DM process framework to implement the research. Results show that the built models can predict heat demand at best with mean average percentage errors of 3.8% for 24-h profile and 5.9% for full day. A deployment model for integrating the generated data mining models into an existing building energy management system is also discussed.
Resumo:
The statistical analysis of compositional data is commonly used in geological studies. As is well-known, compositions should be treated using logratios of parts, which are difficult to use correctly in standard statistical packages. In this paper we describe the new features of our freeware package, named CoDaPack, which implements most of the basic statistical methods suitable for compositional data. An example using real data is presented to illustrate the use of the package
Resumo:
Most of economic literature has presented its analysis under the assumption of homogeneous capital stock. However, capital composition differs across countries. What has been the pattern of capital composition associated with World economies? We make an exploratory statistical analysis based on compositional data transformed by Aitchinson logratio transformations and we use tools for visualizing and measuring statistical estimators of association among the components. The goal is to detect distinctive patterns in the composition. As initial findings could be cited that: 1. Sectorial components behaved in a correlated way, building industries on one side and , in a less clear view, equipment industries on the other. 2. Full sample estimation shows a negative correlation between durable goods component and other buildings component and between transportation and building industries components. 3. Countries with zeros in some components are mainly low income countries at the bottom of the income category and behaved in a extreme way distorting main results observed in the full sample. 4. After removing these extreme cases, conclusions seem not very sensitive to the presence of another isolated cases
Resumo:
Several eco-toxicological studies have shown that insectivorous mammals, due to their feeding habits, easily accumulate high amounts of pollutants in relation to other mammal species. To assess the bio-accumulation levels of toxic metals and their in°uence on essential metals, we quantified the concentration of 19 elements (Ca, K, Fe, B, P, S, Na, Al, Zn, Ba, Rb, Sr, Cu, Mn, Hg, Cd, Mo, Cr and Pb) in bones of 105 greater white-toothed shrews (Crocidura russula) from a polluted (Ebro Delta) and a control (Medas Islands) area. Since chemical contents of a bio-indicator are mainly compositional data, conventional statistical analyses currently used in eco-toxicology can give misleading results. Therefore, to improve the interpretation of the data obtained, we used statistical techniques for compositional data analysis to define groups of metals and to evaluate the relationships between them, from an inter-population viewpoint. Hypothesis testing on the adequate balance-coordinates allow us to confirm intuition based hypothesis and some previous results. The main statistical goal was to test equal means of balance-coordinates for the two defined populations. After checking normality, one-way ANOVA or Mann-Whitney tests were carried out for the inter-group balances
Resumo:
Our essay aims at studying suitable statistical methods for the clustering of compositional data in situations where observations are constituted by trajectories of compositional data, that is, by sequences of composition measurements along a domain. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methods for clustering functional data, known as Functional Cluster Analysis (FCA), have been applied by practitioners and scientists in many fields. To our knowledge, FCA techniques have not been extended to cope with the problem of clustering compositional data trajectories. In order to extend FCA techniques to the analysis of compositional data, FCA clustering techniques have to be adapted by using a suitable compositional algebra. The present work centres on the following question: given a sample of compositional data trajectories, how can we formulate a segmentation procedure giving homogeneous classes? To address this problem we follow the steps described below. First of all we adapt the well-known spline smoothing techniques in order to cope with the smoothing of compositional data trajectories. In fact, an observed curve can be thought of as the sum of a smooth part plus some noise due to measurement errors. Spline smoothing techniques are used to isolate the smooth part of the trajectory: clustering algorithms are then applied to these smooth curves. The second step consists in building suitable metrics for measuring the dissimilarity between trajectories: we propose a metric that accounts for difference in both shape and level, and a metric accounting for differences in shape only. A simulation study is performed in order to evaluate the proposed methodologies, using both hierarchical and partitional clustering algorithm. The quality of the obtained results is assessed by means of several indices
Resumo:
This article reflects on key methodological issues emerging from children and young people's involvement in data analysis processes. We outline a pragmatic framework illustrating different approaches to engaging children, using two case studies of children's experiences of participating in data analysis. The article highlights methods of engagement and important issues such as the balance of power between adults and children, training, support, ethical considerations, time and resources. We argue that involving children in data analysis processes can have several benefits, including enabling a greater understanding of children's perspectives and helping to prioritise children's agendas in policy and practice. (C) 2007 The Author(s). Journal compilation (C) 2007 National Children's Bureau.
Recent developments in genetic data analysis: what can they tell us about human demographic history?
Resumo:
Over the last decade, a number of new methods of population genetic analysis based on likelihood have been introduced. This review describes and explains the general statistical techniques that have recently been used, and discusses the underlying population genetic models. Experimental papers that use these methods to infer human demographic and phylogeographic history are reviewed. It appears that the use of likelihood has hitherto had little impact in the field of human population genetics, which is still primarily driven by more traditional approaches. However, with the current uncertainty about the effects of natural selection, population structure and ascertainment of single-nucleotide polymorphism markers, it is suggested that likelihood-based methods may have a greater impact in the future.
Resumo:
In this paper, we address issues in segmentation Of remotely sensed LIDAR (LIght Detection And Ranging) data. The LIDAR data, which were captured by airborne laser scanner, contain 2.5 dimensional (2.5D) terrain surface height information, e.g. houses, vegetation, flat field, river, basin, etc. Our aim in this paper is to segment ground (flat field)from non-ground (houses and high vegetation) in hilly urban areas. By projecting the 2.5D data onto a surface, we obtain a texture map as a grey-level image. Based on the image, Gabor wavelet filters are applied to generate Gabor wavelet features. These features are then grouped into various windows. Among these windows, a combination of their first and second order of statistics is used as a measure to determine the surface properties. The test results have shown that ground areas can successfully be segmented from LIDAR data. Most buildings and high vegetation can be detected. In addition, Gabor wavelet transform can partially remove hill or slope effects in the original data by tuning Gabor parameters.
Resumo:
Almost all research fields in geosciences use numerical models and observations and combine these using data-assimilation techniques. With ever-increasing resolution and complexity, the numerical models tend to be highly nonlinear and also observations become more complicated and their relation to the models more nonlinear. Standard data-assimilation techniques like (ensemble) Kalman filters and variational methods like 4D-Var rely on linearizations and are likely to fail in one way or another. Nonlinear data-assimilation techniques are available, but are only efficient for small-dimensional problems, hampered by the so-called ‘curse of dimensionality’. Here we present a fully nonlinear particle filter that can be applied to higher dimensional problems by exploiting the freedom of the proposal density inherent in particle filtering. The method is illustrated for the three-dimensional Lorenz model using three particles and the much more complex 40-dimensional Lorenz model using 20 particles. By also applying the method to the 1000-dimensional Lorenz model, again using only 20 particles, we demonstrate the strong scale-invariance of the method, leading to the optimistic conjecture that the method is applicable to realistic geophysical problems. Copyright c 2010 Royal Meteorological Society
Resumo:
Little research so far has been devoted to understanding the diffusion of grassroots innovation for sustainability across space. This paper explores and compares the spatial diffusion of two networks of grassroots innovations, the Transition Towns Network (TTN) and Gruppi di Acquisto Solidale (Solidarity Purchasing Groups – GAS), in Great Britain and Italy. Spatio-temporal diffusion data were mined from available datasets, and patterns of diffusion were uncovered through an exploratory data analysis. The analysis shows that GAS and TTN diffusion in Italy and Great Britain is spatially structured, and that the spatial structure has changed over time. TTN has diffused differently in Great Britain and Italy, while GAS and TTN have diffused similarly in central Italy. The uneven diffusion of these grassroots networks on the one hand challenges current narratives on the momentum of grassroots innovations, but on the other highlights important issues in the geography of grassroots innovations for sustainability, such as cross-movement transfers and collaborations, institutional thickness, and interplay of different proximities in grassroots innovation diffusion.
Resumo:
This paper presents the groundwater favorability mapping on a fractured terrain in the eastern portion of Sao Paulo State, Brazil. Remote sensing, airborne geophysical data, photogeologic interpretation, geologic and geomorphologic maps and geographic information system (GIS) techniques have been used. The results of cross-tabulation between these maps and well yield data allowed groundwater prospective parameters in a fractured-bedrock aquifer. These prospective parameters are the base for the favorability analysis whose principle is based on the knowledge-driven method. The mutticriteria analysis (weighted linear combination) was carried out to give a groundwater favorabitity map, because the prospective parameters have different weights of importance and different classes of each parameter. The groundwater favorability map was tested by cross-tabulation with new well yield data and spring occurrence. The wells with the highest values of productivity, as well as all the springs occurrence are situated in the excellent and good favorabitity mapped areas. It shows good coherence between the prospective parameters and the well yield and the importance of GIS techniques for definition of target areas for detail study and wells location. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
Housing is an important component of wealth for a typical household in many countries. The objective of this paper is to investigate the effect of real-estate price variation on welfare, trying to close a gap between the welfare literature in Brazil and that in the U.S., the U.K., and other developed countries. Our first motivation relates to the fact that real estate is probably more important here than elsewhere as a proportion of wealth, which potentially makes the impact of a price change bigger here. Our second motivation relates to the fact that real-estate prices boomed in Brazil in the last five years. Prime real estate in Rio de Janeiro and São Paulo have tripled in value in that period, and a smaller but generalized increase has been observed throughout the country. Third, we have also seen a recent consumption boom in Brazil in the last five years. Indeed, the recent rise of some of the poor to middle-income status is well documented not only for Brazil but for other emerging countries as well. Regarding consumption and real-estate prices in Brazil, one cannot imply causality from correlation, but one can do causal inference with an appropriate structural model and proper inference, or with a proper inference in a reduced-form setup. Our last motivation is related to the complete absence of studies of this kind in Brazil, which makes ours a pioneering study. We assemble a panel-data set for the determinants of non-durable consumption growth by Brazilian states, merging the techniques and ideas in Campbell and Cocco (2007) and in Case, Quigley and Shiller (2005). With appropriate controls, and panel-data methods, we investigate whether house-price variation has a positive effect on non-durable consumption. The results show a non-negligible significant impact of the change in the price of real estate on welfare consumption), although smaller then what Campbell and Cocco have found. Our findings support the view that the channel through which house prices affect consumption is a financial one.
Resumo:
There are four different hypotheses analyzed in the literature that explain deunionization, namely: the decrease in the demand for union representation by the workers; the impaet of globalization over unionization rates; teehnieal ehange and ehanges in the legal and politieal systems against unions. This paper aims to test alI ofthem. We estimate a logistie regression using panel data proeedure with 35 industries from 1973 to 1999 and eonclude that the four hypotheses ean not be rejeeted by the data. We also use a varianee analysis deeomposition to study the impaet of these variables over the drop in unionization rates. In the model with no demographic variables the results show that these economic (tested) variables can account from 10% to 12% of the drop in unionization. However, when we include demographic variables these tested variables can account from 10% to 35% in the total variation of unionization rates. In this case the four hypotheses tested can explain up to 50% ofthe total drop in unionization rates explained by the model.
Resumo:
We investigate the issue of whether there was a stable money demand function for Japan in 1990's using both aggregate and disaggregate time series data. The aggregate data appears to support the contention that there was no stable money demand function. The disaggregate data shows that there was a stable money demand function. Neither was there any indication of the presence of liquidity trapo Possible sources of discrepancy are explored and the diametrically opposite results between the aggregate and disaggregate analysis are attributed to the neglected heterogeneity among micro units. We also conduct simulation analysis to show that when heterogeneity among micro units is present. The prediction of aggregate outcomes, using aggregate data is less accurate than the prediction based on micro equations. Moreover. policy evaluation based on aggregate data can be grossly misleading.