906 resultados para Exploratory statistical data analysis
Resumo:
Los modelos de desarrollo regional, rural y urbano arrancaron en la década de los 90 en Estados Unidos, modelando los factores relacionados con la economía que suministran información y conocimiento acerca de cómo los parámetros geográficos y otros externos influencian la economía regional. El desarrollo regional y en particular el rural han seguido diferentes caminos en Europa y España, adoptando como modelo los programas estructurales de la UE ligados a la PAC. El Programa para el Desarrollo Rural Sostenible, recientemente lanzado por el Gobierno de España (2010) no profundiza en los modelos económicos de esta economía y sus causas. Este estudio pretende encontrar pautas de comportamiento de las variables de la economía regional-rural, y como el efecto de distribución geográfica de la población condiciona la actividad económica. Para este propósito, y utilizando datos espaciales y económicos de las regiones, se implementaran modelos espaciales que permitan evaluar el comportamiento económico, y verificar hipótesis de trabajo sobre la geografía y la economía del territorio. Se utilizarán modelos de análisis espacial como el análisis exploratorio espacial y los modelos econométricos de ecuaciones simultáneas, y dentro de estas los modelos ampliamente utilizados en estudios regionales de Carlino-Mills- Boarnet. ABSTRACT The regional development models for rural and urban areas started in USA in the ´90s, modeling the economy and the factors involved to understand and collect the knowledge of how the external parameters influenced the regional economy. Regional development and in particular rural development has followed different paths in Europe and Spain, adopting structural programs defined in the EU Agriculture Common Policy. The program for Sustainable Rural Development recently implemented in Spain (2010) is short sighted considering the effects of the regional economy. This study endeavors to underline models of behavior for the rural and regional economy variables, and how the regional distribution of population conditions the economic activities. For that purpose using current spatial regional economic data, this study will implement spatial economic models to evaluate the behavior of the regional economy, including the evaluation of working hypothesis about geography and economy in the territory. The approach will use data analysis models, like exploratory spatial data analysis, and spatial econometric models, and in particular for its wide acceptance in regional analysis, the Carlino-Mills-Boarnet equations model.
Resumo:
En los últimos años ha habido un gran aumento de fuentes de datos biomédicos. La aparición de nuevas técnicas de extracción de datos genómicos y generación de bases de datos que contienen esta información ha creado la necesidad de guardarla para poder acceder a ella y trabajar con los datos que esta contiene. La información contenida en las investigaciones del campo biomédico se guarda en bases de datos. Esto se debe a que las bases de datos permiten almacenar y manejar datos de una manera simple y rápida. Dentro de las bases de datos existen una gran variedad de formatos, como pueden ser bases de datos en Excel, CSV o RDF entre otros. Actualmente, estas investigaciones se basan en el análisis de datos, para a partir de ellos, buscar correlaciones que permitan inferir, por ejemplo, tratamientos nuevos o terapias más efectivas para una determinada enfermedad o dolencia. El volumen de datos que se maneja en ellas es muy grande y dispar, lo que hace que sea necesario el desarrollo de métodos automáticos de integración y homogeneización de los datos heterogéneos. El proyecto europeo p-medicine (FP7-ICT-2009-270089) tiene como objetivo asistir a los investigadores médicos, en este caso de investigaciones relacionadas con el cáncer, proveyéndoles con nuevas herramientas para el manejo de datos y generación de nuevo conocimiento a partir del análisis de los datos gestionados. La ingestión de datos en la plataforma de p-medicine, y el procesamiento de los mismos con los métodos proporcionados, buscan generar nuevos modelos para la toma de decisiones clínicas. Dentro de este proyecto existen diversas herramientas para integración de datos heterogéneos, diseño y gestión de ensayos clínicos, simulación y visualización de tumores y análisis estadístico de datos. Precisamente en el ámbito de la integración de datos heterogéneos surge la necesidad de añadir información externa al sistema proveniente de bases de datos públicas, así como relacionarla con la ya existente mediante técnicas de integración semántica. Para resolver esta necesidad se ha creado una herramienta, llamada Term Searcher, que permite hacer este proceso de una manera semiautomática. En el trabajo aquí expuesto se describe el desarrollo y los algoritmos creados para su correcto funcionamiento. Esta herramienta ofrece nuevas funcionalidades que no existían dentro del proyecto para la adición de nuevos datos provenientes de fuentes públicas y su integración semántica con datos privados.---ABSTRACT---Over the last few years, there has been a huge growth of biomedical data sources. The emergence of new techniques of genomic data generation and data base generation that contain this information, has created the need of storing it in order to access and work with its data. The information employed in the biomedical research field is stored in databases. This is due to the capability of databases to allow storing and managing data in a quick and simple way. Within databases there is a variety of formats, such as Excel, CSV or RDF. Currently, these biomedical investigations are based on data analysis, which lead to the discovery of correlations that allow inferring, for example, new treatments or more effective therapies for a specific disease or ailment. The volume of data handled in them is very large and dissimilar, which leads to the need of developing new methods for automatically integrating and homogenizing the heterogeneous data. The p-medicine (FP7-ICT-2009-270089) European project aims to assist medical researchers, in this case related to cancer research, providing them with new tools for managing and creating new knowledge from the analysis of the managed data. The ingestion of data into the platform and its subsequent processing with the provided tools aims to enable the generation of new models to assist in clinical decision support processes. Inside this project, there exist different tools related to areas such as the integration of heterogeneous data, the design and management of clinical trials, simulation and visualization of tumors and statistical data analysis. Particularly in the field of heterogeneous data integration, there is a need to add external information from public databases, and relate it to the existing ones through semantic integration methods. To solve this need a tool has been created: the term Searcher. This tool aims to make this process in a semiautomatic way. This work describes the development of this tool and the algorithms employed in its operation. This new tool provides new functionalities that did not exist inside the p-medicine project for adding new data from public databases and semantically integrate them with private data.
Resumo:
Improvements over the past 30 years in statistical data, analysis, and related theory have strengthened the basis for science and technology policy by confirming the importance of technical change in national economic performance. But two important features of scientific and technological activities in the Organization for Economic Cooperation and Development countries are still not addressed adequately in mainstream economics: (i) the justification of public funding for basic research and (ii) persistent international differences in investment in research and development and related activities. In addition, one major gap is now emerging in our systems of empirical measurement—the development of software technology, especially in the service sector. There are therefore dangers of diminishing returns to the usefulness of economic research, which continues to rely completely on established theory and established statistical sources. Alternative propositions that deserve serious consideration are: (i) the economic usefulness of basic research is in the provision of (mainly tacit) skills rather than codified and applicable information; (ii) in developing and exploiting technological opportunities, institutional competencies are just as important as the incentive structures that they face; and (iii) software technology developed in traditional service sectors may now be a more important locus of technical change than software technology developed in “high-tech” manufacturing.
Resumo:
O objetivo deste trabalho é analisar o impacto dos Sistemas de Gestão Integrados (SGI) no desempenho organizacional sob a óptica do Triple Bottom Line (TBL), verificando se esta implementação auxilia a empresa a se tornar mais sustentável. A abordagem multi-método utilizada está dividida em três partes. A primeira compreende uma revisão sistemática da literatura, tendo como base a abordagem bibliométrica. A base de dados escolhida para a seleção dos artigos que compõem a amostra foi a ISI Web of Knowledge (Web of Science). As análises conduzidas sugerem lacunas na literatura a serem pesquisadas de modo a relacionar a integração dos sistemas de gestão como meio para as organizações tornarem-se mais sustentáveis, auxiliando assim na elaboração de um modelo teórico e das hipóteses de pesquisa. Os resultados parciais obtidos ressaltam a lacuna na literatura de estudos nessa área, principalmente que contemplem a dimensão social do Triple Bottom Line. Lacunas na literatura foram identificadas também no que se refere à análise do impacto da adoção dessas abordagens normativas no desempenho organizacional. A segunda etapa da metodologia é composta por estudos de casos múltiplos em empresas de diferentes setores e que tenham implantado sistemas de gestão de maneira integrada. Os resultados obtidos mostram que a certificação auxilia no desenvolvimento de ações sustentáveis, resultando em impactos econômicos, ambientais e sociais positivos. Nesta etapa, testou-se o modelo e as hipóteses levantadas na abordagem bibliométrica. A terceira etapa da metodologia é composta por análises estatísticas de dados secundários extraídos da revista Exame ?Maiores e Melhores\'. Os dados do ano de 2014 das empresas foram tratados por meio do software MINITAB 17 ®. Por meio do teste de mediana de mood, as amostras foram testadas e apresentaram diferenças estatisticamente significativas para o desempenho das empresas em diferentes setores. De maneira geral, as empresas com SGI apresentam melhor desempenho econômico do que as demais. Com a mesma base de dados, utilizando o modelo de equações estruturais e o software Smart PLS 2.0, criou-se um diagrama de caminhos analisando os constructos (SGI) com variáveis de desempenho (Endividamento, Lucratividade, Patrimônio, Crescimento e Retorno). O modelo de equações estruturais testado apresentou força para a relação entre SGI com Endividamento, Lucratividade, Patrimônio e Crescimento. As diferentes metodologias apresentadas contribuíram para responder a hipótese e afirmar com base na amostra deste trabalho que o SGI leva as empresas a terem melhor desempenho econômico, ambiental e social (baseado no TBL).
Resumo:
The Brazilian state of Paraná exhibits a violent geography of inequality and duality, hosting both the most developed city in the country, internationally recognized by its urban and environmental innovations, and southern Brazil’s most concentrated cluster of poverty and underdevelopment. Over the course of the past decades, the state underwent a major economic transformation, modernizing and increasing its industrial structure and shifting to the service sector with a larger participation of the knowledge economy. This study is concerned on the interplay between formal education and socioeconomic development during this process, and above all its spatial character. It attempts make sense of the rich literature on education and growth and/or development, discussing it through the lenses of human geography and planning. In order for the analysis to be possible, this study created a consistent database of municipal scores of education over the course of 40 years, dealing with changing census methodologies and municipal boundaries. Making use of modern exploratory spatial data analysis combined with spatial regressions, the study identifies a clustered, time-persistent interplay between education and development that is stronger for low and basic levels of education. Moreover, it provides evidence that not only education is a predictor of future development, but also that analyses of this kind must take into consideration spatial autocorrelation in order to be accurate.
Resumo:
This paper develops an Internet geographical information system (GIS) and spatial model application that provides socio-economic information and exploratory spatial data analysis for local government authorities (LGAs) in Queensland, Australia. The application aims to improve the means by which large quantities of data may be analysed, manipulated and displayed in order to highlight trends and patterns as well as provide performance benchmarking that is readily understandable and easily accessible for decision-makers. Measures of attribute similarity and spatial proximity are combined in a clustering model with a spatial autocorrelation index for exploratory spatial data analysis to support the identification of spatial patterns of change. Analysis of socio-economic changes in Queensland is presented. The results demonstrate the usefulness and potential appeal of the Internet GIS applications as a tool to inform the process of regional analysis, planning and policy.
Resumo:
2000 Mathematics Subject Classification: 62P10, 62J12.
Resumo:
Principal component analysis (PCA) is well recognized in dimensionality reduction, and kernel PCA (KPCA) has also been proposed in statistical data analysis. However, KPCA fails to detect the nonlinear structure of data well when outliers exist. To reduce this problem, this paper presents a novel algorithm, named iterative robust KPCA (IRKPCA). IRKPCA works well in dealing with outliers, and can be carried out in an iterative manner, which makes it suitable to process incremental input data. As in the traditional robust PCA (RPCA), a binary field is employed for characterizing the outlier process, and the optimization problem is formulated as maximizing marginal distribution of a Gibbs distribution. In this paper, this optimization problem is solved by stochastic gradient descent techniques. In IRKPCA, the outlier process is in a high-dimensional feature space, and therefore kernel trick is used. IRKPCA can be regarded as a kernelized version of RPCA and a robust form of kernel Hebbian algorithm. Experimental results on synthetic data demonstrate the effectiveness of IRKPCA. © 2010 Taylor & Francis.
Resumo:
An abstract of a thesis devoted to using helix-coil models to study unfolded states.\\
Research on polypeptide unfolded states has received much more attention in the last decade or so than it has in the past. Unfolded states are thought to be implicated in various
misfolding diseases and likely play crucial roles in protein folding equilibria and folding rates. Structural characterization of unfolded states has proven to be
much more difficult than the now well established practice of determining the structures of folded proteins. This is largely because many core assumptions underlying
folded structure determination methods are invalid for unfolded states. This has led to a dearth of knowledge concerning the nature of unfolded state conformational
distributions. While many aspects of unfolded state structure are not well known, there does exist a significant body of work stretching back half a century that
has been focused on structural characterization of marginally stable polypeptide systems. This body of work represents an extensive collection of experimental
data and biophysical models associated with describing helix-coil equilibria in polypeptide systems. Much of the work on unfolded states in the last decade has not been devoted
specifically to the improvement of our understanding of helix-coil equilibria, which arguably is the most well characterized of the various conformational equilibria
that likely contribute to unfolded state conformational distributions. This thesis seeks to provide a deeper investigation of helix-coil equilibria using modern
statistical data analysis and biophysical modeling techniques. The studies contained within seek to provide deeper insights and new perspectives on what we presumably
know very well about protein unfolded states. \\
Chapter 1 gives an overview of recent and historical work on studying protein unfolded states. The study of helix-coil equilibria is placed in the context
of the general field of unfolded state research and the basics of helix-coil models are introduced.\\
Chapter 2 introduces the newest incarnation of a sophisticated helix-coil model. State of the art modern statistical techniques are employed to estimate the energies
of various physical interactions that serve to influence helix-coil equilibria. A new Bayesian model selection approach is utilized to test many long-standing
hypotheses concerning the physical nature of the helix-coil transition. Some assumptions made in previous models are shown to be invalid and the new model
exhibits greatly improved predictive performance relative to its predecessor. \\
Chapter 3 introduces a new statistical model that can be used to interpret amide exchange measurements. As amide exchange can serve as a probe for residue-specific
properties of helix-coil ensembles, the new model provides a novel and robust method to use these types of measurements to characterize helix-coil ensembles experimentally
and test the position-specific predictions of helix-coil models. The statistical model is shown to perform exceedingly better than the most commonly used
method for interpreting amide exchange data. The estimates of the model obtained from amide exchange measurements on an example helical peptide
also show a remarkable consistency with the predictions of the helix-coil model. \\
Chapter 4 involves a study of helix-coil ensembles through the enumeration of helix-coil configurations. Aside from providing new insights into helix-coil ensembles,
this chapter also introduces a new method by which helix-coil models can be extended to calculate new types of observables. Future work on this approach could potentially
allow helix-coil models to move into use domains that were previously inaccessible and reserved for other types of unfolded state models that were introduced in chapter 1.
Resumo:
Agricultural land has been identified as a potential source of greenhouse gas emissions offsets through biosequestration in vegetation and soil. In the extensive grazing land of Australia, landholders may participate in the Australian Government’s Emissions Reduction Fund and create offsets by reducing woody vegetation clearing and allowing native woody plant regrowth to grow. This study used bioeconomic modelling to evaluate the trade-offs between an existing central Queensland grazing operation, which has been using repeated tree clearing to maintain pasture growth, and an alternative carbon and grazing enterprise in which tree clearing is reduced and the additional carbon sequestered in trees is sold. The results showed that ceasing clearing in favour of producing offsets produces a higher net present value over 20 years than the existing cattle enterprise at carbon prices, which are close to current (2015) market levels (~$13 t–1 CO2-e). However, by modifying key variables, relative profitability did change. Sensitivity analysis evaluated key variables, which determine the relative profitability of carbon and cattle. In order of importance these were: the carbon price, the gross margin of cattle production, the severity of the tree–grass relationship, the area of regrowth retained, the age of regrowth at the start of the project, and to a lesser extent the cost of carbon project administration, compliance and monitoring. Based on the analysis, retaining regrowth to generate carbon income may be worthwhile for cattle producers in Australia, but careful consideration needs to be given to the opportunity cost of reduced cattle income.
Resumo:
As climate change continues to impact socio-ecological systems, tools that assist conservation managers to understand vulnerability and target adaptations are essential. Quantitative assessments of vulnerability are rare because available frameworks are complex and lack guidance for dealing with data limitations and integrating across scales and disciplines. This paper describes a semi-quantitative method for assessing vulnerability to climate change that integrates socio-ecological factors to address management objectives and support decision-making. The method applies a framework first adopted by the Intergovernmental Panel on Climate Change and uses a structured 10-step process. The scores for each framework element are normalized and multiplied to produce a vulnerability score and then the assessed components are ranked from high to low vulnerability. Sensitivity analyses determine which indicators most influence the analysis and the resultant decision-making process so data quality for these indicators can be reviewed to increase robustness. Prioritisation of components for conservation considers other economic, social and cultural values with vulnerability rankings to target actions that reduce vulnerability to climate change by decreasing exposure or sensitivity and/or increasing adaptive capacity. This framework provides practical decision-support and has been applied to marine ecosystems and fisheries, with two case applications provided as examples: (1) food security in Pacific Island nations under climate-driven fish declines, and (2) fisheries in the Gulf of Carpentaria, northern Australia. The step-wise process outlined here is broadly applicable and can be undertaken with minimal resources using existing data, thereby having great potential to inform adaptive natural resource management in diverse locations.
Wavelet correlation between subjects: A time-scale data driven analysis for brain mapping using fMRI
Resumo:
Functional magnetic resonance imaging (fMRI) based on BOLD signal has been used to indirectly measure the local neural activity induced by cognitive tasks or stimulation. Most fMRI data analysis is carried out using the general linear model (GLM), a statistical approach which predicts the changes in the observed BOLD response based on an expected hemodynamic response function (HRF). In cases when the task is cognitively complex or in cases of diseases, variations in shape and/or delay may reduce the reliability of results. A novel exploratory method using fMRI data, which attempts to discriminate between neurophysiological signals induced by the stimulation protocol from artifacts or other confounding factors, is introduced in this paper. This new method is based on the fusion between correlation analysis and the discrete wavelet transform, to identify similarities in the time course of the BOLD signal in a group of volunteers. We illustrate the usefulness of this approach by analyzing fMRI data from normal subjects presented with standardized human face pictures expressing different degrees of sadness. The results show that the proposed wavelet correlation analysis has greater statistical power than conventional GLM or time domain intersubject correlation analysis. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação
Resumo:
Several eco-toxicological studies have shown that insectivorous mammals, due to theirfeeding habits, easily accumulate high amounts of pollutants in relation to other mammal species. To assess the bio-accumulation levels of toxic metals and their in°uenceon essential metals, we quantified the concentration of 19 elements (Ca, K, Fe, B, P,S, Na, Al, Zn, Ba, Rb, Sr, Cu, Mn, Hg, Cd, Mo, Cr and Pb) in bones of 105 greaterwhite-toothed shrews (Crocidura russula) from a polluted (Ebro Delta) and a control(Medas Islands) area. Since chemical contents of a bio-indicator are mainly compositional data, conventional statistical analyses currently used in eco-toxicology can givemisleading results. Therefore, to improve the interpretation of the data obtained, weused statistical techniques for compositional data analysis to define groups of metalsand to evaluate the relationships between them, from an inter-population viewpoint.Hypothesis testing on the adequate balance-coordinates allow us to confirm intuitionbased hypothesis and some previous results. The main statistical goal was to test equalmeans of balance-coordinates for the two defined populations. After checking normality,one-way ANOVA or Mann-Whitney tests were carried out for the inter-group balances
Resumo:
The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.