909 resultados para EXPLORATORY DATA ANALYSIS
Wavelet correlation between subjects: A time-scale data driven analysis for brain mapping using fMRI
Resumo:
Functional magnetic resonance imaging (fMRI) based on BOLD signal has been used to indirectly measure the local neural activity induced by cognitive tasks or stimulation. Most fMRI data analysis is carried out using the general linear model (GLM), a statistical approach which predicts the changes in the observed BOLD response based on an expected hemodynamic response function (HRF). In cases when the task is cognitively complex or in cases of diseases, variations in shape and/or delay may reduce the reliability of results. A novel exploratory method using fMRI data, which attempts to discriminate between neurophysiological signals induced by the stimulation protocol from artifacts or other confounding factors, is introduced in this paper. This new method is based on the fusion between correlation analysis and the discrete wavelet transform, to identify similarities in the time course of the BOLD signal in a group of volunteers. We illustrate the usefulness of this approach by analyzing fMRI data from normal subjects presented with standardized human face pictures expressing different degrees of sadness. The results show that the proposed wavelet correlation analysis has greater statistical power than conventional GLM or time domain intersubject correlation analysis. (C) 2010 Elsevier B.V. All rights reserved.
Diversity and commonality in national identities: an exploratory analysis of cross-national patterns
Resumo:
Issues of boundary maintenance are implicit in all studies of national identity. By definition, national communities consist of those who are included but surrounded (literally or metaphorically) by those who are excluded. Most extant research on national identity explores criteria for national membership largely in terms of official or public definitions described, for example, in citizenship and immigration laws or in texts of popular culture. We know much less about how ordinary people in various nations reason about these issues. An analysis of cross-national (N = 23) survey data from the 1995 International Social Science Program reveals a core pattern in most of the countries studied. Respondents were asked how important various criteria were in being 'truly' a member of a particular nation. Exploratory factor analysis shows that these items cluster in terms of two underlying dimensions. Ascriptive/objectivist criteria relating to birth, religion and residence can be distinguished from civic/voluntarist criteria relating to subjective feelings of membership and belief in core institutions. In most nations the ascriptive/objectivist dimension of national identity was more prominent than the subjective civic/voluntarist dimension. Taken overall, these findings suggest an unanticipated homogeneity in the ways that citizens around the world think about national identity. To the extent that these dimensions also mirror the well-known distinction between ethnic and civic national identification, they suggest that the former remains robust despite globalization, mass migration and cultural pluralism. Throughout the world official definitions of national identification have tended to shift towards a civic model. Yet citizens remain remarkably traditional in outlook. A task for future research is to investigate the macrosociological forces that produce both commonality and difference in the core patterns we have identified.
Resumo:
3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.
Resumo:
The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.
Resumo:
This paper proposes a regression model considering the modified Weibull distribution. This distribution can be used to model bathtub-shaped failure rate functions. Assuming censored data, we consider maximum likelihood and Jackknife estimators for the parameters of the model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and we also present some ways to perform global influence. Besides, for different parameter settings, sample sizes and censoring percentages, various simulations are performed and the empirical distribution of the modified deviance residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for a martingale-type residual in log-modified Weibull regression models with censored data. Finally, we analyze a real data set under log-modified Weibull regression models. A diagnostic analysis and a model checking based on the modified deviance residual are performed to select appropriate models. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
In this study, regression models are evaluated for grouped survival data when the effect of censoring time is considered in the model and the regression structure is modeled through four link functions. The methodology for grouped survival data is based on life tables, and the times are grouped in k intervals so that ties are eliminated. Thus, the data modeling is performed by considering the discrete models of lifetime regression. The model parameters are estimated by using the maximum likelihood and jackknife methods. To detect influential observations in the proposed models, diagnostic measures based on case deletion, which are denominated global influence, and influence measures based on small perturbations in the data or in the model, referred to as local influence, are used. In addition to those measures, the local influence and the total influential estimate are also employed. Various simulation studies are performed and compared to the performance of the four link functions of the regression models for grouped survival data for different parameter settings, sample sizes and numbers of intervals. Finally, a data set is analyzed by using the proposed regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In an investigation intended to determine training needs of night crews, Bowers et al. (1998, this issue) report two studies showing that the patterning of communication is a better discriminator of good and poor crews than is the content of communication. Bowers et al. characterize their studies as intended to generate hypotheses for training needs and draw connections with Exploratory Sequential Data Analysis (ESDA). Although applauding the intentions of Bowers ct al., we point out some concerns with their characterization and implementation of ESDA. Our principal concern is that the Bowers et al. exploration of the data does not convincingly lead them back to a better fundamental understanding of the original phenomena they are investigating.
Resumo:
This paper develops an interactive approach for exploratory spatial data analysis. Measures of attribute similarity and spatial proximity are combined in a clustering model to support the identification of patterns in spatial information. Relationships between the developed clustering approach, spatial data mining and choropleth display are discussed. Analysis of property crime rates in Brisbane, Australia is presented. A surprising finding in this research is that there are substantial inconsistencies in standard choropleth display options found in two widely used commercial geographical information systems, both in terms of definition and performance. The comparative results demonstrate the usefulness and appeal of the developed approach in a geographical information system environment for exploratory spatial data analysis.
Resumo:
Exploratory factor analysis is a widely used statistical technique in the social sciences. It attempts to identify underlying factors that explain the pattern of correlations within a set of observed variables. A statistical software package is needed to perform the calcula- tions. However, there are some limitations with popular statistical software packages, like SPSS. The R programming language is a free software package for statistical and graphical computing. It o ers many packages written by contributors from all over the world and programming resources that allow it to overcome the dialog limitations of SPSS. This paper o ers an SPSS dialog written in the R programming language with the help of some packages, so that researchers with little or no knowledge in programming, or those who are accustomed to making their calculations based on statistical dialogs, have more options when applying factor analysis to their data and hence can adopt a better approach when dealing with ordinal, Likert-type data.
Resumo:
The increasing availability of mobility data and the awareness of its importance and value have been motivating many researchers to the development of models and tools for analyzing movement data. This paper presents a brief survey of significant research works about modeling, processing and visualization of data about moving objects. We identified some key research fields that will provide better features for online analysis of movement data. As result of the literature review, we suggest a generic multi-layer architecture for the development of an online analysis processing software tool, which will be used for the definition of the future work of our team.
Resumo:
3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.
Resumo:
Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.
Resumo:
Complex industrial plants exhibit multiple interactions among smaller parts and with human operators. Failure in one part can propagate across subsystem boundaries causing a serious disaster. This paper analyzes the industrial accident data series in the perspective of dynamical systems. First, we process real world data and show that the statistics of the number of fatalities reveal features that are well described by power law (PL) distributions. For early years, the data reveal double PL behavior, while, for more recent time periods, a single PL fits better into the experimental data. Second, we analyze the entropy of the data series statistics over time. Third, we use the Kullback–Leibler divergence to compare the empirical data and multidimensional scaling (MDS) techniques for data analysis and visualization. Entropy-based analysis is adopted to assess complexity, having the advantage of yielding a single parameter to express relationships between the data. The classical and the generalized (fractional) entropy and Kullback–Leibler divergence are used. The generalized measures allow a clear identification of patterns embedded in the data.
Resumo:
Programa Doutoral em Matemática e Aplicações.
Resumo:
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.