906 resultados para Exploratory statistical data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most of economic literature has presented its analysis under the assumption of homogeneous capital stock. However, capital composition differs across countries. What has been the pattern of capital composition associated with World economies? We make an exploratory statistical analysis based on compositional data transformed by Aitchinson logratio transformations and we use tools for visualizing and measuring statistical estimators of association among the components. The goal is to detect distinctive patterns in the composition. As initial findings could be cited that: 1. Sectorial components behaved in a correlated way, building industries on one side and , in a less clear view, equipment industries on the other. 2. Full sample estimation shows a negative correlation between durable goods component and other buildings component and between transportation and building industries components. 3. Countries with zeros in some components are mainly low income countries at the bottom of the income category and behaved in a extreme way distorting main results observed in the full sample. 4. After removing these extreme cases, conclusions seem not very sensitive to the presence of another isolated cases

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several eco-toxicological studies have shown that insectivorous mammals, due to their feeding habits, easily accumulate high amounts of pollutants in relation to other mammal species. To assess the bio-accumulation levels of toxic metals and their in°uence on essential metals, we quantified the concentration of 19 elements (Ca, K, Fe, B, P, S, Na, Al, Zn, Ba, Rb, Sr, Cu, Mn, Hg, Cd, Mo, Cr and Pb) in bones of 105 greater white-toothed shrews (Crocidura russula) from a polluted (Ebro Delta) and a control (Medas Islands) area. Since chemical contents of a bio-indicator are mainly compositional data, conventional statistical analyses currently used in eco-toxicology can give misleading results. Therefore, to improve the interpretation of the data obtained, we used statistical techniques for compositional data analysis to define groups of metals and to evaluate the relationships between them, from an inter-population viewpoint. Hypothesis testing on the adequate balance-coordinates allow us to confirm intuition based hypothesis and some previous results. The main statistical goal was to test equal means of balance-coordinates for the two defined populations. After checking normality, one-way ANOVA or Mann-Whitney tests were carried out for the inter-group balances

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A retrospective study of 1353 occupational injuries occurring at a chemical manufacturing facility in Houston, Texas from January, 1982 through May, 1988 was performed to investigate the etiology of the occupational injury process. Injury incidence rates were calculated for various sub-populations of workers to determine differences in the risk of injury for various groups. Linear modeling techniques were used to determine the association between certain collected independent variables and severity of an injury event. Finally, two sub-groups of the worker population, shiftworkers and injury recidivists, were examined. An injury recidivist as defined is any worker experiencing one or more injury per year. Overall, female shiftworkers evidenced the highest average injury incidence rate compared to all other worker groups analyzed. Although the female shiftworkers were younger and less experienced, the etiology of their increased risk of injury remains unclear, although the rigors of performing shiftwork itself or ergonomic factors are suspect. In general, females were injured more frequently than males, but they did not incur more severe injuries. For all workers, many injuries were caused by erroneous or foregone training, and risk taking behaviors. Injuries of these types are avoidable. The distribution of injuries by severity level was bimodal; either injuries were of minor or major severity with only a small number of cases falling in between. Of the variables collected, only the type of injury incurred and the worker's titlecode were statistically significantly associated with injury severity. Shiftworkers did not sustain more severe injuries than other worker groups. Injury to shiftworkers varied as a 24-hour pattern; the greatest number occurred between 1200-1230 hours, (p = 0.002) by Cosinor analysis. Recidivists made up 3.3% of the population (23 males and 10 females), yet suffered 17.8% of the injuries. Although past research suggests that injury recidivism is a random statistical event, analysis of the data by logistic regression implicates gender, area worked, age and job titlecode as being statistically significantly related to injury recidivism at this facility. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of quantitative methods has become increasingly important in the study of neuropathology and especially in neurodegenerative disease. Disorders such as Alzheimer's disease (AD) and the frontotemporal dementias (FTD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This chapter reviews the advantages and limitations of the different methods of quantifying pathological lesions in histological sections including estimates of density, frequency, coverage, and the use of semi-quantitative scores. The sampling strategies by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are described. In addition, data analysis methods commonly used to analysis quantitative data in neuropathology, including analysis of variance (ANOVA), polynomial curve fitting, multiple regression, classification trees, and principal components analysis (PCA), are discussed. These methods are illustrated with reference to quantitative studies of a variety of neurodegenerative disorders.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most of economic literature has presented its analysis under the assumption of homogeneous capital stock.However, capital composition differs across countries. What has been the pattern of capital compositionassociated with World economies? We make an exploratory statistical analysis based on compositional datatransformed by Aitchinson logratio transformations and we use tools for visualizing and measuring statisticalestimators of association among the components. The goal is to detect distinctive patterns in the composition.As initial findings could be cited that:1. Sectorial components behaved in a correlated way, building industries on one side and , in a lessclear view, equipment industries on the other.2. Full sample estimation shows a negative correlation between durable goods component andother buildings component and between transportation and building industries components.3. Countries with zeros in some components are mainly low income countries at the bottom of theincome category and behaved in a extreme way distorting main results observed in the fullsample.4. After removing these extreme cases, conclusions seem not very sensitive to the presence ofanother isolated cases

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article we introduce JULIDE, a software toolkit developed to perform the 3D reconstruction, intensity normalization, volume standardization by 3D image registration and voxel-wise statistical analysis of autoradiographs of mouse brain sections. This software tool has been developed in the open-source ITK software framework and is freely available under a GPL license. The article presents the complete image processing chain from raw data acquisition to 3D statistical group analysis. Results of the group comparison in the context of a study on spatial learning are shown as an illustration of the data that can be obtained with this tool.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reliability analysis is a well established branch of statistics that deals with the statistical study of different aspects of lifetimes of a system of components. As we pointed out earlier that major part of the theory and applications in connection with reliability analysis were discussed based on the measures in terms of distribution function. In the beginning chapters of the thesis, we have described some attractive features of quantile functions and the relevance of its use in reliability analysis. Motivated by the works of Parzen (1979), Freimer et al. (1988) and Gilchrist (2000), who indicated the scope of quantile functions in reliability analysis and as a follow up of the systematic study in this connection by Nair and Sankaran (2009), in the present work we tried to extend their ideas to develop necessary theoretical framework for lifetime data analysis. In Chapter 1, we have given the relevance and scope of the study and a brief outline of the work we have carried out. Chapter 2 of this thesis is devoted to the presentation of various concepts and their brief reviews, which were useful for the discussions in the subsequent chapters .In the introduction of Chapter 4, we have pointed out the role of ageing concepts in reliability analysis and in identifying life distributions .In Chapter 6, we have studied the first two L-moments of residual life and their relevance in various applications of reliability analysis. We have shown that the first L-moment of residual function is equivalent to the vitality function, which have been widely discussed in the literature .In Chapter 7, we have defined percentile residual life in reversed time (RPRL) and derived its relationship with reversed hazard rate (RHR). We have discussed the characterization problem of RPRL and demonstrated with an example that the RPRL for given does not determine the distribution uniquely

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr) transformation to obtain the random vector y of dimension D. The factor model is then y = Λf + e (1) with the factors f of dimension k < D, the error term e, and the loadings matrix Λ. Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysis model (1) can be written as Cov(y) = ΛΛT + ψ (2) where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as the loadings matrix Λ are estimated from an estimation of Cov(y). Given observed clr transformed data Y as realizations of the random vector y. Outliers or deviations from the idealized model assumptions of factor analysis can severely effect the parameter estimation. As a way out, robust estimation of the covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), see Pison et al. (2003). Well known robust covariance estimators with good statistical properties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), rely on a full-rank data matrix Y which is not the case for clr transformed data (see, e.g., Aitchison, 1986). The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves this singularity problem. The data matrix Y is transformed to a matrix Z by using an orthonormal basis of lower dimension. Using the ilr transformed data, a robust covariance matrix C(Z) can be estimated. The result can be back-transformed to the clr space by C(Y ) = V C(Z)V T where the matrix V with orthonormal columns comes from the relation between the clr and the ilr transformation. Now the parameters in the model (2) can be estimated (Basilevsky, 1994) and the results have a direct interpretation since the links to the original variables are still preserved. The above procedure will be applied to data from geochemistry. Our special interest is on comparing the results with those of Reimann et al. (2002) for the Kola project data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of inter-laboratory test comparisons to determine the performance of individual laboratories for specific tests (or for calibration) [ISO/IEC Guide 43-1, 1997. Proficiency testing by interlaboratory comparisons - Part 1: Development and operation of proficiency testing schemes] is called Proficiency Testing (PT). In this paper we propose the use of the generalized likelihood ratio test to compare the performance of the group of laboratories for specific tests relative to the assigned value and illustrate the procedure considering an actual data from the PT program in the area of volume. The proposed test extends the test criteria in use allowing to test for the consistency of the group of laboratories. Moreover, the class of elliptical distributions are considered for the obtained measurements. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper a new parametric method to deal with discrepant experimental results is developed. The method is based on the fit of a probability density function to the data. This paper also compares the characteristics of different methods used to deduce recommended values and uncertainties from a discrepant set of experimental data. The methods are applied to the (137)Cs and (90)Sr published half-lives and special emphasis is given to the deduced confidence intervals. The obtained results are analyzed considering two fundamental properties expected from an experimental result: the probability content of confidence intervals and the statistical consistency between different recommended values. The recommended values and uncertainties for the (137)Cs and (90)Sr half-lives are 10,984 (24) days and 10,523 (70) days, respectively. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality reduction is employed for visual data analysis as a way to obtaining reduced spaces for high dimensional data or to mapping data directly into 2D or 3D spaces. Although techniques have evolved to improve data segregation on reduced or visual spaces, they have limited capabilities for adjusting the results according to user's knowledge. In this paper, we propose a novel approach to handling both dimensionality reduction and visualization of high dimensional data, taking into account user's input. It employs Partial Least Squares (PLS), a statistical tool to perform retrieval of latent spaces focusing on the discriminability of the data. The method employs a training set for building a highly precise model that can then be applied to a much larger data set very effectively. The reduced data set can be exhibited using various existing visualization techniques. The training data is important to code user's knowledge into the loop. However, this work also devises a strategy for calculating PLS reduced spaces when no training data is available. The approach produces increasingly precise visual mappings as the user feeds back his or her knowledge and is capable of working with small and unbalanced training sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualization and exploratory analysis is an important part of any data analysis and is made more challenging when the data are voluminous and high-dimensional. One such example is environmental monitoring data, which are often collected over time and at multiple locations, resulting in a geographically indexed multivariate time series. Financial data, although not necessarily containing a geographic component, present another source of high-volume multivariate time series data. We present the mvtsplot function which provides a method for visualizing multivariate time series data. We outline the basic design concepts and provide some examples of its usage by applying it to a database of ambient air pollution measurements in the United States and to a hypothetical portfolio of stocks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exposimeters are increasingly applied in bioelectromagnetic research to determine personal radiofrequency electromagnetic field (RF-EMF) exposure. The main advantages of exposimeter measurements are their convenient handling for study participants and the large amount of personal exposure data, which can be obtained for several RF-EMF sources. However, the large proportion of measurements below the detection limit is a challenge for data analysis. With the robust ROS (regression on order statistics) method, summary statistics can be calculated by fitting an assumed distribution to the observed data. We used a preliminary sample of 109 weekly exposimeter measurements from the QUALIFEX study to compare summary statistics computed by robust ROS with a naïve approach, where values below the detection limit were replaced by the value of the detection limit. For the total RF-EMF exposure, differences between the naïve approach and the robust ROS were moderate for the 90th percentile and the arithmetic mean. However, exposure contributions from minor RF-EMF sources were considerably overestimated with the naïve approach. This results in an underestimation of the exposure range in the population, which may bias the evaluation of potential exposure-response associations. We conclude from our analyses that summary statistics of exposimeter data calculated by robust ROS are more reliable and more informative than estimates based on a naïve approach. Nevertheless, estimates of source-specific medians or even lower percentiles depend on the assumed data distribution and should be considered with caution.