106 resultados para Dynamic data set visualization
Resumo:
The singular value decomposition and its interpretation as alinear biplot has proved to be a powerful tool for analysing many formsof multivariate data. Here we adapt biplot methodology to the specifficcase of compositional data consisting of positive vectors each of whichis constrained to have unit sum. These relative variation biplots haveproperties relating to special features of compositional data: the studyof ratios, subcompositions and models of compositional relationships. Themethodology is demonstrated on a data set consisting of six-part colourcompositions in 22 abstract paintings, showing how the singular valuedecomposition can achieve an accurate biplot of the colour ratios and howpossible models interrelating the colours can be diagnosed.
Resumo:
Structural equation models (SEM) are commonly used to analyze the relationship between variables some of which may be latent, such as individual ``attitude'' to and ``behavior'' concerning specific issues. A number of difficulties arise when we want to compare a large number of groups, each with large sample size, and the manifest variables are distinctly non-normally distributed. Using an specific data set, we evaluate the appropriateness of the following alternative SEM approaches: multiple group versus MIMIC models, continuous versus ordinal variables estimation methods, and normal theory versus non-normal estimation methods. The approaches are applied to the ISSP-1993 Environmental data set, with the purpose of exploring variation in the mean level of variables of ``attitude'' to and ``behavior''concerning environmental issues and their mutual relationship across countries. Issues of both theoretical and practical relevance arise in the course of this application.
Resumo:
It is shown how correspondence analysis may be applied to a subset of response categories from a questionnaire survey, for example the subset of undecided responses or the subset of responses for a particular category. The idea is to maintain the original relative frequencies of the categories and not re-express them relative to totals within the subset, as would normally be done in a regular correspondence analysis of the subset. Furthermore, the masses and chi-square metric assigned to the data subset are the same as those in the correspondence analysis of the whole data set. This variant of the method, called Subset Correspondence Analysis, is illustrated on data from the ISSP survey on Family and Changing Gender Roles.
Resumo:
Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.
Resumo:
Panel data can be arranged into a matrix in two ways, called 'long' and 'wide' formats (LFand WF). The two formats suggest two alternative model approaches for analyzing paneldata: (i) univariate regression with varying intercept; and (ii) multivariate regression withlatent variables (a particular case of structural equation model, SEM). The present papercompares the two approaches showing in which circumstances they yield equivalent?insome cases, even numerically equal?results. We show that the univariate approach givesresults equivalent to the multivariate approach when restrictions of time invariance (inthe paper, the TI assumption) are imposed on the parameters of the multivariate model.It is shown that the restrictions implicit in the univariate approach can be assessed bychi-square difference testing of two nested multivariate models. In addition, commontests encountered in the econometric analysis of panel data, such as the Hausman test, areshown to have an equivalent representation as chi-square difference tests. Commonalitiesand differences between the univariate and multivariate approaches are illustrated usingan empirical panel data set of firms' profitability as well as a simulated panel data.
Resumo:
Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.
Resumo:
Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.
Resumo:
Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.
Resumo:
We present a participant study that compares biological data exploration tasks using volume renderings of laser confocal microscopy data across three environments that vary in level of immersion: a desktop, fishtank, and cave system. For the tasks, data, and visualization approach used in our study, we found that subjects qualitatively preferred and quantitatively performed better in the cave compared with the fishtank and desktop. Subjects performed real-world biological data analysis tasks that emphasized understanding spatial relationships including characterizing the general features in a volume, identifying colocated features, and reporting geometric relationships such as whether clusters of cells were coplanar. After analyzing data in each environment, subjects were asked to choose which environment they wanted to analyze additional data sets in - subjects uniformly selected the cave environment.
Resumo:
Income distribution in Spain has experienced a substantial improvement towards equalisation during the second half of the seventies and the eighties; a period during which most OECD countries experienced the opposite trend. In spite of the many recent papers on the Spanish income distribution, the period covered by those stops in 1990. The aim of this paper is to extent the analysis to 1996 employing the same methodology and the same data set (ECPF). Our results not only corroborate the (decreasing inequality) trend found by others during the second half of the eighties, but also suggest that this trend extends over the first half of the nineties. We also show that our main conclusions are robust to changes in the equivalence scale, to changes in the definition of income and to potential data contamination. Finally, we analyse some of the causes which may be driving the overall picture of income inequality using two decomposition techniques. From this analyses three variables emerge as the major responsible factors for the observed improvement in the income distribution: education, household composition and socioeconomic situation of the household head.
Resumo:
This paper analyses the elasticities of demand in tolled motorways in Spain with respect to the main variables influencing it. The demand equation is estimated using a panel data set where the cross-section observations correspond to the different Spanish tolled motorways sections, and the temporal dimension ranges from the beginning of the eighties until the end of the nineties. The results show a high elasticity with respect to the economic activity level. The average elasticity with respect to petrol price falls around -0.3, while toll elasticities clearly vary across motorway sections. These motorway sections are classified into four groups according to the estimated toll elasticity with values that range from -0.21 for the most inelastic to -0.83 for the most elastic. The main factors that explain such differences are the quality of the alternative road and the length of the section. The long-term effect is about 50 per cent higher than the short term one; however, the period of adjustment is relatively short.
Resumo:
This technical background paper describes the methods applied and data sources used in the compilation of the 1980-2003 data set for material flow accounts of the Mexican economy and presents the data set. It is organised in four parts: the first part gives an overview of the Material Flow Accounting (MFA) methodology. The second part presents the main material flows of the Mexican economy including biomass, fossil fuels, metal ores, industrial minerals and, construction minerals. The aim of this part is to explain the procedures and methods followed, the data sources used as well as providing a brief evaluation of the quality and reliability of the information used and the accounts established. Finally, some conclusions will be provided.
Resumo:
Using a newly constructed data set, we calculate quality-adjusted price indexes after estimating hedonic price regressions from 1988 to 2004 in the Spanish automobile market. The increasing competition was favoured by the removal of trade restrictions and the special plans for the renewal of the Spanish automobile fleet. We find that the increasing degree of competition during those years led to an overall drop in automobile prices by 20 percent which implied considerable consumer gains thanks to higher market efficiency. Additionally, our results indicate that loyalty relevance and discrepancies in automobile reliability declined during those years. This is captured.
Resumo:
L’objectiu de la recerca és definir un marc teòric i metodològic per a l’estudi del canvi tecnològic en Arqueologia. Aquest model posa èmfasi en caracteritzar els compromisos que configuren una tecnologia i avaluar-los en funció dels factors de situació —tècnics, econòmics, polítics, socials i ideològics. S’ha aplicat aquest model a un cas d’estudi concret: la producció d’àmfores romanes durant el canvi d’Era en la província Tarraconensis. L’estudi tecnològic dels envasos s’ha realitzat mitjançant diverses tècniques analítiques: Fluorescència de raigs X (FRX), Difracció de raigs X (DRX), Microscòpia òptica (MO) i Microscòpia electrònica de rastreig (MER). Les dades obtingudes permeten, a més, establir els grups de referència per a cada centre productor d’àmfores i, així, identificar la provinença dels individus recuperats en els centres consumidors. Donat que les àmfores en estudi són artefactes dissenyats específicament per a ser estibats en una nau i servir com a envàs de transport, l’estudi inclou la caracterització de les propietats mecàniques de resistència a la fractura i de tenacitat. En aquest sentit, i per primera vegada, s’ha aplicat l’Anàlisi d’Elements Finits (AEF) per a conèixer el comportament dels diferents dissenys d’àmfora en ésser sotmesos a diverses forces d’ús. L’AEF permet simular per ordinador les activitats en què les àmfores haurien participat durant el seu ús i avaluar-ne el seu comportament tècnic. Els resultats mostren una gran adequació entre les formulacions teòriques i el programa analític implementat per a aquest estudi. Respecte el cas d’estudi, els resultats mostren una gran variabilitat en les eleccions tecnològiques preses pels ceramistes de diferents tallers, però també al llarg del període de funcionament d’un mateix taller. L’aplicació del model ha permès proposar una explicació al canvi de disseny de les àmfores romanes.
Resumo:
This paper explores the effects of two main sources of innovation -intramural and external R&D- on the productivity level in a sample of 3,267 Catalonian firms. The data set used is based on the official innovation survey of Catalonia which was a part of the Spanish sample of CIS4, covering the years 2002-2004. We compare empirical results by applying usual OLS and quantile regression techniques both in manufacturing and services industries. In quantile regression, results suggest different patterns at both innovation sources as we move across conditional quantiles. The elasticity of intramural R&D activities on productivity decreased when we move up the high productivity levels both in manufacturing and services sectors, while the effects of external R&D rise in high-technology industries but are more ambiguous in low-technology and knowledge-intensive services. JEL codes: O300, C100, O140. Keywords: Innovation sources, R&D, Productivity, Quantile regression