971 resultados para ALS data-set
Resumo:
The application of compositional data analysis through log ratio trans-formations corresponds to a multinomial logit model for the shares themselves.This model is characterized by the property of Independence of Irrelevant Alter-natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactlythis invariance of the ratio that underlies the commonly used zero replacementprocedure in compositional data analysis. In this paper we investigate using thenested logit model that does not embody IIA and an associated zero replacementprocedure and compare its performance with that of the more usual approach ofusing the multinomial logit model. Our comparisons exploit a data set that com-bines voting data by electoral division with corresponding census data for eachdivision for the 2001 Federal election in Australia
Resumo:
Developments in the statistical analysis of compositional data over the last twodecades have made possible a much deeper exploration of the nature of variability,and the possible processes associated with compositional data sets from manydisciplines. In this paper we concentrate on geochemical data sets. First we explainhow hypotheses of compositional variability may be formulated within the naturalsample space, the unit simplex, including useful hypotheses of subcompositionaldiscrimination and specific perturbational change. Then we develop through standardmethodology, such as generalised likelihood ratio tests, statistical tools to allow thesystematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require specialconstruction. We comment on the use of graphical methods in compositional dataanalysis and on the ordination of specimens. The recent development of the conceptof compositional processes is then explained together with the necessary tools for astaying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland.Finally we point out a number of unresolved problems in the statistical analysis ofcompositional processes
Resumo:
R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computingand graphics. The environment in which many classical and modern statistical techniques havebeen implemented, but many are supplied as packages. There are 8 standard packages and many moreare available through the cran family of Internet sites http://cran.r-project.org .We started to develop a library of functions in R to support the analysis of mixtures and our goal isa MixeR package for compositional data analysis that provides support foroperations on compositions: perturbation and power multiplication, subcomposition with or withoutresiduals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances,compositional Kullback-Leibler divergence etc.graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features:barycenter, geometric mean of the data set, the percentiles lines, marking and coloring ofsubsets of the data set, theirs geometric means, notation of individual data in the set . . .dealing with zeros and missing values in compositional data sets with R procedures for simpleand multiplicative replacement strategy,the time series analysis of compositional data.We’ll present the current status of MixeR development and illustrate its use on selected data sets
Resumo:
Seafloor imagery is a rich source of data for the study of biological and geological processes. Among several applications, still images of the ocean floor can be used to build image composites referred to as photo-mosaics. Photo-mosaics provide a wide-area visual representation of the benthos, and enable applications as diverse as geological surveys, mapping and detection of temporal changes in the morphology of biodiversity. We present an approach for creating globally aligned photo-mosaics using 3D position estimates provided by navigation sensors available in deep water surveys. Without image registration, such navigation data does not provide enough accuracy to produce useful composite images. Results from a challenging data set of the Lucky Strike vent field at the Mid Atlantic Ridge are reported
Resumo:
Empirical literature on the analysis of the efficiency of measures for reducing persistent government deficits has mainly focused on the direct explanation of deficit. By contrast, this paper aims at modeling government revenue and expenditure within a simultaneous framework and deriving the fiscal balance (surplus or deficit) equation as the difference between the two variables. This setting enables one to not only judge how relevant the explanatory variables are in explaining the fiscal balance but also understand their impact on revenue and/or expenditure. Our empirical results, obtained by using a panel data set on Swiss Cantons for the period 1980-2002, confirm the relevance of the approach followed here, by providing unambiguous evidence of a simultaneous relationship between revenue and expenditure. They also reveal strong dynamic components in revenue, expenditure, and fiscal balance. Among the significant determinants of public fiscal balance we not only find the usual business cycle elements, but also and more importantly institutional factors such as the number of administrative units, and the ease with which people can resort to political (direct democracy) instruments, such as public initiatives and referendum.
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced
Resumo:
This paper describes the construction of series of educational attainment of the adult population insample of 21 OECD countries covering the period 1960-2010. These series are a revised version of the data set described in de la Fuente and Doménech (2002)
Resumo:
A fundamental question in developmental biology is how tissues are patterned to give rise to differentiated body structures with distinct morphologies. The Drosophila wing disc offers an accessible model to understand epithelial spatial patterning. It has been studied extensively using genetic and molecular approaches. Bristle patterns on the thorax, which arise from the medial part of the wing disc, are a classical model of pattern formation, dependent on a pre-pattern of trans-activators and –repressors. Despite of decades of molecular studies, we still only know a subset of the factors that determine the pre-pattern. We are applying a novel and interdisciplinary approach to predict regulatory interactions in this system. It is based on the description of expression patterns by simple logical relations (addition, subtraction, intersection and union) between simple shapes (graphical primitives). Similarities and relations between primitives have been shown to be predictive of regulatory relationships between the corresponding regulatory factors in other Systems, such as the Drosophila egg. Furthermore, they provide the basis for dynamical models of the bristle-patterning network, which enable us to make even more detailed predictions on gene regulation and expression dynamics. We have obtained a data-set of wing disc expression patterns which we are now processing to obtain average expression patterns for each gene. Through triangulation of the images we can transform the expression patterns into vectors which can easily be analysed by Standard clustering methods. These analyses will allow us to identify primitives and regulatory interactions. We expect to identify new regulatory interactions and to understand the basic Dynamics of the regulatory network responsible for thorax patterning. These results will provide us with a better understanding of the rules governing gene regulatory networks in general, and provide the basis for future studies of the evolution of the thorax-patterning network in particular.
Resumo:
When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.
Resumo:
Do the contests with the largest prizes attract the most able contestants? Towhat extent do contestants avoid competition? In this paper, we show, theoreticallyand empirically, that the distribution of abilities plays a crucial role in determiningcontest choice. Sorting exists only when the proportion of high-ability contestantsis sufficiently small. As this proportion increases, contestants shy away from competitionand sorting decreases, such that, reverse sorting becomes a possibility. Wetest our theoretical predictions using a large panel data set containing contest choiceover three decades. We use exogenous variation in the participation of highly-ablecompetitors to provide empirical evidence for the relationship among prizes, competition,and sorting.
Resumo:
Network formation within the BRITE--EURAM program is investigated.Wedescribe the role of the hub of the network, which is defined as the setofmain contractors that account for most of the participations. We studytheeffects that the conflict of objectives within European research fundingbetween pre-competitive research vs. European cohesion has on theformationof networks and on the relationship between different partnersof the network. \\A panel data set is constructed including the second and third frameworkof theBrite--Euram program. A model of joint production of research results isusedto test for changes in the behavior of partners within the twoframeworks. \\The main findings are that participations are very concentrated, that isasmall group of institutions account for most of the participations, butgoingfrom the second to the third framework the presence of subcontractorsand singleparticipants increases substantially. This result is reinforced by the factthat main contractors receive smaller spill-ins within networks, butspill-insincrease from the second to the third framework.
Resumo:
We study the effect of providing relative performance feedback information onperformance, when individuals are rewarded according to their absolute performance. Anatural experiment that took place in a high school offers an unusual opportunity to testthis effect in a real-effort setting. For one year only, students received information thatallowed them to know whether they were performing above (below) the class average aswell as the distance from this average. We exploit a rich panel data set and find that theprovision of this information led to an increase of 5% in students grades. Moreover, theeffect was significant for the whole distribution. However, once the information wasremoved, the effect disappeared. To rule out the concern that the effect may beartificially driven by teachers within the school, we verify our results using nationallevel exams (externally graded) for the same students, and the effect remains.
Resumo:
Despite the advancement of phylogenetic methods to estimate speciation and extinction rates, their power can be limited under variable rates, in particular for clades with high extinction rates and small number of extant species. Fossil data can provide a powerful alternative source of information to investigate diversification processes. Here, we present PyRate, a computer program to estimate speciation and extinction rates and their temporal dynamics from fossil occurrence data. The rates are inferred in a Bayesian framework and are comparable to those estimated from phylogenetic trees. We describe how PyRate can be used to explore different models of diversification. In addition to the diversification rates, it provides estimates of the parameters of the preservation process (fossilization and sampling) and the times of speciation and extinction of each species in the data set. Moreover, we develop a new birth-death model to correlate the variation of speciation/extinction rates with changes of a continuous trait. Finally, we demonstrate the use of Bayes factors for model selection and show how the posterior estimates of a PyRate analysis can be used to generate calibration densities for Bayesian molecular clock analysis. PyRate is an open-source command-line Python program available at http://sourceforge.net/projects/pyrate/.
Resumo:
In spite of increasing representation of women in politics, little is known about their impact onpolicies. Comparing outcomes of parliaments with different shares of female members does not identifytheir causal impact because of possible differences in the underlying electorate. This paper usesa unique data set on voting decisions to sheds new light on gender gaps in policy making. Ouranalysis focuses on Switzerland, where all citizens can directly decide on a broad range of policiesin referendums and initiatives. We show that there are large gender gaps in the areas of health,environmental protection, defense spending and welfare policy which typically persist even conditionalon socio-economic characteristics. We also find that female policy makers have a substantial effect onthe composition of public spending, but a small effect on the overall size of government.