Biblioteca Digital

958 resultados para multivariate binary data

An R Library for Compositional Data Analysis in Archaeometry

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Compositional data naturally arises from the scientific analysis of the chemicalcomposition of archaeological material such as ceramic and glass artefacts. Data of thistype can be explored using a variety of techniques, from standard multivariate methodssuch as principal components analysis and cluster analysis, to methods based upon theuse of log-ratios. The general aim is to identify groups of chemically similar artefactsthat could potentially be used to answer questions of provenance.This paper will demonstrate work in progress on the development of a documentedlibrary of methods, implemented using the statistical package R, for the analysis ofcompositional data. R is an open source package that makes available very powerfulstatistical facilities at no cost. We aim to show how, with the aid of statistical softwaresuch as R, traditional exploratory multivariate analysis can easily be used alongside, orin combination with, specialist techniques of compositional data analysis.The library has been developed from a core of basic R functionality, together withpurpose-written routines arising from our own research (for example that reported atCoDaWork'03). In addition, we have included other appropriate publicly availabletechniques and libraries that have been implemented in R by other authors. Availablefunctions range from standard multivariate techniques through to various approaches tolog-ratio analysis and zero replacement. We also discuss and demonstrate a smallselection of relatively new techniques that have hitherto been little-used inarchaeometric applications involving compositional data. The application of the libraryto the analysis of data arising in archaeometry will be demonstrated; results fromdifferent analyses will be compared; and the utility of the various methods discussed

A compositional data analysis package for R providing multiple approaches

Relevância:

30.00% 30.00%

Publicador:

Resumo:

”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix

Exploration of geological variability and possible processes through the use of compositional data analysis: an example using scottish metamorphosed

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Developments in the statistical analysis of compositional data over the last twodecades have made possible a much deeper exploration of the nature of variability,and the possible processes associated with compositional data sets from manydisciplines. In this paper we concentrate on geochemical data sets. First we explainhow hypotheses of compositional variability may be formulated within the naturalsample space, the unit simplex, including useful hypotheses of subcompositionaldiscrimination and specific perturbational change. Then we develop through standardmethodology, such as generalised likelihood ratio tests, statistical tools to allow thesystematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require specialconstruction. We comment on the use of graphical methods in compositional dataanalysis and on the ordination of specimens. The recent development of the conceptof compositional processes is then explained together with the necessary tools for astaying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland.Finally we point out a number of unresolved problems in the statistical analysis ofcompositional processes

Assessing the Precision of Compositional Data in a Stratified Double Stage Cluster Sample: Application to the Swiss Earnings Structure Survey

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Precision of released figures is not only an important quality feature of official statistics,it is also essential for a good understanding of the data. In this paper we show a casestudy of how precision could be conveyed if the multivariate nature of data has to betaken into account. In the official release of the Swiss earnings structure survey, the totalsalary is broken down into several wage components. We follow Aitchison's approachfor the analysis of compositional data, which is based on logratios of components. Wefirst present diferent multivariate analyses of the compositional data whereby the wagecomponents are broken down by economic activity classes. Then we propose a numberof ways to assess precision

Multivariate ARIMA Compositional Time Series Analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A compositional time series is obtained when a compositional data vector is observed atdifferent points in time. Inherently, then, a compositional time series is a multivariatetime series with important constraints on the variables observed at any instance in time.Although this type of data frequently occurs in situations of real practical interest, atrawl through the statistical literature reveals that research in the field is very much in itsinfancy and that many theoretical and empirical issues still remain to be addressed. Anyappropriate statistical methodology for the analysis of compositional time series musttake into account the constraints which are not allowed for by the usual statisticaltechniques available for analysing multivariate time series. One general approach toanalyzing compositional time series consists in the application of an initial transform tobreak the positive and unit sum constraints, followed by the analysis of the transformedtime series using multivariate ARIMA models. In this paper we discuss the use of theadditive log-ratio, centred log-ratio and isometric log-ratio transforms. We also presentresults from an empirical study designed to explore how the selection of the initialtransform affects subsequent multivariate ARIMA modelling as well as the quality ofthe forecasts

Pulmonary embolism and 3-month outcomes in 4036 patients with venous thromboembolism and chronic obstructive pulmonary disease: data from the RIETE registry.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND Patients with chronic obstructive pulmonary disease (COPD) have a modified clinical presentation of venous thromboembolism (VTE) but also a worse prognosis than non-COPD patients with VTE. As it may induce therapeutic modifications, we evaluated the influence of the initial VTE presentation on the 3-month outcomes in COPD patients. METHODS COPD patients included in the on-going world-wide RIETE Registry were studied. The rate of pulmonary embolism (PE), major bleeding and death during the first 3 months in COPD patients were compared according to their initial clinical presentation (acute PE or deep vein thrombosis (DVT)). RESULTS Of the 4036 COPD patients included, 2452 (61%; 95% CI: 59.2-62.3) initially presented with PE. PE as the first VTE recurrence occurred in 116 patients, major bleeding in 101 patients and mortality in 443 patients (Fatal PE: first cause of death). Multivariate analysis confirmed that presenting with PE was associated with higher risk of VTE recurrence as PE (OR, 2.04; 95% CI: 1.11-3.72) and higher risk of fatal PE (OR, 7.77; 95% CI: 2.92-15.7). CONCLUSIONS COPD patients presenting with PE have an increased risk for PE recurrences and fatal PE compared with those presenting with DVT alone. More efficient therapy is needed in this subtype of patients.

Imputation in data fusion of heterogeneous data sets a model-based numerical experiment

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.

Binary proposal for assessing quality of Open Access Institutional Repositories : the case of Spanish repositories

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The reason for this study is to propose a new quantitative approach on how to assess the quality of Open Access University Institutional Repositories. The results of this new approach are tested in the Spanish University Repositories. The assessment method is based in a binary codification of a proposal of features that objectively describes the repositories. The purposes of this method are assessing the quality and an almost automatically system for updating the data of the characteristics. First of all a database was created with the 38 Spanish institutional repositories. The variables of analysis are presented and explained either if they are coming from bibliography or are a set of new variables. Among the characteristics analyzed are the features of the software, the services of the repository, the features of the information system, the Internet visibility and the licenses of use. Results from Spanish universities ARE provided as a practical example of the assessment and for having a picture of the state of the development of the open access movement in Spain.

Complicated intra-abdominal infections worldwide: the definitive data of the CIAOW Study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The CIAOW study (Complicated intra-abdominal infections worldwide observational study) is a multicenter observational study underwent in 68 medical institutions worldwide during a six-month study period (October 2012-March 2013). The study included patients older than 18 years undergoing surgery or interventional drainage to address complicated intra-abdominal infections (IAIs). 1898 patients with a mean age of 51.6 years (range 18-99) were enrolled in the study. 777 patients (41%) were women and 1,121 (59%) were men. Among these patients, 1,645 (86.7%) were affected by community-acquired IAIs while the remaining 253 (13.3%) suffered from healthcare-associated infections. Intraperitoneal specimens were collected from 1,190 (62.7%) of the enrolled patients. 827 patients (43.6%) were affected by generalized peritonitis while 1071 (56.4%) suffered from localized peritonitis or abscesses. The overall mortality rate was 10.5% (199/1898). According to stepwise multivariate analysis (PR = 0.005 and PE = 0.001), several criteria were found to be independent variables predictive of mortality, including patient age (OR = 1.1; 95%CI = 1.0-1.1; p < 0.0001), the presence of small bowel perforation (OR = 2.8; 95%CI = 1.5-5.3; p < 0.0001), a delayed initial intervention (a delay exceeding 24 hours) (OR = 1.8; 95%CI = 1.5-3.7; p < 0.0001), ICU admission (OR = 5.9; 95%CI = 3.6-9.5; p < 0.0001) and patient immunosuppression (OR = 3.8; 95%CI = 2.1-6.7; p < 0.0001).

A comparison of the alr and ilr transformations for kernel density estimation of compositional data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel densityestimation techniques in the context of compositional data analysis. Indeed, they gavetwo options for the choice of the kernel to be used in the kernel estimator. One ofthese kernels is based on the use the alr transformation on the simplex SD jointly withthe normal distribution on RD-1. However, these authors themselves recognized thatthis method has some deficiencies. A method for overcoming these dificulties based onrecent developments for compositional data analysis and multivariate kernel estimationtheory, combining the ilr transformation with the use of the normal density with a fullbandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu-Figueras (2006). Here we present an extensive simulation study that compares bothmethods in practice, thus exploring the finite-sample behaviour of both estimators

Truncated robust distance for clinical laboratory safety data monitoring and assessment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Laboratory safety data are routinely collected in clinical studies for safety monitoring and assessment. We have developed a truncated robust multivariate outlier detection method for identifying subjects with clinically relevant abnormal laboratory measurements. The proposed method can be applied to historical clinical data to establish a multivariate decision boundary that can then be used for future clinical trial laboratory safety data monitoring and assessment. Simulations demonstrate that the proposed method has the ability to detect relevant outliers while automatically excluding irrelevant outliers. Two examples from actual clinical studies are used to illustrate the use of this method for identifying clinically relevant outliers.

Subjective health assessments and active labor market participation of older men: evidence from a semiparametric binary choice model with nonadditive correlated individual-specific effects

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We use panel data from the U. S. Health and Retirement Study, 1992-2002, to estimate the effect of self-assessed health limitations on the active labor market participation of older men. Self-assessments of health are likely to be endogenous to labor supply due to justification bias and individual-specific heterogeneity in subjective evaluations. We address both concerns. We propose a semiparametric binary choice procedure that incorporates nonadditive correlated individual-specific effects. Our estimation strategy identifies and estimates the average partial effects of health and functioning on labor market participation. The results indicate that poor health plays a major role in labor market exit decisions.

A mixed approach for proving non-inferiority in clinical trials with binary endpoints

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When a new treatment is compared to an established one in a randomized clinical trial, it is standard practice to statistically test for non-inferiority rather than for superiority. When the endpoint is binary, one usually compares two treatments using either an odds-ratio or a difference of proportions. In this paper, we propose a mixed approach which uses both concepts. One first defines the non-inferiority margin using an odds-ratio and one ultimately proves non-inferiority statistically using a difference of proportions. The mixed approach is shown to be more powerful than the conventional odds-ratio approach when the efficacy of the established treatment is known (with good precision) and high (e.g. with more than 56% of success). The gain of power achieved may lead in turn to a substantial reduction in the sample size needed to prove non-inferiority. The mixed approach can be generalized to ordinal endpoints.

Some last thoughts on compositional data analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the disadvantages of old age is that there is more past than future: this,however, may be turned into an advantage if the wealth of experience and, hopefully,wisdom gained in the past can be reflected upon and throw some light on possiblefuture trends. To an extent, then, this talk is necessarily personal, certainly nostalgic,but also self critical and inquisitive about our understanding of the discipline ofstatistics. A number of almost philosophical themes will run through the talk: searchfor appropriate modelling in relation to the real problem envisaged, emphasis onsensible balances between simplicity and complexity, the relative roles of theory andpractice, the nature of communication of inferential ideas to the statistical layman, theinter-related roles of teaching, consultation and research. A list of keywords might be:identification of sample space and its mathematical structure, choices betweentransform and stay, the role of parametric modelling, the role of a sample spacemetric, the underused hypothesis lattice, the nature of compositional change,particularly in relation to the modelling of processes. While the main theme will berelevance to compositional data analysis we shall point to substantial implications forgeneral multivariate analysis arising from experience of the development ofcompositional data analysis…

A simple permutation test for clusteredness

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hierarchical clustering is a popular method for finding structure in multivariate data,resulting in a binary tree constructed on the particular objects of the study, usually samplingunits. The user faces the decision where to cut the binary tree in order to determine the numberof clusters to interpret and there are various ad hoc rules for arriving at a decision. A simplepermutation test is presented that diagnoses whether non-random levels of clustering are presentin the set of objects and, if so, indicates the specific level at which the tree can be cut. The test isvalidated against random matrices to verify the type I error probability and a power study isperformed on data sets with known clusteredness to study the type II error.

«
1
2
...
9
10
11
12
13
14
15
...
63
64
»