939 resultados para STATISTICAL DATA
Resumo:
Within the framework of a retrospective study of the incidence of hip fractures in the canton of Vaud (Switzerland), all cases of hip fracture occurring among the resident population in 1986 and treated in the hospitals of the canton were identified from among five different information sources. Relevant data were then extracted from the medical records. At least two sources of information were used to identify cases in each hospital, among them the statistics of the Swiss Hospital Association (VESKA). These statistics were available for 9 of the 18 hospitals in the canton that participated in the study. The number of cases identified from the VESKA statistics was compared to the total number of cases for each hospital. For the 9 hospitals the number of cases in the VESKA statistics was 407, whereas, after having excluded diagnoses that were actually "status after fracture" and double entries, the total for these hospitals was 392, that is 4% less than the VESKA statistics indicate. It is concluded that the VESKA statistics provide a good approximation of the actual number of cases treated in these hospitals, with a tendency to overestimate this number. In order to use these statistics for calculating incidence figures, however, it is imperative that a greater proportion of all hospitals (50% presently in the canton, 35% nationwide) participate in these statistics.
Resumo:
This paper presents reflexions about statistical considerations on illicit drug profiling and more specifically about the calculation of threshold for determining of the seizure are linked or not. The specific case of heroin and cocaine profiling is presented with the necessary details on the target profiling variables (major alkaloids) selected and the analytical method used. Statistical approach to compare illicit drug seizures is also presented with the introduction of different scenarios dealing with different data pre-treatment or transformation of variables.The main aim consists to demonstrate the influence of data pre-treatment on the statistical outputs. A thorough study of the evolution of the true positive rate (TP) and the false positive rate (FP) in heroin and cocaine comparison is then proposed to investigate this specific topic and to demonstrate that there is no universal approach available and that the calculations have to be revaluate for each new specific application.
Resumo:
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.
Resumo:
We explore in depth the validity of a recently proposed scaling law for earthquake inter-event time distributions in the case of the Southern California, using the waveform cross-correlation catalog of Shearer et al. Two statistical tests are used: on the one hand, the standard two-sample Kolmogorov-Smirnov test is in agreement with the scaling of the distributions. On the other hand, the one-sample Kolmogorov-Smirnov statistic complemented with Monte Carlo simulation of the inter-event times, as done by Clauset et al., supports the validity of the gamma distribution as a simple model of the scaling function appearing on the scaling law, for rescaled inter-event times above 0.01, except for the largest data set (magnitude greater than 2). A discussion of these results is provided.
Resumo:
CONTEXT: Several genetic risk scores to identify asymptomatic subjects at high risk of developing type 2 diabetes mellitus (T2DM) have been proposed, but it is unclear whether they add extra information to risk scores based on clinical and biological data. OBJECTIVE: The objective of the study was to assess the extra clinical value of genetic risk scores in predicting the occurrence of T2DM. DESIGN: This was a prospective study, with a mean follow-up time of 5 yr. SETTING AND SUBJECTS: The study included 2824 nondiabetic participants (1548 women, 52 ± 10 yr). MAIN OUTCOME MEASURE: Six genetic risk scores for T2DM were tested. Four were derived from the literature and two were created combining all (n = 24) or shared (n = 9) single-nucleotide polymorphisms of the previous scores. A previously validated clinic + biological risk score for T2DM was used as reference. RESULTS: Two hundred seven participants (7.3%) developed T2DM during follow-up. On bivariate analysis, no differences were found for all but one genetic score between nondiabetic and diabetic participants. After adjusting for the validated clinic + biological risk score, none of the genetic scores improved discrimination, as assessed by changes in the area under the receiver-operating characteristic curve (range -0.4 to -0.1%), sensitivity (-2.9 to -1.0%), specificity (0.0-0.1%), and positive (-6.6 to +0.7%) and negative (-0.2 to 0.0%) predictive values. Similarly, no improvement in T2DM risk prediction was found: net reclassification index ranging from -5.3 to -1.6% and nonsignificant (P ≥ 0.49) integrated discrimination improvement. CONCLUSIONS: In this study, adding genetic information to a previously validated clinic + biological score does not seem to improve the prediction of T2DM.
Resumo:
Distribution of socio-economic features in urban space is an important source of information for land and transportation planning. The metropolization phenomenon has changed the distribution of types of professions in space and has given birth to different spatial patterns that the urban planner must know in order to plan a sustainable city. Such distributions can be discovered by statistical and learning algorithms through different methods. In this paper, an unsupervised classification method and a cluster detection method are discussed and applied to analyze the socio-economic structure of Switzerland. The unsupervised classification method, based on Ward's classification and self-organized maps, is used to classify the municipalities of the country and allows to reduce a highly-dimensional input information to interpret the socio-economic landscape. The cluster detection method, the spatial scan statistics, is used in a more specific manner in order to detect hot spots of certain types of service activities. The method is applied to the distribution services in the agglomeration of Lausanne. Results show the emergence of new centralities and can be analyzed in both transportation and social terms.
Resumo:
The advent and application of high-resolution array-based comparative genome hybridization (array CGH) has led to the detection of large numbers of copy number variants (CNVs) in patients with developmental delay and/or multiple congenital anomalies as well as in healthy individuals. The notion that CNVs are also abundantly present in the normal population challenges the interpretation of the clinical significance of detected CNVs in patients. In this review we will illustrate a general clinical workflow based on our own experience that can be used in routine diagnostics for the interpretation of CNVs.
Resumo:
It is well known that dichotomizing continuous data has the effect to decrease statistical power when the goal is to test for a statistical association between two variables. Modern researchers however are focusing not only on statistical significance but also on an estimation of the "effect size" (i.e., the strength of association between the variables) to judge whether a significant association is also clinically relevant. In this article, we are interested in the consequences of dichotomizing continuous data on the value of an effect size in some classical settings. It turns out that the conclusions will not be the same whether using a correlation or an odds ratio to summarize the strength of association between the variables: Whereas the value of a correlation is typically decreased by a factor pi/2 after each dichotomization, the value of an odds ratio is at the same time raised to the power 2. From a descriptive statistical point of view, it is thus not clear whether dichotomizing continuous data leads to a decrease or to an increase in the effect size, as illustrated using a data set to investigate the relationship between motor and intellectual functions in children and adolescents
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuouslycored boreholes, 100 to 220m deep were drilled in the northern part of the PoPlain by Regione Lombardia in the last five years. Quantitative provenanceanalysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carriedout by using multivariate statistical analysis (principal component analysis, PCA,and similarity analysis) on an integrated data set, including high-resolution bulkpetrography and heavy-mineral analyses on Pleistocene sands and of 250 majorand minor modern rivers draining the southern flank of the Alps from West toEast (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations,metamorphic and quartzofeldspathic detritus from the Western and Central Alpswas carried from the axial belt to the Po basin longitudinally parallel to theSouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenariorapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset ofthe first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA andsimilarity analysis from core samples show that the longitudinal trunk river at thistime was shifted southward by the rapid southward and westward progradation oftransverse alluvial river systems fed from the Central and Southern Alps.Sediments were transported southward by braided river systems as well as glacialsediments transported by Alpine valley glaciers invaded the alluvial plain.Kew words: Detrital modes; Modern sands; Provenance; Principal ComponentsAnalysis; Similarity, Canberra Distance; palaeodrainage
Resumo:
Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics
Resumo:
Compositional data naturally arises from the scientific analysis of the chemicalcomposition of archaeological material such as ceramic and glass artefacts. Data of thistype can be explored using a variety of techniques, from standard multivariate methodssuch as principal components analysis and cluster analysis, to methods based upon theuse of log-ratios. The general aim is to identify groups of chemically similar artefactsthat could potentially be used to answer questions of provenance.This paper will demonstrate work in progress on the development of a documentedlibrary of methods, implemented using the statistical package R, for the analysis ofcompositional data. R is an open source package that makes available very powerfulstatistical facilities at no cost. We aim to show how, with the aid of statistical softwaresuch as R, traditional exploratory multivariate analysis can easily be used alongside, orin combination with, specialist techniques of compositional data analysis.The library has been developed from a core of basic R functionality, together withpurpose-written routines arising from our own research (for example that reported atCoDaWork'03). In addition, we have included other appropriate publicly availabletechniques and libraries that have been implemented in R by other authors. Availablefunctions range from standard multivariate techniques through to various approaches tolog-ratio analysis and zero replacement. We also discuss and demonstrate a smallselection of relatively new techniques that have hitherto been little-used inarchaeometric applications involving compositional data. The application of the libraryto the analysis of data arising in archaeometry will be demonstrated; results fromdifferent analyses will be compared; and the utility of the various methods discussed
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix
Resumo:
Developments in the statistical analysis of compositional data over the last twodecades have made possible a much deeper exploration of the nature of variability,and the possible processes associated with compositional data sets from manydisciplines. In this paper we concentrate on geochemical data sets. First we explainhow hypotheses of compositional variability may be formulated within the naturalsample space, the unit simplex, including useful hypotheses of subcompositionaldiscrimination and specific perturbational change. Then we develop through standardmethodology, such as generalised likelihood ratio tests, statistical tools to allow thesystematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require specialconstruction. We comment on the use of graphical methods in compositional dataanalysis and on the ordination of specimens. The recent development of the conceptof compositional processes is then explained together with the necessary tools for astaying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland.Finally we point out a number of unresolved problems in the statistical analysis ofcompositional processes