925 resultados para Automated data analysis
Resumo:
Usually, psychometricians apply classical factorial analysis to evaluate construct validity of order rankscales. Nevertheless, these scales have particular characteristics that must be taken into account: totalscores and rank are highly relevant
Resumo:
Isotopic data are currently becoming an important source of information regardingsources, evolution and mixing processes of water in hydrogeologic systems. However, itis not clear how to treat with statistics the geochemical data and the isotopic datatogether. We propose to introduce the isotopic information as new parts, and applycompositional data analysis with the resulting increased composition. Results areequivalent to downscale the classical isotopic delta variables, because they are alreadyrelative (as needed in the compositional framework) and isotopic variations are almostalways very small. This methodology is illustrated and tested with the study of theLlobregat River Basin (Barcelona, NE Spain), where it is shown that, though verysmall, isotopic variations comp lement geochemical principal components, and help inthe better identification of pollution sources
Resumo:
In the eighties, John Aitchison (1986) developed a new methodological approach for the statistical analysis of compositional data. This new methodology was implemented in Basic routines grouped under the name CODA and later NEWCODA inMatlab (Aitchison, 1997). After that, several other authors have published extensions to this methodology: Marín-Fernández and others (2000), Barceló-Vidal and others (2001), Pawlowsky-Glahn and Egozcue (2001, 2002) and Egozcue and others (2003). (...)
Resumo:
One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By anessential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur inmany compositional situations, such as household budget patterns, time budgets,palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful insuch situations. From consideration of such examples it seems sensible to build up amodel in two stages, the first determining where the zeros will occur and the secondhow the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data
Resumo:
First discussion on compositional data analysis is attributable to Karl Pearson, in 1897. However, notwithstanding the recent developments on algebraic structure of the simplex, more than twenty years after Aitchison’s idea of log-transformations of closed data, scientific literature is again full of statistical treatments of this type of data by using traditional methodologies. This is particularly true in environmental geochemistry where besides the problem of the closure, the spatial structure (dependence) of the data have to be considered. In this work we propose the use of log-contrast values, obtained by asimplicial principal component analysis, as LQGLFDWRUV of given environmental conditions. The investigation of the log-constrast frequency distributions allows pointing out the statistical laws able togenerate the values and to govern their variability. The changes, if compared, for example, with the mean values of the random variables assumed as models, or other reference parameters, allow definingmonitors to be used to assess the extent of possible environmental contamination. Case study on running and ground waters from Chiavenna Valley (Northern Italy) by using Na+, K+, Ca2+, Mg2+, HCO3-, SO4 2- and Cl- concentrations will be illustrated
Resumo:
BACKGROUND: American College of Cardiology/American Heart Association guidelines for the diagnosis and management of heart failure recommend investigating exacerbating conditions such as thyroid dysfunction, but without specifying the impact of different thyroid-stimulation hormone (TSH) levels. Limited prospective data exist on the association between subclinical thyroid dysfunction and heart failure events. METHODS AND RESULTS: We performed a pooled analysis of individual participant data using all available prospective cohorts with thyroid function tests and subsequent follow-up of heart failure events. Individual data on 25 390 participants with 216 248 person-years of follow-up were supplied from 6 prospective cohorts in the United States and Europe. Euthyroidism was defined as TSH of 0.45 to 4.49 mIU/L, subclinical hypothyroidism as TSH of 4.5 to 19.9 mIU/L, and subclinical hyperthyroidism as TSH <0.45 mIU/L, the last two with normal free thyroxine levels. Among 25 390 participants, 2068 (8.1%) had subclinical hypothyroidism and 648 (2.6%) had subclinical hyperthyroidism. In age- and sex-adjusted analyses, risks of heart failure events were increased with both higher and lower TSH levels (P for quadratic pattern <0.01); the hazard ratio was 1.01 (95% confidence interval, 0.81-1.26) for TSH of 4.5 to 6.9 mIU/L, 1.65 (95% confidence interval, 0.84-3.23) for TSH of 7.0 to 9.9 mIU/L, 1.86 (95% confidence interval, 1.27-2.72) for TSH of 10.0 to 19.9 mIU/L (P for trend <0.01) and 1.31 (95% confidence interval, 0.88-1.95) for TSH of 0.10 to 0.44 mIU/L and 1.94 (95% confidence interval, 1.01-3.72) for TSH <0.10 mIU/L (P for trend=0.047). Risks remained similar after adjustment for cardiovascular risk factors. CONCLUSION: Risks of heart failure events were increased with both higher and lower TSH levels, particularly for TSH ≥10 and <0.10 mIU/L.
Resumo:
The aim of this paper is to analyse the impact of university knowledge and technology transfer activities on academic research output. Specifically, we study whether researchers with collaborative links with the private sector publish less than their peers without such links, once controlling for other sources of heterogeneity. We report findings from a longitudinal dataset on researchers from two engineering departments in the UK between 1985 until 2006. Our results indicate that researchers with industrial links publish significantly more than their peers. Academic productivity, though, is higher for low levels of industry involvement as compared to high levels.
Resumo:
One of the disadvantages of old age is that there is more past than future: this,however, may be turned into an advantage if the wealth of experience and, hopefully,wisdom gained in the past can be reflected upon and throw some light on possiblefuture trends. To an extent, then, this talk is necessarily personal, certainly nostalgic,but also self critical and inquisitive about our understanding of the discipline ofstatistics. A number of almost philosophical themes will run through the talk: searchfor appropriate modelling in relation to the real problem envisaged, emphasis onsensible balances between simplicity and complexity, the relative roles of theory andpractice, the nature of communication of inferential ideas to the statistical layman, theinter-related roles of teaching, consultation and research. A list of keywords might be:identification of sample space and its mathematical structure, choices betweentransform and stay, the role of parametric modelling, the role of a sample spacemetric, the underused hypothesis lattice, the nature of compositional change,particularly in relation to the modelling of processes. While the main theme will berelevance to compositional data analysis we shall point to substantial implications forgeneral multivariate analysis arising from experience of the development ofcompositional data analysis…
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods.
Resumo:
The structural modeling of spatial dependence, using a geostatistical approach, is an indispensable tool to determine parameters that define this structure, applied on interpolation of values at unsampled points by kriging techniques. However, the estimation of parameters can be greatly affected by the presence of atypical observations in sampled data. The purpose of this study was to use diagnostic techniques in Gaussian spatial linear models in geostatistics to evaluate the sensitivity of maximum likelihood and restrict maximum likelihood estimators to small perturbations in these data. For this purpose, studies with simulated and experimental data were conducted. Results with simulated data showed that the diagnostic techniques were efficient to identify the perturbation in data. The results with real data indicated that atypical values among the sampled data may have a strong influence on thematic maps, thus changing the spatial dependence structure. The application of diagnostic techniques should be part of any geostatistical analysis, to ensure a better quality of the information from thematic maps.
Resumo:
The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.
Resumo:
In response to the mandate on Load and Resistance Factor Design (LRFD) implementations by the Federal Highway Administration (FHWA) on all new bridge projects initiated after October 1, 2007, the Iowa Highway Research Board (IHRB) sponsored these research projects to develop regional LRFD recommendations. The LRFD development was performed using the Iowa Department of Transportation (DOT) Pile Load Test database (PILOT). To increase the data points for LRFD development, develop LRFD recommendations for dynamic methods, and validate the results ofLRFD calibration, 10 full-scale field tests on the most commonly used steel H-piles (e.g., HP 10 x 42) were conducted throughout Iowa. Detailed in situ soil investigations were carried out, push-in pressure cells were installed, and laboratory soil tests were performed. Pile responses during driving, at the end of driving (EOD), and at re-strikes were monitored using the Pile Driving Analyzer (PDA), following with the CAse Pile Wave Analysis Program (CAPWAP) analysis. The hammer blow counts were recorded for Wave Equation Analysis Program (WEAP) and dynamic formulas. Static load tests (SLTs) were performed and the pile capacities were determined based on the Davisson’s criteria. The extensive experimental research studies generated important data for analytical and computational investigations. The SLT measured loaddisplacements were compared with the simulated results obtained using a model of the TZPILE program and using the modified borehole shear test method. Two analytical pile setup quantification methods, in terms of soil properties, were developed and validated. A new calibration procedure was developed to incorporate pile setup into LRFD.
Resumo:
This report presents the results of work zone field data analyzed on interstate highways in Missouri to determine the mean breakdown and queue-discharge flow rates as measures of capacity. Several days of traffic data collected at a work zone near Pacific, Missouri with a speed limit of 50 mph were analyzed in both the eastbound and westbound directions. As a result, a total of eleven breakdown events were identified using average speed profiles. The traffic flows prior to and after the onset of congestion were studied. Breakdown flow rates ranged between 1194 to 1404 vphpl, with an average of 1295 vphpl, and a mean queue discharge rate of 1072 vphpl was determined. Mean queue discharge, as used by the Highway Capacity Manual 2000 (HCM), in terms of pcphpl was found to be 1199, well below the HCM’s average capacity of 1600 pcphpl. This reduced capacity found at the site is attributable mainly to narrower lane width and higher percentage of heavy vehicles, around 25%, in the traffic stream. The difference found between mean breakdown flow (1295 vphpl) and queue-discharge flow (1072 vphpl) has been observed widely, and is due to reduced traffic flow once traffic breaks down and queues start to form. The Missouri DOT currently uses a spreadsheet for work zone planning applications that assumes the same values of breakdown and mean queue discharge flow rates. This study proposes that breakdown flow rates should be used to forecast the onset of congestion, whereas mean queue discharge flow rates should be used to estimate delays under congested conditions. Hence, it is recommended that the spreadsheet be refined accordingly.