919 resultados para exploratory spatial data analysis
Resumo:
One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By anessential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur inmany compositional situations, such as household budget patterns, time budgets,palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful insuch situations. From consideration of such examples it seems sensible to build up amodel in two stages, the first determining where the zeros will occur and the secondhow the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data
Resumo:
BACKGROUND: American College of Cardiology/American Heart Association guidelines for the diagnosis and management of heart failure recommend investigating exacerbating conditions such as thyroid dysfunction, but without specifying the impact of different thyroid-stimulation hormone (TSH) levels. Limited prospective data exist on the association between subclinical thyroid dysfunction and heart failure events. METHODS AND RESULTS: We performed a pooled analysis of individual participant data using all available prospective cohorts with thyroid function tests and subsequent follow-up of heart failure events. Individual data on 25 390 participants with 216 248 person-years of follow-up were supplied from 6 prospective cohorts in the United States and Europe. Euthyroidism was defined as TSH of 0.45 to 4.49 mIU/L, subclinical hypothyroidism as TSH of 4.5 to 19.9 mIU/L, and subclinical hyperthyroidism as TSH <0.45 mIU/L, the last two with normal free thyroxine levels. Among 25 390 participants, 2068 (8.1%) had subclinical hypothyroidism and 648 (2.6%) had subclinical hyperthyroidism. In age- and sex-adjusted analyses, risks of heart failure events were increased with both higher and lower TSH levels (P for quadratic pattern <0.01); the hazard ratio was 1.01 (95% confidence interval, 0.81-1.26) for TSH of 4.5 to 6.9 mIU/L, 1.65 (95% confidence interval, 0.84-3.23) for TSH of 7.0 to 9.9 mIU/L, 1.86 (95% confidence interval, 1.27-2.72) for TSH of 10.0 to 19.9 mIU/L (P for trend <0.01) and 1.31 (95% confidence interval, 0.88-1.95) for TSH of 0.10 to 0.44 mIU/L and 1.94 (95% confidence interval, 1.01-3.72) for TSH <0.10 mIU/L (P for trend=0.047). Risks remained similar after adjustment for cardiovascular risk factors. CONCLUSION: Risks of heart failure events were increased with both higher and lower TSH levels, particularly for TSH ≥10 and <0.10 mIU/L.
Resumo:
The aim of this paper is to analyse the impact of university knowledge and technology transfer activities on academic research output. Specifically, we study whether researchers with collaborative links with the private sector publish less than their peers without such links, once controlling for other sources of heterogeneity. We report findings from a longitudinal dataset on researchers from two engineering departments in the UK between 1985 until 2006. Our results indicate that researchers with industrial links publish significantly more than their peers. Academic productivity, though, is higher for low levels of industry involvement as compared to high levels.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
One of the disadvantages of old age is that there is more past than future: this,however, may be turned into an advantage if the wealth of experience and, hopefully,wisdom gained in the past can be reflected upon and throw some light on possiblefuture trends. To an extent, then, this talk is necessarily personal, certainly nostalgic,but also self critical and inquisitive about our understanding of the discipline ofstatistics. A number of almost philosophical themes will run through the talk: searchfor appropriate modelling in relation to the real problem envisaged, emphasis onsensible balances between simplicity and complexity, the relative roles of theory andpractice, the nature of communication of inferential ideas to the statistical layman, theinter-related roles of teaching, consultation and research. A list of keywords might be:identification of sample space and its mathematical structure, choices betweentransform and stay, the role of parametric modelling, the role of a sample spacemetric, the underused hypothesis lattice, the nature of compositional change,particularly in relation to the modelling of processes. While the main theme will berelevance to compositional data analysis we shall point to substantial implications forgeneral multivariate analysis arising from experience of the development ofcompositional data analysis…
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods.
Resumo:
The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.
Resumo:
The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.
Resumo:
The present research deals with an application of artificial neural networks for multitask learning from spatial environmental data. The real case study (sediments contamination of Geneva Lake) consists of 8 pollutants. There are different relationships between these variables, from linear correlations to strong nonlinear dependencies. The main idea is to construct a subsets of pollutants which can be efficiently modeled together within the multitask framework. The proposed two-step approach is based on: 1) the criterion of nonlinear predictability of each variable ?k? by analyzing all possible models composed from the rest of the variables by using a General Regression Neural Network (GRNN) as a model; 2) a multitask learning of the best model using multilayer perceptron and spatial predictions. The results of the study are analyzed using both machine learning and geostatistical tools.
Resumo:
In response to the mandate on Load and Resistance Factor Design (LRFD) implementations by the Federal Highway Administration (FHWA) on all new bridge projects initiated after October 1, 2007, the Iowa Highway Research Board (IHRB) sponsored these research projects to develop regional LRFD recommendations. The LRFD development was performed using the Iowa Department of Transportation (DOT) Pile Load Test database (PILOT). To increase the data points for LRFD development, develop LRFD recommendations for dynamic methods, and validate the results ofLRFD calibration, 10 full-scale field tests on the most commonly used steel H-piles (e.g., HP 10 x 42) were conducted throughout Iowa. Detailed in situ soil investigations were carried out, push-in pressure cells were installed, and laboratory soil tests were performed. Pile responses during driving, at the end of driving (EOD), and at re-strikes were monitored using the Pile Driving Analyzer (PDA), following with the CAse Pile Wave Analysis Program (CAPWAP) analysis. The hammer blow counts were recorded for Wave Equation Analysis Program (WEAP) and dynamic formulas. Static load tests (SLTs) were performed and the pile capacities were determined based on the Davisson’s criteria. The extensive experimental research studies generated important data for analytical and computational investigations. The SLT measured loaddisplacements were compared with the simulated results obtained using a model of the TZPILE program and using the modified borehole shear test method. Two analytical pile setup quantification methods, in terms of soil properties, were developed and validated. A new calibration procedure was developed to incorporate pile setup into LRFD.
Resumo:
This report presents the results of work zone field data analyzed on interstate highways in Missouri to determine the mean breakdown and queue-discharge flow rates as measures of capacity. Several days of traffic data collected at a work zone near Pacific, Missouri with a speed limit of 50 mph were analyzed in both the eastbound and westbound directions. As a result, a total of eleven breakdown events were identified using average speed profiles. The traffic flows prior to and after the onset of congestion were studied. Breakdown flow rates ranged between 1194 to 1404 vphpl, with an average of 1295 vphpl, and a mean queue discharge rate of 1072 vphpl was determined. Mean queue discharge, as used by the Highway Capacity Manual 2000 (HCM), in terms of pcphpl was found to be 1199, well below the HCM’s average capacity of 1600 pcphpl. This reduced capacity found at the site is attributable mainly to narrower lane width and higher percentage of heavy vehicles, around 25%, in the traffic stream. The difference found between mean breakdown flow (1295 vphpl) and queue-discharge flow (1072 vphpl) has been observed widely, and is due to reduced traffic flow once traffic breaks down and queues start to form. The Missouri DOT currently uses a spreadsheet for work zone planning applications that assumes the same values of breakdown and mean queue discharge flow rates. This study proposes that breakdown flow rates should be used to forecast the onset of congestion, whereas mean queue discharge flow rates should be used to estimate delays under congested conditions. Hence, it is recommended that the spreadsheet be refined accordingly.
Resumo:
In response to the mandate on Load and Resistance Factor Design (LRFD) implementations by the Federal Highway Administration (FHWA) on all new bridge projects initiated after October 1, 2007, the Iowa Highway Research Board (IHRB) sponsored these research projects to develop regional LRFD recommendations. The LRFD development was performed using the Iowa Department of Transportation (DOT) Pile Load Test database (PILOT). To increase the data points for LRFD development, develop LRFD recommendations for dynamic methods, and validate the results of LRFD calibration, 10 full-scale field tests on the most commonly used steel H-piles (e.g., HP 10 x 42) were conducted throughout Iowa. Detailed in situ soil investigations were carried out, push-in pressure cells were installed, and laboratory soil tests were performed. Pile responses during driving, at the end of driving (EOD), and at re-strikes were monitored using the Pile Driving Analyzer (PDA), following with the CAse Pile Wave Analysis Program (CAPWAP) analysis. The hammer blow counts were recorded for Wave Equation Analysis Program (WEAP) and dynamic formulas. Static load tests (SLTs) were performed and the pile capacities were determined based on the Davisson’s criteria. The extensive experimental research studies generated important data for analytical and computational investigations. The SLT measured load-displacements were compared with the simulated results obtained using a model of the TZPILE program and using the modified borehole shear test method. Two analytical pile setup quantification methods, in terms of soil properties, were developed and validated. A new calibration procedure was developed to incorporate pile setup into LRFD.
Resumo:
Background: Guidelines of the Diagnosis and Management of Heart Failure (HF) recommend investigating exacerbating conditions, such as thyroid dysfunction, but without specifying impact of different TSH levels. Limited prospective data exist regarding the association between subclinical thyroid dysfunction and HF events. Methods: We performed a pooled analysis of individual participant data using all available prospective cohorts with thyroid function tests and subsequent follow-up of HF events. Individual data on 25,390 participants with 216,247 person-years of follow-up were supplied from 6 prospective cohorts in the United States and Europe. Euthyroidism was defined as TSH 0.45-4.49 mIU/L, subclinical hypothyroidism as TSH 4.5-19.9 mIU/L and subclinical hyperthyroidism as TSH <0.45 mIU/L, both with normal free thyroxine levels. HF events were defined as acute HF events, hospitalization or death related to HF events. Results: Among 25,390 participants, 2068 had subclinical hypothyroidism (8.1%) and 648 subclinical hyperthyroidism (2.6%). In age- and gender-adjusted analyses, risks of HF events were increased with both higher and lower TSH levels (P for quadratic pattern<0.01): hazard ratio (HR) was 1.01 (95% confidence interval [CI] 0.81-1.26) for TSH 4.5-6.9 mIU/L, 1.65 (CI 0.84-3.23) for TSH 7.0-9.9 mIU/L, 1.86 (CI 1.27-2.72) for TSH 10.0-19.9 mIUL/L (P for trend <0.01), and was 1.31 (CI 0.88-1.95) for TSH 0.10-0.44 mIU/L and 1.94 (CI 1.01-3.72) for TSH <0.10 mIU/L (P for trend=0.047). Risks remained similar after adjustment for cardiovascular risk factors. Conclusion: Risks of HF events were increased with both higher and lower TSH levels, particularly for TSH ≥10 mIU/L and for TSH <0.10 mIU/L. Our findings might help to interpret TSH levels in the prevention and investigation of HF.