909 resultados para Exploratory data analysis (EDA)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The structural modeling of spatial dependence, using a geostatistical approach, is an indispensable tool to determine parameters that define this structure, applied on interpolation of values at unsampled points by kriging techniques. However, the estimation of parameters can be greatly affected by the presence of atypical observations in sampled data. The purpose of this study was to use diagnostic techniques in Gaussian spatial linear models in geostatistics to evaluate the sensitivity of maximum likelihood and restrict maximum likelihood estimators to small perturbations in these data. For this purpose, studies with simulated and experimental data were conducted. Results with simulated data showed that the diagnostic techniques were efficient to identify the perturbation in data. The results with real data indicated that atypical values among the sampled data may have a strong influence on thematic maps, thus changing the spatial dependence structure. The application of diagnostic techniques should be part of any geostatistical analysis, to ensure a better quality of the information from thematic maps.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In response to the mandate on Load and Resistance Factor Design (LRFD) implementations by the Federal Highway Administration (FHWA) on all new bridge projects initiated after October 1, 2007, the Iowa Highway Research Board (IHRB) sponsored these research projects to develop regional LRFD recommendations. The LRFD development was performed using the Iowa Department of Transportation (DOT) Pile Load Test database (PILOT). To increase the data points for LRFD development, develop LRFD recommendations for dynamic methods, and validate the results ofLRFD calibration, 10 full-scale field tests on the most commonly used steel H-piles (e.g., HP 10 x 42) were conducted throughout Iowa. Detailed in situ soil investigations were carried out, push-in pressure cells were installed, and laboratory soil tests were performed. Pile responses during driving, at the end of driving (EOD), and at re-strikes were monitored using the Pile Driving Analyzer (PDA), following with the CAse Pile Wave Analysis Program (CAPWAP) analysis. The hammer blow counts were recorded for Wave Equation Analysis Program (WEAP) and dynamic formulas. Static load tests (SLTs) were performed and the pile capacities were determined based on the Davisson’s criteria. The extensive experimental research studies generated important data for analytical and computational investigations. The SLT measured loaddisplacements were compared with the simulated results obtained using a model of the TZPILE program and using the modified borehole shear test method. Two analytical pile setup quantification methods, in terms of soil properties, were developed and validated. A new calibration procedure was developed to incorporate pile setup into LRFD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This report presents the results of work zone field data analyzed on interstate highways in Missouri to determine the mean breakdown and queue-discharge flow rates as measures of capacity. Several days of traffic data collected at a work zone near Pacific, Missouri with a speed limit of 50 mph were analyzed in both the eastbound and westbound directions. As a result, a total of eleven breakdown events were identified using average speed profiles. The traffic flows prior to and after the onset of congestion were studied. Breakdown flow rates ranged between 1194 to 1404 vphpl, with an average of 1295 vphpl, and a mean queue discharge rate of 1072 vphpl was determined. Mean queue discharge, as used by the Highway Capacity Manual 2000 (HCM), in terms of pcphpl was found to be 1199, well below the HCM’s average capacity of 1600 pcphpl. This reduced capacity found at the site is attributable mainly to narrower lane width and higher percentage of heavy vehicles, around 25%, in the traffic stream. The difference found between mean breakdown flow (1295 vphpl) and queue-discharge flow (1072 vphpl) has been observed widely, and is due to reduced traffic flow once traffic breaks down and queues start to form. The Missouri DOT currently uses a spreadsheet for work zone planning applications that assumes the same values of breakdown and mean queue discharge flow rates. This study proposes that breakdown flow rates should be used to forecast the onset of congestion, whereas mean queue discharge flow rates should be used to estimate delays under congested conditions. Hence, it is recommended that the spreadsheet be refined accordingly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In response to the mandate on Load and Resistance Factor Design (LRFD) implementations by the Federal Highway Administration (FHWA) on all new bridge projects initiated after October 1, 2007, the Iowa Highway Research Board (IHRB) sponsored these research projects to develop regional LRFD recommendations. The LRFD development was performed using the Iowa Department of Transportation (DOT) Pile Load Test database (PILOT). To increase the data points for LRFD development, develop LRFD recommendations for dynamic methods, and validate the results of LRFD calibration, 10 full-scale field tests on the most commonly used steel H-piles (e.g., HP 10 x 42) were conducted throughout Iowa. Detailed in situ soil investigations were carried out, push-in pressure cells were installed, and laboratory soil tests were performed. Pile responses during driving, at the end of driving (EOD), and at re-strikes were monitored using the Pile Driving Analyzer (PDA), following with the CAse Pile Wave Analysis Program (CAPWAP) analysis. The hammer blow counts were recorded for Wave Equation Analysis Program (WEAP) and dynamic formulas. Static load tests (SLTs) were performed and the pile capacities were determined based on the Davisson’s criteria. The extensive experimental research studies generated important data for analytical and computational investigations. The SLT measured load-displacements were compared with the simulated results obtained using a model of the TZPILE program and using the modified borehole shear test method. Two analytical pile setup quantification methods, in terms of soil properties, were developed and validated. A new calibration procedure was developed to incorporate pile setup into LRFD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Guidelines of the Diagnosis and Management of Heart Failure (HF) recommend investigating exacerbating conditions, such as thyroid dysfunction, but without specifying impact of different TSH levels. Limited prospective data exist regarding the association between subclinical thyroid dysfunction and HF events. Methods: We performed a pooled analysis of individual participant data using all available prospective cohorts with thyroid function tests and subsequent follow-up of HF events. Individual data on 25,390 participants with 216,247 person-years of follow-up were supplied from 6 prospective cohorts in the United States and Europe. Euthyroidism was defined as TSH 0.45-4.49 mIU/L, subclinical hypothyroidism as TSH 4.5-19.9 mIU/L and subclinical hyperthyroidism as TSH <0.45 mIU/L, both with normal free thyroxine levels. HF events were defined as acute HF events, hospitalization or death related to HF events. Results: Among 25,390 participants, 2068 had subclinical hypothyroidism (8.1%) and 648 subclinical hyperthyroidism (2.6%). In age- and gender-adjusted analyses, risks of HF events were increased with both higher and lower TSH levels (P for quadratic pattern<0.01): hazard ratio (HR) was 1.01 (95% confidence interval [CI] 0.81-1.26) for TSH 4.5-6.9 mIU/L, 1.65 (CI 0.84-3.23) for TSH 7.0-9.9 mIU/L, 1.86 (CI 1.27-2.72) for TSH 10.0-19.9 mIUL/L (P for trend <0.01), and was 1.31 (CI 0.88-1.95) for TSH 0.10-0.44 mIU/L and 1.94 (CI 1.01-3.72) for TSH <0.10 mIU/L (P for trend=0.047). Risks remained similar after adjustment for cardiovascular risk factors. Conclusion: Risks of HF events were increased with both higher and lower TSH levels, particularly for TSH ≥10 mIU/L and for TSH <0.10 mIU/L. Our findings might help to interpret TSH levels in the prevention and investigation of HF.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The factor structure of a back translated Spanish version (Lega, Caballo and Ellis, 2002) of the Attitudes and Beliefs Inventory (ABI) (Burgess, 1990) is analyzed in a sample of 250 university students.The Spanish version of the ABI is a 48-items self-report inventory using a 5-point Likert scale that assesses rational and irrational attitudes and beliefs. 24-items cover two dimensions of irrationality: a) areas of content (3 subscales), and b) styles of thinking (4 subscales).An Exploratory Factor Analysis (Parallel Analysis with Unweighted Least Squares method and Promin rotation) was performed with the FACTOR 9.20 software (Lorenzo-Seva and Ferrando, 2013).The results reproduced the main four styles of irrational thinking in relation with the three specific contents of irrational beliefs. However, two factors showed a complex configuration with important cross-loadings of different items in content and style. More analyses are needed to review the specific content and style of such items.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The factor structure of a back translated Spanish version (Lega, Caballo and Ellis, 2002) of the Attitudes and Beliefs Inventory (ABI) (Burgess, 1990) is analyzed in a sample of 250 university students.The Spanish version of the ABI is a 48-items self-report inventory using a 5-point Likert scale that assesses rational and irrational attitudes and beliefs. 24-items cover two dimensions of irrationality: a) areas of content (3 subscales), and b) styles of thinking (4 subscales).An Exploratory Factor Analysis (Parallel Analysis with Unweighted Least Squares method and Promin rotation) was performed with the FACTOR 9.20 software (Lorenzo-Seva and Ferrando, 2013).The results reproduced the main four styles of irrational thinking in relation with the three specific contents of irrational beliefs. However, two factors showed a complex configuration with important cross-loadings of different items in content and style. More analyses are needed to review the specific content and style of such items.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper presents some contemporary approaches to spatial environmental data analysis. The main topics are concentrated on the decision-oriented problems of environmental spatial data mining and modeling: valorization and representativity of data with the help of exploratory data analysis, spatial predictions, probabilistic and risk mapping, development and application of conditional stochastic simulation models. The innovative part of the paper presents integrated/hybrid model-machine learning (ML) residuals sequential simulations-MLRSS. The models are based on multilayer perceptron and support vector regression ML algorithms used for modeling long-range spatial trends and sequential simulations of the residuals. NIL algorithms deliver non-linear solution for the spatial non-stationary problems, which are difficult for geostatistical approach. Geostatistical tools (variography) are used to characterize performance of ML algorithms, by analyzing quality and quantity of the spatially structured information extracted from data with ML algorithms. Sequential simulations provide efficient assessment of uncertainty and spatial variability. Case study from the Chernobyl fallouts illustrates the performance of the proposed model. It is shown that probability mapping, provided by the combination of ML data driven and geostatistical model based approaches, can be efficiently used in decision-making process. (C) 2003 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

CONTEXT: Subclinical hypothyroidism has been associated with increased risk of coronary heart disease (CHD), particularly with thyrotropin levels of 10.0 mIU/L or greater. The measurement of thyroid antibodies helps predict the progression to overt hypothyroidism, but it is unclear whether thyroid autoimmunity independently affects CHD risk. OBJECTIVE: The objective of the study was to compare the CHD risk of subclinical hypothyroidism with and without thyroid peroxidase antibodies (TPOAbs). DATA SOURCES AND STUDY SELECTION: A MEDLINE and EMBASE search from 1950 to 2011 was conducted for prospective cohorts, reporting baseline thyroid function, antibodies, and CHD outcomes. DATA EXTRACTION: Individual data of 38 274 participants from six cohorts for CHD mortality followed up for 460 333 person-years and 33 394 participants from four cohorts for CHD events. DATA SYNTHESIS: Among 38 274 adults (median age 55 y, 63% women), 1691 (4.4%) had subclinical hypothyroidism, of whom 775 (45.8%) had positive TPOAbs. During follow-up, 1436 participants died of CHD and 3285 had CHD events. Compared with euthyroid individuals, age- and gender-adjusted risks of CHD mortality in subclinical hypothyroidism were similar among individuals with and without TPOAbs [hazard ratio (HR) 1.15, 95% confidence interval (CI) 0.87-1.53 vs HR 1.26, CI 1.01-1.58, P for interaction = .62], as were risks of CHD events (HR 1.16, CI 0.87-1.56 vs HR 1.26, CI 1.02-1.56, P for interaction = .65). Risks of CHD mortality and events increased with higher thyrotropin, but within each stratum, risks did not differ by TPOAb status. CONCLUSIONS: CHD risk associated with subclinical hypothyroidism did not differ by TPOAb status, suggesting that biomarkers of thyroid autoimmunity do not add independent prognostic information for CHD outcomes.