38 resultados para automated thematic analysis of textual data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A version of Matheron’s discrete Gaussian model is applied to cell composition data.The examples are for map patterns of felsic metavolcanics in two different areas. Q-Qplots of the model for cell values representing proportion of 10 km x 10 km cell areaunderlain by this rock type are approximately linear, and the line of best fit can be usedto estimate the parameters of the model. It is also shown that felsic metavolcanics in theAbitibi area of the Canadian Shield can be modeled as a fractal

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation in CODAWORK'03, session 4: Applications to archeometry

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several eco-toxicological studies have shown that insectivorous mammals, due to theirfeeding habits, easily accumulate high amounts of pollutants in relation to other mammal species. To assess the bio-accumulation levels of toxic metals and their in°uenceon essential metals, we quantified the concentration of 19 elements (Ca, K, Fe, B, P,S, Na, Al, Zn, Ba, Rb, Sr, Cu, Mn, Hg, Cd, Mo, Cr and Pb) in bones of 105 greaterwhite-toothed shrews (Crocidura russula) from a polluted (Ebro Delta) and a control(Medas Islands) area. Since chemical contents of a bio-indicator are mainly compositional data, conventional statistical analyses currently used in eco-toxicology can givemisleading results. Therefore, to improve the interpretation of the data obtained, weused statistical techniques for compositional data analysis to define groups of metalsand to evaluate the relationships between them, from an inter-population viewpoint.Hypothesis testing on the adequate balance-coordinates allow us to confirm intuitionbased hypothesis and some previous results. The main statistical goal was to test equalmeans of balance-coordinates for the two defined populations. After checking normality,one-way ANOVA or Mann-Whitney tests were carried out for the inter-group balances

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Morphological descriptors are practical and essential biomarkers for diagnosis andtreatment selection for intracranial aneurysm management according to the current guidelinesin use. Nevertheless, relatively little work has been dedicated to improve the three-dimensionalquanti cation of aneurysmal morphology, automate the analysis, and hence reduce the inherentintra- and inter-observer variability of manual analysis. In this paper we propose a methodologyfor the automated isolation and morphological quanti cation of saccular intracranial aneurysmsbased on a 3D representation of the vascular anatomy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In spite of its relative importance in the economy of many countriesand its growing interrelationships with other sectors, agriculture has traditionally been excluded from accounting standards. Nevertheless, to support its Common Agricultural Policy, for years the European Commission has been making an effort to obtain standardized information on the financial performance and condition of farms. Through the Farm Accountancy Data Network (FADN), every year data are gathered from a rotating sample of 60.000 professional farms across all member states. FADN data collection is not structured as an accounting cycle but as an extensive questionnaire. This questionnaire refers to assets, liabilities, revenues and expenses, and seems to try to obtain a "true and fair view" of the financial performance and condition of the farms it surveys. However, the definitions used in the questionnaire and the way data is aggregated often appear flawed from an accounting perspective. The objective of this paper is to contrast the accounting principles implicit in the FADN questionnaire with generally accepted accounting principles, particularly those found in the IVth Directive of the European Union, on the one hand, and those recently proposed by the International Accounting Standards Committee’s Steering Committeeon Agriculture in its Draft Statement of Principles, on the other hand. There are two reasons why this is useful. First, it allows to make suggestions how the information provided by FADN could be more in accordance with the accepted accounting framework, and become a more valuable tool for policy makers, farmers, and other stakeholders. Second, it helps assessing the suitability of FADN to become the starting point for a European accounting standard on agriculture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dual scaling of a subjects-by-objects table of dominance data (preferences,paired comparisons and successive categories data) has been contrasted with correspondence analysis, as if the two techniques were somehow different. In this note we show that dual scaling of dominance data is equivalent to the correspondence analysis of a table which is doubled with respect to subjects. We also show that the results of both methods can be recovered from a principal components analysis of the undoubled dominance table which is centred with respect to subject means.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Monitoring thunderstorms activity is an essential part of operational weather surveillance given their potential hazards, including lightning, hail, heavy rainfall, strong winds or even tornadoes. This study has two main objectives: firstly, the description of a methodology, based on radar and total lightning data to characterise thunderstorms in real-time; secondly, the application of this methodology to 66 thunderstorms that affected Catalonia (NE Spain) in the summer of 2006. An object-oriented tracking procedure is employed, where different observation data types generate four different types of objects (radar 1-km CAPPI reflectivity composites, radar reflectivity volumetric data, cloud-to-ground lightning data and intra-cloud lightning data). In the framework proposed, these objects are the building blocks of a higher level object, the thunderstorm. The methodology is demonstrated with a dataset of thunderstorms whose main characteristics, along the complete life cycle of the convective structures (development, maturity and dissipation), are described statistically. The development and dissipation stages present similar durations in most cases examined. On the contrary, the duration of the maturity phase is much more variable and related to the thunderstorm intensity, defined here in terms of lightning flash rate. Most of the activity of IC and CG flashes is registered in the maturity stage. In the development stage little CG flashes are observed (2% to 5%), while for the dissipation phase is possible to observe a few more CG flashes (10% to 15%). Additionally, a selection of thunderstorms is used to examine general life cycle patterns, obtained from the analysis of normalized (with respect to thunderstorm total duration and maximum value of variables considered) thunderstorm parameters. Among other findings, the study indicates that the normalized duration of the three stages of thunderstorm life cycle is similar in most thunderstorms, with the longest duration corresponding to the maturity stage (approximately 80% of the total time).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recently measured inclusive electron-proton cross section in the nucleon resonance region, performed with the CLAS detector at the Thomas Jefferson Laboratory, has provided new data for the nucleon structure function F2 with previously unavailable precision. In this paper we propose a description of these experimental data based on a Regge-dual model for F2. The basic inputs in the model are nonlinear complex Regge trajectories producing both isobar resonances and a smooth background. The model is tested against the experimental data, and the Q2 dependence of the moments is calculated. The fitted model for the structure function (inclusive cross section) is a limiting case of the more general scattering amplitude equally applicable to deeply virtual Compton scattering. The connection between the two is discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a mixture model based on the beta distribution, without preestablishedmeans and variances, to analyze a large set of Beauty-Contest data obtainedfrom diverse groups of experiments (Bosch-Domenech et al. 2002). This model gives a bettert of the experimental data, and more precision to the hypothesis that a large proportionof individuals follow a common pattern of reasoning, described as iterated best reply (degenerate),than mixture models based on the normal distribution. The analysis shows thatthe means of the distributions across the groups of experiments are pretty stable, while theproportions of choices at dierent levels of reasoning vary across groups.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A comment about the article “Local sensitivity analysis for compositional data with application to soil texture in hydrologic modelling” writen by L. Loosvelt and co-authors. The present comment is centered in three specific points. The first one is related to the fact that the authors avoid the use of ilr-coordinates. The second one refers to some generalization of sensitivity analysis when input parameters are compositional. The third tries to show that the role of the Dirichlet distribution in the sensitivity analysis is irrelevant

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser.