1000 resultados para Data Minig


Relevância:

20.00% 20.00%

Publicador:

Resumo:

O propósito da Web Semântica é conseguir uma Web de dados totalmente ligada, isto é, numa perspetiva Linked Open Data. A Web Semântica deve garantir (estabelecendo padrões tecnológicos, vocabulários, linguagens lógicas, etc.) que os conteúdos publicados na Websejam inteligíveis quer por agentes humanos, quer por agentes máquina. Esta dissertação tem como objetivo responder à um problema delimitado, propondo uma solução no quadro da Web Semântica e suas tecnologias. Partindo-se de uma lista de termos em linguagem natural utilizados no Website da ANACOM (Autoridade Nacional de Comunicações), propomos uma organização de acordo com metodologias de construção de ontologias e vocabulários. Inspirámo-nos em duas metodologias, o Ontology Development 101 e o Process and Methodology for Core Vocabularies. O vocabulário controlado resultante, tem como base tecnológica o modelo de organização de conhecimento, recomendado pelo W3C (World Wide Web Consortium), o SKOS (Simple Knowledge Organization System). Trata-sede uma tecnologia standard da W3C desde 2009, utilizada na criação de tesauros,esquemas de classificação, taxonomias, glossários e outros tipos de vocabulários controlados. Como resultado da nossa intervenção, conseguimos organizar e codificar em SKOS, cerca de cinco centenas de termos identificados no Website da ANACOM. Para além da proposta do vocabulário controlado, passámos em revista às tecnologias e teorias que sustentam a temática da Web Semântica.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents and estimates a dynamic choice model in the attribute space considering rational consumers. In light of the evidence of several state-dependence patterns, the standard attribute-based model is extended by considering a general utility function where pure inertia and pure variety-seeking behaviors can be explained in the model as particular linear cases. The dynamics of the model are fully characterized by standard dynamic programming techniques. The model presents a stationary consumption pattern that can be inertial, where the consumer only buys one product, or a variety-seeking one, where the consumer shifts among varied products.We run some simulations to analyze the consumption paths out of the steady state. Underthe hybrid utility assumption, the consumer behaves inertially among the unfamiliar brandsfor several periods, eventually switching to a variety-seeking behavior when the stationary levels are approached. An empirical analysis is run using scanner databases for three different product categories: fabric softener, saltine cracker, and catsup. Non-linear specifications provide the best fit of the data, as hybrid functional forms are found in all the product categories for most attributes and segments. These results reveal the statistical superiority of the non-linear structure and confirm the gradual trend to seek variety as the level of familiarity with the purchased items increases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Correspondence analysis has found extensive use in ecology, archeology, linguisticsand the social sciences as a method for visualizing the patterns of association in a table offrequencies or nonnegative ratio-scale data. Inherent to the method is the expression of the datain each row or each column relative to their respective totals, and it is these sets of relativevalues (called profiles) that are visualized. This relativization of the data makes perfect sensewhen the margins of the table represent samples from sub-populations of inherently differentsizes. But in some ecological applications sampling is performed on equal areas or equalvolumes so that the absolute levels of the observed occurrences may be of relevance, in whichcase relativization may not be required. In this paper we define the correspondence analysis ofthe raw unrelativized data and discuss its properties, comparing this new method to regularcorrespondence analysis and to a related variant of non-symmetric correspondence analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper aims to estimate a translog stochastic frontier production function in the analysis of a panel of 150 mixed Catalan farms in the period 1989-1993, in order to attempt to measure and explain variation in technical inefficiency scores with a one-stage approach. The model uses gross value added as the output aggregate measure. Total employment, fixed capital, current assets, specific costs and overhead costs are introduced into the model as inputs. Stochasticfrontier estimates are compared with those obtained using a linear programming method using a two-stage approach. The specification of the translog stochastic frontier model appears as an appropriate representation of the data, technical change was rejected and the technical inefficiency effects were statistically significant. The mean technical efficiency in the period analyzed was estimated to be 64.0%. Farm inefficiency levels were found significantly at 5%level and positively correlated with the number of economic size units.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a method for the measurement of changes in health inequality and income-related health inequality over time in a population.For pure health inequality (as measured by the Gini coefficient) andincome-related health inequality (as measured by the concentration index),we show how measures derived from longitudinal data can be related tocross section Gini and concentration indices that have been typicallyreported in the literature to date, along with measures of health mobilityinspired by the literature on income mobility. We also show how thesemeasures of mobility can be usefully decomposed into the contributions ofdifferent covariates. We apply these methods to investigate the degree ofincome-related mobility in the GHQ measure of psychological well-being inthe first nine waves of the British Household Panel Survey (BHPS). Thisreveals that dynamics increase the absolute value of the concentrationindex of GHQ on income by 10%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

1. Aim - Concerns over how global change will influence species distributions, in conjunction with increased emphasis on understanding niche dynamics in evolutionary and community contexts, highlight the growing need for robust methods to quantify niche differences between or within taxa. We propose a statistical framework to describe and compare environmental niches from occurrence and spatial environmental data.¦2. Location - Europe, North America, South America¦3. Methods - The framework applies kernel smoothers to densities of species occurrence in gridded environmental space to calculate metrics of niche overlap and test hypotheses regarding niche conservatism. We use this framework and simulated species with predefined distributions and amounts of niche overlap to evaluate several ordination and species distribution modeling techniques for quantifying niche overlap. We illustrate the approach with data on two well-studied invasive species.¦4. Results - We show that niche overlap can be accurately detected with the framework when variables driving the distributions are known. The method is robust to known and previously undocumented biases related to the dependence of species occurrences on the frequency of environmental conditions that occur across geographic space. The use of a kernel smoother makes the process of moving from geographical space to multivariate environmental space independent of both sampling effort and arbitrary choice of resolution in environmental space. However, the use of ordination and species distribution model techniques for selecting, combining and weighting variables on which niche overlap is calculated provide contrasting results.¦5. Main conclusions - The framework meets the increasing need for robust methods to quantify niche differences. It is appropriate to study niche differences between species, subspecies or intraspecific lineages that differ in their geographical distributions. Alternatively, it can be used to measure the degree to which the environmental niche of a species or intraspecific lineage has changed over time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Time-lapse geophysical monitoring and inversion are valuable tools in hydrogeology for monitoring changes in the subsurface due to natural and forced (tracer) dynamics. However, the resulting models may suffer from insufficient resolution, which leads to underestimated variability and poor mass recovery. Structural joint inversion using cross-gradient constraints can provide higher-resolution models compared with individual inversions and we present the first application to time-lapse data. The results from a synthetic and field vadose zone water tracer injection experiment show that joint 3-D time-lapse inversion of crosshole electrical resistance tomography (ERT) and ground penetrating radar (GPR) traveltime data significantly improve the imaged characteristics of the point injected plume, such as lateral spreading and center of mass, as well as the overall consistency between models. The joint inversion method appears to work well for cases when one hydrological state variable (in this case moisture content) controls the time-lapse response of both geophysical methods. Citation: Doetsch, J., N. Linde, and A. Binley (2010), Structural joint inversion of time-lapse crosshole ERT and GPR traveltime data, Geophys. Res. Lett., 37, L24404, doi: 10.1029/2010GL045482.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.