66 resultados para data sheets


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main objective of this paper aims at developing a methodology that takes into account the human factor extracted from the data base used by the recommender systems, and which allow to resolve the specific problems of prediction and recommendation. In this work, we propose to extract the user's human values scale from the data base of the users, to improve their suitability in open environments, such as the recommender systems. For this purpose, the methodology is applied with the data of the user after interacting with the system. The methodology is exemplified with a case study

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the eighties, John Aitchison (1986) developed a new methodological approach for the statistical analysis of compositional data. This new methodology was implemented in Basic routines grouped under the name CODA and later NEWCODA inMatlab (Aitchison, 1997). After that, several other authors have published extensions to this methodology: Marín-Fernández and others (2000), Barceló-Vidal and others (2001), Pawlowsky-Glahn and Egozcue (2001, 2002) and Egozcue and others (2003). (...)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The log-ratio methodology makes available powerful tools for analyzing compositionaldata. Nevertheless, the use of this methodology is only possible for those data setswithout null values. Consequently, in those data sets where the zeros are present, aprevious treatment becomes necessary. Last advances in the treatment of compositionalzeros have been centered especially in the zeros of structural nature and in the roundedzeros. These tools do not contemplate the particular case of count compositional datasets with null values. In this work we deal with \count zeros" and we introduce atreatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichletprobability distribution as a prior and we estimate the posterior probabilities. Then weapply a multiplicative modi¯cation for the non-zero values. We present a case studywhere this new methodology is applied.Key words: count data, multiplicative replacement, composition, log-ratio analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel densityestimation techniques in the context of compositional data analysis. Indeed, they gavetwo options for the choice of the kernel to be used in the kernel estimator. One ofthese kernels is based on the use the alr transformation on the simplex SD jointly withthe normal distribution on RD-1. However, these authors themselves recognized thatthis method has some deficiencies. A method for overcoming these dificulties based onrecent developments for compositional data analysis and multivariate kernel estimationtheory, combining the ilr transformation with the use of the normal density with a fullbandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu-Figueras (2006). Here we present an extensive simulation study that compares bothmethods in practice, thus exploring the finite-sample behaviour of both estimators