Biblioteca Digital

33 resultados para Data Coding.

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain

Measures of fit in multiple correspondence analysis of crisp and fuzzy coded data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.

Improving a leaves automatic recognition process using PCA

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work we present a simulation of a recognition process with perimeter characterization of a simple plant leaves as a unique discriminating parameter. Data coding allowing for independence of leaves size and orientation may penalize performance recognition for some varieties. Border description sequences are then used, and Principal Component Analysis (PCA) is applied in order to study which is the best number of components for the classification task, implemented by means of a Support Vector Machine (SVM) System. Obtained results are satisfactory, and compared with [4] our system improves the recognition success, diminishing the variance at the same time.

Automatic Recognition of Leaves by Shape Detection Pre-Processing with Ica

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work we present a simulation of a recognition process with perimeter characterization of a simple plant leaves as a unique discriminating parameter. Data coding allowing for independence of leaves size and orientation may penalize performance recognition for some varieties. Border description sequences are then used to characterize the leaves. Independent Component Analysis (ICA) is then applied in order to study which is the best number of components to be considered for the classification task, implemented by means of an Artificial Neural Network (ANN). Obtained results with ICA as a pre-processing tool are satisfactory, and compared with some references our system improves the recognition success up to 80.8% depending on the number of considered independent components.

Estimation of protein coding density in a corpus of DNA sequence data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.

Fuzzy coding in constrained ordinations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Canonical correspondence analysis and redundancy analysis are two methods of constrained ordination regularly used in the analysis of ecological data when several response variables (for example, species abundances) are related linearly to several explanatory variables (for example, environmental variables, spatial positions of samples). In this report I demonstrate the advantages of the fuzzy coding of explanatory variables: first, nonlinear relationships can be diagnosed; second, more variance in the responses can be explained; and third, in the presence of categorical explanatory variables (for example, years, regions) the interpretation of the resulting triplot ordination is unified because all explanatory variables are measured at a categorical level.

Biplots of fuzzy coded data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.

Lossless Image Data Embedding in Plain Areas

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This letter presents a lossless data hiding scheme for digital images which uses an edge detector to locate plain areas for embedding. The proposed method takes advantage of the well-known gradient adjacent prediction utilized in image coding. In the suggested scheme, prediction errors and edge values are first computed and then, excluding the edge pixels, prediction error values are slightly modified through shifting the prediction errors to embed data. The aim of proposed scheme is to decrease the amount of modified pixels to improve transparency by keeping edge pixel values of the image. The experimental results have demonstrated that the proposed method is capable of hiding more secret data than the known techniques at the same PSNR, thus proving that using edge detector to locate plain areas for lossless data embedding can enhance the performance in terms of data embedding rate versus the PSNR of marked images with respect to original image.

Population-based incidence of myeloid malignancies: fifteen years of epidemiological data in the province of Girona, Spain

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Myeloid malignancies (MMs) are a heterogeneous group of hematologic malignancies presenting different incidence, prognosis and survival.1–3 Changing classifications (FAB 1994, WHO 2001 and WHO 2008) and few available epidemiological data complicate incidence comparisons.4,5 Taking this into account, the aims of the present study were: a) to calculate the incidence rates and trends of MMs in the Province of Girona, northeastern Spain, between 1994 and 2008 according to the WHO 2001 classification; and b) to predict the number of MMs cases in Spain during 2013. Data were extracted from the population-based Girona Cancer Registry (GCR) located in the north-east of Catalonia, Spain, and covering a population of 731,864 inhabitants (2008 census). Cases were registered according to the rules of the European Network for Cancer Registries and the Manual for Coding and Reporting Haematological Malignancies (HAEMACARE project). To ensure the complete coverage of MMs in the GCR, and especially myeloproliferative neoplasms (MPN) and myelodysplastic syndromes (MDS), a retrospective search was performed. The ICD-O-2 (1990) codes were converted into their corresponding ICD-O-3 (2000) codes, including MDS, polycythemia vera (PV) and essential thrombocythemia (ET) as malignant diseases. Results of crude rate (CR) and European standardized incidence rate (ASRE) were expressed per 100,000 inhabitants/year

Likelihood inferences with interval-censored data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

En l’anàlisi de la supervivència el problema de les dades censurades en un interval es tracta, usualment,via l’estimació per màxima versemblança. Amb l’objectiu d’utilitzar una expressió simplificada de la funció de versemblança, els mètodes estàndards suposen que les condicions que produeixen la censura no afecten el temps de fallada. En aquest article formalitzem les condicions que asseguren la validesa d’aquesta versemblança simplificada. Així, precisem diferents condicions de censura no informativa i definim una condició de suma constant anàloga a la derivada en el context de censura per la dreta. També demostrem que les inferències obtingudes amb la versemblançaa simplificada són correctes quan aquestes condicions són certes. Finalment, tractem la identificabilitat de la funció distribució del temps de fallada a partir de la informació observada i estudiem la possibilitat de contrastar el compliment de la condició de suma constant.

Eliciting and fostering learners' metacognitive knowledge about language learning in self-directed learning programs: a review of data collection methods and procedures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Són molts els estudis que avui en dia incideixen en la necessitat d’oferir un suport metodològic i psicològic als aprenents que treballen de manera autònoma. L’objectiu d’aquest suport és ajudar-los a desenvolupar les destreses que necessiten per dirigir el seu aprenentatge així com una actitud positiva i una major conscienciació envers aquest aprenentatge. En definitiva, aquests dos tipus de preparació es consideren essencials per ajudar els aprenents a esdevenir més autònoms i més eficients en el seu propi aprenentatge. Malgrat això, si bé és freqüent trobar estudis que exemplifiquen aplicacions del suport metodològic dins els seus programes, principalment en la formació d’estratègies o ajudant els aprenents a desenvolupar un pla de treball, aquest no és el cas quan es tracta de la seva preparació psicològica. Amb rares excepcions, trobem estudis que documentin com s’incideix en les actituds i en les creences dels aprenents, també coneguts com a coneixement metacognitiu (CM), en programes que fomenten l’autonomia en l’aprenentatge. Els objectius d’aquest treball son dos: a) oferir una revisió d’estudis que han utilitzat diferents mitjans per incidir en el CM dels aprenents i b) descriure les febleses i avantatges dels procediments i instruments que utilitzen, tal com han estat valorats en estudis de recerca, ja que ens permetrà establir criteris objectius sobre com i quan utilitzar-los en programes que fomentin l’aprenentatge autodirigit.

What do voters know about the economy? a study of Danish data, 1990-1993

Relevância:

20.00% 20.00%

Publicador:

The Real economy and the perceived economy in popularity functions: how much do voters need to know? a study of British data, 1974-1997

Relevância:

20.00% 20.00%

Publicador:

Modeling usage of medical care services: the medical expenditure panel survey data, 1996-2000

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We explore the determinants of usage of six different types of health care services, using the Medical Expenditure Panel Survey data, years 1996-2000. We apply a number of models for univariate count data, including semiparametric, semi-nonparametric and finite mixture models. We find that the complexity of the model that is required to fit the data well depends upon the way in which the data is pooled across sexes and over time, and upon the characteristics of the usage measure. Pooling across time and sexes is almost always favored, but when more heterogeneous data is pooled it is often the case that a more complex statistical model is required.

Are one factor logarithmic volatility models useful to fit the features of financial data? An application to microsoft data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper provides empirical evidence that continuous time models with one factor of volatility, in some conditions, are able to fit the main characteristics of financial data. It also reports the importance of the feedback factor in capturing the strong volatility clustering of data, caused by a possible change in the pattern of volatility in the last part of the sample. We use the Efficient Method of Moments (EMM) by Gallant and Tauchen (1996) to estimate logarithmic models with one and two stochastic volatility factors (with and without feedback) and to select among them.

Human capital in growth regressions: how much difference does data quality make? An update and further results

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We construct estimates of educational attainment for a sample of OECD countries using previously unexploited sources. We follow a heuristic approach to obtain plausible time profiles for attainment levels by removing sharp breaks in the data that seem to reflect changes in classification criteria. We then construct indicators of the information content of our series and a number of previously available data sets and examine their performance in several growth specifications. We find a clear positive correlation between data quality and the size and significance of human capital coefficients in growth regressions. Using an extension of the classical errors in variables model, we construct a set of meta-estimates of the coefficient of years of schooling in an aggregate Cobb-Douglas production function. Our results suggest that, after correcting for measurement error bias, the value of this parameter is well above 0.50.

«
1
2
3
»