997 resultados para imputation hedonic method


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Imputation is commonly used to compensate for item non-response in sample surveys. If we treat the imputed values as if they are true values, and then compute the variance estimates by using standard methods, such as the jackknife, we can seriously underestimate the true variances. We propose a modified jackknife variance estimator which is defined for any without-replacement unequal probability sampling design in the presence of imputation and non-negligible sampling fraction. Mean, ratio and random-imputation methods will be considered. The practical advantage of the method proposed is its breadth of applicability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Taxonomic free sorting (TFS) is a fast, reliable and new technique in sensory science. The method extends the typical free sorting task where stimuli are grouped according to similarities, by asking respondents to combine their groups two at a time to produce a hierarchy. Previously, TFS has been used for the visual assessment of packaging whereas this study extends the range of potential uses of the technique to incorporate full sensory analysis by the target consumer, which, when combined with hedonic liking scores, was used to generate a novel preference map. Furthermore, to fully evaluate the efficacy of using the sorting method, the technique was evaluated with a healthy older adult consumer group. Participants sorted eight products into groups and described their reason at each stage as they combined those groups, producing a consumer-specific vocabulary. This vocabulary was combined with hedonic data from a separate group of older adults, to give the external preference map. Taxonomic sorting is a simple, fast and effective method for use with older adults, and its combination with liking data can yield a preference map constructed entirely from target consumer data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the context of collaborative filtering, the well known data sparsity issue makes two like-minded users have little similarity, and consequently renders the k nearest neighbour rule inapplicable. In this paper, we address the data sparsity problem in the neighbourhood-based CF methods by proposing an Adaptive-Maximum imputation method (AdaM). The basic idea is to identify an imputation area that can maximize the imputation benefit for recommendation purposes, while minimizing the imputation error brought in. To achieve the maximum imputation benefit, the imputation area is determined from both the user and the item perspectives; to minimize the imputation error, there is at least one real rating preserved for each item in the identified imputation area. A theoretical analysis is provided to prove that the proposed imputation method outperforms the conventional neighbourhood-based CF methods through more accurate neighbour identification. Experiment results on benchmark datasets show that the proposed method significantly outperforms the other related state-of-the-art imputation-based methods in terms of accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Electronic Medical Records (EMR) are increasingly used for risk prediction. EMR analysis is complicated by missing entries. There are two reasons - the “primary reason for admission” is included in EMR, but the co-morbidities (other chronic diseases) are left uncoded, and, many zero values in the data are accurate, reflecting that a patient has not accessed medical facilities. A key challenge is to deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflecting the fact that patients have some, but not all diseases. We propose a novel model to fill-in these missing values, and use the new representation for prediction of key hospital events. To “fill-in” missing values, we represent the feature-patient matrix as a product of two low rank factors, preserving the sparsity property in the product. Intuitively, the product regularization allows sparse imputation of patient conditions reflecting common comorbidities across patients. We develop a scalable optimization algorithm based on Block coordinate descent method to find an optimal solution. We evaluate the proposed framework on two real world EMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions). Our result shows that the AUC for 3 months admission prediction is improved significantly from (0.741 to 0.786) for Cancer data and (0.678 to 0.724) for AMI data. We also extend the proposed method to a supervised model for predicting of multiple related risk outcomes (e.g. emergency presentations and admissions in hospital over 3, 6 and 12 months period) in an integrated framework. For this model, the AUC averaged over outcomes is improved significantly from (0.768 to 0.806) for Cancer data and (0.685 to 0.748) for AMI data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To assist cattle producers transition from microsatellite (MS) to single nucleotide polymorphism (SNP) genotyping for parental verification we previously devised an effective and inexpensive method to impute MS alleles from SNP haplotypes. While the reported method was verified with only a limited data set (N = 479) from Brown Swiss, Guernsey, Holstein, and Jersey cattle, some of the MS-SNP haplotype associations were concordant across these phylogenetically diverse breeds. This implied that some haplotypes predate modern breed formation and remain in strong linkage disequilibrium. To expand the utility of MS allele imputation across breeds, MS and SNP data from more than 8000 animals representing 39 breeds (Bos taurus and B. indicus) were used to predict 9410 SNP haplotypes, incorporating an average of 73 SNPs per haplotype, for which alleles from 12 MS markers could be accurately be imputed. Approximately 25% of the MS-SNP haplotypes were present in multiple breeds (N = 2 to 36 breeds). These shared haplotypes allowed for MS imputation in breeds that were not represented in the reference population with only a small increase in Mendelian inheritance inconsistancies. Our reported reference haplotypes can be used for any cattle breed and the reported methods can be applied to any species to aid the transition from MS to SNP genetic markers. While ~91% of the animals with imputed alleles for 12 MS markers had ≤1 Mendelian inheritance conflicts with their parents' reported MS genotypes, this figure was 96% for our reference animals, indicating potential errors in the reported MS genotypes. The workflow we suggest autocorrects for genotyping errors and rare haplotypes, by MS genotyping animals whose imputed MS alleles fail parentage verification, and then incorporating those animals into the reference dataset.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this paper is to analyze the determining factors for the pricing of handsets sold with service plans, using the hedonic price method. This was undertaken by building a database comprising 48 handset models, under nine different service plans, over a period of 53 weeks in 2008, and resulted in 27 different attributes and a total number of nearly 300,000 data registers. The results suggest that the value of monthly subscriptions and calling minutes are important to explain the prices of handsets. Furthermore, both the physical volume and number of megapixels of a camera had an effect on the prices. The bigger the handset, the cheaper it becomes, and the more megapixels a camera phone has, the more expensive it becomes. Additionally, it was found that in 2008 Brazilian phone companies were subsidizing enabled data connection handsets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this paper is to analyze the determining factors for the pricing of handsets sold with service plans, using the hedonic price method. This was undertaken by building a database comprising 48 handset models, under nine different service plans, over a period of 53 weeks in 2008, and resulted in 27 different attributes and a total number of nearly 300,000 data registers. The results suggest that the value of monthly subscriptions and calling minutes are important to explain the prices of handsets. Furthermore, both the physical volume and number of megapixels of a camera had an effect on the prices. The bigger the handset, the cheaper it becomes, and the more megapixels a camera phone has, the more expensive it becomes. Additionally, it was found that in 2008 Brazilian phone companies were subsidizing enabled data connection handsets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a new method for fitting proportional hazards models with error-prone covariates. Regression coefficients are estimated by solving an estimating equation that is the average of the partial likelihood scores based on imputed true covariates. For the purpose of imputation, a linear spline model is assumed on the baseline hazard. We discuss consistency and asymptotic normality of the resulting estimators, and propose a stochastic approximation scheme to obtain the estimates. The algorithm is easy to implement, and reduces to the ordinary Cox partial likelihood approach when the measurement error has a degenerative distribution. Simulations indicate high efficiency and robustness. We consider the special case where error-prone replicates are available on the unobserved true covariates. As expected, increasing the number of replicate for the unobserved covariates increases efficiency and reduces bias. We illustrate the practical utility of the proposed method with an Eastern Cooperative Oncology Group clinical trial where a genetic marker, c-myc expression level, is subject to measurement error.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fuzzy min–max neural network classifier is a supervised learning method. This classifier takes the hybrid neural networks and fuzzy systems approach. All input variables in the network are required to correspond to continuously valued variables, and this can be a significant constraint in many real-world situations where there are not only quantitative but also categorical data. The usual way of dealing with this type of variables is to replace the categorical by numerical values and treat them as if they were continuously valued. But this method, implicitly defines a possibly unsuitable metric for the categories. A number of different procedures have been proposed to tackle the problem. In this article, we present a new method. The procedure extends the fuzzy min–max neural network input to categorical variables by introducing new fuzzy sets, a new operation, and a new architecture. This provides for greater flexibility and wider application. The proposed method is then applied to missing data imputation in voting intention polls. The micro data—the set of the respondents’ individual answers to the questions—of this type of poll are especially suited for evaluating the method since they include a large number of numerical and categorical attributes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson’s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In large epidemiological studies missing data can be a problem, especially if information is sought on a sensitive topic or when a composite measure is calculated from several variables each affected by missing values. Multiple imputation is the method of choice for 'filling in' missing data based on associations among variables. Using an example about body mass index from the Australian Longitudinal Study on Women's Health, we identify a subset of variables that are particularly useful for imputing values for the target variables. Then we illustrate two uses of multiple imputation. The first is to examine and correct for bias when data are not missing completely at random. The second is to impute missing values for an important covariate; in this case omission from the imputation process of variables to be used in the analysis may introduce bias. We conclude with several recommendations for handling issues of missing data. Copyright (C) 2004 John Wiley Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives: To estimate differences in self-rated health by mode of administration and to assess the value of multiple imputation to make self-rated health comparable for telephone and mail. Methods: In 1996, Survey 1 of the Australian Longitudinal Study on Women's Health was answered by mail. In 1998, 706 and 11,595 mid-age women answered Survey 2 by telephone and mail respectively. Self-rated health was measured by the physical and mental health scores of the SF-36. Mean change in SF-36 scores between Surveys 1 and 2 were compared for telephone and mail respondents to Survey 2, before and after adjustment for socio-demographic and health characteristics. Missing values and SF-36 scores for telephone respondents at Survey 2 were imputed from SF-36 mail responses and telephone and mail responses to socio-demographic and health questions. Results: At Survey 2, self-rated health improved for telephone respondents but not mail respondents. After adjustment, mean changes in physical health and mental health scores remained higher (0.4 and 1.6 respectively) for telephone respondents compared with mail respondents (-1.2 and 0.1 respectively). Multiple imputation yielded adjusted changes in SF-36 scores that were similar for telephone and mail respondents. Conclusions and Implications: The effect of mode of administration on the change in mental health is important given that a difference of two points in SF-36 scores is accepted as clinically meaningful. Health evaluators should be aware of and adjust for the effects of mode of administration on self-rated health. Multiple imputation is one method that may be used to adjust SF-36 scores for mode of administration bias.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The country-product-dummy (CPD) method, originally proposed in Summers (1973), has recently been revisited in its weighted formulation to handle a variety of data related situations (Rao and Timmer, 2000, 2003; Heravi et al., 2001; Rao, 2001; Aten and Menezes, 2002; Heston and Aten, 2002; Deaton et al., 2004). The CPD method is also increasingly being used in the context of hedonic modelling instead of its original purpose of filling holes in Summers (1973). However, the CPD method is seen, among practitioners, as a black box due to its regression formulation. The main objective of the paper is to establish equivalence of purchasing power parities and international prices derived from the application of the weighted-CPD method with those arising out of the Rao-system for multilateral comparisons. A major implication of this result is that the weighted-CPD method would then be a natural method of aggregation at all levels of aggregation within the context of international comparisons.