195 resultados para Missing values

em Deakin Research Online - Australia


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Missing data imputation is a key issue in learning from incomplete data. Various techniques have been developed with great successes on dealing with missing values in data sets with homogeneous attributes (their independent attributes are all either continuous or discrete). This paper studies a new setting of missing data imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their independent attributes are of different types), referred to as imputing mixed-attribute data sets. Although many real applications are in this setting, there is no estimator designed for imputing mixed-attribute data sets. This paper first proposes two consistent estimators for discrete and continuous missing target values, respectively. And then, a mixture-kernel-based iterative estimator is advocated to impute mixed-attribute data sets. The proposed method is evaluated with extensive experiments compared with some typical algorithms, and the result demonstrates that the proposed approach is better than these existing imputation methods in terms of classification accuracy and root mean square error (RMSE) at different missing ratios.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Different data classification algorithms have been developed and applied in various areas to analyze and extract valuable information and patterns from large datasets with noise and missing values. However, none of them could consistently perform well over all datasets. To this end, ensemble methods have been suggested as the promising measures. This paper proposes a novel hybrid algorithm, which is the combination of a multi-objective Genetic Algorithm (GA) and an ensemble classifier. While the ensemble classifier, which consists of a decision tree classifier, an Artificial Neural Network (ANN) classifier, and a Support Vector Machine (SVM) classifier, is used as the classification committee, the multi-objective Genetic Algorithm is employed as the feature selector to facilitate the ensemble classifier to improve the overall sample classification accuracy while also identifying the most important features in the dataset of interest. The proposed GA-Ensemble method is tested on three benchmark datasets, and compared with each individual classifier as well as the methods based on mutual information theory, bagging and boosting. The results suggest that this GA-Ensemble method outperform other algorithms in comparison, and be a useful method for classification and feature selection problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background
Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset.

Results
One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system.

Conclusion
The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective To develop and evaluate the effectiveness of a community behavioural intervention to prevent weight gain and improve health related behaviours in women with young children.
Design Cluster randomised controlled trial.
Setting A community setting in urban Australia. 
Participants 250 adult women with a mean age of 40. 39 years (SD 4.77, range 25-51) and a mean body mass index of 27.82 kg/m2 (SD 5.42, range 18-47) were recruited as clusters through 12 primary (elementary) schools. Intervention Schools were randomly assigned to the intervention or the control. Mothers whose schools fell in the intervention group (n=127) attended four interactive group sessions that involved simple health messages, behaviour change strategies, and group discussion, and received monthly support using mobile telephone text messages for 12 months. The control group (n=123)
attended one non-interactive information session based on population dietary and physical activity guidelines. 
Main outcome measures The main outcome measures were weight change and difference in weight change between the intervention group and the control group at 12 months. Secondary outcomes were changes in serum concentrations of fasting lipids and glucose, and changes in dietary behaviours, physical activity, and self management behaviours.
Results All analyses were adjusted for baseline values and the possible clustering effect. Women in the control group gained weight over the 12 month study period (0.83 kg, 95% confidence interval (CI) 0.12 to 1.54), whereas those in the intervention group lost weight (−0.20 kg, −0.90 to 0.49). The difference in weight change between the intervention group and the control group at 12 months was −1.13 kg (−2.03 to −0.24 kg; P<0.05) on the basis of observed values and −1.11 kg (−2.17 to −0.04) after multiple imputation to account for possible bias created by missing values. Secondary analyses after multiple imputation showed a difference in the intervention group compared with the control group for total cholesterol concentration (−0.35 mmol/l, −0.70 to −0.001), self management behaviours (diet score 0.18, 0.13 to 0.33; physical activity score 0.24, 0.05 to 0.43), and confidence to control weight (0.40, 0.11 to 0.69). Regular self weighing was associated with weight loss in the intervention group only (−1.98 kg, −3.75 to −0.23).
Conclusions Weight gain in women with young children could be prevented using a low intensity self management intervention delivered in a community setting. Self management of health behaviours improved with the intervention. The response rate of 12%, although comparable with that in other community studies, might limit the ability to generalise to other populations.    
Trial registration Australian New Zealand Clinical Trials Registry number ACTRN12608000110381.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As each user tends to rate a small proportion of available items, the resulted Data Sparsity issue brings significant challenges to the research of recommender systems. This issue becomes even more severe for neighborhood-based collaborative filtering methods, as there are even lower numbers of ratings available in the neighborhood of the query item. In this paper, we aim to address the Data Sparsity issue in the context of the neighborhood-based collaborative filtering. Given the (user, item) query, a set of key ratings are identified, and an auto-adaptive imputation method is proposed to fill the missing values in the set of key ratings. The proposed method can be used with any similarity metrics, such as the Pearson Correlation Coefficient and Cosine-based similarity, and it is theoretically guaranteed to outperform the neighborhood-based collaborative filtering approaches. Results from experiments prove that the proposed method could significantly improve the accuracy of recommendations for neighborhood-based Collaborative Filtering algorithms. © 2012 ACM.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: This study describes and compares health-related quality of life (HRQOL) of prostate cancer patients who received either radical prostatectomy (nerve-sparing, nsRP, or non-nerve-sparing, nnsRP) or radiotherapy (external RT, brachytherapy, or both combined) for treatment of localised prostate cancer. Methods: The prospective, multicenter cohort study included 529 patients. Questionnaires included the IIEF, QLQ-C30, and PORPUS-P. Data were collected before (baseline), three, six, twelve, and twenty-four months after treatment. Differences between groups' baseline characteristics were assessed; changes over time were analysed with generalised estimating equations (GEE). Missing values were treated with multiple imputation. Further, scores at baseline and end of follow-up were compared to German reference data. Results: The typical time trend was a decrease of average HRQOL three months after treatment followed by (partial) recovery. RP patients experienced considerable impairment in sexual functioning. The covariate-adjusted GEE identified a significant - but not clinically relevant - treatment effect for diarrhoea (b∈=∈7.0 for RT, p∈=∈0.006) and PORPUS-P (b∈=∈2.3 for nsRP, b∈=∈2.2 for RT, p∈=∈0.045) compared to the reference nnsRP. Most of the HRQOL scores were comparable to German norm values. Conclusions: Findings from previous research were reproduced in a specific setting of a patient cohort in the German health care system. According to the principle of evidence-based medicine, this strengthens the messages regarding treatment in prostate cancer and its impacts on patients' health-related quality of life. After adjustment for baseline HRQOL and other covariates, RT patients reported increased symptoms of diarrhoea, and nnsRP patients decreased prostate-specific HRQOL. RP patients experienced considerable impairment in sexual functioning. These differences should be taken into account by physicians when choosing the best therapy for a patient.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The bulk of existing work on the statistical forecasting of air quality is based on either neural networks or linear regressions, which are both subject to important drawbacks. In particular, while neural networks are complicated and prone to in-sample overfitting, linear regressions are highly dependent on the specification of the regression function. The present paper shows how combining linear regression forecasts can be used to circumvent all of these problems. The usefulness of the proposed combination approach is verified using both Monte Carlo simulation and an extensive application to air quality in Bogota, one of the largest and most polluted cities in Latin America. © 2014 Elsevier Ltd.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Electronic Medical Records (EMR) are increasingly used for risk prediction. EMR analysis is complicated by missing entries. There are two reasons - the “primary reason for admission” is included in EMR, but the co-morbidities (other chronic diseases) are left uncoded, and, many zero values in the data are accurate, reflecting that a patient has not accessed medical facilities. A key challenge is to deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflecting the fact that patients have some, but not all diseases. We propose a novel model to fill-in these missing values, and use the new representation for prediction of key hospital events. To “fill-in” missing values, we represent the feature-patient matrix as a product of two low rank factors, preserving the sparsity property in the product. Intuitively, the product regularization allows sparse imputation of patient conditions reflecting common comorbidities across patients. We develop a scalable optimization algorithm based on Block coordinate descent method to find an optimal solution. We evaluate the proposed framework on two real world EMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions). Our result shows that the AUC for 3 months admission prediction is improved significantly from (0.741 to 0.786) for Cancer data and (0.678 to 0.724) for AMI data. We also extend the proposed method to a supervised model for predicting of multiple related risk outcomes (e.g. emergency presentations and admissions in hospital over 3, 6 and 12 months period) in an integrated framework. For this model, the AUC averaged over outcomes is improved significantly from (0.768 to 0.806) for Cancer data and (0.685 to 0.748) for AMI data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we tackle the incompleteness of user rating history in the context of collaborative filtering for Top-N recommendations. Previous research ignore a fact that two rating patterns exist in the user × item rating matrix and influence each other. More importantly, their interactive influence characterizes the development of each other, which can consequently be exploited to improve the modelling of rating patterns, especially when the user × item rating matrix is highly incomplete due to the well-known data sparsity issue. This paper proposes a Rating Pattern Subspace to iteratively re-optimize the missing values in each user’s rating history by modelling both the global and the personal rating patterns simultaneously. The basic idea is to project the user × item rating matrix on a low-rank subspace to capture the global rating patterns. Then, the projection of each individual user on the subspace is further optimized according to his/her own rating history and the captured global rating patterns. Finally, the optimized user projections are used to improve the modelling of the global rating patterns. Based on this subspace, we propose a RapSVD-L algorithm for Top-N recommendations. In the experiments, the performance of the proposed method is compared with the state-of-the-art Top-N recommendation methods on two real datasets under various data sparsity levels. The experimental results show that RapSVD-L outperforms the compared algorithms not only on the all items recommendations but also on the long tail item recommendations in terms of accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Snapper (Pagrus auratus) is widely distributed throughout subtropical and temperate southern oceans and forms a significant recreational and commercial fishery in Queensland, Australia. Using data from government reports, media sources, popular publications and a government fisheries survey carried out in 1910, we compiled information on individual snapper fishing trips that took place prior to the commencement of fisherywide organized data collection, from 1871 to 1939. In addition to extracting all available quantitative data, we translated qualitative information into bounded estimates and used multiple imputation to handle missing values, forming 287 records for which catch rate (snapper fisher -1 h -1) could be derived. Uncertainty was handled through a parametric maximum likelihood framework (a transformed trivariate Gaussian), which facilitated statistical comparisons between data sources. No statistically significant differences in catch rates were found among media sources and the government fisheries survey. Catch rates remained stable throughout the time series, averaging 3.75 snapper fisher -1 h -1 (95% confidence interval, 3.42–4.09) as the fishery expanded into new grounds. In comparison, a contemporary (1993–2002) south-east Queensland charter fishery produced an average catch rate of 0.4 snapper fisher -1 h -1 (95% confidence interval, 0.31–0.58). These data illustrate the productivity of a fishery during its earliest years of development and represent the earliest catch rate data globally for this species. By adopting a formalized approach to address issues common to many historical records – missing data, a lack of quantitative information and reporting bias – our analysis demonstrates the potential for historical narratives to contribute to contemporary fisheries management.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Australia's country, non-daily newspapers present journalism graduates with excellent opportunities to get a foot in the door, experience a wide range of journalistic responsibilities and compile an impressive portfolio. However, tertiary journalism courses largely ignore the unique news values, issues and challenges involved with country non-daily reporting. Considering a large percentage of future journalists are likely to enter the industry on a country non-daily, journalism education's current attitude has serious implications for the profession. However, this situation cannot be rectified until these specific news values, issues and challenges have been documented in order for them to be integrated into pedagogical models. This article documents the country non-daily's news values, issues and challenges, and indicates their importance to journalism training and education.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In February 2000, the new Victorian Labor Government announced that they were removing the western shard from Lab Architecture Studio's winning Melbourne Federation Square design. then under construction. The specific 'contested terrain' at the intersection of Flinders Street and Swanston Street Walk. Melbourne, allows the exploration of the politics of place construction over time, through an examination of Young and Jackson's Hotel [1861]. St Paul's Cathedral [1886). Flinders Street Station [1912]. the Westin Hotel [1999) and Federation Square. This site brings together architectural and social history, questions public space and identity, and looks at Melburnian's perceptions, attitudes and values. It further demonstrates that the fragmentation of the professions, and fragmentary histories. lead to the preservation of 'bits' of architecture and the destruction of the urban/landscape context, jeopardizing the Identity of place.