3 resultados para imputation
em Aston University Research Archive
Resumo:
Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.
Herbal medicines:physician's recommendation and clinical evaluation of St.John's Wort for depression
Resumo:
Why some physicians recommend herbal medicines while others do not is not well understood. We undertook a survey designed to identify factors, which predict recommendation of herbal medicines by physicians in Malaysia. About a third (206 out of 626) of the physicians working at the University of Malaya Medical Centre ' were interviewed face-to-face, using a structured questionnaire. Physicians were asked about their personal use of, recommendation of, perceived interest in and, usefulness and safety of herbal medicines. Using logistic regression modelling we identified personal use, general interest, interest in receiving training, race and higher level of medical training as significant predictors of recommendation. St. John's wort is one of the most widely used herbal remedies. It is also probably the most widely evaluated herbal remedy with no fewer than 57 randomised controlled trials. Evidence from the depression trials suggests that St. John's wort is more effective than placebo while its comparative efficacy to conventional antidepressants is not well established. We updated previous meta-analyses of St. John's wort, described the characteristics of the included trials, applied methods of data imputation and transformation for incomplete trial data and examined sources of heterogeneity in the design and results of those trials. Thirty randomised controlled trials, which were heterogeneous in design, were identified. Our meta-analysis showed that St. John's wort was significantly more effective than placebo [pooled RR 1.90 (1.54-2.35)] and [Pooled WMD 4.09 (2.33 to 5.84)]. However, the remedy was similar to conventional antidepressant in its efficacy [Pooled RR I. 0 I (0.93 -1.10)] and [Pooled WMD 0.18 (- 0.66 to 1.02). Subgroup analyses of the placebo-controlled trials suggested that use of different diagnostic classifications at the inclusion stage led to different estimates of effect. Similarly a significant difference in the estimates of efficacy was observed when trials were categorised according to length of follow-up. Confounding between the variables, diagnostic classification and length of trial was shown by loglinear analysis. Despite extensive study, there is still no consensus on how effective St. lohn's wort is in depression. However, most experts would agree that it has some effect. Our meta-analysis highlights the problems associated with the clinical evaluation of herbal medicines when the active ingredients are poorly defined or unknown. The problem is compounded when the target disease (e.g. depression) is also difficult to define and different instruments are available to diagnose and evaluate it.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.