108 resultados para Random Forests Classifier
Resumo:
Background: Ineffective risk stratification can delay diagnosis of serious disease in patients with hematuria. We applied a systems biology approach to analyze clinical, demographic and biomarker measurements (n = 29) collected from 157 hematuric patients: 80 urothelial cancer (UC) and 77 controls with confounding pathologies.
Methods: On the basis of biomarkers, we conducted agglomerative hierarchical clustering to identify patient and biomarker clusters. We then explored the relationship between the patient clusters and clinical characteristics using Chi-square analyses. We determined classification errors and areas under the receiver operating curve of Random Forest Classifiers (RFC) for patient subpopulations using the biomarker clusters to reduce the dimensionality of the data.
Results: Agglomerative clustering identified five patient clusters and seven biomarker clusters. Final diagnoses categories were non-randomly distributed across the five patient clusters. In addition, two of the patient clusters were enriched with patients with ‘low cancer-risk’ characteristics. The biomarkers which contributed to the diagnostic classifiers for these two patient clusters were similar. In contrast, three of the patient clusters were significantly enriched with patients harboring ‘high cancer-risk” characteristics including proteinuria, aggressive pathological stage and grade, and malignant cytology. Patients in these three clusters included controls, that is, patients with other serious disease and patients with cancers other than UC. Biomarkers which contributed to the diagnostic classifiers for the largest ‘high cancer- risk’ cluster were different than those contributing to the classifiers for the ‘low cancer-risk’ clusters. Biomarkers which contributed to subpopulations that were split according to smoking status, gender and medication were different.
Conclusions: The systems biology approach applied in this study allowed the hematuric patients to cluster naturally on the basis of the heterogeneity within their biomarker data, into five distinct risk subpopulations. Our findings highlight an approach with the promise to unlock the potential of biomarkers. This will be especially valuable in the field of diagnostic bladder cancer where biomarkers are urgently required. Clinicians could interpret risk classification scores in the context of clinical parameters at the time of triage. This could reduce cystoscopies and enable priority diagnosis of aggressive diseases, leading to improved patient outcomes at reduced costs. © 2013 Emmert-Streib et al; licensee BioMed Central Ltd.
Resumo:
Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of similar to 32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 x 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.
Resumo:
Data from a large-scale contingent valuation study are used to investigate the effects of forest attributes on willingness to pay for forest recreation in Ireland. In particular, the presence of a nature reserve in the forest is found to significantly increase the visitors' willingness to pay. A random utility model is used to estimate the welfare change associated with the creation of nature reserves in all the Irish forests currently without one. The yearly impact on visitors' economic welfare of new nature reserves approaches half a million pounds per annum, exclusive of non-recreational values. (C) 2000 Elsevier Science B.V. All rights reserved.
Resumo:
We present TANC, a TAN classifier (tree-augmented naive) based on imprecise probabilities. TANC models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM). A first contribution of this paper is the experimental comparison between EDM and the global Imprecise Dirichlet Model using the naive credal classifier (NCC), with the aim of showing that EDM is a sensible approximation of the global IDM. TANC is able to deal with missing data in a conservative manner by considering all possible completions (without assuming them to be missing-at-random), but avoiding an exponential increase of the computational time. By experiments on real data sets, we show that TANC is more reliable than the Bayesian TAN and that it provides better performance compared to previous TANs based on imprecise probabilities. Yet, TANC is sometimes outperformed by NCC because the learned TAN structures are too complex; this calls for novel algorithms for learning the TAN structures, better suited for an imprecise probability classifier.
Resumo:
In this paper we present TANC, i.e., a tree-augmented naive credal classifier based on imprecise probabilities; it models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM) (Cano et al., 2007) and deals conservatively with missing data in the training set, without assuming them to be missing-at-random. The EDM is an approximation of the global Imprecise Dirichlet Model (IDM), which considerably simplifies the computation of upper and lower probabilities; yet, having been only recently introduced, the quality of the provided approximation needs still to be verified. As first contribution, we extensively compare the output of the naive credal classifier (one of the few cases in which the global IDM can be exactly implemented) when learned with the EDM and the global IDM; the output of the classifier appears to be identical in the vast majority of cases, thus supporting the adoption of the EDM in real classification problems. Then, by experiments we show that TANC is more reliable than the precise TAN (learned with uniform prior), and also that it provides better performance compared to a previous (Zaffalon, 2003) TAN model based on imprecise probabilities. TANC treats missing data by considering all possible completions of the training set, but avoiding an exponential increase of the computational times; eventually, we present some preliminary results with missing data.
Resumo:
Human activity has undoubtedly had a major impact on Holocene forested ecosystems, with the concurrent expansion of plants and animals associated with cleared landscapes and pasture, also known as 'culture-steppe'. However, this anthropogenic perspective may have underestimated the contribution of autogenic disturbance (e.g. wind-throw, fire), or a mixture of autogenic and anthropogenic processes, within early Holocene forests. Entomologists have long argued that the north European primary forest was probably similar in structure to pasture woodland. This idea has received support from the conservation biologist Frans Vera, who has recently strongly argued that the role of large herbivores in maintaining open forests in the primeval landscapes of Europe has been seriously underestimated. This paper reviews this debate from a fossil invertebrate perspective and looks at several early Holocene insect assemblages. Although wood taxa are indeed important during this period, species typical of open areas and grassland and dung beetles, usually associated with the dung of grazing animals, are persistent presences in many early woodland faunas. We also suggest that fire and other natural disturbance agents appear to have played an important ecological role in some of these forests, maintaining open areas and creating open vegetation islands within these systems. More work, however, is required to ascertain the role of grazing animals, but we conclude that fossil insects have a significant contribution to make to this debate. This evidence has fundamental implications in terms of how the palaeoecological record is interpreted, particularly by environmental archaeologists and palaeoecologists who may be more interested in identifying human-environment interactions rather than the ecological processes which may be preserved within palaeoecological records.
Resumo:
This research published in the foremost international journal in information theory and shows interplay between complex random matrix and multiantenna information theory. Dr T. Ratnarajah is leader in this area of research and his work has been contributed in the development of graduate curricula (course reader) in Massachusetts Institute of Technology (MIT), USA, By Professor Alan Edelman. The course name is "The Mathematics and Applications of Random Matrices", see http://web.mit.edu/18.338/www/projects.html
Resumo:
We suggest a theoretical scheme for the simulation of quantum random walks on a line using beam splitters, phase shifters, and photodetectors. Our model enables us to simulate a quantum random walk using of the wave nature of classical light fields. Furthermore, the proposed setup allows the analysis of the effects of decoherence. The transition from a pure mean-photon-number distribution to a classical one is studied varying the decoherence parameters.
Resumo:
It is shown how the fractional probability density diffusion equation for the diffusion limit of one-dimensional continuous time random walks may be derived from a generalized Markovian Chapman-Kolmogorov equation. The non-Markovian behaviour is incorporated into the Markovian Chapman-Kolmogorov equation by postulating a Levy like distribution of waiting times as a kernel. The Chapman-Kolmogorov equation so generalised then takes on the form of a convolution integral. The dependence on the initial conditions typical of a non-Markovian process is treated by adding a time dependent term involving the survival probability to the convolution integral. In the diffusion limit these two assumptions about the past history of the process are sufficient to reproduce anomalous diffusion and relaxation behaviour of the Cole-Cole type. The Green function in the diffusion limit is calculated using the fact that the characteristic function is the Mittag-Leffler function. Fourier inversion of the characteristic function yields the Green function in terms of a Wright function. The moments of the distribution function are evaluated from the Mittag-Leffler function using the properties of characteristic functions and a relation between the powers of the second moment and higher order even moments is derived. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.