8 resultados para Random Forests Classifier
em BORIS: Bern Open Repository and Information System - Berna - Suiça
Resumo:
Abstract Radiation metabolomics employing mass spectral technologies represents a plausible means of high-throughput minimally invasive radiation biodosimetry. A simplified metabolomics protocol is described that employs ubiquitous gas chromatography-mass spectrometry and open source software including random forests machine learning algorithm to uncover latent biomarkers of 3 Gy gamma radiation in rats. Urine was collected from six male Wistar rats and six sham-irradiated controls for 7 days, 4 prior to irradiation and 3 after irradiation. Water and food consumption, urine volume, body weight, and sodium, potassium, calcium, chloride, phosphate and urea excretion showed major effects from exposure to gamma radiation. The metabolomics protocol uncovered several urinary metabolites that were significantly up-regulated (glyoxylate, threonate, thymine, uracil, p-cresol) and down-regulated (citrate, 2-oxoglutarate, adipate, pimelate, suberate, azelaate) as a result of radiation exposure. Thymine and uracil were shown to derive largely from thymidine and 2'-deoxyuridine, which are known radiation biomarkers in the mouse. The radiation metabolomic phenotype in rats appeared to derive from oxidative stress and effects on kidney function. Gas chromatography-mass spectrometry is a promising platform on which to develop the field of radiation metabolomics further and to assist in the design of instrumentation for use in detecting biological consequences of environmental radiation release.
Resumo:
To enhance understanding of the metabolic indicators of type 2 diabetes mellitus (T2DM) disease pathogenesis and progression, the urinary metabolomes of well characterized rhesus macaques (normal or spontaneously and naturally diabetic) were examined. High-resolution ultra-performance liquid chromatography coupled with the accurate mass determination of time-of-flight mass spectrometry was used to analyze spot urine samples from normal (n = 10) and T2DM (n = 11) male monkeys. The machine-learning algorithm random forests classified urine samples as either from normal or T2DM monkeys. The metabolites important for developing the classifier were further examined for their biological significance. Random forests models had a misclassification error of less than 5%. Metabolites were identified based on accurate masses (<10 ppm) and confirmed by tandem mass spectrometry of authentic compounds. Urinary compounds significantly increased (p < 0.05) in the T2DM when compared with the normal group included glycine betaine (9-fold), citric acid (2.8-fold), kynurenic acid (1.8-fold), glucose (68-fold), and pipecolic acid (6.5-fold). When compared with the conventional definition of T2DM, the metabolites were also useful in defining the T2DM condition, and the urinary elevations in glycine betaine and pipecolic acid (as well as proline) indicated defective re-absorption in the kidney proximal tubules by SLC6A20, a Na(+)-dependent transporter. The mRNA levels of SLC6A20 were significantly reduced in the kidneys of monkeys with T2DM. These observations were validated in the db/db mouse model of T2DM. This study provides convincing evidence of the power of metabolomics for identifying functional changes at many levels in the omics pipeline.
Resumo:
There has been limited analysis of the effects of hepatocellular carcinoma (HCC) on liver metabolism and circulating endogenous metabolites. Here, we report the findings of a plasma metabolomic investigation of HCC patients by ultraperformance liquid chromatography-electrospray ionization-quadrupole time-of-flight mass spectrometry (UPLC-ESI-QTOFMS), random forests machine learning algorithm, and multivariate data analysis. Control subjects included healthy individuals as well as patients with liver cirrhosis or acute myeloid leukemia. We found that HCC was associated with increased plasma levels of glycodeoxycholate, deoxycholate 3-sulfate, and bilirubin. Accurate mass measurement also indicated upregulation of biliverdin and the fetal bile acids 7α-hydroxy-3-oxochol-4-en-24-oic acid and 3-oxochol-4,6-dien-24-oic acid in HCC patients. A quantitative lipid profiling of patient plasma was also conducted by ultraperformance liquid chromatography-electrospray ionization-triple quadrupole mass spectrometry (UPLC-ESI-TQMS). By this method, we found that HCC was also associated with reduced levels of lysophosphocholines and in 4 of 20 patients with increased levels of lysophosphatidic acid [LPA(16:0)], where it correlated with plasma α-fetoprotein levels. Interestingly, when fatty acids were quantitatively profiled by gas chromatography-mass spectrometry (GC-MS), we found that lignoceric acid (24:0) and nervonic acid (24:1) were virtually absent from HCC plasma. Overall, this investigation illustrates the power of the new discovery technologies represented in the UPLC-ESI-QTOFMS platform combined with the targeted, quantitative platforms of UPLC-ESI-TQMS and GC-MS for conducting metabolomic investigations that can engender new insights into cancer pathobiology.
Resumo:
The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.
Resumo:
Activation of the peroxisome proliferator-activated receptor alpha (PPARalpha) is associated with increased fatty acid catabolism and is commonly targeted for the treatment of hyperlipidemia. To identify latent, endogenous biomarkers of PPARalpha activation and hence increased fatty acid beta-oxidation, healthy human volunteers were given fenofibrate orally for 2 weeks and their urine was profiled by UPLC-QTOFMS. Biomarkers identified by the machine learning algorithm random forests included significant depletion by day 14 of both pantothenic acid (>5-fold) and acetylcarnitine (>20-fold), observations that are consistent with known targets of PPARalpha including pantothenate kinase and genes encoding proteins involved in the transport and synthesis of acylcarnitines. It was also concluded that serum cholesterol (-12.7%), triglycerides (-25.6%), uric acid (-34.7%), together with urinary propylcarnitine (>10-fold), isobutyrylcarnitine (>2.5-fold), (S)-(+)-2-methylbutyrylcarnitine (5-fold), and isovalerylcarnitine (>5-fold) were all reduced by day 14. Specificity of these biomarkers as indicators of PPARalpha activation was demonstrated using the Ppara-null mouse. Urinary pantothenic acid and acylcarnitines may prove useful indicators of PPARalpha-induced fatty acid beta-oxidation in humans. This study illustrates the utility of a pharmacometabolomic approach to understand drug effects on lipid metabolism in both human populations and in inbred mouse models.
Resumo:
MRSI grids frequently show spectra with poor quality, mainly because of the high sensitivity of MRS to field inhomogeneities. These poor quality spectra are prone to quantification and/or interpretation errors that can have a significant impact on the clinical use of spectroscopic data. Therefore, quality control of the spectra should always precede their clinical use. When performed manually, quality assessment of MRSI spectra is not only a tedious and time-consuming task, but is also affected by human subjectivity. Consequently, automatic, fast and reliable methods for spectral quality assessment are of utmost interest. In this article, we present a new random forest-based method for automatic quality assessment of (1) H MRSI brain spectra, which uses a new set of MRS signal features. The random forest classifier was trained on spectra from 40 MRSI grids that were classified as acceptable or non-acceptable by two expert spectroscopists. To account for the effects of intra-rater reliability, each spectrum was rated for quality three times by each rater. The automatic method classified these spectra with an area under the curve (AUC) of 0.976. Furthermore, in the subset of spectra containing only the cases that were classified every time in the same way by the spectroscopists, an AUC of 0.998 was obtained. Feature importance for the classification was also evaluated. Frequency domain skewness and kurtosis, as well as time domain signal-to-noise ratios (SNRs) in the ranges 50-75 ms and 75-100 ms, were the most important features. Given that the method is able to assess a whole MRSI grid faster than a spectroscopist (approximately 3 s versus approximately 3 min), and without loss of accuracy (agreement between classifier trained with just one session and any of the other labelling sessions, 89.88%; agreement between any two labelling sessions, 89.03%), the authors suggest its implementation in the clinical routine. The method presented in this article was implemented in jMRUI's SpectrIm plugin. Copyright © 2016 John Wiley & Sons, Ltd.
Resumo:
Fungi are important members of soil microbial communities with a crucial role in biogeochemical processes. Although soil fungi are known to be highly diverse, little is known about factors influencing variations in their diversity and community structure among forests dominated by the same tree species but spread over different regions and under different managements. We analyzed the soil fungal diversity and community composition of managed and unmanaged European beech dominated forests located in three German regions, the Schwäbische Alb in Southwestern, the Hainich-Dün in Central and the Schorfheide Chorin in the Northeastern Germany, using internal transcribed spacer (ITS) rDNA pyrotag sequencing. Multiple sequence quality filtering followed by sequence data normalization revealed 1655 fungal operational taxonomic units. Further analysis based on 722 abundant fungal OTUs revealed the phylum Basidiomycota to be dominant (54%) and its community to comprise 71.4% of ectomycorrhizal taxa. Fungal community structure differed significantly (p≤0.001) among the three regions and was characterized by non-random fungal OTUs co-occurrence. Soil parameters, herbaceous understory vegetation, and litter cover affected fungal community structure. However, within each study region we found no difference in fungal community structure between management types. Our results also showed region specific significant correlation patterns between the dominant ectomycorrhizal fungal genera. This suggests that soil fungal communities are region-specific but nevertheless composed of functionally diverse and complementary taxa.
Resumo:
Over the last decade, a plethora of computer-aided diagnosis (CAD) systems have been proposed aiming to improve the accuracy of the physicians in the diagnosis of interstitial lung diseases (ILD). In this study, we propose a scheme for the classification of HRCT image patches with ILD abnormalities as a basic component towards the quantification of the various ILD patterns in the lung. The feature extraction method relies on local spectral analysis using a DCT-based filter bank. After convolving the image with the filter bank, q-quantiles are computed for describing the distribution of local frequencies that characterize image texture. Then, the gray-level histogram values of the original image are added forming the final feature vector. The classification of the already described patches is done by a random forest (RF) classifier. The experimental results prove the superior performance and efficiency of the proposed approach compared against the state-of-the-art.