773 resultados para Machine learning approaches
Resumo:
Making research relevant to development is a complex, non-linear and often unpredictable process which requires very particular skills and strategies on the part of researchers. The National Centre of Competence in Research (NCCR) North-South provides financial and technical support for researchers so that they can effectively cooperate with policy-makers and practitioners. An analysis of 10 years of experience translating research into development practise in the NCCR North-South revealed the following four strategies as particularly relevant: a) research orientation towards the needs and interests of partners; b) implementation of promising methods and approaches; c) communication and dissemination of research results; and d) careful analysis of the political context through monitoring and learning approaches. The NCCR North-South experience shows that “doing excellent research” is just one piece of the mosaic. It is equally important to join hands with non-academic partners from the very beginning of a research project, in order to develop and test new pathways for sustainable development. Capacity building – in the North and South – enables researchers to do both: To do excellent research and to make it relevant for development.
Resumo:
There has been limited analysis of the effects of hepatocellular carcinoma (HCC) on liver metabolism and circulating endogenous metabolites. Here, we report the findings of a plasma metabolomic investigation of HCC patients by ultraperformance liquid chromatography-electrospray ionization-quadrupole time-of-flight mass spectrometry (UPLC-ESI-QTOFMS), random forests machine learning algorithm, and multivariate data analysis. Control subjects included healthy individuals as well as patients with liver cirrhosis or acute myeloid leukemia. We found that HCC was associated with increased plasma levels of glycodeoxycholate, deoxycholate 3-sulfate, and bilirubin. Accurate mass measurement also indicated upregulation of biliverdin and the fetal bile acids 7α-hydroxy-3-oxochol-4-en-24-oic acid and 3-oxochol-4,6-dien-24-oic acid in HCC patients. A quantitative lipid profiling of patient plasma was also conducted by ultraperformance liquid chromatography-electrospray ionization-triple quadrupole mass spectrometry (UPLC-ESI-TQMS). By this method, we found that HCC was also associated with reduced levels of lysophosphocholines and in 4 of 20 patients with increased levels of lysophosphatidic acid [LPA(16:0)], where it correlated with plasma α-fetoprotein levels. Interestingly, when fatty acids were quantitatively profiled by gas chromatography-mass spectrometry (GC-MS), we found that lignoceric acid (24:0) and nervonic acid (24:1) were virtually absent from HCC plasma. Overall, this investigation illustrates the power of the new discovery technologies represented in the UPLC-ESI-QTOFMS platform combined with the targeted, quantitative platforms of UPLC-ESI-TQMS and GC-MS for conducting metabolomic investigations that can engender new insights into cancer pathobiology.
Resumo:
To enhance understanding of the metabolic indicators of type 2 diabetes mellitus (T2DM) disease pathogenesis and progression, the urinary metabolomes of well characterized rhesus macaques (normal or spontaneously and naturally diabetic) were examined. High-resolution ultra-performance liquid chromatography coupled with the accurate mass determination of time-of-flight mass spectrometry was used to analyze spot urine samples from normal (n = 10) and T2DM (n = 11) male monkeys. The machine-learning algorithm random forests classified urine samples as either from normal or T2DM monkeys. The metabolites important for developing the classifier were further examined for their biological significance. Random forests models had a misclassification error of less than 5%. Metabolites were identified based on accurate masses (<10 ppm) and confirmed by tandem mass spectrometry of authentic compounds. Urinary compounds significantly increased (p < 0.05) in the T2DM when compared with the normal group included glycine betaine (9-fold), citric acid (2.8-fold), kynurenic acid (1.8-fold), glucose (68-fold), and pipecolic acid (6.5-fold). When compared with the conventional definition of T2DM, the metabolites were also useful in defining the T2DM condition, and the urinary elevations in glycine betaine and pipecolic acid (as well as proline) indicated defective re-absorption in the kidney proximal tubules by SLC6A20, a Na(+)-dependent transporter. The mRNA levels of SLC6A20 were significantly reduced in the kidneys of monkeys with T2DM. These observations were validated in the db/db mouse model of T2DM. This study provides convincing evidence of the power of metabolomics for identifying functional changes at many levels in the omics pipeline.
Resumo:
Investigation uses simulation to explore the inherent tradeoffs ofcontrolling high-speed and highly robust walking robots while minimizing energy consumption. Using a novel controller which optimizes robustness, energy economy, and speed of a simulated robot on rough terrain, the user can adjust their priorities between these three outcome measures and systematically generate a performance curveassessing the tradeoffs associated with these metrics.
Resumo:
With a virus such as Human Immunodeficiency Virus (HIV) that has infected millions of people worldwide, and with many unaware that they are infected, it becomes vital to understand how the virus works and how it functions at the molecular level. Because there currently is no vaccine and no way to eradicate the virus from an infected person, any information about how the virus interacts with its host greatly increases the chances of understanding how HIV works and brings scientists one step closer to being able to combat such a destructive virus. Thousands of HIV viruses have been sequenced and are available in many online databases for public use. Attributes that are linked to each sequence include the viral load within the host and how sick the patient is currently. Being able to predict the stage of infection for someone is a valuable resource, as it could potentially aid in treatment options and proper medication use. Our approach of analyzing region-specific amino acid composition for select genes has been able to predict patient disease state up to an accuracy of 85.4%. Moreover, we output a set of classification rules based on the sequence that may prove useful for diagnosing the expected clinical outcome of the infected patient.
Resumo:
The task considered in this paper is performance evaluation of region segmentation algorithms in the ground-truth-based paradigm. Given a machine segmentation and a ground-truth segmentation, performance measures are needed. We propose to consider the image segmentation problem as one of data clustering and, as a consequence, to use measures for comparing clusterings developed in statistics and machine learning. By doing so, we obtain a variety of performance measures which have not been used before in image processing. In particular, some of these measures have the highly desired property of being a metric. Experimental results are reported on both synthetic and real data to validate the measures and compare them with others.
Resumo:
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Resumo:
The developmental processes and functions of an organism are controlled by the genes and the proteins that are derived from these genes. The identification of key genes and the reconstruction of gene networks can provide a model to help us understand the regulatory mechanisms for the initiation and progression of biological processes or functional abnormalities (e.g. diseases) in living organisms. In this dissertation, I have developed statistical methods to identify the genes and transcription factors (TFs) involved in biological processes, constructed their regulatory networks, and also evaluated some existing association methods to find robust methods for coexpression analyses. Two kinds of data sets were used for this work: genotype data and gene expression microarray data. On the basis of these data sets, this dissertation has two major parts, together forming six chapters. The first part deals with developing association methods for rare variants using genotype data (chapter 4 and 5). The second part deals with developing and/or evaluating statistical methods to identify genes and TFs involved in biological processes, and construction of their regulatory networks using gene expression data (chapter 2, 3, and 6). For the first part, I have developed two methods to find the groupwise association of rare variants with given diseases or traits. The first method is based on kernel machine learning and can be applied to both quantitative as well as qualitative traits. Simulation results showed that the proposed method has improved power over the existing weighted sum method (WS) in most settings. The second method uses multiple phenotypes to select a few top significant genes. It then finds the association of each gene with each phenotype while controlling the population stratification by adjusting the data for ancestry using principal components. This method was applied to GAW 17 data and was able to find several disease risk genes. For the second part, I have worked on three problems. First problem involved evaluation of eight gene association methods. A very comprehensive comparison of these methods with further analysis clearly demonstrates the distinct and common performance of these eight gene association methods. For the second problem, an algorithm named the bottom-up graphical Gaussian model was developed to identify the TFs that regulate pathway genes and reconstruct their hierarchical regulatory networks. This algorithm has produced very significant results and it is the first report to produce such hierarchical networks for these pathways. The third problem dealt with developing another algorithm called the top-down graphical Gaussian model that identifies the network governed by a specific TF. The network produced by the algorithm is proven to be of very high accuracy.
Resumo:
Important food crops like rice are constantly exposed to various stresses that can have devastating effect on their survival and productivity. Being sessile, these highly evolved organisms have developed elaborate molecular machineries to sense a mixture of stress signals and elicit a precise response to minimize the damage. However, recent discoveries revealed that the interplay of these stress regulatory and signaling molecules is highly complex and remains largely unknown. In this work, we conducted large scale analysis of differential gene expression using advanced computational methods to dissect regulation of stress response which is at the heart of all molecular changes leading to the observed phenotypic susceptibility. One of the most important stress conditions in terms of loss of productivity is drought. We performed genomic and proteomic analysis of epigenetic and miRNA mechanisms in regulation of drought responsive genes in rice and found subsets of genes with striking properties. Overexpressed genesets included higher number of epigenetic marks, miRNA targets and transcription factors which regulate drought tolerance. On the other hand, underexpressed genesets were poor in above features but were rich in number of metabolic genes with multiple co-expression partners contributing majorly towards drought resistance. Identification and characterization of the patterns exhibited by differentially expressed genes hold key to uncover the synergistic and antagonistic components of the cross talk between stress response mechanisms. We performed meta-analysis on drought and bacterial stresses in rice and Arabidopsis, and identified hundreds of shared genes. We found high level of conservation of gene expression between these stresses. Weighted co-expression network analysis detected two tight clusters of genes made up of master transcription factors and signaling genes showing strikingly opposite expression status. To comprehensively identify the shared stress responsive genes between multiple abiotic and biotic stresses in rice, we performed meta-analyses of microarray studies from seven different abiotic and six biotic stresses separately and found more than thirteen hundred shared stress responsive genes. Various machine learning techniques utilizing these genes classified the stresses into two major classes' namely abiotic and biotic stresses and multiple classes of individual stresses with high accuracy and identified the top genes showing distinct patterns of expression. Functional enrichment and co-expression network analysis revealed the different roles of plant hormones, transcription factors in conserved and non-conserved genesets in regulation of stress response.
Resumo:
Activation of the peroxisome proliferator-activated receptor alpha (PPARalpha) is associated with increased fatty acid catabolism and is commonly targeted for the treatment of hyperlipidemia. To identify latent, endogenous biomarkers of PPARalpha activation and hence increased fatty acid beta-oxidation, healthy human volunteers were given fenofibrate orally for 2 weeks and their urine was profiled by UPLC-QTOFMS. Biomarkers identified by the machine learning algorithm random forests included significant depletion by day 14 of both pantothenic acid (>5-fold) and acetylcarnitine (>20-fold), observations that are consistent with known targets of PPARalpha including pantothenate kinase and genes encoding proteins involved in the transport and synthesis of acylcarnitines. It was also concluded that serum cholesterol (-12.7%), triglycerides (-25.6%), uric acid (-34.7%), together with urinary propylcarnitine (>10-fold), isobutyrylcarnitine (>2.5-fold), (S)-(+)-2-methylbutyrylcarnitine (5-fold), and isovalerylcarnitine (>5-fold) were all reduced by day 14. Specificity of these biomarkers as indicators of PPARalpha activation was demonstrated using the Ppara-null mouse. Urinary pantothenic acid and acylcarnitines may prove useful indicators of PPARalpha-induced fatty acid beta-oxidation in humans. This study illustrates the utility of a pharmacometabolomic approach to understand drug effects on lipid metabolism in both human populations and in inbred mouse models.