813 resultados para microarray data classification
Resumo:
Alachlor has been a commonly applied herbicide and is a substance of ecotoxicological concern. The present study aims to identify molecular biomarkers in the eukaryotic model Saccharomyces cerevisiae that can be used to predict potential cytotoxic effects of alachlor, while providing new mechanistic clues with possible relevance for experimentally less accessible eukaryotes. It focuses on genome-wide expression profiling in a yeast population in response to two exposure scenarios exerting effects from slight to moderate magnitude at phenotypic level. In particular, 100 and 264 genes, respectively, were found as differentially expressed on a 2-h exposure of yeast cells to the lowest observed effect concentration (110 mg/L) and the 20% inhibitory concentration (200 mg/L) of alachlor, in comparison with cells not exposed to the herbicide. The datasets of alachlor-responsive genes showed functional enrichment in diverse metabolic, transmembrane transport, cell defense, and detoxification categories. In general, the modifications in transcript levels of selected candidate biomarkers, assessed by quantitative reverse transcriptase polymerase chain reaction, confirmed the microarray data and varied consistently with the growth inhibitory effects of alachlor. Approximately 16% of the proteins encoded by alachlor-differentially expressed genes were found to share significant homology with proteins from ecologically relevant eukaryotic species. The biological relevance of these results is discussed in relation to new insights into the potential adverse effects of alachlor in health of organisms from ecosystems, particularly in worst-case situations such as accidental spills or careless storage, usage, and disposal.
Resumo:
Thesis (Master, Biology) -- Queen's University, 2016-09-29 20:09:46.997
Resumo:
In the present study we show that luxS of Bifidobacterium breve UCC2003 is involved in the production of the interspecies signaling molecule autoinducer-2 (AI-2), and that this gene is essential for gastrointestinal colonization of a murine host, while it is also involved in providing protection against Salmonella infection in Caenorhabditis elegans. We demonstrate that a B. breve luxS-insertion mutant is significantly more susceptible to iron chelators than the WT strain and that this sensitivity can be partially reverted in the presence of the AI-2 precursor DPD. Furthermore, we show that several genes of an iron starvation-induced gene cluster, which are downregulated in the luxS-insertion mutant and which encodes a presumed iron-uptake system, are transcriptionally upregulated under in vivo conditions. Mutation of two genes of this cluster in B. breve UCC2003 renders the derived mutant strains sensitive to iron chelators while deficient in their ability to confer gut pathogen protection to Salmonella-infected nematodes. Since a functional luxS gene is present in all tested members of the genus Bifidobacterium, we conclude that bifidobacteria operate a LuxS-mediated system for gut colonization and pathogen protection that is correlated with iron acquisition.
Resumo:
In this study, we demonstrate that the prototype B. breve strain UCC2003 possesses specific metabolic pathways for the utilisation of lacto-N-tetraose (LNT) and lacto-N-neotetraose (LNnT), which represent the central moieties of Type I and Type II human milk oligosaccharides (HMOs), respectively. Using a combination of experimental approaches, the enzymatic machinery involved in the metabolism of LNT and LNnT was identified and characterised. Homologs of the key genetic loci involved in the utilisation of these HMO substrates were identified in B. breve, B. bifidum, B. longum subsp. infantis and B. longum subsp. longum using bioinformatic analyses, and were shown to be variably present among other members of the Bifidobacterium genus, with a distinct pattern of conservation among human-associated bifidobacterial species.
Resumo:
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.
Resumo:
This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.
Resumo:
This paper describes a methodology that was developed for the classification of Medium Voltage (MV) electricity customers. Starting from a sample of data bases, resulting from a monitoring campaign, Data Mining (DM) techniques are used in order to discover a set of a MV consumer typical load profile and, therefore, to extract knowledge regarding to the electric energy consumption patterns. In first stage, it was applied several hierarchical clustering algorithms and compared the clustering performance among them using adequacy measures. In second stage, a classification model was developed in order to allow classifying new consumers in one of the obtained clusters that had resulted from the previously process. Finally, the interpretation of the discovered knowledge are presented and discussed.
Resumo:
Chronic liver disease (CLD) is most of the time an asymptomatic, progressive, and ultimately potentially fatal disease. In this study, an automatic hierarchical procedure to stage CLD using ultrasound images, laboratory tests, and clinical records are described. The first stage of the proposed method, called clinical based classifier (CBC), discriminates healthy from pathologic conditions. When nonhealthy conditions are detected, the method refines the results in three exclusive pathologies in a hierarchical basis: 1) chronic hepatitis; 2) compensated cirrhosis; and 3) decompensated cirrhosis. The features used as well as the classifiers (Bayes, Parzen, support vector machine, and k-nearest neighbor) are optimally selected for each stage. A large multimodal feature database was specifically built for this study containing 30 chronic hepatitis cases, 34 compensated cirrhosis cases, and 36 decompensated cirrhosis cases, all validated after histopathologic analysis by liver biopsy. The CBC classification scheme outperformed the nonhierachical one against all scheme, achieving an overall accuracy of 98.67% for the normal detector, 87.45% for the chronic hepatitis detector, and 95.71% for the cirrhosis detector.
Resumo:
PURPOSE: Fatty liver disease (FLD) is an increasing prevalent disease that can be reversed if detected early. Ultrasound is the safest and ubiquitous method for identifying FLD. Since expert sonographers are required to accurately interpret the liver ultrasound images, lack of the same will result in interobserver variability. For more objective interpretation, high accuracy, and quick second opinions, computer aided diagnostic (CAD) techniques may be exploited. The purpose of this work is to develop one such CAD technique for accurate classification of normal livers and abnormal livers affected by FLD. METHODS: In this paper, the authors present a CAD technique (called Symtosis) that uses a novel combination of significant features based on the texture, wavelet transform, and higher order spectra of the liver ultrasound images in various supervised learning-based classifiers in order to determine parameters that classify normal and FLD-affected abnormal livers. RESULTS: On evaluating the proposed technique on a database of 58 abnormal and 42 normal liver ultrasound images, the authors were able to achieve a high classification accuracy of 93.3% using the decision tree classifier. CONCLUSIONS: This high accuracy added to the completely automated classification procedure makes the authors' proposed technique highly suitable for clinical deployment and usage.
Resumo:
In this work the identification and diagnosis of various stages of chronic liver disease is addressed. The classification results of a support vector machine, a decision tree and a k-nearest neighbor classifier are compared. Ultrasound image intensity and textural features are jointly used with clinical and laboratorial data in the staging process. The classifiers training is performed by using a population of 97 patients at six different stages of chronic liver disease and a leave-one-out cross-validation strategy. The best results are obtained using the support vector machine with a radial-basis kernel, with 73.20% of overall accuracy. The good performance of the method is a promising indicator that it can be used, in a non invasive way, to provide reliable information about the chronic liver disease staging.
Resumo:
In this work liver contour is semi-automatically segmented and quantified in order to help the identification and diagnosis of diffuse liver disease. The features extracted from the liver contour are jointly used with clinical and laboratorial data in the staging process. The classification results of a support vector machine, a Bayesian and a k-nearest neighbor classifier are compared. A population of 88 patients at five different stages of diffuse liver disease and a leave-one-out cross-validation strategy are used in the classification process. The best results are obtained using the k-nearest neighbor classifier, with an overall accuracy of 80.68%. The good performance of the proposed method shows a reliable indicator that can improve the information in the staging of diffuse liver disease.