31 resultados para microarray data classification
em University of Queensland eSpace - Australia
Resumo:
This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).
Resumo:
Fast Classification (FC) networks were inspired by a biologically plausible mechanism for short term memory where learning occurs instantaneously. Both weights and the topology for an FC network are mapped directly from the training samples by using a prescriptive training scheme. Only two presentations of the training data are required to train an FC network. Compared with iterative learning algorithms such as Back-propagation (which may require many hundreds of presentations of the training data), the training of FC networks is extremely fast and learning convergence is always guaranteed. Thus FC networks may be suitable for applications where real-time classification is needed. In this paper, the FC networks are applied for the real-time extraction of gene expressions for Chlamydia microarray data. Both the classification performance and learning time of the FC networks are compared with the Multi-Layer Proceptron (MLP) networks and support-vector-machines (SVM) in the same classification task. The FC networks are shown to have extremely fast learning time and comparable classification accuracy.
Resumo:
We have constructed cDNA microarrays for soybean (Glycine max L. Merrill), containing approximately 4,100 Unigene ESTs derived from axenic roots, to evaluate their application and utility for functional genomics of organ differentiation in legumes. We assessed microarray technology by conducting studies to evaluate the accuracy of microarray data and have found them to be both reliable and reproducible in repeat hybridisations. Several ESTs showed high levels (>50 fold) of differential expression in either root or shoot tissue of soybean. A small number of physiologically interesting, and differentially expressed sequences found by microarray analysis were verified by both quantitative real-time RT-PCR and Northern blot analysis. There was a linear correlation (r(2) = 0.99, over 5 orders of magnitude) between microarray and quantitative real-time RT-PCR data. Microarray analysis of soybean has enormous potential not only for the discovery of new genes involved in tissue differentiation and function, but also to study the expression of previously characterised genes, gene networks and gene interactions in wild-type, mutant or transgenic; plants.
Resumo:
Ecological regions are increasingly used as a spatial unit for planning and environmental management. It is important to define these regions in a scientifically defensible way to justify any decisions made on the basis that they are representative of broad environmental assets. The paper describes a methodology and tool to identify cohesive bioregions. The methodology applies an elicitation process to obtain geographical descriptions for bioregions, each of these is transformed into a Normal density estimate on environmental variables within that region. This prior information is balanced with data classification of environmental datasets using a Bayesian statistical modelling approach to objectively map ecological regions. The method is called model-based clustering as it fits a Normal mixture model to the clusters associated with regions, and it addresses issues of uncertainty in environmental datasets due to overlapping clusters.
Resumo:
1. Cluster analysis of reference sites with similar biota is the initial step in creating River Invertebrate Prediction and Classification System (RIVPACS) and similar river bioassessment models such as Australian River Assessment System (AUSRIVAS). This paper describes and tests an alternative prediction method, Assessment by Nearest Neighbour Analysis (ANNA), based on the same philosophy as RIVPACS and AUSRIVAS but without the grouping step that some people view as artificial. 2. The steps in creating ANNA models are: (i) weighting the predictor variables using a multivariate approach analogous to principal axis correlations, (ii) calculating the weighted Euclidian distance from a test site to the reference sites based on the environmental predictors, (iii) predicting the faunal composition based on the nearest reference sites and (iv) calculating an observed/expected (O/E) analogous to RIVPACS/AUSRIVAS. 3. The paper compares AUSRIVAS and ANNA models on 17 datasets representing a variety of habitats and seasons. First, it examines each model's regressions for Observed versus Expected number of taxa, including the r(2), intercept and slope. Second, the two models' assessments of 79 test sites in New Zealand are compared. Third, the models are compared on test and presumed reference sites along a known trace metal gradient. Fourth, ANNA models are evaluated for western Australia, a geographically distinct region of Australia. The comparisons demonstrate that ANNA and AUSRIVAS are generally equivalent in performance, although ANNA turns out to be potentially more robust for the O versus E regressions and is potentially more accurate on the trace metal gradient sites. 4. The ANNA method is recommended for use in bioassessment of rivers, at least for corroborating the results of the well established AUSRIVAS- and RIVPACS-type models, if not to replace them.
Resumo:
Alcohol dependence may result from neuroadaptation involving alteration of gene expression after long-term alcohol exposure. The systematic study of gene expression profiles of the human alcoholic brain was initiated using the method of polymerase chain reaction (PCR)-differential display and was followed by DNA microarray. To date, more than 100 alcohol-responsive genes have been identified from the frontal cortex, motor cortex and nucleus accumbens of the human brain. These genes have a wide range of functions in the brain and indicate diverse actions of alcohol on neuronal function. This review discusses the current information on the genetic basis of alcoholism and the induction and characterization of these alcohol-responsive genes.
Resumo:
Although many of the molecular interactions in kidney development are now well understood, the molecules involved in the specification of the metanephric mesenchyme from surrounding intermediate mesoderm and, hence, the formation of the renal progenitor population are poorly characterized. In this study, cDNA microarrays were used to identify genes enriched in the murine embryonic day 10.5 (E10.5) uninduced metanephric mesenchyme, the renal progenitor population, in comparison with more rostral derivatives of the intermediate mesoderm. Microarray data were analyzed using R statistical software to determine accurately genes differentially expressed between these populations. Microarray outliers were biologically verified, and the spatial expression pattern of these genes at E10.5 and subsequent stages of early kidney development was determined by RNA in situ hybridization. This approach identified 21 genes preferentially expressed by the E10.5 metanephric mesenchyme, including Ewing sarcoma homolog, 14-3-3 theta, retinoic acid receptor-alpha, stearoyl-CoA desaturase 2, CD24, and cadherin-11, that may be important in formation of renal progenitor cells. Cell surface proteins such as CD24 and cadherin-11 that were strongly and specifically expressed in the uninduced metanephric mesenchyme and mark the renal progenitor population may prove useful in the purification of renal progenitor cells by FACS. These findings may assist in the isolation and characterization of potential renal stem cells for use in cellular therapies for kidney disease.
Resumo:
We previously demonstrated that olfactory cultures front individuals with schizophrenia had increased cell proliferation compared to Cultures from healthy controls. The aims of this study were to (a) replicate this observation in a new group Of individuals with schizophrenia, (b) examine the specificity of these findings by including individuals with bipolar I disorder and (c) explore gene expression differences that may underlie cell cycle differences in these diseases. Compared to controls (n = 10), there was significantly more mitosis in schizophrenia patient cultures (it = 8) and significantly more cell death in the bipolar I disorder patient cultures (n=8). Microarray data showed alterations to the cell cycle and phosphatidylinositol signalling pathways in schizophrenia and bipolar I disorder, respectively. Whilst caution is required in the interpretation of the array results, the study provides evidence indicating that cell proliferation and cell death in olfactory neuroepithelial cultures is differentially altered in schizophrenia and bipolar disorder. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
We describe the creation process of the Minimum Information Specification for In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE). Modeled after the existing minimum information specification for microarray data, we created a new specification for gene expression localization experiments, initially to facilitate data sharing within a consortium. After successful use within the consortium, the specification was circulated to members of the wider biomedical research community for comment and refinement. After a period of acquiring many new suggested requirements, it was necessary to enter a final phase of excluding those requirements that were deemed inappropriate as a minimum requirement for all experiments. The full specification will soon be published as a version 1.0 proposal to the community, upon which a more full discussion must take place so that the final specification may be achieved with the involvement of the whole community. This paper is part of the special issue of OMICS on data standards.
Resumo:
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.