990 resultados para HUMAN TRANSCRIPTOME
Resumo:
Gene expression is one of the most critical factors influencing the phenotype of a cell. As a result of several technological advances, measuring gene expression levels has become one of the most common molecular biological measurements to study the behaviour of cells. The scientific community has produced enormous and constantly increasing collection of gene expression data from various human cells both from healthy and pathological conditions. However, while each of these studies is informative and enlighting in its own context and research setup, diverging methods and terminologies make it very challenging to integrate existing gene expression data to a more comprehensive view of human transcriptome function. On the other hand, bioinformatic science advances only through data integration and synthesis. The aim of this study was to develop biological and mathematical methods to overcome these challenges and to construct an integrated database of human transcriptome as well as to demonstrate its usage. Methods developed in this study can be divided in two distinct parts. First, the biological and medical annotation of the existing gene expression measurements needed to be encoded by systematic vocabularies. There was no single existing biomedical ontology or vocabulary suitable for this purpose. Thus, new annotation terminology was developed as a part of this work. Second part was to develop mathematical methods correcting the noise and systematic differences/errors in the data caused by various array generations. Additionally, there was a need to develop suitable computational methods for sample collection and archiving, unique sample identification, database structures, data retrieval and visualization. Bioinformatic methods were developed to analyze gene expression levels and putative functional associations of human genes by using the integrated gene expression data. Also a method to interpret individual gene expression profiles across all the healthy and pathological tissues of the reference database was developed. As a result of this work 9783 human gene expression samples measured by Affymetrix microarrays were integrated to form a unique human transcriptome resource GeneSapiens. This makes it possible to analyse expression levels of 17330 genes across 175 types of healthy and pathological human tissues. Application of this resource to interpret individual gene expression measurements allowed identification of tissue of origin with 92.0% accuracy among 44 healthy tissue types. Systematic analysis of transcriptional activity levels of 459 kinase genes was performed across 44 healthy and 55 pathological tissue types and a genome wide analysis of kinase gene co-expression networks was done. This analysis revealed biologically and medically interesting data on putative kinase gene functions in health and disease. Finally, we developed a method for alignment of gene expression profiles (AGEP) to perform analysis for individual patient samples to pinpoint gene- and pathway-specific changes in the test sample in relation to the reference transcriptome database. We also showed how large-scale gene expression data resources can be used to quantitatively characterize changes in the transcriptomic program of differentiating stem cells. Taken together, these studies indicate the power of systematic bioinformatic analyses to infer biological and medical insights from existing published datasets as well as to facilitate the interpretation of new molecular profiling data from individual patients.
Resumo:
open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Resumo:
We report the results of a transcript finishing initiative, undertaken for the purpose of identifying and characterizing novel human transcripts, in which RT-PCR was used to bridge gaps between paired EST Clusters, mapped against the genomic sequence. Each pair of EST Clusters selected for experimental validation was designated a transcript finishing unit (TFU). A total of 489 TFUs were selected for validation, and an overall efficiency of 43.1% was achieved. We generated a total of 59,975 bp of transcribed sequences organized into 432 exons, contributing to the definition of the structure of 211 human transcripts. The structure of several transcripts reported here was confirmed during the course of this project, through the generation of their corresponding full-length cDNA sequences. Nevertheless, for 21% of the validated TFUs, a full-length cDNA sequence is not yet available in public databases, and the structure of 69.2% of these TFUs was not correctly predicted by computer programs. The TF strategy provides a significant contribution to the definition of the complete catalog of human genes and transcripts, because it appears to be particularly useful for identification of low abundance transcripts expressed in a restricted Set of tissues as well as for the delineation of gene boundaries and alternatively spliced isoforms.
Resumo:
Whereas genome sequencing defines the genetic potential of an organism, transcript sequencing defines the utilization of this potential and links the genome with most areas of biology. To exploit the information within the human genome in the fight against cancer, we have deposited some two million expressed sequence tags (ESTs) from human tumors and their corresponding normal tissues in the public databases. The data currently define approximate to23,500 genes, of which only approximate to1,250 are still represented only by ESTs. Examination of the EST coverage of known cancer-related (CR) genes reveals that <1% do not have corresponding ESTs, indicating that the representation of genes associated with commonly studied tumors is high. The careful recording of the origin of all ESTs we have produced has enabled detailed definition of where the genes they represent are expressed in the human body. More than 100,000 ESTs are available for seven tissues, indicating a surprising variability of gene usage that has led to the discovery of a significant number of genes with restricted expression, and that may thus be therapeutically useful. The ESTs also reveal novel nonsynonymous germline variants (although the one-pass nature of the data necessitates careful validation) and many alternatively spliced transcripts. Although widely exploited by the scientific community, vindicating our totally open source policy, the EST data generated still provide extensive information that remains to be systematically explored, and that may further facilitate progress toward both the understanding and treatment of human cancers.
Resumo:
This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.
Resumo:
The androgen receptor (AR) is the dominant growth factor in prostate cancer (PCa). Therefore, understanding how ARs regulate the human transcriptome is of paramount importance. The early effects of castration on human PCa have not previously been studied 27 patients medically castrated with degarelix 7 d before radical prostatectomy. We used mass spectrometry, immunohistochemistry, and gene expression array (validated by reverse transcription-polymerase chain reaction) to compare resected tumour with matched, controlled, untreated PCa tissue. All patients had levels of serum androgen, with reduced levels of intraprostatic androgen at prostatectomy. We observed differential expression of known androgen-regulated genes (TMPRSS2, KLK3, CAMKK2, FKBP5). We identified 749 genes downregulated and 908 genes upregulated following castration. AR regulation of α-methylacyl-CoA racemase expression and three other genes (FAM129A, RAB27A, and KIAA0101) was confirmed. Upregulation of oestrogen receptor 1 (ESR1) expression was observed in malignant epithelia and was associated with differential expression of ESR1-regulated genes and correlated with proliferation (Ki-67 expression).
PATIENT SUMMARY: This first-in-man study defines the rapid gene expression changes taking place in prostate cancer (PCa) following castration. Expression levels of the genes that the androgen receptor regulates are predictive of treatment outcome. Upregulation of oestrogen receptor 1 is a mechanism by which PCa cells may survive despite castration.
Resumo:
We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com.
Resumo:
Understanding alternative splicing is crucial to elucidate the mechanisms behind several biological phenomena, including diseases. The huge amount of expressed sequences available nowadays represents an opportunity and a challenge to catalog and display alternative splicing events (ASEs). Although several groups have faced this challenge with relative success, we still lack a computational tool that uses a simple and straightforward method to retrieve, name and present ASEs. Here we present SPLOOCE, a portal for the analysis of human splicing variants. SPLOOCE uses a method based on regular expressions for retrieval of ASEs. We propose a simple syntax that is able to capture the complexity of ASEs.
Resumo:
The main focus of this thesis is the use of high-throughput sequencing technologies in functional genomics (in particular in the form of ChIP-seq, chromatin immunoprecipitation coupled with sequencing, and RNA-seq) and the study of the structure and regulation of transcriptomes. Some parts of it are of a more methodological nature while others describe the application of these functional genomic tools to address various biological problems. A significant part of the research presented here was conducted as part of the ENCODE (ENCyclopedia Of DNA Elements) Project.
The first part of the thesis focuses on the structure and diversity of the human transcriptome. Chapter 1 contains an analysis of the diversity of the human polyadenylated transcriptome based on RNA-seq data generated for the ENCODE Project. Chapter 2 presents a simulation-based examination of the performance of some of the most popular computational tools used to assemble and quantify transcriptomes. Chapter 3 includes a study of variation in gene expression, alternative splicing and allelic expression bias on the single-cell level and on a genome-wide scale in human lymphoblastoid cells; it also brings forward a number of critical to the practice of single-cell RNA-seq measurements methodological considerations.
The second part presents several studies applying functional genomic tools to the study of the regulatory biology of organellar genomes, primarily in mammals but also in plants. Chapter 5 contains an analysis of the occupancy of the human mitochondrial genome by TFAM, an important structural and regulatory protein in mitochondria, using ChIP-seq. In Chapter 6, the mitochondrial DNA occupancy of the TFB2M transcriptional regulator, the MTERF termination factor, and the mitochondrial RNA and DNA polymerases is characterized. Chapter 7 consists of an investigation into the curious phenomenon of the physical association of nuclear transcription factors with mitochondrial DNA, based on the diverse collections of transcription factor ChIP-seq datasets generated by the ENCODE, mouseENCODE and modENCODE consortia. In Chapter 8 this line of research is further extended to existing publicly available ChIP-seq datasets in plants and their mitochondrial and plastid genomes.
The third part is dedicated to the analytical and experimental practice of ChIP-seq. As part of the ENCODE Project, a set of metrics for assessing the quality of ChIP-seq experiments was developed, and the results of this activity are presented in Chapter 9. These metrics were later used to carry out a global analysis of ChIP-seq quality in the published literature (Chapter 10). In Chapter 11, the development and initial application of an automated robotic ChIP-seq (in which these metrics also played a major role) is presented.
The fourth part presents the results of some additional projects the author has been involved in, including the study of the role of the Piwi protein in the transcriptional regulation of transposon expression in Drosophila (Chapter 12), and the use of single-cell RNA-seq to characterize the heterogeneity of gene expression during cellular reprogramming (Chapter 13).
The last part of the thesis provides a review of the results of the ENCODE Project and the interpretation of the complexity of the biochemical activity exhibited by mammalian genomes that they have revealed (Chapters 15 and 16), an overview of the expected in the near future technical developments and their impact on the field of functional genomics (Chapter 14), and a discussion of some so far insufficiently explored research areas, the future study of which will, in the opinion of the author, provide deep insights into many fundamental but not yet completely answered questions about the transcriptional biology of eukaryotes and its regulation.
Resumo:
The reciprocal interaction between cancer cells and the tissue-specific stroma is critical for primary and metastatic tumor growth progression. Prostate cancer cells colonize preferentially bone (osteotropism), where they alter the physiological balance between osteoblast-mediated bone formation and osteoclast-mediated bone resorption, and elicit prevalently an osteoblastic response (osteoinduction). The molecular cues provided by osteoblasts for the survival and growth of bone metastatic prostate cancer cells are largely unknown. We exploited the sufficient divergence between human and mouse RNA sequences together with redefinition of highly species-specific gene arrays by computer-aided and experimental exclusion of cross-hybridizing oligonucleotide probes. This strategy allowed the dissection of the stroma (mouse) from the cancer cell (human) transcriptome in bone metastasis xenograft models of human osteoinductive prostate cancer cells (VCaP and C4-2B). As a result, we generated the osteoblastic bone metastasis-associated stroma transcriptome (OB-BMST). Subtraction of genes shared by inflammation, wound healing and desmoplastic responses, and by the tissue type-independent stroma responses to a variety of non-osteotropic and osteotropic primary cancers generated a curated gene signature ("Core" OB-BMST) putatively representing the bone marrow/bone-specific stroma response to prostate cancer-induced, osteoblastic bone metastasis. The expression pattern of three representative Core OB-BMST genes (PTN, EPHA3 and FSCN1) seems to confirm the bone specificity of this response. A robust induction of genes involved in osteogenesis and angiogenesis dominates both the OB-BMST and Core OB-BMST. This translates in an amplification of hematopoietic and, remarkably, prostate epithelial stem cell niche components that may function as a self-reinforcing bone metastatic niche providing a growth support specific for osteoinductive prostate cancer cells. The induction of this combinatorial stem cell niche is a novel mechanism that may also explain cancer cell osteotropism and local interference with hematopoiesis (myelophthisis). Accordingly, these stem cell niche components may represent innovative therapeutic targets and/or serum biomarkers in osteoblastic bone metastasis.
Resumo:
With the availability of a large amount of genomic data it is expected that the influence of single nucleotide variations (SNVs) in many biological phenomena will be elucidated. Here, we approached the problem of how SNVs affect alternative splicing. First, we observed that SNVs and exonic splicing regulators (ESRs) independently show a biased distribution in alternative exons. More importantly, SNVs map more frequently in ESRs located in alternative exons than in ESRs located in constitutive exons. By looking at SNVs associated with alternative exon/intron borders (by their common presence in the same cDNA molecule), we observed that a specific type of ESR, the exonic splicing silencers (ESSs), are more frequently modified by SNVs. Our results establish a clear association between genetic diversity and alternative splicing involving ESSs.
Resumo:
Adenosine deaminases acting on RNA (ADARs) catalyze the hydrolytic deamination of adenosine to inosine in double-stranded RNA (dsRNA) and thereby potentially alter the information content and structure of cellular RNAs. Notably, although the overwhelming majority of such editing events occur in transcripts derived from Alu repeat elements, the biological function of non-coding RNA editing remains uncertain. Here, we show that mutations in ADAR1 (also known as ADAR) cause the autoimmune disorder Aicardi-Goutieres syndrome (AGS). As in Adar1-null mice, the human disease state is associated with upregulation of interferon-stimulated genes, indicating a possible role for ADAR1 as a suppressor of type I interferon signaling. Considering recent insights derived from the study of other AGS-related proteins, we speculate that ADAR1 may limit the cytoplasmic accumulation of the dsRNA generated from genomic repetitive elements.
Resumo:
The reciprocal interaction between cancer cells and the tissue-specific stroma is critical for primary and metastatic tumor growth progression. Prostate cancer cells colonize preferentially bone (osteotropism), where they alter the physiological balance between osteoblast-mediated bone formation and osteoclast-mediated bone resorption, and elicit prevalently an osteoblastic response (osteoinduction). The molecular cues provided by osteoblasts for the survival and growth of bone metastatic prostate cancer cells are largely unknown. We exploited the sufficient divergence between human and mouse RNA sequences together with redefinition of highly species-specific gene arrays by computer-aided and experimental exclusion of cross-hybridizing oligonucleotide probes. This strategy allowed the dissection of the stroma (mouse) from the cancer cell (human) transcriptome in bone metastasis xenograft models of human osteoinductive prostate cancer cells (VCaP and C4-2B). As a result, we generated the osteoblastic bone metastasis-associated stroma transcriptome (OB-BMST). Subtraction of genes shared by inflammation, wound healing and desmoplastic responses, and by the tissue type-independent stroma responses to a variety of non-osteotropic and osteotropic primary cancers generated a curated gene signature ("Core" OB-BMST) putatively representing the bone marrow/bone-specific stroma response to prostate cancer-induced, osteoblastic bone metastasis. The expression pattern of three representative Core OB-BMST genes (PTN, EPHA3 and FSCN1) seems to confirm the bone specificity of this response. A robust induction of genes involved in osteogenesis and angiogenesis dominates both the OB-BMST and Core OB-BMST. This translates in an amplification of hematopoietic and, remarkably, prostate epithelial stem cell niche components that may function as a self-reinforcing bone metastatic niche providing a growth support specific for osteoinductive prostate cancer cells. The induction of this combinatorial stem cell niche is a novel mechanism that may also explain cancer cell osteotropism and local interference with hematopoiesis (myelophthisis). Accordingly, these stem cell niche components may represent innovative therapeutic targets and/or serum biomarkers in osteoblastic bone metastasis.
Resumo:
Large numbers of noncoding RNA transcripts (ncRNAS) are being revealed by complementary DNA cloning and genome tiling array studies in animals. The big and as yet largely unanswered question is whether these transcripts are relevant. A paper by Willingham et al. shows the way forward by developing a strategy for large-scale functional screening of ncRNAs, involving small interfering RNA knockdowns in cell-based screens, which identified a previously unidentified ncRNA repressor of the transcription factor NFAT. It appears likely that ncRNAs constitute a critical hidden layer of gene regulation in complex organisms, the understanding of which requires new approaches in functional genomics.