987 resultados para exon
Resumo:
In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.
Resumo:
Background: The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manualannotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.Results: The GENCODE gene features are divided into eight different categories of which onlythe first two (known and novel coding sequence) are confidently predicted to be protein-codinggenes. 5’ rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentallyverify the initial annotation. Of the 420 coding loci tested, 229 RACE products have beensequenced. They supported 5’ extensions of 30 loci and new splice variants in 50 loci. In addition,46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15putative transcripts. We assessed the comprehensiveness of the GENCODE annotation byattempting to validate all the predicted exon boundaries outside the GENCODE annotation. Outof 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only twoof them in intergenic regions.Conclusions: In total, 487 loci, of which 434 are coding, have been annotated as part of theGENCODE reference set available from the UCSC browser. Comparison of GENCODEannotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained withinthe two sets, which is a reflection of the high number of alternative splice forms with uniqueexons annotated. Over 50% of coding loci have been experimentally verified by 5’ RACE forEGASP and the GENCODE collaboration is continuing to refine its annotation of 1% humangenome with the aid of experimental validation.
Resumo:
The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT–PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.
Resumo:
Background Following the discovery that mutant KRAS is associated with resistance to anti-epidermal growth factor receptor (EGFR) antibodies, the tumours of patients with metastatic colorectal cancer are now profiled for seven KRAS mutations before receiving cetuximab or panitumumab. However, most patients with KRAS wild-type tumours still do not respond. We studied the effect of other downstream mutations on the efficacy of cetuximab in, to our knowledge, the largest cohort to date of patients with chemotherapy-refractory metastatic colorectal cancer treated with cetuximab plus chemotherapy in the pre-KRAS selection era. Methods 1022 tumour DNA samples (73 from fresh-frozen and 949 from formalin-fixed, paraffin-embedded tissue) from patients treated with cetuximab between 2001 and 2008 were gathered from 11 centres in seven European countries. 773 primary tumour samples had sufficient quality DNA and were included in mutation frequency analyses; mass spectrometry genotyping of tumour samples for KRAS, BRAF, NRAS, and PIK3CA was done centrally. We analysed objective response, progression-free survival (PFS), and overall survival in molecularly defined subgroups of the 649 chemotherapy-refractory patients treated with cetuximab plus chemotherapy. Findings 40.0% (299/747) of the tumours harboured a KRAS mutation, 14.5% (108/743) harboured a PIK3CA mutation (of which 68.5% [74/108] were located in exon 9 and 20.4% [22/108] in exon 20), 4.7% (36/761) harboured a BRAF mutation, and 2.6% (17/644) harboured an NRAS mutation. KRAS mutants did not derive benefit compared with wild types, with a response rate of 6.7% (17/253) versus 35.8% (126/352; odds ratio [OR] 0.13, 95% CI 0.07-0.22; p<0.0001), a median PFS of 12. weeks versus 24 weeks (hazard ratio [HR] 1 98, 1.66-2.36; p<0.0001), and a median overall survival of 32 weeks versus 50 weeks (1.75, 1.47-2.09; p<0.0001). In KRAS wild types, carriers of BRAF and NRAS mutations had a significantly lower response rate than did BRAF and NRAS wild types, with a response rate of 8.3% (2/24) in carriers of BRAF mutations versus 38.0% in BRAF wild types (124/326; OR 0.15, 95% CI 0.02-0.51; p=0.0012); and 7.7% (1/13) in carriers of NRAS mutations versus 38.1% in NRAS wild types (110/289; OR 0.14, 0.007-0.70; p=0.013). PIK3CA exon 9 mutations had no effect, whereas exon 20 mutations were associated with a worse outcome compared with wild types, with a response rate of 0.0% (0/9) versus 36.8% (121/329; OR 0.00,0.00-0.89; p=0.029), a median PFS of 11.5 weeks versus 24 weeks (HR 2.52, 1.33-4.78; p=0.013), and a median overall survival of 34 weeks versus 51 weeks (3.29, 1.60-6.74; p=0.0057). Multivariate analysis and conditional inference trees confirmed that, if KRAS is not mutated, assessing BRAF, NRAS, and PIK3CA exon 20 mutations (in that order) gives additional information about outcome. Objective response rates in our series were 24.4% in the unselected population, 36.3% in the KRAS wild-type selected population, and 41.2% in the KRAS, BRAF, NRAS, and PIK3CA exon 20 wild-type population. Interpretation While confirming the negative effect of KRAS mutations on outcome after cetuximab, we show that BRAF, NRAS, and PIK3CA,exon 20 mutations are significantly associated with a low response rate. Objective response rates could be improved by additional genotyping of BRAF, NRAS, and PIK3CA exon 20 mutations in a KRAS wild-type population.
Resumo:
Background: Alternatively spliced exons play an important role in the diversification of gene function in most metazoans and are highly regulated by conserved motifs in exons and introns. Two contradicting properties have been associated to evolutionary conserved alternative exons: higher sequence conservation and higher rate of non-synonymous substitutions, relative to constitutive exons. In order to clarify this issue, we have performed an analysis of the evolution of alternative and constitutive exons, using a large set of protein coding exons conserved between human and mouse and taking into account the conservation of the transcript exonic structure. Further, we have also defined a measure of the variation of the arrangement of exonic splicing enhancers (ESE-conservation score) to study the evolution of splicing regulatory sequences. We have used this measure to correlate the changes in the arrangement of ESEs with the divergence of exon and intron sequences. Results: We find evidence for a relation between the lack of conservation of the exonic structure and the weakening of the sequence evolutionary constraints in alternative and constitutive exons. Exons in transcripts with non-conserved exonic structures have higher synonymous (dS) and non-synonymous (dN) substitution rates than exons in conserved structures. Moreover, alternative exons in transcripts with non-conserved exonic structure are the least constrained in sequence evolution, and at high EST-inclusion levels they are found to be very similar to constitutive exons, whereas alternative exons in transcripts with conserved exonic structure have a dS significantly lower than average at all EST-inclusion levels. We also find higher conservation in the arrangement of ESEs in constitutive exons compared to alternative ones. Additionally, the sequence conservation at flanking introns remains constant for constitutive exons at all ESE-conservation values, but increases for alternative exons at high ESE-conservation values. Conclusion: We conclude that most of the differences in dN observed between alternative and constitutive exons can be explained by the conservation of the transcript exonic structure. Low dS values are more characteristic of alternative exons with conserved exonic structure, but not of those with non-conserved exonic structure. Additionally, constitutive exons are characterized by a higher conservation in the arrangement of ESEs, and alternative exons with an ESE-conservation similar to that of constitutive exons are characterized by a conservation of the flanking intron sequences higher than average, indicating the presence of more intronic regulatory signals.
Resumo:
Background: The understanding of whole genome sequences in higher eukaryotes depends to a large degree on the reliable definition of transcription units including exon/intron structures, translated open reading frames (ORFs) and flanking untranslated regions. The best currently available chicken transcript catalog is the Ensembl build based on the mappings of a relatively small number of full length cDNAs and ESTs to the genome as well as genome sequence derived in silico gene predictions.Results: We use Long Serial Analysis of Gene Expression (LongSAGE) in bursal lymphocytes and the DT40 cell line to verify the quality and completeness of the annotated transcripts. 53.6% of the more than 38,000 unique SAGE tags (unitags) match to full length bursal cDNAs, the Ensembl transcript build or the genome sequence. The majority of all matching unitags show single matches to the genome, but no matches to the genome derived Ensembl transcript build. Nevertheless, most of these tags map close to the 3' boundaries of annotated Ensembl transcripts.Conclusions: These results suggests that rather few genes are missing in the current Ensembl chicken transcript build, but that the 3' ends of many transcripts may not have been accurately predicted. The tags with no match in the transcript sequences can now be used to improve gene predictions, pinpoint the genomic location of entirely missed transcripts and optimize the accuracy of gene finder software.
Resumo:
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Resumo:
Retroposed genes (retrogenes) originate via the reverse transcription of mature messenger RNAs from parental source genes and are therefore usually devoid of introns. Here, we characterize a particular set of mammalian retrogenes that acquired introns upon their emergence and thus represent rare cases of intron gain in mammals. We find that although a few retrogenes evolved introns in their coding or 3' untranslated regions (untranslated region, UTR), most introns originated together with untranslated exons in the 5' flanking regions of the retrogene insertion site. They emerged either de novo or through fusions with 5' UTR exons of host genes into which the retrogenes inserted. Generally, retrogenes with introns display high transcription levels and show broader spatial expression patterns than other retrogenes. Our experimental expression analyses of individual intron-containing retrogenes show that 5' UTR introns may indeed promote higher expression levels, at least in part through encoded regulatory elements. By contrast, 3' UTR introns may lead to downregulation of expression levels via nonsense-mediated decay mechanisms. Notably, the majority of retrogenes with introns in their 5' flanks depend on distant, sometimes bidirectional CpG dinucleotide-enriched promoters for their expression that may be recruited from other genes in the genomic vicinity. We thus propose a scenario where the acquisition of new 5' exon-intron structures was directly linked to the recruitment of distant promoters by these retrogenes, a process potentially facilitated by the presence of proto-splice sites in the genomic vicinity of retrogene insertion sites. Thus, the primary role and selective benefit of new 5' introns (and UTR exons) was probably initially to span the often substantial distances to potent CpG promoters driving retrogene transcription. Later in evolution, these introns then obtained additional regulatory roles in fine tuning retrogene expression levels. Our study provides novel insights regarding mechanisms underlying the origin of new introns, the evolutionary relevance of intron gain, and the origin of new gene promoters.
Resumo:
The diagnosis of muscular dystrophies or the assessment of the functional benefit of gene or cell therapies can be difficult, especially for poorly accessible muscles, and it often lacks a singlefiber resolution. In the present study, we evaluated whether muscle diseases can be diagnosed from small biopsies using atomic force microscopy (AFM). AFM was shown to provide a sensitive and quantitative description of the resistance of normal and dystrophic myofibers within live muscle tissues explanted from Duchenne mdx mice. The rescue of dystrophin expression by gene therapy approaches led to the functional recovery of treated dystrophic muscle fibers, as probed using AFM and by in situ wholemuscle strength measurements. Comparison of muscles treated with viral or non-viral vectors indicated that the efficacy of the gene transfer approaches could be distinguished with a single myofiber resolution. This indicated full correction of the resistance to deformation in nearly all of the muscle fibers treated with an adeno-associated viral vector that mediates exon-skipping on the dystrophin mRNA. Having shown that AFM can provide a quantitative assessment of the expression of muscle proteins and of the muscular function in animal models, we assessed myofiber resistance in the context of human muscular dystrophies and myopathies. Thus, various forms of human Becker syndrome can also be detected using AFM in blind studies of small frozen biopsies from human patients. Interestingly, it also allowed the detection of anomalies in a fraction of the muscle fibers from patients showing a muscle weakness that could not be attributed to a known molecular or genetic defect. Overall, we conclude that AFM may provide a useful method to complement current diagnosis tools of known and unknown muscular diseases, in research and in a clinical context.
Resumo:
We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
Resumo:
Metachondromatosis (MC) is a rare, autosomal dominant, incompletely penetrant combined exostosis and enchondromatosis tumor syndrome. MC is clinically distinct from other multiple exostosis or multiple enchondromatosis syndromes and is unlinked to EXT1 and EXT2, the genes responsible for autosomal dominant multiple osteochondromas (MO). To identify a gene for MC, we performed linkage analysis with high-density SNP arrays in a single family, used a targeted array to capture exons and promoter sequences from the linked interval in 16 participants from 11 MC families, and sequenced the captured DNA using high-throughput parallel sequencing technologies. DNA capture and parallel sequencing identified heterozygous putative loss-of-function mutations in PTPN11 in 4 of the 11 families. Sanger sequence analysis of PTPN11 coding regions in a total of 17 MC families identified mutations in 10 of them (5 frameshift, 2 nonsense, and 3 splice-site mutations). Copy number analysis of sequencing reads from a second targeted capture that included the entire PTPN11 gene identified an additional family with a 15 kb deletion spanning exon 7 of PTPN11. Microdissected MC lesions from two patients with PTPN11 mutations demonstrated loss-of-heterozygosity for the wild-type allele. We next sequenced PTPN11 in DNA samples from 54 patients with the multiple enchondromatosis disorders Ollier disease or Maffucci syndrome, but found no coding sequence PTPN11 mutations. We conclude that heterozygous loss-of-function mutations in PTPN11 are a frequent cause of MC, that lesions in patients with MC appear to arise following a "second hit," that MC may be locus heterogeneous since 1 familial and 5 sporadically occurring cases lacked obvious disease-causing PTPN11 mutations, and that PTPN11 mutations are not a common cause of Ollier disease or Maffucci syndrome.
Resumo:
We have reported the identification of human gene MAGE-1, which directs the expression of an antigen recognized on a melanoma by autologous cytolytic T lymphocytes (CTL). We show here that CTL directed against this antigen, which was named MZ2-E, recognize a nonapeptide encoded by the third exon of gene MAGE-1. The CTL also recognize this peptide when it is presented by mouse cells transfected with an HLA-A1 gene, confirming the association of antigen MZ2-E with the HLA-A1 molecule. Other members of the MAGE gene family do not code for the same peptide, suggesting that only MAGE-1 produces the antigen recognized by the anti-MZ2-E CTL. Our results open the possibility of immunizing HLA-A1 patients whose tumor expresses MAGE-1 either with the antigenic peptide or with autologous antigen-presenting cells pulsed with the peptide.
Resumo:
Background: Imatinib has revolutionized the treatment of chronic myeloid leukemia (CML) and gastrointestinal stromal tumors (GIST). Considering the large inter-individual differences in the function of the systems involved in its disposition, exposure to imatinib can be expected to vary widely among patients. This observational study aimed at describing imatinib pharmacokinetic variability and its relationship with various biological covariates, especially plasma alpha1-acid glycoprotein (AGP), and at exploring the concentration-response relationship in patients. Methods: A population pharmacokinetic model (NONMEM) including 321 plasma samples from 59 patients was built up and used to derive individual post-hoc Bayesian estimates of drug exposure (AUC; area under curve). Associations between AUC and therapeutic response or tolerability were explored by ordered logistic regression. Influence of the target genotype (i.e. KIT mutation profile) on response was also assessed in GIST patients. Results: A one-compartment model with first-order absorption appropriately described the data, with an average oral clearance of 14.3 L/h (CL) and volume of distribution of 347 L (Vd). A large inter-individual variability remained unexplained, both on CL (36%) and Vd (63%), but AGP levels proved to have a marked impact on total imatinib disposition. Moreover, both total and free AUC correlated with the occurrence and number of side effects (e.g. OR 2.9±0.6 for a 2-fold free AUC increase; p<0.001). Furthermore, in GIST patients, higher free AUC predicted a higher probability of therapeutic response (OR 1.9±0.5; p<0.05), notably in patients with tumor harboring an exon 9 mutation or wild-type KIT, known to decrease tumor sensitivity towards imatinib. Conclusion: The large pharmacokinetic variability, associated to the pharmacokinetic-pharmacodynamic relationship uncovered are arguments to further investigate the usefulness of individualizing imatinib prescription based on TDM. For this type of drug, it should ideally take into consideration either circulating AGP concentrations or free drug levels, as well as KIT genotype for GIST.
Resumo:
Host-pathogen interactions are a major evolutionary force promoting local adaptation. Genes of the major histocompatibility complex (MHC) represent unique candidates to investigate evolutionary processes driving local adaptation to parasite communities. The present study aimed at identifying the relative roles of neutral and adaptive processes driving the evolution of MHC class IIB (MHCIIB) genes in natural populations of European minnows (Phoxinus phoxinus). To this end, we isolated and genotyped exon 2 of two MHCIIB gene duplicates (DAB1 and DAB3) and 1665 amplified fragment length polymorphism (AFLP) markers in nine populations, and characterized local bacterial communities by 16S rDNA barcoding using 454 amplicon sequencing. Both MHCIIB loci exhibited signs of historical balancing selection. Whereas genetic differentiation exceeded that of neutral markers at both loci, the populations' genetic diversities were positively correlated with local pathogen diversities only at DAB3. Overall, our results suggest pathogen-mediated local adaptation in European minnows at both MHCIIB loci. While at DAB1 selection appears to favor different alleles among populations, this is only partially the case in DAB3, which appears to be locally adapted to pathogen communities in terms of genetic diversity. These results provide new insights into the importance of host-pathogen interactions in driving local adaptation in the European minnow, and highlight that the importance of adaptive processes driving MHCIIB gene evolution may differ among duplicates within species, presumably as a consequence of alternative selective regimes or different genomic context.
Resumo:
Mutations in the fibroblast growth factor receptor 2 (FGFR2) cause a variety of craniosynostosis syndromes. The mutational spectrum tends to be narrow with the majority of mutations occurring in either exon IIIa or IIIc or in the intronic sequence preceding exon IIIc. Mutations outside of this hotspot are uncommon and the few identified mutations have demonstrated wide clinical variability, making it difficult to establish a clear-cut genotype-phenotype correlation. To better delineate the clinical picture associated with these unusual mutations, we describe a severely affected patient with Pfeiffer syndrome and a missense mutation in the tyrosine kinase (TK) domain of FGFR2.