Biblioteca Digital

974 resultados para coding

Point Mutations in the Monocarboxylate Transporter SLC16A12 Lead to Juvenile and Age-Related Cataract

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Purpose: Previously we reported on a premature termination mutation in SLC16A12 that leads to dominant juvenile cataract and renal glucosuria. To assess the mutation rate and genotype-phenotype correlations of SLC16A12 in juvenile or age-related forms of cataract, we performed a mutation screen in cataract patients. Methods: Clinical data of approximately 660 patients were collected, genomic DNA was isolated and analyzed. Exons 3 to 8 including flanking intron sequences of SLC16A12 were PCR amplified and DNA sequence was determined. Selected mutations were tested by cell culture assays, in silico analysis and RT-PCR. Results: We found sequence alterations at a rate of approximately 1/75 patients. None of them was found in 360 control alleles. Alterations affect splice site and regulatory region but most mutations caused an amino acid substitution. The majority of the coding region mutations maps to trans-membrane domains. One mutation located to the 5'UTR. It affects translational efficiency of SLC16A12. In addition, we identified a cataract-predisposing SNP in the non-coding region that causes allele-specific splicing of the 5'UTR region. Conclusions: Altered translational efficiency of the solute carrier SLC16A12 and its allele-specific splicing strongly support a model of challenged homeostasis to cause various forms of cataract. In addition, the pathogenic property of the here reported sequence alterations is supported by the lack of known sequence variations within the coding region of SLC16A12. Due to the relatively high mutation rate, we suggest to include SLC16A12 in diagnostic cataract screening. Generally, our data recommend the assessment of regulatory sequences for diagnostic purposes.

Physiological function of PARbZip circadian clock-controlled transcription factors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

PARbZip proteins (proline and acidic amino acid-rich basic leucine zipper) represent a subfamily of circadian transcription factors belonging to the bZip family. They are transcriptionally controlled by the circadian molecular oscillator and are suspected to accomplish output functions of the clock. In turn, PARbZip proteins control expression of genes coding for enzymes involved in metabolism, but also expression of transcription factors which control the expression of these enzymes. For example, these transcription factors control vitamin B6 metabolism, which influences neurotransmitter homeostasis in the brain, and loss of PARbZip function leads to spontaneous and sound-induced epilepsy that are frequently lethal. In liver, kidney, and small intestine, PAR bZip transcription factors regulate phase I, II, and III detoxifying enzymes in addition to the constitutive androstane receptor (CAR), one of the principal sensors of xenobiotics. Indeed, knockout mice for the three PARbZip transcription factors are deficient in xenobiotic detoxification and display high morbidity, high mortality, and accelerated aging. Finally, less than 20% of these animals reach an age of 1 year. Accumulated evidences suggest that PARbZip transcription factors play a role of relay, coupling circadian metabolism of xenobiotic and probably endobiotic substances to the core clock circuitry of local circadian oscillators.

Stick insect genomes reveal natural selection's role in parallel speciation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Natural selection can drive the repeated evolution of reproductive isolation, but the genomic basis of parallel speciation remains poorly understood. We analyzed whole-genome divergence between replicate pairs of stick insect populations that are adapted to different host plants and undergoing parallel speciation. We found thousands of modest-sized genomic regions of accentuated divergence between populations, most of which are unique to individual population pairs. We also detected parallel genomic divergence across population pairs involving an excess of coding genes with specific molecular functions. Regions of parallel genomic divergence in nature exhibited exceptional allele frequency changes between hosts in a field transplant experiment. The results advance understanding of biological diversification by providing convergent observational and experimental evidence for selection's role in driving repeatable genomic divergence.

The transcriptome of the arbuscular mycorrhizal fungus Glomus intraradices (DAOM 197198) reveals functional tradeoffs in an obligate symbiont.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

? The arbuscular mycorrhizal symbiosis is arguably the most ecologically important eukaryotic symbiosis, yet it is poorly understood at the molecular level. To provide novel insights into the molecular basis of symbiosis-associated traits, we report the first genome-wide analysis of the transcriptome from Glomus intraradices DAOM 197198. ? We generated a set of 25,906 nonredundant virtual transcripts (NRVTs) transcribed in germinated spores, extraradical mycelium and symbiotic roots using Sanger and 454 sequencing. NRVTs were used to construct an oligoarray for investigating gene expression. ? We identified transcripts coding for the meiotic recombination machinery, as well as meiosis-specific proteins, suggesting that the lack of a known sexual cycle in G. intraradices is not a result of major deletions of genes essential for sexual reproduction and meiosis. Induced expression of genes encoding membrane transporters and small secreted proteins in intraradical mycelium, together with the lack of expression of hydrolytic enzymes acting on plant cell wall polysaccharides, are all features of G. intraradices that are shared with ectomycorrhizal symbionts and obligate biotrophic pathogens. ? Our results illuminate the genetic basis of symbiosis-related traits of the most ancient lineage of plant biotrophs, advancing future research on these agriculturally and ecologically important symbionts.

Functional analysis of a conserved DNA methyltransferase in caulobacter crescentus

Relevância:

10.00% 10.00%

Publicador:

Resumo:

CcrM is a DNA methyltransferase that methylates the adenine in GANTC motifs in the chromo-some of the bacterial model Caulobacter crescentus. The loss of the CcrM homolog is lethal in C. crescentus and in several other species of Alphaproteobacteria. In this research, we used different experimental and bioinformatic approaches to determine why CcrM is so critical to the physiology of C. crescentus. We first showed that CcrM is a resident orphan DNA methyltransferase in non-Rickettsiales Alphaproteobacteria and that its gene is strictly conserved in this clade (with only one ex¬ception among the genomes sequenced so far). In C. crescentus, cells depleted in CcrM in rich medium quickly lose viability and present an elongated phenotype characteristic of an im¬pairment in cell division. Using minimal medium instead of rich medium as selective and main¬tenance substrate, we could generate a AccrM mutant that presents a viability comparable to the wild type strain and only mild morphological defects. On the basis of a transcriptomic ap¬proach, we determined that several genes essential for cell division were downregulated in the AccrM strain in minimal medium. We offered decisive arguments to support that the efficient transcription of two of these genes, ftsZ and mipZ, coding respectively for the Z-ring forming GTPase FtsZ and an inhibitor of FtsZ polymerization needed for the correct positioning of the Z- ring at mid-cell, requires the methylation of an adenine in a conserved GANTC motif located in their core promoter region. We propose a model, according to which the genome of C. crescentus encodes a transcriptional activator that requires a methylated adenine in a GANTC context to bind to DNA and suggest that this transcriptional regulator might be the global cell-cycle regulator GcrA. In addition, combining a classic genetic approach and in vitro evolution experiments, we showed that the mortality and cell division defects of the AccrM strain in rich medium are mainly due to limiting intracellular levels of the FtsZ protein. We also studied the dynamics of GANTC methylation in C. crescentus using the SMRT technol¬ogy developed by Pacific Biosciences. Our findings support the commonly accepted model, accord¬ing to which the methylation state of GANTC motifs varies during the cell cycle of C. crescentus: before the initiation of DNA replication, the GANTC motifs are fully-methylated (methylated on both strands); when the DNA gets replicated, the GANTC motifs become hemi-methylated (methyl¬ated on one strand only) and this occurs at different times during replication for different loci along the chromosome depending on their position relative to the origin of replication; the GANTC mo¬tifs are only remethylated after DNA replication has finished as a consequence of the massive and short-lived expression of CcrM in predivisional cells. About 30 GANTC motifs in the C. crescentus chromosome were found to be undermethylated in most of the bacterial population; these might be protected from CcrM activity by DNA binding proteins and some of them could be involved in methylation-based bistable transcriptional switches. - CcrM est une ADN méthyltransférase qui méthyle les adénines dans le contexte GANTC dans le génome de la bactérie modèle Caulobacter crescentus. La perte de l'homologue de CcrM chez C. crescentus et chez plusieurs autres espèces d'Alphaproteobactéries est létale. Dans le courant de cette recherche, nous tentons de déterminer pourquoi la protéine CcrM est cruciale pour la survie de C. crescentus. Nous démontrons d'abord que CcrM est une adénine méthyltransférase orpheline résidente, dont le gène fait partie du génome minimal partagé par les Alphaprotéobactéries non-Rickettsiales (à une exception près). Lorsqu'une souche de C. crescentus est privée de CcrM, sa viabilité décroît rapi¬dement et ses cellules présentent une morphologie allongée qui suggère que la division cellulaire est inhibée. Nous sommes parvenus à créer une souche AccrM en utilisant un milieu minimum, au lieu du milieu riche classiquement employé, comme milieu de sélection et de maintenance pour la souche. Lorsque nous avons étudié le transcriptome de cette souche de C. crescentus privée de CcrM, nous avons pu constater que plusieurs gènes essentiels pour le bon déroulement de la division cellulaire bactérienne étaient réprimés. En particulier, l'expression adéquate des gènes ftsZ et mipZ - qui codent, respectivement, pour FtsZ, la protéine qui constitue, au milieu de la cellule, un anneau protéique qui initie le processus de division et pour MipZ, un inhibiteur de la polymérisation de FtsZ qui est indispensable pour le bon positionnement de l'anneau FtsZ - est dépendante de la présence d'une adénine méthylée dans un motif GANTC conservé situé dans leur région promotrice. Nous présentons un modèle selon lequel le génome de C. crescentus code pour un facteur de transcription qui exige la présence d'une adénine méthylée dans un contexte GANTC pour s'attacher à l'ADN et nous suggérons qu'il pourrait s'agir du régulateur global du cycle cellulaire GcrA. En outre, nous montrons, en combinant la génétique classique et une approche basée sur l'évolution expérimentale, que la mortalité et l'inhibition de la division cellulaire caractéristiques de la souche àccrMeη milieu riche sont dues à des niveaux excessivement bas de protéine FtsZ. Nous avons aussi étudié la dynamique de la méthylation du chromosome de C. crescentus sur la base de la technologie SMRT développée par Pacific Biosciences. Nous confirmons le modèle communément accepté, qui affirme que l'état de méthylation des motifs GANTC change durant le cycle cellulaire de C. crescentus: les motifs GANTC sont complètement méthylés (méthylés sur les deux brins) avant de début de la réplication de l'ADN; ils deviennent hémi-méthylés (méthylés sur un brin seulement) une fois répliqués, ce qui arrive à différents moments durant la réplication pour différents sites le long du chromosome en fonction de leur position par rapport à l'origine de répli-cation; finalement, les motifs GANTC sont reméthylés après la fin de la réplication du chromosome lorsque la protéine CcrM est massivement, mais très transitoirement, produite. Par ailleurs, nous identifions dans le chromosome de C. crescentus environ 30 motifs GANTC qui restent en perma-nence non-méthylés dans une grande partie de la population bactérienne; ces motifs sont probable-ment protégés de l'action de CcrM par des protéines qui s'attachent à l'ADN et certains d'entre eux pourraient être impliqués dans des mécanismes de régulation générant une transcription bistable.

Genomic organization and gene expression in a chromosomal region of Leishmania major.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Little is known about the relation between the genome organization and gene expression in Leishmania. Bioinformatic analysis can be used to predict genes and find homologies with known proteins. A model was proposed, in which genes are organized into large clusters and transcribed from only one strand, in the form of large polycistronic primary transcripts. To verify the validity of this model, we studied gene expression at the transcriptional, post-transcriptional and translational levels in a unique locus of 34kb located on chr27 and represented by cosmid L979. Sequence analysis revealed 115 ORFs on either DNA strand. Using computer programs developed for Leishmania genes, only nine of these ORFs, localized on the same strand, were predicted to code for proteins, some of which show homologies with known proteins. Additionally, one pseudogene, was identified. We verified the biological relevance of these predictions. mRNAs from nine predicted genes and proteins from seven were detected. Nuclear run-on analyses confirmed that the top strand is transcribed by RNA polymerase II and suggested that there is no polymerase entry site. Low levels of transcription were detected in regions of the bottom strand and stable transcripts were identified for four ORFs on this strand not predicted to be protein-coding. In conclusion, the transcriptional organization of the Leishmania genome is complex, raising the possibility that computer predictions may not be comprehensive.

PRPF31 alternative splicing and expression in human retina

Relevância:

10.00% 10.00%

Publicador:

Resumo:

PURPOSE: To provide a mechanistic link between mutations in PRPF31, and essential and ubiquitously expressed gene, and retinitis pigmentosa, a disorder restricted to the eye. METHODS: We investigated the existence of retina-specific PRPF31 isoforms and the expression of this gene in human retina and other tissues, as well as in cultured human cell lines. PRPF31 transcripts were examined by RT-PCR, quantitative PCR, cloning and sequencing. RESULTS: Database searching revealed the presence of a retina-specific PRPF31 isoform in mouse. However, this isoform could not be experimentally identified in transcripts from human retina or from a human whole eye. Nevertheless, four different PRPF31 isoforms, that were common to all analyzed tissues and cell lines, were isolated. Three of these harbored the full-length PRPF31 coding sequence, whereas the fourth was very short and probably non-coding. The amount of PRPF31 mRNA was previously found to be lower in patients with mutations in this gene than in healthy individuals, making it likely that retinal cells are more sensitive to variation in PRPF31 expression. However, quantitative PCR experiments revealed that PRPF31 mRNA levels in human retina were comparable to those detected in other tissues. CONCLUSIONS: Our results show that the retina-restricted phenotype caused by PRPF31 mutations cannot be explained by the presence of tissue-specific isoforms, or by differential expression of PRPF31 in the retina. As a consequence, the etiology of PRPF31-associated retinitis pigmentosa likely relies on other, probably more subtle molecular mechanisms.

Global GacA-steered control of cyanide and exoprotease production in Pseudomonas fluorescens involves specific ribosome binding sites.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The conserved two-component regulatory system GacS/GacA determines the expression of extracellular products and virulence factors in a variety of Gram-negative bacteria. In the biocontrol strain CHA0 of Pseudomonas fluorescens, the response regulator GacA is essential for the synthesis of extracellular protease (AprA) and secondary metabolites including hydrogen cyanide. GacA was found to exert its control on the hydrogen cyanide biosynthetic genes (hcnABC) and on the aprA gene indirectly via a posttranscriptional mechanism. Expression of a translational hcnA'-'lacZ fusion was GacA-dependent whereas a transcriptional hcnA-lacZ fusion was not. A distinct recognition site overlapping with the ribosome binding site appears to be primordial for GacA-steered regulation. GacA-dependence could be conferred to the Escherichia coli lacZ mRNA by a 3-bp substitution in the ribosome binding site. The gene coding for the global translational repressor RsmA of P. fluorescens was cloned. RsmA overexpression mimicked partial loss of GacA function and involved the same recognition site, suggesting that RsmA is a downstream regulatory element of the GacA control cascade. Mutational inactivation of the chromosomal rsmA gene partially suppressed a gacS defect. Thus, a central, GacA-dependent switch from primary to secondary metabolism may operate at the level of translation.

Molecular changes underlying mammalian phenotypic innovation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mammals are characterized by specific phenotypic traits that include lactation, hair, and relatively large brains with unique structures. Individual mammalian lineages have, in turn, evolved characteristic traits that distinguish them from others. These include obvious anatom¬ical differences but also differences related to reproduction, life span, cognitive abilities, be¬havior. and disease susceptibility. However, the molecular basis of the diverse mammalian phenotypes and the selective pressures that shaped their evolution remain largely unknown. In the first part of my thesis, I analyzed the genetic factors associated with the origin of a unique mammalian phenotype lactation and I studied the selective pressures that forged the transition from oviparity to viviparity. Using a comparative genomics approach and evolutionary simulations, I showed that the emergence of lactation, as well as the appear¬ance of the casein gene family, significantly reduced selective pressure on the major egg-yolk proteins (the vitellogenin family). This led to a progressive loss of vitellogenins, which - in oviparous species - act as storage proteins for lipids, amino acids, phosphorous and calcium in the isolated egg. The passage to internal fertilization and placentation in therian mam¬mals rendered vitellogenins completely dispensable, which ended in the loss of the whole gene family in this lineage. As illustrated by the vitellogenin study, changes in gene content are one possible underlying factor for the evolution of mammalian-specific phenotypes. However, more subtle genomic changes, such as mutations in protein-coding sequences, can also greatly affect the phenotypes. In particular, it was proposed that changes at the level of gene reg¬ulation could underlie many (or even most) phenotypic differences between species. In the second part of my thesis, I participated in a major comparative study of mammalian tissue transcriptomes, with the goal of understanding how evolutionary forces affected expression patterns in the past 200 million years of mammalian evolution. I showed that, while com¬parisons of gene expressions are in agreement with the known species phylogeny, the rate of expression evolution varies greatly among lineages. Species with low effective population size, such as monotremes and hominoids, showed significantly accelerated rates of gene expression evolution. The most likely explanation for the high rate of gene expression evolution in these lineages is the accumulation of mildly deleterious mutations in regulatory regions, due to the low efficiency of purifying selection. Thus, our observations are in agreement with the nearly neutral theory of molecular evolution. I also describe substantial differences in evolutionary rates between tissues, with brain being the most constrained (especially in primates) and testis significantly accelerated. The rate of gene expression evolution also varies significantly between chromosomes. In particular, I observed an acceleration of gene expression changes on the X chromosome, probably as a result of adaptive processes associated with the origin of therian sex chromosomes. Lastly, I identified several individual genes as well as co-regulated expression modules that have undergone lineage specific expression changes and likely under¬lie various phenotypic innovations in mammals. The methods developed during my thesis, as well as the comprehensive gene content analyses and transcriptomics datasets made available by our group, will likely prove to be useful for further exploratory analyses of the diverse mammalian phenotypes.

SGP-1: prediction and validation of homologous genes based on sequence alignments

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are “genomic fossils” valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome’s structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

Improving gene annotation using peptide mass spectrometry

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Annotation of protein-coding genes is a key goal of genome sequencing projects. In spite of tremendous recent advances in computational gene finding, comprehensive annotation remains a challenge. Peptide mass spectrometry is a powerful tool for researching the dynamic proteome and suggests an attractive approach to discover and validate protein-coding genes. We present algorithms to construct and efficiently search spectra against a genomic database, with no prior knowledge of encoded proteins. By searching a corpus of 18.5 million tandem mass spectra (MS/MS) from human proteomic samples, we validate 39,000 exons and 11,000 introns at the level of translation. We present translation-level evidence for novel or extended exons in 16 genes, confirm translation of 224 hypothetical proteins, and discover or confirm over 40 alternative splicing events. Polymorphisms are efficiently encoded in our database, allowing us to observe variant alleles for 308 coding SNPs. Finally, we demonstrate the use of mass spectrometry to improve automated gene prediction, adding 800 correct exons to our predictions using a simple rescoring strategy. Our results demonstrate that proteomic profiling should play a role in any genome sequencing project.

Prominent use of distal 5’ transcription start sites and discovery of a large number of additional exons in ENCODE regions

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.

GeneID in "Drosophila"

Relevância:

10.00% 10.00%

Publicador:

Resumo:

GeneID is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, and start and stop codons are predicted and scored along the sequence using position weight matrices (PWMs). In the second step, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the log-likelihood ratio of a Markov model for coding DNA. In the last step, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. In this paper we describe the obtention of PWMs for sites, and the Markov model of coding DNA in Drosophila melanogaster. We also compare other models of coding DNA with the Markov model. Finally, we present and discuss the results obtained when GeneID is used to predict genes in the Adh region. These results show that the accuracy of GeneID predictions compares currently with that of other existing tools but that GeneID is likely to be more efficient in terms of speed and memory usage.

Assembling genes from predicted exons in linear time with dynamic programming

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.

«
1
2
...
49
50
51
52
53
54
55
...
64
65
»