954 resultados para Computational biology and bioinformatics


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gene duplication leads to paralogy, which complicates the de novo assembly of genotyping-by-sequencing (GBS) data. The issue of paralogous genes is exacerbated in plants, because they are particularly prone to gene duplication events. Paralogs are normally filtered from GBS data before undertaking population genomics or phylogenetic analyses. However, gene duplication plays an important role in the functional diversification of genes and it can also lead to the formation of postzygotic barriers. Using populations and closely related species of a tropical mountain shrub, we examine 1) the genomic differentiation produced by putative orthologs, and 2) the distribution of recent gene duplication among lineages and geography. We find high differentiation among populations from isolated mountain peaks and species-level differentiation within what is morphologically described as a single species. The inferred distribution of paralogs among populations is congruent with taxonomy and shows that GBS could be used to examine recent gene duplication as a source of genomic differentiation of nonmodel species.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Knowledge of the genetic structure of plant populations is necessary for the understanding of the dynamics of major ecological processes. It also has applications in conservation biology and risk assessment for genetically modified crops. This paper reports the genetic structure of a linear population of sea beet, Beta vulgaris ssp. maritima (the wild relative of sugar beet), on Furzey Island, Poole Harbour. The relative spatial positions of the plants were accurately mapped and the plants were scored for variation at isozyme and RFLP loci. Structure was analysed by repeated subdivision of the population to find the average size of a randomly mating group. Estimates of F-ST between randomly mating units were then made, and gave patterns consistent with the structure of the population being determined largely by founder effects. The implications of these results for the monitoring of transgene spread in wild sea beet populations are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gene duplication and neofunctionalization are known to be important processes in the evolution of phenotypic complexity. They account for important evolutionary novelties that confer ecological adaptation, such as the major histocompatibility complex (MHC), a multigene family crucial to the vertebrate immune system. In birds, two MHC class II β (MHCIIβ) exon 3 lineages have been recently characterized, and two hypotheses for the evolutionary history of MHCIIβ lineages were proposed. These lineages could have arisen either by 1) an ancient duplication and subsequent divergence of one paralog or by 2) recent parallel duplications followed by functional convergence. Here, we compiled a data set consisting of 63 MHCIIβ exon 3 sequences from six avian orders to distinguish between these hypotheses and to understand the role of selection in the divergent evolution of the two avian MHCIIβ lineages. Based on phylogenetic reconstructions and simulations, we show that a unique duplication event preceding the major avian radiations gave rise to two ancestral MHCIIβ lineages that were each likely lost once later during avian evolution. Maximum likelihood estimation shows that following the ancestral duplication, positive selection drove a radical shift from basic to acidic amino acid composition of a protein domain facing the α-chain in the MHCII α β-heterodimer. Structural analyses of the MHCII α β-heterodimer highlight that three of these residues are potentially involved in direct interactions with the α-chain, suggesting that the shift following duplication may have been accompanied by coevolution of the interacting α- and β-chains. These results provide new insights into the long-term evolutionary relationships among avian MHC genes and open interesting perspectives for comparative and population genomic studies of avian MHC evolution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Owing to its special mode of evolution and central role in the adaptive immune system, the major histocompatibility complex (MHC) has become the focus of diverse disciplines such as immunology, evolutionary ecology, and molecular evolution. MHC evolution has been studied extensively in diverse vertebrate lineages over the last few decades, and it has been suggested that birds differ from the established mammalian norm. Mammalian MHC genes evolve independently, and duplication history (i.e., orthology) can usually be traced back within lineages. In birds, this has been observed in only 3 pairs of closely related species. Here we report strong evidence for the persistence of orthology of MHC genes throughout an entire avian order. Phylogenetic reconstructions of MHC class II B genes in 14 species of owls trace back orthology over tens of thousands of years in exon 3. Moreover, exon 2 sequences from several species show closer relationships than sequences within species, resembling transspecies evolution typically observed in mammals. Thus, although previous studies suggested that long-term evolutionary dynamics of the avian MHC was characterized by high rates of concerted evolution, resulting in rapid masking of orthology, our results question the generality of this conclusion. The owl MHC thus opens new perspectives for a more comprehensive understanding of avian MHC evolution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cancer/Testis (CT) genes, normally expressed in germ line cells but also activated in a wide range of cancer types, often encode antigens that are immunogenic in cancer patients, and present potential for use as biomarkers and targets for immunotherapy. Using multiple in silico gene expression analysis technologies, including twice the number of expressed sequence tags used in previous studies, we have performed a comprehensive genome-wide survey of expression for a set of 153 previously described CT genes in normal and cancer expression libraries. We find that although they are generally highly expressed in testis, these genes exhibit heterogeneous gene expression profiles, allowing their classification into testis-restricted (39), testis/brain-restricted (14), and a testis-selective (85) group of genes that show additional expression in somatic tissues. The chromosomal distribution of these genes confirmed the previously observed dominance of X chromosome location, with CT-X genes being significantly more testis-restricted than non-X CT. Applying this core classification in a genome-wide survey we identified >30 CT candidate genes; 3 of them, PEPP-2, OTOA, and AKAP4, were confirmed as testis-restricted or testis-selective using RT-PCR, with variable expression frequencies observed in a panel of cancer cell lines. Our classification provides an objective ranking for potential CT genes, which is useful in guiding further identification and characterization of these potentially important diagnostic and therapeutic targets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The human chromosome 8p23.1 region contains a 3.8–4.5 Mb segment which can be found in different orientations (defined as genomic inversion) among individuals. The identification of single nucleotide polymorphisms (SNPs) tightly linked to the genomic orientation of a given region should be useful to indirectly evaluate the genotypes of large genomic orientations in the individuals. Results: We have identified 16 SNPs, which are in linkage disequilibrium (LD) with the 8p23.1 inversion as detected by fluorescent in situ hybridization (FISH). The variability of the 8p23.1 orientation in 150 HapMap samples was predicted using this set of SNPs and was verified by FISH in a subset of samples. Four genes (NEIL2, MSRA, CTSB and BLK) were found differentially expressed (p<0.0005) according to the orientation of the 8p23.1 region. Finally, we have found variable levels of mosaicism for the orientation of the 8p23.1 as determined by FISH. Conclusion: By means of dense SNP genotyping of the region, haplotype-based computational analyses and FISH experiments we could infer and verify the orientation status of alleles in the 8p23.1 region by detecting two short haplotype stretches at both ends of the inverted region, which are likely the relic of the chromosome in which the original inversion occurred. Moreover, an impact of 8p23.1 inversion on gene expression levels cannot be ruled out, since four genes from this region have statistically significant different expression levels depending on the inversion status. FISH results in lymphoblastoid cell lines suggest the presence of mosaicism regarding the 8p23.1 inversion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are “genomic fossils” valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome’s structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT–PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intuitively, music has both predictable and unpredictable components. In this work we assess this qualitative statement in a quantitative way using common time series models fitted to state-of-the-art music descriptors. These descriptors cover different musical facets and are extracted from a large collection of real audio recordings comprising a variety of musical genres. Our findings show that music descriptor time series exhibit a certain predictability not only for short time intervals, but also for mid-term and relatively long intervals. This fact is observed independently of the descriptor, musical facet and time series model we consider. Moreover, we show that our findings are not only of theoretical relevance but can also have practical impact. To this end we demonstrate that music predictability at relatively long time intervals can be exploited in a real-world application, namely the automatic identification of cover songs (i.e. different renditions or versions of the same musical piece). Importantly, this prediction strategy yields a parameter-free approach for cover song identification that is substantially faster, allows for reduced computational storage and still maintains highly competitive accuracies when compared to state-of-the-art systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Retroposed genes (retrogenes) originate via the reverse transcription of mature messenger RNAs from parental source genes and are therefore usually devoid of introns. Here, we characterize a particular set of mammalian retrogenes that acquired introns upon their emergence and thus represent rare cases of intron gain in mammals. We find that although a few retrogenes evolved introns in their coding or 3' untranslated regions (untranslated region, UTR), most introns originated together with untranslated exons in the 5' flanking regions of the retrogene insertion site. They emerged either de novo or through fusions with 5' UTR exons of host genes into which the retrogenes inserted. Generally, retrogenes with introns display high transcription levels and show broader spatial expression patterns than other retrogenes. Our experimental expression analyses of individual intron-containing retrogenes show that 5' UTR introns may indeed promote higher expression levels, at least in part through encoded regulatory elements. By contrast, 3' UTR introns may lead to downregulation of expression levels via nonsense-mediated decay mechanisms. Notably, the majority of retrogenes with introns in their 5' flanks depend on distant, sometimes bidirectional CpG dinucleotide-enriched promoters for their expression that may be recruited from other genes in the genomic vicinity. We thus propose a scenario where the acquisition of new 5' exon-intron structures was directly linked to the recruitment of distant promoters by these retrogenes, a process potentially facilitated by the presence of proto-splice sites in the genomic vicinity of retrogene insertion sites. Thus, the primary role and selective benefit of new 5' introns (and UTR exons) was probably initially to span the often substantial distances to potent CpG promoters driving retrogene transcription. Later in evolution, these introns then obtained additional regulatory roles in fine tuning retrogene expression levels. Our study provides novel insights regarding mechanisms underlying the origin of new introns, the evolutionary relevance of intron gain, and the origin of new gene promoters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Arthroderma benhamiae is a zoophilic dermatophyte belonging to the Trichophyton mentagrophytes species complex. Here, a population of A. benhamiae wild strains from the same geographical area (Switzerland) was studied by comparing their morphology, assessing their molecular variability using internal transcribed spacer (ITS) and 28S rRNA gene sequencing, and evaluating their interfertility. Sequencing of the ITS region and of part of the 28S rRNA gene revealed the existence of two infraspecific groups with markedly different colony phenotypes: white (group I) and yellow (group II), respectively. For all strains, the results of mating type identification by PCR, using HMG (high-mobility group) and α-box genes in the mating type locus as targets, were in total accordance with the results of mating type identification by strain confrontation experiments. White-phenotype strains were of mating type + (mt+) or mating type - (mt-), whilst yellow-phenotype strains were all mt-. White and yellow strains were found to produce fertile cleistothecia after mating with A. benhamiae reference tester strains, which belonged to a third group intermediate between groups I and II. However, no interfertility was observed between yellow strains and white strains of mt+. A significant result was that white strains of mt- were able to mate and produce fertile cleistothecia with the white A. benhamiae strain CBS 112371 (mt+), the genome of which has recently been sequenced and annotated. This finding should offer new tools for investigating the biology and genetics of dermatophytes using wild-type strains.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.