Biblioteca Digital

15 resultados para Gene structure

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain

Caracterització estructural i funcional del gen de la miogenia en l’orada (Sparus aurata )

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Report for the scientific sojourn at the University of Maryland Biotechnology Institute from February to August 2007. Myogenesis of skeletal muscles in vertebrates is controlled by extracellular signalling molecules together with intracellular transcription factors. Among the transcriptional factors, the members of the myogenic regulatory family play important roles regulating skeletal muscle development and growth. To characterize the gene structure and expression of fish myogenin, we have isolated the myogenin genomic gene and cDNA from gilthead seabream (Sparus aurata) and analyzed the genomic structure, pattern of expression and the regulation of musclespecific expression. Sequence analysis revealed that the seabream myogenin shares a similar gene structure with other fish myogenins, with three exons, two introns and the highly conserved bHLH domain. Expression studies demonstrated that myogenin is expressed in both slow and fast muscles as well as in muscle cells in primary culture. In situ hybridization showed that myogenin was specifically expressed in developing somites of seabream embryos. Promoter activity analysis demonstrated that the myogenin promoter could drive green fluorescence protein expression in muscle cells of zebrafish embryos, as well as in myofibers of adult zebrafish and juvenile seabream.

SGP-1: prediction and validation of homologous genes based on sequence alignments

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

GeneID in "Drosophila"

Relevância:

60.00% 60.00%

Publicador:

Resumo:

GeneID is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, and start and stop codons are predicted and scored along the sequence using position weight matrices (PWMs). In the second step, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the log-likelihood ratio of a Markov model for coding DNA. In the last step, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. In this paper we describe the obtention of PWMs for sites, and the Markov model of coding DNA in Drosophila melanogaster. We also compare other models of coding DNA with the Markov model. Finally, we present and discuss the results obtained when GeneID is used to predict genes in the Adh region. These results show that the accuracy of GeneID predictions compares currently with that of other existing tools but that GeneID is likely to be more efficient in terms of speed and memory usage.

Assembling genes from predicted exons in linear time with dynamic programming

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.

Metal dealing at the origin of the Chordata phylum: The metallothionein system and metal overload response in Amphioxus.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Non-vertebrate chordates, specifically amphioxus, are considered of the utmost interest for gaining insight into the evolutionary trends, i.e. differentiation and specialization, of gene/protein systems. In this work, MTs (metallothioneins), the most important metal binding proteins, are characterized for the first time in the cephalochordate subphylum at both gene and protein level, together with the main features defining the amphioxus response to cadmium and copper overload. Two MT genes (BfMT1 and BfMT2) have been identified in a contiguous region of the genome, as well as several ARE (antioxidant response element) and MRE (metal response element) located upstream the transcribed region. Their corresponding cDNAs exhibit identical sequence in the two lancelet species (B. floridae and B. lanceolatum), BfMT2 cDNA resulting from an alternative splicing event. BfMT1 is a polyvalent metal binding peptide that coordinates any of the studied metal ions (Zn, Cd or Cu) rendering complexes stable enough to last in physiological environments, which is fully concordant with the constitutive expression of its gene, and therefore, with a metal homeostasis housekeeping role. On the contrary, BfMT2 exhibits a clear ability to coordinate Cd(II) ions, while it is absolutely unable to fold into stable Cu (I) complexes, even as mixed species. This identifies it as an essential detoxification agent, which is consequently only induced in emergency situations. The cephalochordate MTs are not directly related to vertebrate MTs, neither by gene structure, protein similarity nor metal-binding behavior of the encoded peptides. The closest relative is the echinoderm MT, which confirm proposed phylogenetic relationships between these two groups. The current findings support the existence in most organisms of two types of MTs as for their metal binding preferences, devoted to different biological functions: multivalent MTs for housekeeping roles, and specialized MTs that evolve either as Cd-thioneins or Cu-thioneins, according to the ecophysiological needs of each kind of organisms.

Diversity of the bacterial community in the surface soil of a pear orchard based on 16S rRNA gene analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A cultivation-independent approach based on polymerase chain reaction (PCR)-amplified partial small subunit rRNA genes was used to characterize bacterial populations in the surface soil of a commercial pear orchard consisting of different pear cultivars during two consecutive growing seasons. Pyrus communis L. cvs Blanquilla, Conference, and Williams are among the most widely cultivated cultivars in Europe and account for the majority of pear production in Northeastern Spain. To assess the heterogeneity of the community structure in response to environmental variables and tree phenology, bacterial populations were examined using PCR-denaturing gradient gel electrophoresis (DGGE) followed by cluster analysis of the 16S ribosomal DNA profiles by means of the unweighted pair group method with arithmetic means. Similarity analysis of the band patterns failed to identify characteristic fingerprints associated with the pear cultivars. Both environmentally and biologically based principal-component analyses showed that the microbial communities changed significantly throughout the year depending on temperature and, to a lesser extent, on tree phenology and rainfall. Prominent DGGE bands were excised and sequenced to gain insight into the identities of the predominant bacterial populations. Most DGGE band sequences were related to bacterial phyla, such as Bacteroidetes, Cyanobacteria, Acidobacteria, Proteobacteria, Nitrospirae, and Gemmatimonadetes, previously associated with typical agronomic crop environments

Fusion of the human gene for the polyubiquitination co-effector UEV-1 with Kua, a newly identified gene

Relevância:

30.00% 30.00%

Publicador:

Resumo:

UEV proteins are enzymatically inactive variants of the E2 ubiquitin-conjugating enzymes that regulate noncanonical elongation of ubiquitin chains. In Saccharomyces cerevisiae, UEV is part of the RAD6-mediated error-free DNA repair pathway. In mammalian cells, UEV proteins can modulate c-FOS transcription and the G2-M transition of the cell cycle. Here we show that the UEV genes from phylogenetically distant organisms present a remarkable conservation in their exon–intron structure. We also show that the human UEV1 gene is fused with the previously unknown gene Kua. In Caenorhabditis elegans and Drosophila melanogaster, Kua and UEV are in separated loci, and are expressed as independent transcripts and proteins. In humans, Kua and UEV1 are adjacent genes, expressed either as separate transcripts encoding independent Kua and UEV1 proteins, or as a hybrid Kua–UEV transcript, encoding a two-domain protein. Kua proteins represent a novel class of conserved proteins with juxtamembrane histidine-rich motifs. Experiments with epitope-tagged proteins show that UEV1A is a nuclear protein, whereas both Kua and Kua–UEV localize to cytoplasmic structures, indicating that the Kua domain determines the cytoplasmic localization of Kua–UEV. Therefore, the addition of a Kua domain to UEV in the fused Kua–UEV protein confers new biological properties to this regulator of variant polyubiquitination.[Kua cDNAs isolated by RT-PCR and described in this paper have been deposited in the GenBank data library under accession nos. AF1155120 (H. sapiens) and AF152361 (D. melanogaster). Genomic clones containing UEV genes: S. cerevisiae, YGL087c (accession no. Z72609); S. pombe, c338 (accession no. AL023781); P. falciparum, MAL3P2 (accession no. AL034558); A. thaliana, F26F24 (accession no. AC005292); C. elegans, F39B2 (accession no. Z92834); D. melanogaster, AC014908; and H. sapiens, 1185N5 (accession no. AL034423). Accession numbers for Kua cDNAs in GenBank dbEST: M. musculus, AA7853; T. cruzi, AI612534. Other Kua-containing sequences: A. thaliana genomic clones F10M23 (accession no. AL035440), F19K23 (accession no. AC000375), and T20K9 (accession no. AC004786).

An assessment of gene prediction accuracy in large DNA sequences

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.

Signatures of selection in the human olfactory receptor OR5I1 gene

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The human olfactory receptor repertoire is reduced in comparison to other mammalsand to other non-human primates. Nonetheless, this olfactory decline opens an opportunity forevolutionary innovation and improvement. In the present study, we focus on an olfactoryreceptor gene, OR5I1, which had previously been shown to present an excess of amino acidreplacement substitutions between humans and chimpanzees. We analyze the geneticvariation in OR5I1 in a large worldwide human panel and find an excess of derived allelessegregating at relatively high frequencies in all populations. Additional evidence for selectionincludes departures from neutrality in allele frequency spectra tests but no unusually extendedhaplotype structure. Moreover, molecular structural inference suggests that one of thenonsynonymous polymorphisms defining the presumably adaptive protein form of OR5I1may alter the functional binding properties of the olfactory receptor. These results arecompatible with positive selection having modeled the pattern of variation found in the OR5I1gene and with a relatively ancient, mild selective sweep predating the “Out of Africa”expansion of modern humans.

Death and resurrection of the human IRGM gene

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Immunity-related GTPases (IRG) play an important role in defense against intracellular pathogens. One member of this gene family in humans, IRGM, has been recently implicated as a risk factor for Crohn's disease. We analyzed the detailed structure of this gene family among primates and showed that most of the IRG gene cluster was deleted early in primate evolution, after the divergence of the anthropoids from prosimians ( about 50 million years ago). Comparative sequence analysis of New World and Old World monkey species shows that the single-copy IRGM gene became pseudogenized as a result of an Alu retrotransposition event in the anthropoid common ancestor that disrupted the open reading frame (ORF). We find that the ORF was reestablished as a part of a polymorphic stop codon in the common ancestor of humans and great apes. Expression analysis suggests that this change occurred in conjunction with the insertion of an endogenous retrovirus, which altered the transcription initiation, splicing, and expression profile of IRGM. These data argue that the gene became pseudogenized and was then resurrected through a series of complex structural events and suggest remarkable functional plasticity where alleles experience diverse evolutionary pressures over time. Such dynamism in structure and evolution may be critical for a gene family locked in an arms race with an ever-changing repertoire of intracellular parasites.

Application of Multi-SNP Approaches Bayesian LASSO and AUC-RF to Detect Main Effects of Inflammatory-Gene Variants Associated with Bladder Cancer Risk

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The relationship between inflammation and cancer is well established in several tumor types, including bladder cancer. We performed an association study between 886 inflammatory-gene variants and bladder cancer risk in 1,047 cases and 988 controls from the Spanish Bladder Cancer (SBC)/EPICURO Study. A preliminary exploration with the widely used univariate logistic regression approach did not identify any significant SNP after correcting for multiple testing. We further applied two more comprehensive methods to capture the complexity of bladder cancer genetic susceptibility: Bayesian Threshold LASSO (BTL), a regularized regression method, and AUC-Random Forest, a machine-learning algorithm. Both approaches explore the joint effect of markers. BTL analysis identified a signature of 37 SNPs in 34 genes showing an association with bladder cancer. AUC-RF detected an optimal predictive subset of 56 SNPs. 13 SNPs were identified by both methods in the total population. Using resources from the Texas Bladder Cancer study we were able to replicate 30% of the SNPs assessed. The associations between inflammatory SNPs and bladder cancer were reexamined among non-smokers to eliminate the effect of tobacco, one of the strongest and most prevalent environmental risk factor for this tumor. A 9 SNP-signature was detected by BTL. Here we report, for the first time, a set of SNP in inflammatory genes jointly associated with bladder cancer risk. These results highlight the importance of the complex structure of genetic susceptibility associated with cancer risk.

Allelic diversity and population structure in Vibrio cholerae O139 Bengal based on nucleotide sequence analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Comparative analysis of gene fragments of six housekeeping loci, distributed around the two chromosomes of Vibrio cholerae, has been carried out for a collection of 29 V. cholerae O139 Bengal strains isolated from India during the first epidemic period (1992 to 1993). A toxigenic O1 ElTor strain from the seventh pandemic and an environmental non-O1/non-O139 strain were also included in this study. All loci studied were polymorphic, with a small number of polymorphic sites in the sequenced fragments. The genetic diversity determined for our O139 population is concordant with a previous multilocus enzyme electrophoresis study in which we analyzed the same V. cholerae O139 strains. In both studies we have found a higher genetic diversity than reported previously in other molecular studies. The results of the present work showed that O139 strains clustered in several lineages of the dendrogram generated from the matrix of allelic mismatches between the different genotypes, a finding which does not support the hypothesis previously reported that the O139 serogroup is a unique clone. The statistical analysis performed in the V. cholerae O139 isolates suggested a clonal population structure. Moreover, the application of the Sawyer's test and split decomposition to detect intragenic recombination in the sequenced gene fragments did not indicate the existence of recombination in our O139 population.

The Plesiomonas shigelloides wbO1 gene cluster and the role of O1-antigen LPS in pathogenicity

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Plesiomonas shigelloides 302-73 strain (serotype O1) wb gene cluster encodes 15 proteins which are consistent with the chemical structure of the O1-antigen lypopolysaccharide (LPS) previously described for this strain. The P. shigelloides O1-antigen LPS export uses the Wzy-dependent pathway as correspond to heteropolysaccharides structures. By the isolation of two mutants lacking this O1-antigen LPS, we could establish that the presence of the O1-antigen LPS is crucial for to survive in serum mainly to become resistant to complement. Also, it is an important factor in the bacterial adhesion and invasion to some eukaryotic cells, and in the ability to form biofilms. This is the first report on the genetics from a P. shigelloides O-antigen LPS cluster (wb) not shared by Shigella like P. shigelloides O17, the only one reported until now.

Divergent evolution and purifying selection of the flaA gene sequences in Aeromonas

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The bacterial flagellum is the most important organelle of motility in bacteria and plays a key role in many bacterial lifestyles, including virulence. The flagellum also provides a paradigm of how hierarchical gene regulation, intricate protein-protein interactions and controlled protein secretion can result in the assembly of a complex multi-protein structure tightly orchestrated in time and space. As if to stress its importance, plants and animals produce receptors specifically dedicated to the recognition of flagella. Aside from motility, the flagellum also moonlights as an adhesion and has been adapted by humans as a tool for peptide display. Flagellar sequence variation constitutes a marker with widespread potential uses for studies of population genetics and phylogeny of bacterial species. RESULTS: We sequenced the complete flagellin gene (flaA) in 18 different species and subspecies of Aeromonas. Sequences ranged in size from 870 (A. allosaccharophila) to 921 nucleotides (A. popoffii). The multiple alignment displayed 924 sites, 66 of which presented alignment gaps. The phylogenetic tree revealed the existence of two groups of species exhibiting different FlaA flagellins (FlaA1 and FlaA2). Maximum likelihood models of codon substitution were used to analyze flaA sequences. Likelihood ratio tests suggested a low variation in selective pressure among lineages, with an omega ratio of less than 1 indicating the presence of purifying selection in almost all cases. Only one site under potential diversifying selection was identified (isoleucine in position 179). However, 17 amino acid positions were inferred as sites that are likely to be under positive selection using the branch-site model. Ancestral reconstruction revealed that these 17 amino acids were among the amino acid changes detected in the ancestral sequence. CONCLUSION: The models applied to our set of sequences allowed us to determine the possible evolutionary pathway followed by the flaA gene in Aeromonas, suggesting that this gene have probably been evolving independently in the two groups of Aeromonas species since the divergence of a distant common ancestor after one or several episodes of positive selection. REVIEWERS: This article was reviewed by Alexey Kondrashov, John Logsdon and Olivier Tenaillon (nominated by Laurence D Hurst).

Population structure in a highly pelagic seabird, the Cory's shearwater (Calonectris diomedea): an examination of genetics, morphology and ecology

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Increasing evidence suggests oceanic traits may play a key role in the genetic structuring of marine organisms. Whereas genetic breaks in the open ocean are well known in fishes and marine invertebrates, the importance of marine habitat characteristics in seabirds remains less certain. We investigated the role of oceanic transitions versus population genetic processes in driving population differentiation in a highly vagile seabird, the Cory"s shearwater, combining molecular, morphological and ecological data from 27 breeding colonies distributed across the Mediterranean (Calonectris diomedea diomedea) and the Atlantic (C. d. borealis). Genetic and biometric analyses showed a clear differentiation between Atlantic and Mediterranean Cory"s shearwaters. Ringing-recovery data indicated high site fidelity of the species, but we found some cases of dispersal among neighbouring breeding sites (<300 km) and a few long distance movements (>1000 km) within and between each basin. In agreement with this, comparison of phenotypic and genetic data revealed both current and historical dispersal events. Within each region, we did not detect any genetic substructure among archipelagos in the Atlantic, but we found a slight genetic differentiation between western and eastern breeding colonies in the Mediterranean. Accordingly, gene flow estimates suggested substantial dispersal among colonies within basins. Overall, genetic structure of the Cory"s shearwater matches main oceanographic breaks (Almería-Oran Oceanic Front and Siculo-Tunisian Strait), but spatial analyses suggest that patterns of genetic differentiation are better explained by geographic rather than oceanographic distances. In line with previous studies, genetic, phenotypic and ecological evidence supported the separation of Atlantic and Mediterranean forms, suggesting the 2 taxa should be regarded as different species.