948 resultados para local sequence alignment problem
Resumo:
Desde el inicio del proyecto del genoma humano y su éxito en el año 2001 se han secuenciado genomas de multitud de especies. La mejora en las tecnologías de secuenciación ha generado volúmenes de datos con un crecimiento exponencial. El proyecto Análisis bioinformáticos sobre la tecnología Hadoop abarca la computación paralela de datos biológicos como son las secuencias de ADN. El estudio ha sido encauzado por la naturaleza del problema a resolver. El alineamiento de secuencias genéticas con el paradigma MapReduce.
Resumo:
Concentration gradients regulate many cell biological and developmental processes. In rod-shaped fission yeast cells, polar cortical gradients of the DYRK family kinase Pom1 couple cell length with mitotic commitment by inhibiting a mitotic inducer positioned at midcell. However, how Pom1 gradients are established is unknown. Here, we show that Tea4, which is normally deposited at cell tips by microtubules, is both necessary and, upon ectopic cortical localization, sufficient to recruit Pom1 to the cell cortex. Pom1 then moves laterally at the plasma membrane, which it binds through a basic region exhibiting direct lipid interaction. Pom1 autophosphorylates in this region to lower lipid affinity and promote membrane release. Tea4 triggers Pom1 plasma membrane association by promoting its dephosphorylation through the protein phosphatase 1 Dis2. We propose that local dephosphorylation induces Pom1 membrane association and nucleates a gradient shaped by the opposing actions of lateral diffusion and autophosphorylation-dependent membrane detachment.
Resumo:
Death receptors, such as Fas and tumor necrosis factor-related apoptosis-inducing ligand receptors, recruit Fas-associated death domain and pro-caspase-8 homodimers, which are then autoproteolytically activated. Active caspase-8 is released into the cytoplasm, where it cleaves various proteins including pro-caspase-3, resulting in apoptosis. The cellular Fas-associated death domain-like interleukin-1-beta-converting enzyme-inhibitory protein long form (FLIP(L)), a structural homologue of caspase-8 lacking caspase activity because of several mutations in the active site, is a potent inhibitor of death receptor-induced apoptosis. FLIP(L) is proposed to block caspase-8 activity by forming a proteolytically inactive heterodimer with caspase-8. In contrast, we propose that FLIP(L)-bound caspase-8 is an active protease. Upon heterocomplex formation, a limited caspase-8 autoprocessing occurs resulting in the generation of the p43/41 and the p12 subunits. This partially processed form but also the non-cleaved FLIP(L)-caspase-8 heterocomplex are proteolytically active because they both bind synthetic substrates efficiently. Moreover, FLIP(L) expression favors receptor-interacting kinase (RIP) processing within the Fas-signaling complex. We propose that FLIP(L) inhibits caspase-8 release-dependent pro-apoptotic signals, whereas the single, membrane-restricted active site of the FLIP(L)-caspase-8 heterocomplex is proteolytically active and acts on local substrates such as RIP.
Resumo:
Can rules be used to shield public resources from political interference? The Brazilian constitution and national tax code stipulate that revenue sharing transfers to municipal governments be determined by the size of counties in terms of estimated population. In this paper I document that the population estimates which went into the transfer allocation formula for the year 1991 were manipulated, resulting in significant transfer differentials over the entire 1990's. I test whether conditional on county characteristics that might account for the manipulation, center-local party alignment, party popularity and the extent of interparty fragmentation at the county level are correlated with estimated populations in 1991. Results suggest that revenue sharing transfers were targeted at right-wing national deputies in electorally fragmented counties as well as aligned local executives.
Resumo:
The sequence profile method (Gribskov M, McLachlan AD, Eisenberg D, 1987, Proc Natl Acad Sci USA 84:4355-4358) is a powerful tool to detect distant relationships between amino acid sequences. A profile is a table of position-specific scores and gap penalties, providing a generalized description of a protein motif, which can be used for sequence alignments and database searches instead of an individual sequence. A sequence profile is derived from a multiple sequence alignment. We have found 2 ways to improve the sensitivity of sequence profiles: (1) Sequence weights: Usage of individual weights for each sequence avoids bias toward closely related sequences. These weights are automatically assigned based on the distance of the sequences using a published procedure (Sibbald PR, Argos P, 1990, J Mol Biol 216:813-818). (2) Amino acid substitution table: In addition to the alignment, the construction of a profile also needs an amino acid substitution table. We have found that in some cases a new table, the BLOSUM45 table (Henikoff S, Henikoff JG, 1992, Proc Natl Acad Sci USA 89:10915-10919), is more sensitive than the original Dayhoff table or the modified Dayhoff table used in the current implementation. Profiles derived by the improved method are more sensitive and selective in a number of cases where previous methods have failed to completely separate true members from false positives.
Resumo:
The amino acid sequence of mouse brain beta spectrin (beta fodrin), deduced from the nucleotide sequence of complementary DNA clones, reveals that this non-erythroid beta spectrin comprises 2363 residues, with a molecular weight of 274,449 Da. Brain beta spectrin contains three structural domains and we suggest the position of several functional domains including f-actin, synapsin I, ankyrin and spectrin self association sites. Analysis of deduced amino acid sequences indicated striking homology and similar structural characteristics of brain beta spectrin repeats beta 11 and beta 12 to globins. In vitro analysis has demonstrated that heme is capable of specific attachment to brain spectrin, suggesting possible new functions in electron transfer, oxygen binding, nitric oxide binding or heme scavenging.
Resumo:
During the last 2 years, several novel genes that encode glucose transporter-like proteins have been identified and characterized. Because of their sequence similarity with GLUT1, these genes appear to belong to the family of solute carriers 2A (SLC2A, protein symbol GLUT). Sequence comparisons of all 13 family members allow the definition of characteristic sugar/polyol transporter signatures: (1) the presence of 12 membrane-spanning helices, (2) seven conserved glycine residues in the helices, (3) several basic and acidic residues at the intracellular surface of the proteins, (4) two conserved tryptophan residues, and (5) two conserved tyrosine residues. On the basis of sequence similarities and characteristic elements, the extended GLUT family can be divided into three subfamilies, namely class I (the previously known glucose transporters GLUT1-4), class II (the previously known fructose transporter GLUT5, the GLUT7, GLUT9 and GLUT11), and class III (GLUT6, 8, 10, 12, and the myo-inositol transporter HMIT1). Functional characteristics have been reported for some of the novel GLUTs. Like GLUT1-4, they exhibit a tissue/cell-specific expression (GLUT6, leukocytes, brain; GLUT8, testis, blastocysts, brain, muscle, adipocytes; GLUT9, liver, kidney; GLUT10, liver, pancreas; GLUT11, heart, skeletal muscle). GLUT6 and GLUT8 appear to be regulated by sub-cellular redistribution, because they are targeted to intra-cellular compartments by dileucine motifs in a dynamin dependent manner. Sugar transport has been reported for GLUT6, 8, and 11; HMIT1 has been shown to be a H+/myo-inositol co-transporter. Thus, the members of the extended GLUT family exhibit a surprisingly diverse substrate specificity, and the definition of sequence elements determining this substrate specificity will require a full functional characterization of all members.
Resumo:
A novel member of the tumor necrosis factor (TNF) receptor family, designated TRAMP, has been identified. The structural organization of the 393 amino acid long human TRAMP is most homologous to TNF receptor 1. TRAMP is abundantly expressed on thymocytes and lymphocytes. Its extracellular domain is composed of four cysteine-rich domains, and the cytoplasmic region contains a death domain known to signal apoptosis. Overexpression of TRAMP leads to two major responses, NF-kappaB activation and apoptosis. TRAMP-induced cell death is inhibited by an inhibitor of ICE-like proteases, but not by Bcl-2. In addition, TRAMP does not appear to interact with any of the known apoptosis-inducing ligands of the TNF family.
Resumo:
The malic enzyme (ME) gene is a target for both thyroid hormone receptors and peroxisome proliferator-activated receptors (PPAR). Within the ME promoter, two direct repeat (DR)-1-like elements, MEp and MEd, have been identified as putative PPAR response elements (PPRE). We demonstrate that only MEp and not MEd is able to bind PPAR/retinoid X receptor (RXR) heterodimers and mediate peroxisome proliferator signaling. Taking advantage of the close sequence resemblance of MEp and MEd, we have identified crucial determinants of a PPRE. Using reciprocal mutation analyses of these two elements, we show the preference for adenine as the spacing nucleotide between the two half-sites of the PPRE and demonstrate the importance of the two first bases flanking the core DR1 in 5'. This latter feature of the PPRE lead us to consider the polarity of the PPAR/RXR heterodimer bound to its cognate element. We demonstrate that, in contrast to the polarity of RXR/TR and RXR/RAR bound to DR4 and DR5 elements respectively, PPAR binds to the 5' extended half-site of the response element, while RXR occupies the 3' half-site. Consistent with this polarity is our finding that formation and binding of the PPAR/RXR heterodimer requires an intact hinge T region in RXR while its integrity is not required for binding of the RXR/TR heterodimer to a DR4.
Resumo:
EMBnet is a consortium of collaborating bioinformatics groups located mainly within Europe (http://www.embnet.org). Each member country is represented by a 'node', a group responsible for the maintenance of local services for their users (e.g. education, training, software, database distribution, technical support, helpdesk). Among these services a web portal with links and access to locally developed and maintained software is essential and different for each node. Our web portal targets biomedical scientists in Switzerland and elsewhere, offering them access to a collection of important sequence analysis tools mirrored from other sites or developed locally. We describe here the Swiss EMBnet node web site (http://www.ch.embnet.org), which presents a number of original services not available anywhere else.
Resumo:
In order to contribute to the debate about southern glacial refugia used by temperate species and more northern refugia used by boreal or cold-temperate species, we examined the phylogeography of a widespread snake species (Vipera berus) inhabiting Europe up to the Arctic Circle. The analysis of the mitochondrial DNA (mtDNA) sequence variation in 1043 bp of the cytochrome b gene and in 918 bp of the noncoding control region was performed with phylogenetic approaches. Our results suggest that both the duplicated control region and cytochrome b evolve at a similar rate in this species. Phylogenetic analysis showed that V. berus is divided into three major mitochondrial lineages, probably resulting from an Italian, a Balkan and a Northern (from France to Russia) refugial area in Eastern Europe, near the Carpathian Mountains. In addition, the Northern clade presents an important substructure, suggesting two sequential colonization events in Europe. First, the continent was colonized from the three main refugial areas mentioned above during the Lower-Mid Pleistocene. Second, recolonization of most of Europe most likely originated from several refugia located outside of the Mediterranean peninsulas (Carpathian region, east of the Carpathians, France and possibly Hungary) during the Mid-Late Pleistocene, while populations within the Italian and Balkan Peninsulas fluctuated only slightly in distribution range, with larger lowland populations during glacial times and with refugial mountain populations during interglacials, as in the present time. The phylogeographical structure revealed in our study suggests complex recolonization dynamics of the European continent by V. berus, characterized by latitudinal as well as altitudinal range shifts, driven by both climatic changes and competition with related species.
Resumo:
Dans un premier temps, nous avons modélisé la structure d’une famille d’ARN avec une grammaire de graphes afin d’identifier les séquences qui en font partie. Plusieurs autres méthodes de modélisation ont été développées, telles que des grammaires stochastiques hors-contexte, des modèles de covariance, des profils de structures secondaires et des réseaux de contraintes. Ces méthodes de modélisation se basent sur la structure secondaire classique comparativement à nos grammaires de graphes qui se basent sur les motifs cycliques de nucléotides. Pour exemplifier notre modèle, nous avons utilisé la boucle E du ribosome qui contient le motif Sarcin-Ricin qui a été largement étudié depuis sa découverte par cristallographie aux rayons X au début des années 90. Nous avons construit une grammaire de graphes pour la structure du motif Sarcin-Ricin et avons dérivé toutes les séquences qui peuvent s’y replier. La pertinence biologique de ces séquences a été confirmée par une comparaison des séquences d’un alignement de plus de 800 séquences ribosomiques bactériennes. Cette comparaison a soulevée des alignements alternatifs pour quelques unes des séquences que nous avons supportés par des prédictions de structures secondaires et tertiaires. Les motifs cycliques de nucléotides ont été observés par les membres de notre laboratoire dans l'ARN dont la structure tertiaire a été résolue expérimentalement. Une étude des séquences et des structures tertiaires de chaque cycle composant la structure du Sarcin-Ricin a révélé que l'espace des séquences dépend grandement des interactions entre tous les nucléotides à proximité dans l’espace tridimensionnel, c’est-à-dire pas uniquement entre deux paires de bases adjacentes. Le nombre de séquences générées par la grammaire de graphes est plus petit que ceux des méthodes basées sur la structure secondaire classique. Cela suggère l’importance du contexte pour la relation entre la séquence et la structure, d’où l’utilisation d’une grammaire de graphes contextuelle plus expressive que les grammaires hors-contexte. Les grammaires de graphes que nous avons développées ne tiennent compte que de la structure tertiaire et négligent les interactions de groupes chimiques spécifiques avec des éléments extra-moléculaires, comme d’autres macromolécules ou ligands. Dans un deuxième temps et pour tenir compte de ces interactions, nous avons développé un modèle qui tient compte de la position des groupes chimiques à la surface des structures tertiaires. L’hypothèse étant que les groupes chimiques à des positions conservées dans des séquences prédéterminées actives, qui sont déplacés dans des séquences inactives pour une fonction précise, ont de plus grandes chances d’être impliqués dans des interactions avec des facteurs. En poursuivant avec l’exemple de la boucle E, nous avons cherché les groupes de cette boucle qui pourraient être impliqués dans des interactions avec des facteurs d'élongation. Une fois les groupes identifiés, on peut prédire par modélisation tridimensionnelle les séquences qui positionnent correctement ces groupes dans leurs structures tertiaires. Il existe quelques modèles pour adresser ce problème, telles que des descripteurs de molécules, des matrices d’adjacences de nucléotides et ceux basé sur la thermodynamique. Cependant, tous ces modèles utilisent une représentation trop simplifiée de la structure d’ARN, ce qui limite leur applicabilité. Nous avons appliqué notre modèle sur les structures tertiaires d’un ensemble de variants d’une séquence d’une instance du Sarcin-Ricin d’un ribosome bactérien. L’équipe de Wool à l’université de Chicago a déjà étudié cette instance expérimentalement en testant la viabilité de 12 variants. Ils ont déterminé 4 variants viables et 8 létaux. Nous avons utilisé cet ensemble de 12 séquences pour l’entraînement de notre modèle et nous avons déterminé un ensemble de propriétés essentielles à leur fonction biologique. Pour chaque variant de l’ensemble d’entraînement nous avons construit des modèles de structures tertiaires. Nous avons ensuite mesuré les charges partielles des atomes exposés sur la surface et encodé cette information dans des vecteurs. Nous avons utilisé l’analyse des composantes principales pour transformer les vecteurs en un ensemble de variables non corrélées, qu’on appelle les composantes principales. En utilisant la distance Euclidienne pondérée et l’algorithme du plus proche voisin, nous avons appliqué la technique du « Leave-One-Out Cross-Validation » pour choisir les meilleurs paramètres pour prédire l’activité d’une nouvelle séquence en la faisant correspondre à ces composantes principales. Finalement, nous avons confirmé le pouvoir prédictif du modèle à l’aide d’un nouvel ensemble de 8 variants dont la viabilité à été vérifiée expérimentalement dans notre laboratoire. En conclusion, les grammaires de graphes permettent de modéliser la relation entre la séquence et la structure d’un élément structural d’ARN, comme la boucle E contenant le motif Sarcin-Ricin du ribosome. Les applications vont de la correction à l’aide à l'alignement de séquences jusqu’au design de séquences ayant une structure prédéterminée. Nous avons également développé un modèle pour tenir compte des interactions spécifiques liées à une fonction biologique donnée, soit avec des facteurs environnants. Notre modèle est basé sur la conservation de l'exposition des groupes chimiques qui sont impliqués dans ces interactions. Ce modèle nous a permis de prédire l’activité biologique d’un ensemble de variants de la boucle E du ribosome qui se lie à des facteurs d'élongation.
Resumo:
Motivation: DNA assembly programs classically perform an all-against-all comparison of reads to identify overlaps, followed by a multiple sequence alignment and generation of a consensus sequence. If the aim is to assemble a particular segment, instead of a whole genome or transcriptome, a target-specific assembly is a more sensible approach. GenSeed is a Perl program that implements a seed-driven recursive assembly consisting of cycles comprising a similarity search, read selection and assembly. The iterative process results in a progressive extension of the original seed sequence. GenSeed was tested and validated on many applications, including the reconstruction of nuclear genes or segments, full-length transcripts, and extrachromosomal genomes. The robustness of the method was confirmed through the use of a variety of DNA and protein seeds, including short sequences derived from SAGE and proteome projects.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)