989 resultados para Simple Sequence Repeats
Resumo:
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.
Resumo:
Background: Single nucleotide polymorphisms (SNPs) are the most frequent type of sequence variation between individuals, and represent a promising tool for finding genetic determinants of complex diseases and understanding the differences in drug response. In this regard, it is of particular interest to study the effect of non-synonymous SNPs in the context of biological networks such as cell signalling pathways. UniProt provides curated information about the functional and phenotypic effects of sequence variation, including SNPs, as well as on mutations of protein sequences. However, no strategy has been developed to integrate this information with biological networks, with the ultimate goal of studying the impact of the functional effect of SNPs in the structure and dynamics of biological networks. Results: First, we identified the different challenges posed by the integration of the phenotypic effect of sequence variants and mutations with biological networks. Second, we developed a strategy for the combination of data extracted from public resources, such as UniProt, NCBI dbSNP, Reactome and BioModels. We generated attribute files containing phenotypic and genotypic annotations to the nodes of biological networks, which can be imported into network visualization tools such as Cytoscape. These resources allow the mapping and visualization of mutations and natural variations of human proteins and their phenotypic effect on biological networks (e.g. signalling pathways, protein-protein interaction networks, dynamic models). Finally, an example on the use of the sequence variation data in the dynamics of a network model is presented. Conclusion: In this paper we present a general strategy for the integration of pathway and sequence variation data for visualization, analysis and modelling purposes, including the study of the functional impact of protein sequence variations on the dynamics of signalling pathways. This is of particular interest when the SNP or mutation is known to be associated to disease. We expect that this approach will help in the study of the functional impact of disease-associated SNPs on the behaviour of cell signalling pathways, which ultimately will lead to a better understanding of the mechanisms underlying complex diseases.
Resumo:
Background: Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both.Results: Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors.Conclusion: We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
Resumo:
Background: A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners. Results: The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST. Conclusion: Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.
Resumo:
Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.
Resumo:
Microsatellite instability (MSI) testing in clinics is becoming increasingly widespread; therefore, there is an urgent need for methodology standardization and the availability of quality control. This study is aimed to assess the interlaboratory reproducibility of MSI testing in archive samples by using a panel of 5 recently introduced, mononucleotide repeats (MNR). The quality control involved 8 European institutions. Participants were supplied with DNA extracted from 15 archive colon carcinoma samples and from the corresponding normal tissues. Every group was asked to assess the MSI status of the samples by using the BAT25, BAT26, NR21, NR24, and NR27 mononucleotide markers. Four institutions repeated the analysis using the NCI reference panel to confirm the results obtained with the MNR markers. The overall concordance among institutions for MSI analyses at single locus level was 97.7% when using the MNR panel and 95.0% with the NCI one. The laboratories obtained a full agreement in scoring the MSI status of each patient sample, both using the mononucleotide and the NCI marker sets. With the NCI marker set, however, concordance was lowered to 85.7% when considering the MSI-Low phenotype. Concordance between the 2 panels in scoring the MSI status of each sample was complete if no discrimination was made between MSI-Stable and MSI-L, whereas it dropped to 76.7% if MSI-L was considered. In conclusion, the use of the MNR panel seems to be a robust approach that yields a very high level of reproducibility. The results obtained with the 5 MNR are diagnostically consistent with those obtained by the use of the NCI markers, except for the MSI-Low phenotype.
Resumo:
Shrews of the genus Sorex are characterized by a Holarctic distribution, and relationships among extant taxa have never been fully resolved. Phylogenies have been proposed based on morphological, karyological, and biochemical comparisons, but these analyses often produced controversial and contradictory results. Phylogenetic analyses of partial mitochondrial cytochrome b gene sequences (1011 bp) were used to examine the relationships among 27 Sorex species. The molecular data suggest that Sorex comprises two major monophyletic lineages, one restricted mostly to the New World and one with a primarily Palearctic distribution. Furthermore, several sister-species relationships are revealed by the analysis. Based on the split between the Soricinae and Crocidurinae subfamilies, we used a 95% confidence interval for both the calibration of a molecular clock and the subsequent calculation of major diversification events within the genus Sorex. Our analysis does not support an unambiguous acceleration of the molecular clock in shrews, the estimated rate being similar to other estimates of mammalian mitochondrial clocks. In addition, the data presented here indicate that estimates from the fossil record greatly underestimate divergence dates among Sorex taxa.
Resumo:
BACKGROUND: Tests for recent infections (TRIs) are important for HIV surveillance. We have shown that a patient's antibody pattern in a confirmatory line immunoassay (Inno-Lia) also yields information on time since infection. We have published algorithms which, with a certain sensitivity and specificity, distinguish between incident (< = 12 months) and older infection. In order to use these algorithms like other TRIs, i.e., based on their windows, we now determined their window periods. METHODS: We classified Inno-Lia results of 527 treatment-naïve patients with HIV-1 infection < = 12 months according to incidence by 25 algorithms. The time after which all infections were ruled older, i.e. the algorithm's window, was determined by linear regression of the proportion ruled incident in dependence of time since infection. Window-based incident infection rates (IIR) were determined utilizing the relationship 'Prevalence = Incidence x Duration' in four annual cohorts of HIV-1 notifications. Results were compared to performance-based IIR also derived from Inno-Lia results, but utilizing the relationship 'incident = true incident + false incident' and also to the IIR derived from the BED incidence assay. RESULTS: Window periods varied between 45.8 and 130.1 days and correlated well with the algorithms' diagnostic sensitivity (R(2) = 0.962; P<0.0001). Among the 25 algorithms, the mean window-based IIR among the 748 notifications of 2005/06 was 0.457 compared to 0.453 obtained for performance-based IIR with a model not correcting for selection bias. Evaluation of BED results using a window of 153 days yielded an IIR of 0.669. Window-based IIR and performance-based IIR increased by 22.4% and respectively 30.6% in 2008, while 2009 and 2010 showed a return to baseline for both methods. CONCLUSIONS: IIR estimations by window- and performance-based evaluations of Inno-Lia algorithm results were similar and can be used together to assess IIR changes between annual HIV notification cohorts.
Resumo:
Nearly full-length Circumsporozoite protein (CSP) from Plasmodium falciparum, the C-terminal fragments from both P. falciparm and P. yoelii CSP and a fragment comprising 351 amino acids of P.vivax MSPI were expressed in the slime mold Dictyostelium discoideum. Discoidin-tag expression vectors allowed both high yields of these proteins and their purification by a nearly single-step procedure. We exploited the galactose binding activity of Discoidin Ia to separate the fusion proteins by affinity chromatography on Sepharose-4B columns. Inclusion of a thrombin recognition site allowed cleavage of the Discoidin-tag from the fusion protein. Partial secretion of the protein was obtained via an ER independent pathway, whereas routing the recombinant proteins to the ER resulted in glycosylation and retention. Yields of proteins ranged from 0.08 to 3 mg l(-1) depending on the protein sequence and the purification conditions. The recognition of purified MSPI by sera from P. vivax malaria patients was used to confirm the native conformation of the protein expressed in Dictyostelium. The simple purification procedure described here, based on Sepharose-4B, should facilitate the expression and the large-scale purification of various Plasmodium polypeptides.
Resumo:
PURPOSE: To report a large deletion that encompasses more than 90% of PRPF31 gene and two other neighboring genes in their entirety in an adRP pedigree that appears to show only the typical clinical features of retinitis pigmentosa. METHODS: To identify PRPF31 mutation in a dominant RP family (ADRP2) previously linked to the RP11 locus, the 14 exons of PRPF31 were screened for mutations by direct sequencing. To investigate the possibility of a large deletion, microsatellite markers near PRPF31 gene were analyzed by non-denaturing PAGE. RESULTS: Initial screening of PRPF31 gene in the ADRP2 family did not reveal an obvious mutation. A large deletion was however suspected due to lack of heterozygosity for nearly all PRPF31 intragenic single nucleotide polymorphysm (SNPs). In order to estimate the size of the deletion, SNPs and microsatellite markers spanning and flanking PRPF31 were analyzed in the entire ADRP2 family. Haplotype analysis with the above markers suggested a deletion of approximately 30 kb that included the putative promoter region of a novel gene OSCAR, the entire genomic content of genes NDUFA3, TFPT and more than 90% of PRPF31 gene. Sequence analysis of the region flanking the potential deletion showed a high presence of Alu elements implicating Alu mediated recombination as the mechanism responsible for this event. CONCLUSIONS: This mutation provides evidence that haploinsufficiency rather than aberrant function of mutated proteins is the cause of disease in these adRP patients with mutations in PRPF31 gene.
Resumo:
Epidemiological processes leave a fingerprint in the pattern of genetic structure of virus populations. Here, we provide a new method to infer epidemiological parameters directly from viral sequence data. The method is based on phylogenetic analysis using a birth-death model (BDM) rather than the commonly used coalescent as the model for the epidemiological transmission of the pathogen. Using the BDM has the advantage that transmission and death rates are estimated independently and therefore enables for the first time the estimation of the basic reproductive number of the pathogen using only sequence data, without further assumptions like the average duration of infection. We apply the method to genetic data of the HIV-1 epidemic in Switzerland.
Resumo:
In contrast with mammals and birds, most poikilothermic vertebrates feature structurally undifferentiated sex chromosomes, which may result either from frequent turnovers, or from occasional events of XY recombination. The latter mechanism was recently suggested to be responsible for sex-chromosome homomorphy in European tree frogs (Hyla arborea). However, no single case of male recombination has been identified in large-scale laboratory crosses, and populations from NW Europe consistently display sex-specific allelic frequencies with male-diagnostic alleles, suggesting the absence of recombination in their recent history. To address this apparent paradox, we extended the phylogeographic scope of investigations, by analyzing the sequences of three sex-linked markers throughout the whole species distribution. Refugial populations (southern Balkans and Adriatic coast) show a mix of X and Y alleles in haplotypic networks, and no more within-individual pairwise nucleotide differences in males than in females, testifying to recurrent XY recombination. In contrast, populations of NW Europe, which originated from a recent postglacial expansion, show a clear pattern of XY differentiation; the X and Y gametologs of the sex-linked gene Med15 present different alleles, likely fixed by drift on the front wave of expansions, and kept differentiated since. Our results support the view that sex-chromosome homomorphy in H. arborea is maintained by occasional or historical events of recombination; whether the frequency of these events indeed differs between populations remains to be clarified.