67 resultados para sequence based alignments
em National Center for Biotechnology Information - NCBI
Resumo:
Nucleic acid sequence-based amplification (NASBA) has proved to be an ultrasensitive method for HIV-1 diagnosis in plasma even in the primary HIV infection stage. This technique was combined with fluorescence correlation spectroscopy (FCS) which enables online detection of the HIV-1 RNA molecules amplified by NASBA. A fluorescently labeled DNA probe at nanomolar concentration was introduced into the NASBA reaction mixture and hybridizing to a distinct sequence of the amplified RNA molecule. The specific hybridization and extension of this probe during amplification reaction, resulting in an increase of its diffusion time, was monitored online by FCS. As a consequence, after having reached a critical concentration of 0.1–1 nM (threshold for unaided FCS detection), the number of amplified RNA molecules in the further course of reaction could be determined. Evaluation of the hybridization/extension kinetics allowed an estimation of the initial HIV-1 RNA concentration that was present at the beginning of amplification. The value of initial HIV-1 RNA number enables discrimination between positive and false-positive samples (caused for instance by carryover contamination)—this possibility of discrimination is an essential necessity for all diagnostic methods using amplification systems (PCR as well as NASBA). Quantitation of HIV-1 RNA in plasma by combination of NASBA with FCS may also be useful in assessing the efficacy of anti-HIV agents, especially in the early infection stage when standard ELISA antibody tests often display negative results.
Resumo:
Transmission of human immunodeficiency virus 1 (HIV-1) from an infected women to her offspring during gestation and delivery was found to be influenced by the infant's major histocompatibility complex class II DRB1 alleles. Forty-six HIV-infected infants and 63 seroreverting infants, born with passively acquired anti-HIV antibodies but not becoming detectably infected, were typed by an automated nucleotide-sequence-based technique that uses low-resolution PCR to select either the simpler Taq or the more demanding T7 sequencing chemistry. One or more DR13 alleles, including DRB1*1301, 1302, and 1303, were found in 31.7% of seroreverting infants and 15.2% of those becoming HIV-infected [OR (odds ratio) = 2.6 (95% confidence interval 1.0-6.8); P = 0.048]. This association was influenced by ethnicity, being seen more strongly among the 80 Black and Hispanic children [OR = 4.3 (1.2-16.4); P = 0.023], with the most pronounced effect among Black infants where 7 of 24 seroreverters inherited these alleles with none among 12 HIV-infected infants (Haldane OR = 12.3; P = 0.037). The previously recognized association of DR13 alleles with some situations of long-term nonprogression of HIV suggests that similar mechanisms may regulate both the occurrence of infection and disease progression after infection. Upon examining for residual associations, only only the DR2 allele DRB1*1501 was associated with seroreversion in Caucasoid infants (OR = 24; P = 0.004). Among Caucasoids the DRB1*03011 allele was positively associated with the occurrence of HIV infection (P = 0.03).
Resumo:
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous superposition (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 ‘orphans’ (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pa uling.mbu.iisc.ernet.in/~pali.
Resumo:
Molecular, sequence-based environmental surveys of microorganisms have revealed a large degree of previously uncharacterized diversity. However, nearly all studies of the human endogenous bacterial flora have relied on cultivation and biochemical characterization of the resident organisms. We used molecular methods to characterize the breadth of bacterial diversity within the human subgingival crevice by comparing 264 small subunit rDNA sequences from 21 clone libraries created with products amplified directly from subgingival plaque, with sequences obtained from bacteria that were cultivated from the same specimen, as well as with sequences available in public databases. The majority (52.5%) of the directly amplified 16S rRNA sequences were <99% identical to sequences within public databases. In contrast, only 21.4% of the sequences recovered from cultivated bacteria showed this degree of variability. The 16S rDNA sequences recovered by direct amplification were also more deeply divergent; 13.5% of the amplified sequences were more than 5% nonidentical to any known sequence, a level of dissimilarity that is often found between members of different genera. None of the cultivated sequences exhibited this degree of sequence dissimilarity. Finally, direct amplification of 16S rDNA yielded a more diverse view of the subgingival bacterial flora than did cultivation. Our data suggest that a significant proportion of the resident human bacterial flora remain poorly characterized, even within this well studied and familiar microbial environment.
Resumo:
Structural genomics aims to solve a large number of protein structures that represent the protein space. Currently an exhaustive solution for all structures seems prohibitively expensive, so the challenge is to define a relatively small set of proteins with new, currently unknown folds. This paper presents a method that assigns each protein with a probability of having an unsolved fold. The method makes extensive use of protomap, a sequence-based classification, and scop, a structure-based classification. According to protomap, the protein space encodes the relationship among proteins as a graph whose vertices correspond to 13,354 clusters of proteins. A representative fold for a cluster with at least one solved protein is determined after superposition of all scop (release 1.37) folds onto protomap clusters. Distances within the protomap graph are computed from each representative fold to the neighboring folds. The distribution of these distances is used to create a statistical model for distances among those folds that are already known and those that have yet to be discovered. The distribution of distances for solved/unsolved proteins is significantly different. This difference makes it possible to use Bayes' rule to derive a statistical estimate that any protein has a yet undetermined fold. Proteins that score the highest probability to represent a new fold constitute the target list for structural determination. Our predicted probabilities for unsolved proteins correlate very well with the proportion of new folds among recently solved structures (new scop 1.39 records) that are disjoint from our original training set.
Resumo:
Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1 000 000 hits from 462 500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. Questions can be emailed to interhelp@ebi.ac.uk.
Resumo:
Nucleosomes, the basic structural elements of chromosomes, consist of 146 bp of DNA coiled around an octamer of histone proteins, and their presence can strongly influence gene expression. Considerations of the anisotropic flexibility of nucleotide triplets containing 3 cytosines or guanines suggested that a [5'(G/C)3 NN3']n motif might resist wrapping around a histone octamer. To test this, DNAs were constructed containing a 5'-CCGNN-3' pentanucleotide repeat with the Ns varied. Using in vitro nucleosome reconstitution and electron microscopy, a plasmid with 48 contiguous CCGNN repeats strongly excluded nucleosomes in the repeat region. Competitive reconstitution gel retardation experiments using DNA fragments containing 12, 24, or 48 CCGNN repeats showed that the propensity to exclude nucleosomes increased with the length of the repeat. Analysis showed that a 268-bp DNA containing a (CCGNN)48 block is 4.9 +/- 0.6-fold less efficient in nucleosome assembly than a similar length pUC19 fragment and approximately 78-fold less efficient than a similar length (CTG)n sequence, based on results from previous studies. Computer searches against the GenBank database for matches with a [(G/C)3NN]48 sequence revealed numerous examples that frequently were present in the control regions of "TATA-less" genes, including the human ETS-2 and human dihydrofolate reductase genes. In both cases the (G/C)3NN repeat, present in the promoter region, co-maps with loci previously shown to be nuclease hypersensitive sites.
Resumo:
The genome of the pufferfish (Fugu rubripes) (400 Mb) is approximately 7.5 times smaller than the human genome, but it has a similar gene repertoire to that of man. If regions of the two genomes exhibited conservation of gene order (i.e., were syntenic), it should be possible to reduce dramatically the effort required for identification of candidate genes in human disease loci by sequencing syntenic regions of the compact Fugu genome. We have demonstrated that three genes (dihydrolipoamide succinyltransferase, S31iii125, and S20i15), which are linked to FOS in the familial Alzheimer disease focus (AD3) on human chromosome 14, have homologues in the Fugu genome adjacent to Fugu cFOS. The relative gene order of cFOS, S31iii125, and S20i15 was the same in both genomes, but in Fugu these three genes lay within a 12.4-kb region, compared to >600 kb in the human AD3 locus. These results demonstrate the conservation of synteny between the genomes of Fugu and man and highlight the utility of this approach for sequence-based identification of genes in human disease loci.
Resumo:
In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.
Resumo:
Dissecting aortic aneurysm is the hallmark of Marfan syndrome (MFS) and the result of mutations in fibrillin-1, the major constituent of elastin-associated extracellular microfibrils. It is yet to be established whether dysfunction of fibrillin-1 perturbs the ability of the elastic vessel wall to sustain hemodynamic stress by disrupting microfibrillar assembly, by impairing the homeostasis of established elastic fibers, or by a combination of both mechanisms. The pathogenic sequence responsible for the mechanical collapse of the elastic lamellae in the aortic wall is also unknown. Targeted mutation of the mouse fibrillin-1 gene has recently suggested that deficiency of fibrillin-1 reduces tissue homeostasis rather than elastic fiber formation. Here we describe another gene-targeting mutation, mgR, which shows that underexpression of fibrillin-1 similarly leads to MFS-like manifestations. Histopathological analysis of mgR/mgR specimens implicates medial calcification, the inflammatory–fibroproliferative response, and inflammation-mediated elastolysis in the natural history of dissecting aneurysm. More generally, the phenotypic severity associated with various combinations of normal and mutant fibrillin-1 alleles suggests a threshold phenomenon for the functional collapse of the vessel wall that is based on the level and the integrity of microfibrils.
Resumo:
(E)-α-Bisabolene synthase is one of two wound-inducible sesquiterpene synthases of grand fir (Abies grandis), and the olefin product of this cyclization reaction is considered to be the precursor in Abies species of todomatuic acid, juvabione, and related insect juvenile hormone mimics. A cDNA encoding (E)-α-bisabolene synthase was isolated from a wound-induced grand fir stem library by a PCR-based strategy and was functionally expressed in Escherichia coli and shown to produce (E)-α-bisabolene as the sole product from farnesyl diphosphate. The expressed synthase has a deduced size of 93.8 kDa and a pI of 5.03, exhibits other properties typical of sesquiterpene synthases, and resembles in sequence other terpenoid synthases with the exception of a large amino-terminal insertion corresponding to Pro81–Val296. Biosynthetically prepared (E)-α-[3H]bisabolene was converted to todomatuic acid in induced grand fir cells, and the time course of appearance of bisabolene synthase mRNA was shown by Northern hybridization to lag behind that of mRNAs responsible for production of induced oleoresin monoterpenes. These results suggest that induced (E)-α-bisabolene biosynthesis constitutes part of a defense response targeted to insect herbivores, and possibly fungal pathogens, that is distinct from induced oleoresin monoterpene production.
Resumo:
A technique for systematic peptide variation by a combination of rational and evolutionary approaches is presented. The design scheme consists of five consecutive steps: (i) identification of a “seed peptide” with a desired activity, (ii) generation of variants selected from a physicochemical space around the seed peptide, (iii) synthesis and testing of this biased library, (iv) modeling of a quantitative sequence-activity relationship by an artificial neural network, and (v) de novo design by a computer-based evolutionary search in sequence space using the trained neural network as the fitness function. This strategy was successfully applied to the identification of novel peptides that fully prevent the positive chronotropic effect of anti-β1-adrenoreceptor autoantibodies from the serum of patients with dilated cardiomyopathy. The seed peptide, comprising 10 residues, was derived by epitope mapping from an extracellular loop of human β1-adrenoreceptor. A set of 90 peptides was synthesized and tested to provide training data for neural network development. De novo design revealed peptides with desired activities that do not match the seed peptide sequence. These results demonstrate that computer-based evolutionary searches can generate novel peptides with substantial biological activity.
Resumo:
Elucidating the genetic basis of human phenotypes is a major goal of contemporary geneticists. Logically, two fundamental and contrasting approaches are available, one that begins with a phenotype and concludes with the identification of a responsible gene or genes; the other that begins with a gene and works toward identifying one or more phenotypes resulting from allelic variation of it. This paper provides a conceptual overview of phenotype-based vs. gene-based procedures with emphasis on gene-based methods. A key feature of a gene-based approach is that laboratory effort first is devoted to developing an assay for mutations in the gene under regard; the assay then is applied to the evaluation of large numbers of unrelated individuals with a variety of phenotypes that are deemed potentially resulting from alleles at the gene. No effort is directed toward chromosomally mapping the loci responsible for the phenotypes scanned. Example is made of my laboratory’s successful use of a gene-based approach to identify genes causing hereditary diseases of the retina such as retinitis pigmentosa. Reductions in the cost and improvements in the speed of scanning individuals for DNA sequence anomalies may make a gene-based approach an efficient alternative to phenotype-based approaches to correlating genes with phenotypes.
Resumo:
Genes that are characteristic of only certain strains of a bacterial species can be of great biologic interest. Here we describe a PCR-based subtractive hybridization method for efficiently detecting such DNAs and apply it to the gastric pathogen Helicobacter pylori. Eighteen DNAs specific to a monkey-colonizing strain (J166) were obtained by subtractive hybridization against an unrelated strain whose genome has been fully sequenced (26695). Seven J166-specific clones had no DNA sequence match to the 26695 genome, and 11 other clones were mixed, with adjacent patches that did and did not match any sequences in 26695. At the protein level, seven clones had homology to putative DNA restriction-modification enzymes, and two had homology to putative metabolic enzymes. Nine others had no database match with proteins of assigned function. PCR tests of 13 unrelated H. pylori strains by using primers specific for 12 subtracted clones and complementary Southern blot hybridizations indicated that these DNAs are highly polymorphic in the H. pylori population, with each strain yielding a different pattern of gene-specific PCR amplification. The search for polymorphic DNAs, as described here, should help identify previously unknown virulence genes in pathogens and provide new insights into microbial genetic diversity and evolution.
Resumo:
A human and a mouse gene have been isolated based on homology to a recombinational repair gene from the corn smut Ustilago maydis. The new human (h) gene, termed hREC2, bears striking resemblance to several others, including hRAD51 and hLIM15. hREC2 is located on human chromosome 14 at q23–24. The overall amino acid sequence reveals characteristic elements of a RECA-like gene yet harbors an src-like phosphorylation site curiously absent from hRAD51 and hLIM15. Unlike these two relatives, hREC2 is expressed in a wide range of tissues including lung, liver, placenta, pancreas, leukocytes, colon, small intestine, brain, and heart, as well as thymus, prostate, spleen, and uterus. Of greatest interest is that hREC2 is undetectable by reverse transcription-coupled PCR in tissue culture unless the cells are treated by ionizing radiation.