930 resultados para Dna-sequences
Resumo:
We have successfully linked protein library screening directly with the identification of active proteins, without the need for individual purification, display technologies or physical linkage between the protein and its encoding sequence. By using 'MAX' randomization we have rapidly constructed 60 overlapping gene libraries that encode zinc finger proteins, randomized variously at the three principal DNA-contacting residues. Expression and screening of the libraries against five possible target DNA sequences generated data points covering a potential 40,000 individual interactions. Comparative analysis of the resulting data enabled direct identification of active proteins. Accuracy of this library analysis methodology was confirmed by both in vitro and in vivo analyses of identified proteins to yield novel zinc finger proteins that bind to their target sequences with high affinity, as indicated by low nanomolar apparent dissociation constants.
Resumo:
Formal grammars can used for describing complex repeatable structures such as DNA sequences. In this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar. L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant development, and model the morphology of a variety of organisms. We believe that parallel grammars also can be used for modeling genetic mechanisms and sequences such as promoters. Promoters are short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the promoter recognition a complex problem. We replace the problem of promoter recognition by induction of context-free stochastic L-grammar rules, which are later used for the structural analysis of promoter sequences. L-grammar rules are derived automatically from the drosophila and vertebrate promoter datasets using a genetic programming technique and their fitness is evaluated using a Support Vector Machine (SVM) classifier. The artificial promoter sequences generated using the derived L- grammar rules are analyzed and compared with natural promoter sequences.
Resumo:
Разработан и реализован алгоритм выявления фракталоподобных структур в ДНК- последовательностях. Фрактальность трактуется как самоподобие, основанное на свойстве симметрии или комплементарной симметрии. Локальные фракталы интересны своей способностью аккумулировать множественные палиндромно-шпилечные структуры с потенциально возможными регуляторными функциями. Выявлены реальные случаи проявления фрактальности в различных геномах: от вирусов до человека. Рассмотрена возможность использования фракталоподобных структур в качестве маркеров, различающих близкие классы последовательностей.
Resumo:
In the first part of this study human immunodeficiency virus type 1 (HIV-1) proviral DNA sequences derived from 201 clones of the C2-V3 env region and the first exon of the tat gene were obtained from six MV-1 infected heterosexual couples. These molecular data were used to confirm the epidemiological relationships. The ability of the molecular data to draw such conclusions was also tested with multiple phylogenetic analyses. The tat region was much more useful in establishing epidemiological relationships than the commonly used C2-V3.^ Subsequently, using nucleotide sequences from the first exon of the Tat gene, we tested the hypothesis that a Florida dentist (a common source) infected five of his patients in the course of dental procedures, against the null hypothesis that the dentist and each individual of the dental group independently acquired the virus within the local community. Multiple phylogenetic analyses demonstrated that the sequences of the five patients were significantly more related to each other than to sequences of the controls. Our results using Tat sequences, combined with envelope sequence data, strongly support a common phylogenetic epidemiological relationship among these five patients.^ A third study is presented, which deals with the effects of genomic variations in drug resistance. HIV-1 reverse transcriptase (RT) mutations were detected in DNA from peripheral blood mononuclear cells from 11 of 12 HIV-infected children after 11-20 months of zidovudine monotherapy. The codon 41/215 mutant combination was associated with general decline in health status. Patients developing the codon 70 mutation tended to have a better health status. ^
Resumo:
Phylogenetic analyses were performed on six genera and 46 species of the Neotropical palm tribe Geonomeae. The analyses were based on two low copy nuclear DNA sequences from the genes encoding phosphoribulokinase and RNA polymerase II. The basal node of the tribe was polytomous. Pholidostachys formed a monophyletic group. The currently accepted genera Calyptronoma and Calyptrogyne formed a well-supported clade with Calyptronoma resolved as paraphyletic to Calyptrogyne. Geonoma formed a strongly supported monophyletic group consisting of two main clades. ^ An evaluation of the genetic distinctness between Geonoma macrostachys varieties at a local and regional scale using inter-simple sequence repeat (ISSR) markers was performed. Clustering, ordination, and AMOVA suggested a lack of genetic distinctness between varieties at the regional level. A hierarchical AMOVA revealed that the genetic diversity mainly lies among the four localities sampled. A significant genetic differentiation between sympatric varieties occurred in one locality only. The current taxonomy of G. macrostachys, which recognizes only one species, was therefore supported. ^ The preferred habitat of sympatric G. macrostachys varieties with respect to edaphic, topographic, and light factors in three Peruvian lowland forests was studied. The two varieties were mostly encountered in different physiographically defined habitats, with variety acaulis occurring more often in floodplain forest and variety macrostachys in the tierra firme. Comparison of means tests revealed that nine to eleven of the 16 environmental variables were significantly different between varieties. Edaphic factors, mainly soil texture and K content, were better contributors than light conditions to distinguish the habitats occupied by the two varieties in all three study sites. It is concluded that habitat differentiation plays a role in the coexistence of these closely related species taxa. ^
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
Community structure of sediment bacteria in the Everglades freshwater marsh, fringing mangrove forest, and Florida Bay seagrass meadows were described based on polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) patterns of 16S rRNA gene fragments and by sequencing analysis of DGGE bands. The DGGE patterns were correlated with the environmental variables by means of canonical correspondence analysis. There was no significant trend in the Shannon–Weiner index among the sediment samples along the salinity gradient. However, cluster analysis based on DGGE patterns revealed that the bacterial community structure differed according to sites. Not only were these salinity/vegetation regions distinct but the sediment bacteria communities were consistently different along the gradient from freshwater marsh, mangrove forest, eastern-central Florida Bay, and western Florida Bay. Actinobacteria- and Bacteroidetes/Chlorobi-like DNA sequences were amplified throughout all sampling sites. More Chloroflexi and members of candidate division WS3 were found in freshwater marsh and mangrove forest sites than in seagrass sites. The appearance of candidate division OP8-like DNA sequences in mangrove sites distinguished these communities from those of freshwater marsh. The seagrass sites were characterized by reduced presence of bands belonging to Chloroflexi with increased presence of those bands related to Cyanobacteria, γ-Proteobacteria, Spirochetes, and Planctomycetes. This included the sulfate-reducing bacteria, which are prevalent in marine environments. Clearly, bacterial communities in the sediment were different along the gradient, which can be explained mainly by the differences in salinity and total phosphorus.
Resumo:
The mammalian high mobility group protein AT-hook 2 (HMGA2) is a small transcriptional factor involved in cell development and oncogenesis. It contains three "AT-hook" DNA binding domains, which specifically recognize the minor groove of AT-rich DNA sequences. It also has an acidic C-terminal motif. Previous studies showed that HMGA2 mediates all its biological effects through interactions with AT-rich DNA sequences in the promoter regions. In this dissertation, I used a variety of biochemical and biophysical methods to examine the physical properties of HMGA2 and to further investigate HMGA2's interactions with AT-rich DNA sequences. The following are three avenues perused in this study: (1) due to the asymmetrical charge distribution of HMGA2, I have developed a rapid procedure to purify HMGA2 in the milligram range. Preparation of large amounts of HMGA2 makes biophysical studies possible; (2) Since HMGA2 binds to different AT-rich sequences in the promoter regions, I used a combination of isothermal titration calorimetry (ITC) and DNA UV melting experiment to characterize interactions of HMGA2 with poly(dA-dT) 2 and poly(dA)poly(dT). My results demonstrated that (i) each HMGA2 molecule binds to 15 AT bp; (ii) HMGA2 binds to both AT DNAs with very high affinity. However, the binding reaction of HMGA2 to poly(dA-dT) 2 is enthalpy-driven and the binding reaction of HMGA2 with poly(dA)poly(dT) is entropy-driven; (iii) the binding reactions are strongly depended on salt concentrations; (3) Previous studies showed that HMGA2 may have sequence specificity. In this study, I used a PCR-based SELEX procedure to examine the DNA binding specificity of HMGA2. Two consensus sequences for HMGA2 have been identified: 5'-ATATTCGCGAWWATT-3' and 5'-ATATTGCGCAWWATT-3', where W represents A or T. These consensus sequences have a unique feature: the first five base pairs are AT-rich, the middle four to five base pairs are GC-rich, and the last five to six base pairs are AT-rich. All three segments are critical for high affinity binding. Replacing either one of the AT-rich sequences to a non-AT-rich sequence causes at least 100-fold decrease in the binding affinity. Intriguingly, if the GC-segment is substituted by an AT-rich segment, the binding affinity of HMGA2 is reduced approximately 5-fold. Identification of the consensus sequences for HMGA2 represents an important step towards finding its binding sites within the genome.
Resumo:
DNA-binding and RNA-binding proteins are usually considered ‘undruggable’ partly due to the lack of an efficient method to identify inhibitors from existing small molecule repositories. Here we report a rapid and sensitive high-throughput screening approach to identify compounds targeting protein–nucleic acids interactions based on protein–DNA or protein–RNA interaction enzyme-linked immunosorbent assays (PDI-ELISA or PRI-ELISA). We validated the PDI-ELISA method using the mammalian highmobility- group protein AT-hook 2 (HMGA2) as the protein of interest and netropsin as the inhibitor of HMGA2–DNA interactions. With this method we successfully identified several inhibitors and an activator for HMGA2–DNA interactions from a collection of 29 DNA-binding compounds. Guided by this screening excise, we showed that netropsin, the specific inhibitor of HMGA2–DNA interactions, strongly inhibited the differentiation of the mouse pre-adipocyte 3T3-L1 cells into adipocytes, most likely through a mechanism by which the inhibition is through preventing the binding of HMGA2 to the target DNA sequences. This method should be broadly applicable to identify compounds or proteins modulating many DNA-binding or RNA-binding proteins.
Resumo:
Some patients with cancer never develop metastasis, and their host response might provide cues for innovative treatment strategies. We previously reported an association between autoantibodies against complement factor H (CFH) and early-stage lung cancer. CFH prevents complement-mediated cytotoxicity (CDC) by inhibiting formation of cell-lytic membrane attack complexes on self-surfaces. In an effort to translate these findings into a biologic therapy for cancer, we isolated and expressed DNA sequences encoding high-affinity human CFH antibodies directly from single, sorted B cells obtained from patients with the antibody. The co-crystal structure of a CFH antibody-target complex shows a conformational change in the target relative to the native structure. This recombinant CFH antibody causes complement activation and release of anaphylatoxins, promotes CDC of tumor cell lines, and inhibits tumor growth in vivo. The isolation of anti-tumor antibodies derived from single human B cells represents an alternative paradigm in antibody drug discovery.
Resumo:
To provide biological insights into transcriptional regulation, a couple of groups have recently presented models relating the promoter DNA-bound transcription factors (TFs) to downstream gene’s mean transcript level or transcript production rates over time. However, transcript production is dynamic in response to changes of TF concentrations over time. Also, TFs are not the only factors binding to promoters; other DNA binding factors (DBFs) bind as well, especially nucleosomes, resulting in competition between DBFs for binding at same genomic location. Additionally, not only TFs, but also some other elements regulate transcription. Within core promoter, various regulatory elements influence RNAPII recruitment, PIC formation, RNAPII searching for TSS, and RNAPII initiating transcription. Moreover, it is proposed that downstream from TSS, nucleosomes resist RNAPII elongation.
Here, we provide a machine learning framework to predict transcript production rates from DNA sequences. We applied this framework in the S. cerevisiae yeast for two scenarios: a) to predict the dynamic transcript production rate during the cell cycle for native promoters; b) to predict the mean transcript production rate over time for synthetic promoters. As far as we know, our framework is the first successful attempt to have a model that can predict dynamic transcript production rates from DNA sequences only: with cell cycle data set, we got Pearson correlation coefficient Cp = 0.751 and coefficient of determination r2 = 0.564 on test set for predicting dynamic transcript production rate over time. Also, for DREAM6 Gene Promoter Expression Prediction challenge, our fitted model outperformed all participant teams, best of all teams, and a model combining best team’s k-mer based sequence features and another paper’s biologically mechanistic features, in terms of all scoring metrics.
Moreover, our framework shows its capability of identifying generalizable fea- tures by interpreting the highly predictive models, and thereby provide support for associated hypothesized mechanisms about transcriptional regulation. With the learned sparse linear models, we got results supporting the following biological insights: a) TFs govern the probability of RNAPII recruitment and initiation possibly through interactions with PIC components and transcription cofactors; b) the core promoter amplifies the transcript production probably by influencing PIC formation, RNAPII recruitment, DNA melting, RNAPII searching for and selecting TSS, releasing RNAPII from general transcription factors, and thereby initiation; c) there is strong transcriptional synergy between TFs and core promoter elements; d) the regulatory elements within core promoter region are more than TATA box and nucleosome free region, suggesting the existence of still unidentified TAF-dependent and cofactor-dependent core promoter elements in yeast S. cerevisiae; e) nucleosome occupancy is helpful for representing +1 and -1 nucleosomes’ regulatory roles on transcription.
Resumo:
The chemical compounds synthesised and secreted from the dermal glands of amphibian have diverse bioactivities that play key roles in the hosts' innate immune system and in causing diverse pharmacological effects in predators that may ingest the defensive skin secretions. As new biotechnological methods have developed, increasing numbers of novel peptides with novel activities have been discovered from this source of natural compounds. In this study, a number of defensive skin secretion peptide sequences were obtained from the European edible frog, P. kl. esculentus, using a 'shotgun' cloning technique developed previously within our laboratory. Some of these sequences have been previously reported but had either obtained from other species or were isolated using different methods. Two new skin peptides are described here for the first time. Esculentin-2c and Brevinin-2Tbe belong to the Esculentin-2 and Brevinin-2 families, respectively, and both are very similar to their respective analogues but with a few amino acid differences. Further, [Asn-3, Lys-6, Phe-13] 3-14-bombesin isolated previously from the skin of the marsh frog, Rana ridibunda, was identified here in the skin of P. kl. esculentus. Studies such as this can provide a rapid elucidation of peptide and corresponding DNA sequences from unstudied species of frogs and can rapidly provide a basis for related scientific studies such as those involved in systematic or the evolution of a large diverse gene family and usage by biomedical researchers as a source of potential novel drug leads or pharmacological agents.
Resumo:
Gold nanoparticles functionalized with thiolated oligonucleotides (Au-nanoprobes) have been used in a range of applications for the detection of bioanalytes of interest, from ions to proteins and DNA targets. These detection strategies are based on the unique optical properties of gold nanoparticles, in particular, the intense color that is subject to modulation by modification of the medium dieletric. Au-nanoprobes have been applied for the detection and characterization of specific DNA sequences of interest, namely pathogens and disease biomarkers. Nevertheless, despite its relevance, only a few reports exist on the detection of RNA targets. Among these strategies, the colorimetric detection of DNA has been proven to work for several different targets in controlled samples but demonstration in real clinical bioanalysis has been elusive. Here, we used a colorimetric method based on Au-nanoprobes for the direct detection of the e14a2 BCR-ABL fusion transcript in myeloid leukemia patient samples without the need for retro-transcription. Au-nanoprobes directly assessed total RNA from 38 clinical samples, and results were validated against reverse transcription-nested polymerase chain reaction (RT-nested PCR) and reverse transcription-quantitative polymerase chain reaction (RT-qPCR). The colorimetric Au-nanoprobe assay is a simple yet reliable strategy to scrutinize myeloid leukemia patients at diagnosis and evaluate progression, with obvious advantages in terms of time and cost, particularly in low- to medium-income countries where molecular screening is not routinely feasible. Graphical abstract Gold nanoprobe for colorimetric detection of BCR-ABL1 fusion transcripts originating from the Philadelphia chromosome.
Resumo:
Les gènes, qui servent à encoder les fonctions biologiques des êtres vivants, forment l'unité moléculaire de base de l'hérédité. Afin d'expliquer la diversité des espèces que l'on peut observer aujourd'hui, il est essentiel de comprendre comment les gènes évoluent. Pour ce faire, on doit recréer le passé en inférant leur phylogénie, c'est-à-dire un arbre de gènes qui représente les liens de parenté des régions codantes des vivants. Les méthodes classiques d'inférence phylogénétique ont été élaborées principalement pour construire des arbres d'espèces et ne se basent que sur les séquences d'ADN. Les gènes sont toutefois riches en information, et on commence à peine à voir apparaître des méthodes de reconstruction qui utilisent leurs propriétés spécifiques. Notamment, l'histoire d'une famille de gènes en terme de duplications et de pertes, obtenue par la réconciliation d'un arbre de gènes avec un arbre d'espèces, peut nous permettre de détecter des faiblesses au sein d'un arbre et de l'améliorer. Dans cette thèse, la réconciliation est appliquée à la construction et la correction d'arbres de gènes sous trois angles différents: 1) Nous abordons la problématique de résoudre un arbre de gènes non-binaire. En particulier, nous présentons un algorithme en temps linéaire qui résout une polytomie en se basant sur la réconciliation. 2) Nous proposons une nouvelle approche de correction d'arbres de gènes par les relations d'orthologie et paralogie. Des algorithmes en temps polynomial sont présentés pour les problèmes suivants: corriger un arbre de gènes afin qu'il contienne un ensemble d'orthologues donné, et valider un ensemble de relations partielles d'orthologie et paralogie. 3) Nous montrons comment la réconciliation peut servir à "combiner'' plusieurs arbres de gènes. Plus précisément, nous étudions le problème de choisir un superarbre de gènes selon son coût de réconciliation.