6 resultados para DNA Sequences
em Digital Commons at Florida International University
Resumo:
In the first part of this study human immunodeficiency virus type 1 (HIV-1) proviral DNA sequences derived from 201 clones of the C2-V3 env region and the first exon of the tat gene were obtained from six MV-1 infected heterosexual couples. These molecular data were used to confirm the epidemiological relationships. The ability of the molecular data to draw such conclusions was also tested with multiple phylogenetic analyses. The tat region was much more useful in establishing epidemiological relationships than the commonly used C2-V3.^ Subsequently, using nucleotide sequences from the first exon of the Tat gene, we tested the hypothesis that a Florida dentist (a common source) infected five of his patients in the course of dental procedures, against the null hypothesis that the dentist and each individual of the dental group independently acquired the virus within the local community. Multiple phylogenetic analyses demonstrated that the sequences of the five patients were significantly more related to each other than to sequences of the controls. Our results using Tat sequences, combined with envelope sequence data, strongly support a common phylogenetic epidemiological relationship among these five patients.^ A third study is presented, which deals with the effects of genomic variations in drug resistance. HIV-1 reverse transcriptase (RT) mutations were detected in DNA from peripheral blood mononuclear cells from 11 of 12 HIV-infected children after 11-20 months of zidovudine monotherapy. The codon 41/215 mutant combination was associated with general decline in health status. Patients developing the codon 70 mutation tended to have a better health status. ^
Resumo:
Phylogenetic analyses were performed on six genera and 46 species of the Neotropical palm tribe Geonomeae. The analyses were based on two low copy nuclear DNA sequences from the genes encoding phosphoribulokinase and RNA polymerase II. The basal node of the tribe was polytomous. Pholidostachys formed a monophyletic group. The currently accepted genera Calyptronoma and Calyptrogyne formed a well-supported clade with Calyptronoma resolved as paraphyletic to Calyptrogyne. Geonoma formed a strongly supported monophyletic group consisting of two main clades. ^ An evaluation of the genetic distinctness between Geonoma macrostachys varieties at a local and regional scale using inter-simple sequence repeat (ISSR) markers was performed. Clustering, ordination, and AMOVA suggested a lack of genetic distinctness between varieties at the regional level. A hierarchical AMOVA revealed that the genetic diversity mainly lies among the four localities sampled. A significant genetic differentiation between sympatric varieties occurred in one locality only. The current taxonomy of G. macrostachys, which recognizes only one species, was therefore supported. ^ The preferred habitat of sympatric G. macrostachys varieties with respect to edaphic, topographic, and light factors in three Peruvian lowland forests was studied. The two varieties were mostly encountered in different physiographically defined habitats, with variety acaulis occurring more often in floodplain forest and variety macrostachys in the tierra firme. Comparison of means tests revealed that nine to eleven of the 16 environmental variables were significantly different between varieties. Edaphic factors, mainly soil texture and K content, were better contributors than light conditions to distinguish the habitats occupied by the two varieties in all three study sites. It is concluded that habitat differentiation plays a role in the coexistence of these closely related species taxa. ^
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
Community structure of sediment bacteria in the Everglades freshwater marsh, fringing mangrove forest, and Florida Bay seagrass meadows were described based on polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) patterns of 16S rRNA gene fragments and by sequencing analysis of DGGE bands. The DGGE patterns were correlated with the environmental variables by means of canonical correspondence analysis. There was no significant trend in the Shannon–Weiner index among the sediment samples along the salinity gradient. However, cluster analysis based on DGGE patterns revealed that the bacterial community structure differed according to sites. Not only were these salinity/vegetation regions distinct but the sediment bacteria communities were consistently different along the gradient from freshwater marsh, mangrove forest, eastern-central Florida Bay, and western Florida Bay. Actinobacteria- and Bacteroidetes/Chlorobi-like DNA sequences were amplified throughout all sampling sites. More Chloroflexi and members of candidate division WS3 were found in freshwater marsh and mangrove forest sites than in seagrass sites. The appearance of candidate division OP8-like DNA sequences in mangrove sites distinguished these communities from those of freshwater marsh. The seagrass sites were characterized by reduced presence of bands belonging to Chloroflexi with increased presence of those bands related to Cyanobacteria, γ-Proteobacteria, Spirochetes, and Planctomycetes. This included the sulfate-reducing bacteria, which are prevalent in marine environments. Clearly, bacterial communities in the sediment were different along the gradient, which can be explained mainly by the differences in salinity and total phosphorus.
Resumo:
The mammalian high mobility group protein AT-hook 2 (HMGA2) is a small transcriptional factor involved in cell development and oncogenesis. It contains three "AT-hook" DNA binding domains, which specifically recognize the minor groove of AT-rich DNA sequences. It also has an acidic C-terminal motif. Previous studies showed that HMGA2 mediates all its biological effects through interactions with AT-rich DNA sequences in the promoter regions. In this dissertation, I used a variety of biochemical and biophysical methods to examine the physical properties of HMGA2 and to further investigate HMGA2's interactions with AT-rich DNA sequences. The following are three avenues perused in this study: (1) due to the asymmetrical charge distribution of HMGA2, I have developed a rapid procedure to purify HMGA2 in the milligram range. Preparation of large amounts of HMGA2 makes biophysical studies possible; (2) Since HMGA2 binds to different AT-rich sequences in the promoter regions, I used a combination of isothermal titration calorimetry (ITC) and DNA UV melting experiment to characterize interactions of HMGA2 with poly(dA-dT) 2 and poly(dA)poly(dT). My results demonstrated that (i) each HMGA2 molecule binds to 15 AT bp; (ii) HMGA2 binds to both AT DNAs with very high affinity. However, the binding reaction of HMGA2 to poly(dA-dT) 2 is enthalpy-driven and the binding reaction of HMGA2 with poly(dA)poly(dT) is entropy-driven; (iii) the binding reactions are strongly depended on salt concentrations; (3) Previous studies showed that HMGA2 may have sequence specificity. In this study, I used a PCR-based SELEX procedure to examine the DNA binding specificity of HMGA2. Two consensus sequences for HMGA2 have been identified: 5'-ATATTCGCGAWWATT-3' and 5'-ATATTGCGCAWWATT-3', where W represents A or T. These consensus sequences have a unique feature: the first five base pairs are AT-rich, the middle four to five base pairs are GC-rich, and the last five to six base pairs are AT-rich. All three segments are critical for high affinity binding. Replacing either one of the AT-rich sequences to a non-AT-rich sequence causes at least 100-fold decrease in the binding affinity. Intriguingly, if the GC-segment is substituted by an AT-rich segment, the binding affinity of HMGA2 is reduced approximately 5-fold. Identification of the consensus sequences for HMGA2 represents an important step towards finding its binding sites within the genome.
Resumo:
DNA-binding and RNA-binding proteins are usually considered ‘undruggable’ partly due to the lack of an efficient method to identify inhibitors from existing small molecule repositories. Here we report a rapid and sensitive high-throughput screening approach to identify compounds targeting protein–nucleic acids interactions based on protein–DNA or protein–RNA interaction enzyme-linked immunosorbent assays (PDI-ELISA or PRI-ELISA). We validated the PDI-ELISA method using the mammalian highmobility- group protein AT-hook 2 (HMGA2) as the protein of interest and netropsin as the inhibitor of HMGA2–DNA interactions. With this method we successfully identified several inhibitors and an activator for HMGA2–DNA interactions from a collection of 29 DNA-binding compounds. Guided by this screening excise, we showed that netropsin, the specific inhibitor of HMGA2–DNA interactions, strongly inhibited the differentiation of the mouse pre-adipocyte 3T3-L1 cells into adipocytes, most likely through a mechanism by which the inhibition is through preventing the binding of HMGA2 to the target DNA sequences. This method should be broadly applicable to identify compounds or proteins modulating many DNA-binding or RNA-binding proteins.