56 resultados para Sequence Analysis
em Duke University
Resumo:
BACKGROUND: Since mature erythrocytes are terminally differentiated cells without nuclei and organelles, it is commonly thought that they do not contain nucleic acids. In this study, we have re-examined this issue by analyzing the transcriptome of a purified population of human mature erythrocytes from individuals with normal hemoglobin (HbAA) and homozygous sickle cell disease (HbSS). METHODS AND FINDINGS: Using a combination of microarray analysis, real-time RT-PCR and Northern blots, we found that mature erythrocytes, while lacking ribosomal and large-sized RNAs, contain abundant and diverse microRNAs. MicroRNA expression of erythrocytes was different from that of reticulocytes and leukocytes, and contributed the majority of the microRNA expression in whole blood. When we used microRNA microarrays to analyze erythrocytes from HbAA and HbSS individuals, we noted a dramatic difference in their microRNA expression pattern. We found that miR-320 played an important role for the down-regulation of its target gene, CD71 during reticulocyte terminal differentiation. Further investigation revealed that poor expression of miR-320 in HbSS cells was associated with their defective downregulation CD71 during terminal differentiation. CONCLUSIONS: In summary, we have discovered significant microRNA expression in human mature erythrocytes, which is dramatically altered in HbSS erythrocytes and their defect in terminal differentiation. Thus, the global analysis of microRNA expression in circulating erythrocytes can provide mechanistic insights into the disease phenotypes of erythrocyte diseases.
Resumo:
New applications of genetic data to questions of historical biogeography have revolutionized our understanding of how organisms have come to occupy their present distributions. Phylogenetic methods in combination with divergence time estimation can reveal biogeographical centres of origin, differentiate between hypotheses of vicariance and dispersal, and reveal the directionality of dispersal events. Despite their power, however, phylogenetic methods can sometimes yield patterns that are compatible with multiple, equally well-supported biogeographical hypotheses. In such cases, additional approaches must be integrated to differentiate among conflicting dispersal hypotheses. Here, we use a synthetic approach that draws upon the analytical strengths of coalescent and population genetic methods to augment phylogenetic analyses in order to assess the biogeographical history of Madagascar's Triaenops bats (Chiroptera: Hipposideridae). Phylogenetic analyses of mitochondrial DNA sequence data for Malagasy and east African Triaenops reveal a pattern that equally supports two competing hypotheses. While the phylogeny cannot determine whether Africa or Madagascar was the centre of origin for the species investigated, it serves as the essential backbone for the application of coalescent and population genetic methods. From the application of these methods, we conclude that a hypothesis of two independent but unidirectional dispersal events from Africa to Madagascar is best supported by the data.
Resumo:
BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Resumo:
DNaseI footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High-throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. Multiple computational approaches have been developed to identify DNase-seq footprints as predictors of TF binding. However, recent studies have pointed to a substantial cleavage bias of DNase and its negative impact on predictive performance of footprinting. To assess the potential for using DNase-seq to identify individual binding sites, we performed DNase-seq on deproteinized genomic DNA and determined sequence cleavage bias. This allowed us to build bias corrected and TF-specific footprint models. The predictive performance of these models demonstrated that predicted footprints corresponded to high-confidence TF-DNA interactions. DNase-seq footprints were absent under a fraction of ChIP-seq peaks, which we show to be indicative of weaker binding, indirect TF-DNA interactions or possible ChIP artifacts. The modeling approach was also able to detect variation in the consensus motifs that TFs bind to. Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.
Resumo:
BACKGROUND: Enterotoxigenic Escherichia coli (ETEC) is a globally prevalent cause of diarrhea. Though usually self-limited, it can be severe and debilitating. Little is known about the host transcriptional response to infection. We report the first gene expression analysis of the human host response to experimental challenge with ETEC. METHODS: We challenged 30 healthy adults with an unattenuated ETEC strain, and collected serial blood samples shortly after inoculation and daily for 8 days. We performed gene expression analysis on whole peripheral blood RNA samples from subjects in whom severe symptoms developed (n = 6) and a subset of those who remained asymptomatic (n = 6) despite shedding. RESULTS: Compared with baseline, symptomatic subjects demonstrated significantly different expression of 406 genes highlighting increased immune response and decreased protein synthesis. Compared with asymptomatic subjects, symptomatic subjects differentially expressed 254 genes primarily associated with immune response. This comparison also revealed 29 genes differentially expressed between groups at baseline, suggesting innate resilience to infection. Drug repositioning analysis identified several drug classes with potential utility in augmenting immune response or mitigating symptoms. CONCLUSIONS: There are statistically significant and biologically plausible differences in host gene expression induced by ETEC infection. Differential baseline expression of some genes may indicate resilience to infection.
Resumo:
Thermoplastic materials such as cyclic-olefin copolymers (COC) provide a versatile and cost-effective alternative to the traditional glass or silicon substrate for rapid prototyping and industrial scale fabrication of microdevices. To extend the utility of COC as an effective microarray substrate, we developed a new method that enabled for the first time in situ synthesis of DNA oligonucleotide microarrays on the COC substrate. To achieve high-quality DNA synthesis, a SiO(2) thin film array was prepatterned on the inert and hydrophobic COC surface using RF sputtering technique. The subsequent in situ DNA synthesis was confined to the surface of the prepatterned hydrophilic SiO(2) thin film features by precision delivery of the phosphoramidite chemistry using an inkjet DNA synthesizer. The in situ SiO(2)-COC DNA microarray demonstrated superior quality and stability in hybridization assays and thermal cycling reactions. Furthermore, we demonstrate that pools of high-quality mixed-oligos could be cleaved off the SiO(2)-COC microarrays and used directly for construction of DNA origami nanostructures. It is believed that this method will not only enable synthesis of high-quality and low-cost COC DNA microarrays but also provide a basis for further development of integrated microfluidics microarrays for a broad range of bioanalytical and biofabrication applications.
Resumo:
The BUZ/Znf-UBP domain is a protein module found in the cytoplasmic deacetylase HDAC6, E3 ubiquitin ligase BRAP2/IMP, and a subfamily of ubiquitin-specific proteases. Although several BUZ domains have been shown to bind ubiquitin with high affinity by recognizing its C-terminal sequence (RLRGG-COOH), it is currently unknown whether the interaction is sequence-specific or whether the BUZ domains are capable of binding to proteins other than ubiquitin. In this work, the BUZ domains of HDAC6 and Ubp-M were subjected to screening against a one-bead-one-compound (OBOC) peptide library that exhibited random peptide sequences with free C-termini. Sequence analysis of the selected binding peptides as well as alanine scanning studies revealed that the BUZ domains require a C-terminal Gly-Gly motif for binding. At the more N-terminal positions, the two BUZ domains have distinct sequence specificities, allowing them to bind to different peptides and/or proteins. A database search of the human proteome on the basis of the BUZ domain specificities identified 11 and 24 potential partner proteins for Ubp-M and HDAC6 BUZ domains, respectively. Peptides corresponding to the C-terminal sequences of four of the predicted binding partners (FBXO11, histone H4, PTOV1, and FAT10) were synthesized and tested for binding to the BUZ domains by fluorescence polarization. All four peptides bound to the HDAC6 BUZ domain with low micromolar K(D) values and less tightly to the Ubp-M BUZ domain. Finally, in vitro pull-down assays showed that the Ubp-M BUZ domain was capable of binding to the histone H3-histone H4 tetramer protein complex. Our results suggest that BUZ domains are sequence-specific protein-binding modules, with each BUZ domain potentially binding to a different subset of proteins.
Resumo:
Light is a universal signal perceived by organisms, including fungi, in which light regulates common and unique biological processes depending on the species. Previous research has established that conserved proteins, originally called White collar 1 and 2 from the ascomycete Neurospora crassa, regulate UV/blue light sensing. Homologous proteins function in distant relatives of N. crassa, including the basidiomycetes and zygomycetes, which diverged as long as a billion years ago. Here we conducted microarray experiments on the basidiomycete fungus Cryptococcus neoformans to identify light-regulated genes. Surprisingly, only a single gene was induced by light above the commonly used twofold threshold. This gene, HEM15, is predicted to encode a ferrochelatase that catalyses the final step in haem biosynthesis from highly photoreactive porphyrins. The C. neoformans gene complements a Saccharomyces cerevisiae hem15Delta strain and is essential for viability, and the Hem15 protein localizes to mitochondria, three lines of evidence that the gene encodes ferrochelatase. Regulation of HEM15 by light suggests a mechanism by which bwc1/bwc2 mutants are photosensitive and exhibit reduced virulence. We show that ferrochelatase is also light-regulated in a white collar-dependent fashion in N. crassa and the zygomycete Phycomyces blakesleeanus, indicating that ferrochelatase is an ancient target of photoregulation in the fungal kingdom.
Resumo:
The array of human immunodeficiency virus (HIV) subtypes encountered in East London, an area long associated with migration, is unusually heterogeneous, reflecting the diverse geographical origins of the population. In this study it was shown that viral subtypes or clades infecting a sample of HIV type 1 (HIV-1)-positive individuals in East London reflect the global pandemic. The authors studied the humoral response in 210 treatment-naïve chronically HIV-1-infected (>1 year) adult subjects against a panel of 12 viruses from six different clades. Plasmas from individuals infected with clade C, but also plasmas from clade A, and to a lesser degree clade CRF02_AG and CRF01_AE, were significantly more potent at neutralizing the tested viruses compared with plasmas from individuals infected with clade B. The difference in humoral robustness between clade C- and B-infected patients was confirmed in titration studies with an extended panel of clade B and C viruses. These results support the approach to develop an HIV-1 vaccine that includes clade C or A envelope protein (Env) immunogens for the induction of a potent neutralizing humoral response.
Resumo:
BACKGROUND: Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets. RESULTS: Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets. CONCLUSIONS: Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.
Resumo:
BACKGROUND: In a time-course microarray experiment, the expression level for each gene is observed across a number of time-points in order to characterize the temporal trajectories of the gene-expression profiles. For many of these experiments, the scientific aim is the identification of genes for which the trajectories depend on an experimental or phenotypic factor. There is an extensive recent body of literature on statistical methodology for addressing this analytical problem. Most of the existing methods are based on estimating the time-course trajectories using parametric or non-parametric mean regression methods. The sensitivity of these regression methods to outliers, an issue that is well documented in the statistical literature, should be of concern when analyzing microarray data. RESULTS: In this paper, we propose a robust testing method for identifying genes whose expression time profiles depend on a factor. Furthermore, we propose a multiple testing procedure to adjust for multiplicity. CONCLUSIONS: Through an extensive simulation study, we will illustrate the performance of our method. Finally, we will report the results from applying our method to a case study and discussing potential extensions.
Resumo:
BACKGROUND: MicroRNAs (miRNAs) are small non-coding RNAs that post-transcriptionally regulate gene expression in a variety of organisms, including insects, vertebrates, and plants. miRNAs play important roles in cell development and differentiation as well as in the cellular response to stress and infection. To date, there are limited reports of miRNA identification in mosquitoes, insects that act as essential vectors for the transmission of many human pathogens, including flaviviruses. West Nile virus (WNV) and dengue virus, members of the Flaviviridae family, are primarily transmitted by Aedes and Culex mosquitoes. Using high-throughput deep sequencing, we examined the miRNA repertoire in Ae. albopictus cells and Cx. quinquefasciatus mosquitoes. RESULTS: We identified a total of 65 miRNAs in the Ae. albopictus C7/10 cell line and 77 miRNAs in Cx. quinquefasciatus mosquitoes, the majority of which are conserved in other insects such as Drosophila melanogaster and Anopheles gambiae. The most highly expressed miRNA in both mosquito species was miR-184, a miRNA conserved from insects to vertebrates. Several previously reported Anopheles miRNAs, including miR-1890 and miR-1891, were also found in Culex and Aedes, and appear to be restricted to mosquitoes. We identified seven novel miRNAs, arising from nine different precursors, in C7/10 cells and Cx. quinquefasciatus mosquitoes, two of which have predicted orthologs in An. gambiae. Several of these novel miRNAs reside within a ~350 nt long cluster present in both Aedes and Culex. miRNA expression was confirmed by primer extension analysis. To determine whether flavivirus infection affects miRNA expression, we infected female Culex mosquitoes with WNV. Two miRNAs, miR-92 and miR-989, showed significant changes in expression levels following WNV infection. CONCLUSIONS: Aedes and Culex mosquitoes are important flavivirus vectors. Recent advances in both mosquito genomics and high-throughput sequencing technologies enabled us to interrogate the miRNA profile in these two species. Here, we provide evidence for over 60 conserved and seven novel mosquito miRNAs, expanding upon our current understanding of insect miRNAs. Undoubtedly, some of the miRNAs identified will have roles not only in mosquito development, but also in mediating viral infection in the mosquito host.
Resumo:
BACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.
Resumo:
BACKGROUND: Mutations in the TP53 gene are extremely common and occur very early in the progression of serous ovarian cancers. Gene expression patterns that relate to mutational status may provide insight into the etiology and biology of the disease. METHODS: The TP53 coding region was sequenced in 89 frozen serous ovarian cancers, 40 early stage (I/II) and 49 advanced stage (III/IV). Affymetrix U133A expression data was used to define gene expression patterns by mutation, type of mutation, and cancer stage. RESULTS: Missense or chain terminating (null) mutations in TP53 were found in 59/89 (66%) ovarian cancers. Early stage cancers had a significantly higher rate of null mutations than late stage disease (38% vs. 8%, p < 0.03). In advanced stage cases, mutations were more prevalent in short term survivors than long term survivors (81% vs. 30%, p = 0.0004). Gene expression patterns had a robust ability to predict TP53 status within training data. By using early versus late stage disease for out of sample predictions, the signature derived from early stage cancers could accurately (86%) predict mutation status of late stage cancers. CONCLUSIONS: This represents the first attempt to define a genomic signature of TP53 mutation in ovarian cancer. Patterns of gene expression characteristic of TP53 mutation could be discerned and included several genes that are known p53 targets or have been described in the context of expression signatures of TP53 mutation in breast cancer.
Resumo:
BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.