979 resultados para Genomic Regions


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Chromosomal alterations in leukemia have been shown to have prognostic and predictive significance and are also important minimal residual disease (MRD) markers in the follow-up of leukemia patients. Although specific oncogenes and tumor suppressors have been discovered in some of the chromosomal alterations, the role and target genes of many alterations in leukemia remain unknown. In addition, a number of leukemia patients have a normal karyotype by standard cytogenetics, but have variability in clinical course and are often molecularly heterogeneous. Cytogenetic methods traditionally used in leukemia analysis and diagnostics; G-banding, various fluorescence in situ hybridization (FISH) techniques, and chromosomal comparative genomic hybridization (cCGH), have enormously increased knowledge about the leukemia genome, but have limitations in resolution or in genomic coverage. In the last decade, the development of microarray comparative genomic hybridization (array-CGH, aCGH) for DNA copy number analysis and the SNP microarray (SNP-array) method for simultaneous copy number and loss of heterozygosity (LOH) analysis has enabled investigation of chromosomal and gene alterations genome-wide with high resolution and high throughput. In these studies, genetic alterations were analyzed in acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL). The aim was to screen and characterize genomic alterations that could play role in leukemia pathogenesis by using aCGH and SNP-arrays. One of the most important goals was to screen cryptic alterations in karyotypically normal leukemia patients. In addition, chromosomal changes were evaluated to narrow the target regions, to find new markers, and to obtain tumor suppressor and oncogene candidates. The work presented here shows the capability of aCGH to detect submicroscopic copy number alterations in leukemia, with information about breakpoints and genes involved in the alterations, and that genome-wide microarray analyses with aCGH and SNP-array are advantageous methods in the research and diagnosis of leukemia. The most important findings were the cryptic changes detected with aCGH in karyotypically normal AML and CLL, characterization of amplified genes in 11q marker chromosomes, detection of deletion-based mechanisms of MLL-ARHGEF12 fusion gene formation, and detection of LOH without copy number alteration in karyotypically normal AML. These alterations harbor candidate oncogenes and tumor suppressors for further studies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Helicobacter pylori infection is a risk factor for gastric cancer, which is a major health issue worldwide. Gastric cancer has a poor prognosis due to the unnoticeable progression of the disease and surgery is the only available treatment in gastric cancer. Therefore, gastric cancer patients would greatly benefit from identifying biomarker genes that would improve diagnostic and prognostic prediction and provide targets for molecular therapies. DNA copy number amplifications are the hallmarks of cancers in various anatomical locations. Mechanisms of amplification predict that DNA double-strand breaks occur at the margins of the amplified region. The first objective of this thesis was to identify the genes that were differentially expressed in H. pylori infection as well as the transcription factors and signal transduction pathways that were associated with the gene expression changes. The second objective was to identify putative biomarker genes in gastric cancer with correlated expression and copy number, and the last objective was to characterize cancers based on DNA copy number amplifications. DNA microarrays, an in vitro model and real-time polymerase chain reaction were used to measure gene expression changes in H. pylori infected AGS cells. In order to identify the transcription factors and signal transduction pathways that were activated after H. pylori infection, gene expression profiling data from the H. pylori experiments and a bioinformatics approach accompanied by experimental validation were used. Genome-wide expression and copy number microarray analysis of clinical gastric cancer samples and immunohistochemistry on tissue microarray were used to identify putative gastric cancer genes. Data mining and machine learning techniques were applied to study amplifications in a cross-section of cancers. FOS and various stress response genes were regulated by H. pylori infection. H. pylori regulated genes were enriched in the chromosomal regions that are frequently changed in gastric cancer, suggesting that molecular pathways of gastric cancer and premalignant H. pylori infection that induces gastritis are interconnected. 16 transcription factors were identified as being associated with H. pylori infection induced changes in gene expression. NF-κB transcription factor and p50 and p65 subunits were verified using elecrophoretic mobility shift assays. ERBB2 and other genes located in 17q12- q21 were found to be up-regulated in association with copy number amplification in gastric cancer. Cancers with similar cell type and origin clustered together based on the genomic localization of the amplifications. Cancer genes and large genes were co-localized with amplified regions and fragile sites, telomeres, centromeres and light chromosome bands were enriched at the amplification boundaries. H. pylori activated transcription factors and signal transduction pathways function in cellular mechanisms that might be capable of promoting carcinogenesis of the stomach. Intestinal and diffuse type gastric cancers showed distinct molecular genetic profiles. Integration of gene expression and copy number microarray data allowed the identification of genes that might be involved in gastric carcinogenesis and have clinical relevance. Gene amplifications were demonstrated to be non-random genomic instabilities. Cell lineage, properties of precursor stem cells, tissue microenvironment and genomic map localization of specific oncogenes define the site specificity of DNA amplifications, whereas labile genomic features define the structures of amplicons. These conclusions suggest that the definition of genomic changes in cancer is based on the interplay between the cancer cell and the tumor microenvironment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Meleagrid herpesvirus 1 (MeHV-1 or turkey herpesvirus) has been widely used as a vaccine in commercial poultry. Initially, these vaccine applications were for the prevention of Marek’s disease resulting from Gallid herpesvirus 2 infections, while more recently MeHV-1 has been used as recombinant vector for other poultry infections. The construction of herpesvirus infectious clones that permit propagation and manipulation of the viral genome in bacterial hosts has advanced the studies of herpesviral genetics. The current study reports the construction of five MeHV-1 infectious clones. The in vitro properties of viruses recovered from these clones were indistinguishable from the parental MeHV-1. In contrast, the rescued MeHV-1 viruses were significantly attenuated when used in vivo. Complete sequencing of the infectious clones identified the absence of two regions of the MeHV-1 genome compared to the MeHV-1 reference sequence. These analyses determined the rescued viruses have seven genes, UL43, UL44, UL45, UL56, HVT071, sorf3 and US2 either partially or completely deleted. In addition, single nucleotide polymorphisms were identified in all clones compared with the MeHV-1 reference sequence. As a consequence of one of the polymorphisms identified in the UL13 gene, four of the rescued viruses were predicted to encode a serine/threonine protein kinase lacking two of three domains required for activity. Thus four of the recovered viruses have a total of eight missing or defective genes. The implications of these findings in the context of herpesvirus biology and infectious clone construction are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Meleagrid herpesvirus 1 (MeHV-1 or turkey herpesvirus) has been widely used as a vaccine in commercial poultry. Initially, these vaccine applications were for the prevention of Marek’s disease resulting from Gallid herpesvirus 2 infections, while more recently MeHV-1 has been used as recombinant vector for other poultry infections. The construction of herpesvirus infectious clones that permit propagation and manipulation of the viral genome in bacterial hosts has advanced the studies of herpesviral genetics. The current study reports the construction of five MeHV-1 infectious clones. The in vitro properties of viruses recovered from these clones were indistinguishable from the parental MeHV-1. In contrast, the rescued MeHV-1 viruses were significantly attenuated when used in vivo. Complete sequencing of the infectious clones identified the absence of two regions of the MeHV-1 genome compared to the MeHV-1 reference sequence. These analyses determined the rescued viruses have seven genes, UL43, UL44, UL45, UL56, HVT071, sorf3 and US2 either partially or completely deleted. In addition, single nucleotide polymorphisms were identified in all clones compared with the MeHV-1 reference sequence. As a consequence of one of the polymorphisms identified in the UL13 gene, four of the rescued viruses were predicted to encode a serine/threonine protein kinase lacking two of three domains required for activity. Thus four of the recovered viruses have a total of eight missing or defective genes. The implications of these findings in the context of herpesvirus biology and infectious clone construction are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Gastric cancer is the fourth most common cancer and the second most common cause of cancer-related death worldwide. Due to lack of early symptoms, gastric cancer is characterized by late stage diagnosis and unsatisfactory options for curative treatment. Several genomic alterations have been identified in gastric cancer, but the major factors contributing to initiation and progression of gastric cancer remain poorly known. Gene copy number alterations play a key role in the development of gastric cancer, and a change in gene copy number is one of the fundamental mechanisms for a cancer cell to control the expression of potential oncogenes and tumor suppressor genes. This thesis aims at clarifying the complex genomic alterations of gastric cancer to identify novel molecular biomarkers for diagnostic purposes as well as for targeted treatment. To highlight genes of potential biological and clinical relevance, we carried out a systematic microarray-based survey of gene expression and copy number levels in primary gastric tumors and gastric cancer cell lines. Results were validated using immunohistochemistry, real-time qRT-PCR, and affinity capture-based transcript (TRAC) assay. Altogether 192 clinical gastric tissue samples and 7 gastric cancer cell lines were included in this study. Multiple chromosomal regions with recurrent copy number alterations were detected. The most frequent chromosomal alterations included gains at 7q, 8q, 17q, 19q, and 20q and losses at 9p, 18q, and 21q. Distinctive patterns of copy number alterations were detected for different histological subtypes (intestinal and diffuse) and for cancers located in different parts of the stomach. The impact of copy number alterations on gene expression was significant, as 6-10% of genes located in the regions of gains and losses also showed concomitant alterations in their expression. By combining the information from the DNA- and RNA-level analyses many novel gastric cancer-related genes, such as ALPK2, ENAH, HHIPL2, and OSMR, were identified. Independent genome-wide gene expression analysis of Finnish and Japanese gastric tumors revealed an additional set of genes that was differentially expressed in cancerous gastric tissues compared with normal tissue. Overexpression of one of these genes, CXCL1, was associated with an improved survival of gastric cancer. Thus, using an integrative microarray analysis, several novel genes were identified that may be critically important for gastric carcinogenesis. Further studies of these genes may lead to novel biomarkers for gastric cancer diagnosis and targeted therapy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Extraintestinal pathogenic Escherichia coli (ExPEC) represent a diverse group of strains of E. coli, which infect extraintestinal sites, such as the urinary tract, the bloodstream, the meninges, the peritoneal cavity, and the lungs. Urinary tract infections (UTIs) caused by uropathogenic E. coli (UPEC), the major subgroup of ExPEC, are among the most prevalent microbial diseases world wide and a substantial burden for public health care systems. UTIs are responsible for serious morbidity and mortality in the elderly, in young children, and in immune-compromised and hospitalized patients. ExPEC strains are different, both from genetic and clinical perspectives, from commensal E. coli strains belonging to the normal intestinal flora and from intestinal pathogenic E. coli strains causing diarrhea. ExPEC strains are characterized by a broad range of alternate virulence factors, such as adhesins, toxins, and iron accumulation systems. Unlike diarrheagenic E. coli, whose distinctive virulence determinants evoke characteristic diarrheagenic symptoms and signs, ExPEC strains are exceedingly heterogeneous and are known to possess no specific virulence factors or a set of factors, which are obligatory for the infection of a certain extraintestinal site (e. g. the urinary tract). The ExPEC genomes are highly diverse mosaic structures in permanent flux. These strains have obtained a significant amount of DNA (predictably up to 25% of the genomes) through acquisition of foreign DNA from diverse related or non-related donor species by lateral transfer of mobile genetic elements, including pathogenicity islands (PAIs), plasmids, phages, transposons, and insertion elements. The ability of ExPEC strains to cause disease is mainly derived from this horizontally acquired gene pool; the extragenous DNA facilitates rapid adaptation of the pathogen to changing conditions and hence the extent of the spectrum of sites that can be infected. However, neither the amount of unique DNA in different ExPEC strains (or UPEC strains) nor the mechanisms lying behind the observed genomic mobility are known. Due to this extreme heterogeneity of the UPEC and ExPEC populations in general, the routine surveillance of ExPEC is exceedingly difficult. In this project, we presented a novel virulence gene algorithm (VGA) for the estimation of the extraintestinal virulence potential (VP, pathogenicity risk) of clinically relevant ExPECs and fecal E. coli isolates. The VGA was based on a DNA microarray specific for the ExPEC phenotype (ExPEC pathoarray). This array contained 77 DNA probes homologous with known (e.g. adhesion factors, iron accumulation systems, and toxins) and putative (e.g. genes predictably involved in adhesion, iron uptake, or in metabolic functions) ExPEC virulence determinants. In total, 25 of DNA probes homologous with known virulence factors and 36 of DNA probes representing putative extraintestinal virulence determinants were found at significantly higher frequency in virulent ExPEC isolates than in commensal E. coli strains. We showed that the ExPEC pathoarray and the VGA could be readily used for the differentiation of highly virulent ExPECs both from less virulent ExPEC clones and from commensal E. coli strains as well. Implementing the VGA in a group of unknown ExPECs (n=53) and fecal E. coli isolates (n=37), 83% of strains were correctly identified as extraintestinal virulent or commensal E. coli. Conversely, 15% of clinical ExPECs and 19% of fecal E. coli strains failed to raster into their respective pathogenic and non-pathogenic groups. Clinical data and virulence gene profiles of these strains warranted the estimated VPs; UPEC strains with atypically low risk-ratios were largely isolated from patients with certain medical history, including diabetes mellitus or catheterization, or from elderly patients. In addition, fecal E. coli strains with VPs characteristic for ExPEC were shown to represent the diagnostically important fraction of resident strains of the gut flora with a high potential of causing extraintestinal infections. Interestingly, a large fraction of DNA probes associated with the ExPEC phenotype corresponded to novel DNA sequences without any known function in UTIs and thus represented new genetic markers for the extraintestinal virulence. These DNA probes included unknown DNA sequences originating from the genomic subtractions of four clinical ExPEC isolates as well as from five novel cosmid sequences identified in the UPEC strains HE300 and JS299. The characterized cosmid sequences (pJS332, pJS448, pJS666, pJS700, and pJS706) revealed complex modular DNA structures with known and unknown DNA fragments arranged in a puzzle-like manner and integrated into the common E. coli genomic backbone. Furthermore, cosmid pJS332 of the UPEC strain HE300, which carried a chromosomal virulence gene cluster (iroBCDEN) encoding the salmochelin siderophore system, was shown to be part of a transmissible plasmid of Salmonella enterica. Taken together, the results of this project pointed towards the assumptions that first, (i) homologous recombination, even within coding genes, contributes to the observed mosaicism of ExPEC genomes and secondly, (ii) besides en block transfer of large DNA regions (e.g. chromosomal PAIs) also rearrangements of small DNA modules provide a means of genomic plasticity. The data presented in this project supplemented previous whole genome sequencing projects of E. coli and indicated that each E. coli genome displays a unique assemblage of individual mosaic structures, which enable these strains to successfully colonize and infect different anatomical sites.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multiple sclerosis (MS) is an immune-mediated demyelinating disorder of the central nervous system (CNS) affecting 0.1-0.2% of Northern European descent population. MS is considered to be a multifactorial disease, both environment and genetics play a role in its pathogenesis. Despite several decades of intense research, the etiological and pathogenic mechanisms underlying MS remain still largely unknown and no curative treatment exists. The genetic architecture underlying MS is complex with multiple genes involved. The strongest and the best characterized predisposing genetic factors for MS are located, as in other immune-mediated diseases, in the major histocompatibility complex (MHC) on chromosome 6. In humans MHC is called human leukocyte antigen (HLA). Alleles of the HLA locus have been found to associate strongly with MS and remained for many years the only consistently replicable genetic associations. However, recently other genes located outside the MHC region have been proposed as strong candidates for susceptibility to MS in several studies. In this thesis a new genetic locus located on chromosome 7q32, interferon regulatory factor 5 (IRF5), was identified in the susceptibility to MS. In particular, we found that common variation of the gene was associated with the disease in three different populations, Spanish, Swedish and Finnish. We also suggested a possible functional role for one of the risk alleles with impact on the expression of the IRF5 locus. Previous studies have pointed out a possible role played by chromosome 2q33 in the susceptibility to MS and other autoimmune disorders. The work described here also investigated the involvement of this chromosomal region in MS predisposition. After the detection of genetic association with 2q33 (article-1), we extended our analysis through fine-scale single nucleotide polymorphism (SNP) mapping to define further the contribution of this genomic area to disease pathogenesis (article-4). We found a trend (p=0.04) for association to MS with an intronic SNP located in the inducible T-cell co-stimulator (ICOS) gene, an important player in the co-stimulatory pathway of the immune system. Expression analysis of ICOS revealed a novel, previously uncharacterized, alternatively spliced isoform, lacking the extracellular domain that is needed for ligand binding. The stability of the newly-identified transcript variant and its subcellular localization were analyzed. These studies indicated that the novel isoform is stable and shows different subcellular localization as compared to full-length ICOS. The novel isoform might have a regulatory function, but further studies are required to elucidate its function. Chromosome 19q13 has been previously suggested as one of the genomic areas involved in MS predisposition. In several populations, suggestive linkage signals between MS predisposition and 19q13 have been obtained. Here, we analysed the role of allelic variation in 19q13 by family based association analysis in 782 MS families collected from Finland. In this dataset, we were not able to detect any statistically significant associations, although several previously suggested markers were included to the analysis. Replication of the previous findings on the basis of linkage disequilibrium between marker allele and disease/risk allele appears notoriously difficult because of limitations such as allelic heterogeneity. Re-sequencing based approaches may be required for elucidating the role of chromosome 19q13 with MS. This thesis has resulted in the identification of a new MS susceptibility locus (IRF5) previously associated with other inflammatory or autoimmune disorders, such as SLE. IRF5 is one of the mediators of interferons biological function. In addition to providing new insight in the possible pathogenetic pathway of the disease, this finding suggests that there might be common mechanisms between different immune-mediated disorders. Furthermore the work presented here has uncovered a novel isoform of ICOS, which may play a role in regulatory mechanisms of ICOS, an important mediator of lymphocyte activation. Further work is required to uncover its functions and possible involvement of the ICOS locus in MS susceptibility.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The nucleotide sequence of cosmid B1790, carrying the Rif-Str regions of the Mycobacterium leprae chromosome, has been determined. Twelve open reading frames were identified in the 36716bp sequence, representing 40% of the coding capacity. Five ribosomal proteins, two elongation factors and the β and β'subunits of RNA polymerase have been characterized and two novel genes were found. One of these encodes a member of the so-called ABC family of ATP-binding proteins while the other appears to encode an enzyme involved in repairing genomic lesions caused by free radicals. This finding may well be significant as M. leprae, an intracellular pathogen, lives within macrophages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic alterations like point mutations, insertions, deletions, inversions and translocations are frequently found in cancers. Chromosomal translocations are one of the most common genomic aberrations associated with nearly all types of cancers especially leukemia and lymphoma. Recent studies have shown the role of non-B DNA structures in generation of translocations. In the present study, using various bioinformatic tools, we show the propensity of formation of different types of altered DNA structures near translocation breakpoint regions. In particular, we find close association between occurrence of G-quadruplex forming motifs and fragile regions in almost 70% of genes involved in rearrangements in lymphoid cancers. However, such an analysis did not provide any evidence for the occurrence of G-quadruplexes at the close vicinity of translocation breakpoint regions in nonlymphoid cancers. Overall, this study will help in the identification of novel non-B DNA targets that may be responsible for generation of chromosomal translocations in cancer. (C) 2012 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: CpG islands, which are clusters of CpG dinucleotides in GC-rich regions, are considered gene markers and represent an important feature of mammalian genomes. Previous studies of CpG islands have largely been on specific loci or within one geno

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recent release of the domestic dog genome provides us with an ideal opportunity to investigate dog-specific genomic features. In this study, we performed a systematic analysis of CpG islands (CGIs), which are often considered gene markers, in the dog

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Common carp Cyprinus carpio genomic DNA repetitive sequence CR1 has been DIG-labeled and hybridized in situ against chromosomes of red common carp (Cyprinus carpio L. Xingguo red var.). It is found that the repetitive sequence CR1 is mainly localized at the centromeric regions of chromosomes of the red common carp, The application of the chromosomal in situ hybridization technique on fish and the relationship between CR1 repetitive sequence distribution and its function have been discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work represents the nucleotide sequence of the core histone gene cluster from scallop Chlamys farreri. The tandemly repeated unit of 5671 bp containing a copy of the four core histone genes H4, H2B, H2A and H3 was amplified and identified by the techniques of homology cloning and genomic DNA walking. All the histone genes in the cluster had the structures in their 3' flanking region which related to the evolution of histone gene expression patterns throughout the cell cycle, including two different termination signals, the hairpin structure and at least one AATAAA polyadenylation signal. In their 5' region, the transcription initiation sites with a conserved sequence of 5'-PyATTCPu-3' known as the CAP site were present in all genes except to H2B, generally 37-45 bp upstream of the start code. Canonical TATA and CAAT boxes were identified only in certain histone genes. In the case of the promoters of H2B and H2A genes, there was a 5'-GATCC-3' element, which had been found to be essential to start transcription at the appropriate site. After this element, in the promoter of H2B, there was another sequence, 5'-GGATCGAAACGTTC-3', which was similar to the consensus sequence of 5'-GGAATAAACGTATTC-3' corresponding to the H2B-specific promoter element. The presence of enhancer sequences (5'-TGATATATG-3') was identified from the H4 and H3 genes, matching perfectly with the consensus sequence defined for histone genes. There were several slightly more complex repetitive DNA in the intergene regions. The presence of the series of conserved sequences and reiterated sequences was consistent with the view that mollusc histone gene cluster arose by duplicating of an ancestral precursor histone gene, the birth-and-death evolution model with strong purifying selection enabled the histone cluster less variation and more conserved function. Meanwhile, the H2A and the H2B were demonstrated to be potential good marks for phylogenetic analysis. All the results will be contributed to the characterization of repeating histone gene families in molluscs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

From primates to bees, social status regulates reproduction. In the cichlid fish Astatotilapia (Haplochromis) burtoni, subordinate males have reduced fertility and must become dominant to reproduce. This increase in sexual capacity is orchestrated by neurons in the preoptic area, which enlarge in response to dominance and increase expression of gonadotropin-releasing hormone 1 (GnRH1), a peptide critical for reproduction. Using a novel behavioral paradigm, we show for the first time that subordinate males can become dominant within minutes of an opportunity to do so, displaying dramatic changes in body coloration and behavior. We also found that social opportunity induced expression of the immediate-early gene egr-1 in the anterior preoptic area, peaking in regions with high densities of GnRH1 neurons, and not in brain regions that express the related peptides GnRH2 and GnRH3. This genomic response did not occur in stable subordinate or stable dominant males even though stable dominants, like ascending males, displayed dominance behaviors. Moreover, egr-1 in the optic tectum and the cerebellum was similarly induced in all experimental groups, showing that egr-1 induction in the anterior preoptic area of ascending males was specific to this brain region. Because egr-1 codes for a transcription factor important in neural plasticity, induction of egr-1 in the anterior preoptic area by social opportunity could be an early trigger in the molecular cascade that culminates in enhanced fertility and other long-term physiological changes associated with dominance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.

We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.

We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.

Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.

This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.