28 resultados para Complete Dna-sequence
em DigitalCommons@The Texas Medical Center
Treponema paraluiscuniculi is the causative agent of rabbit venereal spirochetosis. It is not infectious to humans, although its genome structure is very closely related to other pathogenic Treponema species including Treponema pallidum subspecies pallidum, the etiological agent of syphilis. In this study, the genome sequence of Treponema paraluiscuniculi, strain Cuniculi A, was determined by a combination of several high-throughput sequencing strategies. Whereas the overall size (1,133,390 bp), arrangement, and gene content of the Cuniculi A genome closely resembled those of the T. pallidum genome, the T. paraluiscuniculi genome contained a markedly higher number of pseudogenes and gene fragments (51). In addition to pseudogenes, 33 divergent genes were also found in the T. paraluiscuniculi genome. A set of 32 (out of 84) affected genes encoded proteins of known or predicted function in the Nichols genome. These proteins included virulence factors, gene regulators and components of DNA repair and recombination. The majority (52 or 61.9%) of the Cuniculi A pseudogenes and divergent genes were of unknown function. Our results indicate that T. paraluiscuniculi has evolved from a T. pallidum-like ancestor and adapted to a specialized host-associated niche (rabbits) during loss of infectivity to humans. The genes that are inactivated or altered in T. paraluiscuniculi are candidates for virulence factors important in the infectivity and pathogenesis of T. pallidum subspecies.
Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^
(1) A mathematical theory for computing the probabilities of various nucleotide configurations is developed, and the probability of obtaining the correct phylogenetic tree (model tree) from sequence data is evaluated for six phylogenetic tree-making methods (UPGMA, distance Wagner method, transformed distance method, Fitch-Margoliash's method, maximum parsimony method, and compatibility method). The number of nucleotides (m*) necessary to obtain the correct tree with a probability of 95% is estimated with special reference to the human, chimpanzee, and gorilla divergence. m* is at least 4,200, but the availability of outgroup species greatly reduces m* for all methods except UPGMA. m* increases if transitions occur more frequently than transversions as in the case of mitochondrial DNA. (2) A new tree-making method called the neighbor-joining method is proposed. This method is applicable either for distance data or character state data. Computer simulation has shown that the neighbor-joining method is generally better than UPGMA, Farris' method, Li's method, and modified Farris method on recovering the true topology when distance data are used. A related method, the simultaneous partitioning method, is also discussed. (3) The maximum likelihood (ML) method for phylogeny reconstruction under the assumption of both constant and varying evolutionary rates is studied, and a new algorithm for obtaining the ML tree is presented. This method gives a tree similar to that obtained by UPGMA when constant evolutionary rate is assumed, whereas it gives a tree similar to that obtained by the maximum parsimony tree and the neighbor-joining method when varying evolutionary rate is assumed. ^
The LIM domain-binding protein Ldb1 is an essential cofactor of LIM-homeodomain (LIM-HD) and LIM-only (LMO) proteins in development. The stoichiometry of Ldb1, LIM-HD, and LMO proteins is tightly controlled in the cell and is likely a critical determinant of their biological actions. Single-stranded DNA-binding proteins (SSBPs) were recently shown to interact with Ldb1 and are also important in developmental programs. We establish here that two mammalian SSBPs, SSBP2 and SSBP3, contribute to an erythroid DNA-binding complex that contains the transcription factors Tal1 and GATA-1, the LIM domain protein Lmo2, and Ldb1 and binds a bipartite E-box-GATA DNA sequence motif. In addition, SSBP2 was found to augment transcription of the Protein 4.2 (P4.2) gene, a direct target of the E-box-GATA-binding complex, in an Ldb1-dependent manner and to increase endogenous Ldb1 and Lmo2 protein levels, E-box-GATA DNA-binding activity, and P4.2 and beta-globin expression in erythroid progenitors. Finally, SSBP2 was demonstrated to inhibit Ldb1 and Lmo2 interaction with the E3 ubiquitin ligase RLIM, prevent RLIM-mediated Ldb1 ubiquitination, and protect Ldb1 and Lmo2 from proteasomal degradation. These results define a novel biochemical function for SSBPs in regulating the abundance of LIM domain and LIM domain-binding proteins.
Lyme disease Borrelia can infect humans and animals for months to years, despite the presence of an active host immune response. The vls antigenic variation system, which expresses the surface-exposed lipoprotein VlsE, plays a major role in B. burgdorferi immune evasion. Gene conversion between vls silent cassettes and the vlsE expression site occurs at high frequency during mammalian infection, resulting in sequence variation in the VlsE product. In this study, we examined vlsE sequence variation in B. burgdorferi B31 during mouse infection by analyzing 1,399 clones isolated from bladder, heart, joint, ear, and skin tissues of mice infected for 4 to 365 days. The median number of codon changes increased progressively in C3H/HeN mice from 4 to 28 days post infection, and no clones retained the parental vlsE sequence at 28 days. In contrast, the decrease in the number of clones with the parental vlsE sequence and the increase in the number of sequence changes occurred more gradually in severe combined immunodeficiency (SCID) mice. Clones containing a stop codon were isolated, indicating that continuous expression of full-length VlsE is not required for survival in vivo; also, these clones continued to undergo vlsE recombination. Analysis of clones with apparent single recombination events indicated that recombinations into vlsE are nonselective with regard to the silent cassette utilized, as well as the length and location of the recombination event. Sequence changes as small as one base pair were common. Fifteen percent of recovered vlsE variants contained "template-independent" sequence changes, which clustered in the variable regions of vlsE. We hypothesize that the increased frequency and complexity of vlsE sequence changes observed in clones recovered from immunocompetent mice (as compared with SCID mice) is due to rapid clearance of relatively invariant clones by variable region-specific anti-VlsE antibody responses.
Aniridia (AN) is a congenital, panocular disorder of the eye characterized by the complete or partial absence of the iris. The disease can occur in both the sporadic and familial forms which, in the latter case, is inherited as an autosomal dominant trait with high penetrance. The objective of this study was to isolate and characterize the genes involved in AN and Sey, and thereby to gain a better understanding of the molecular basis of the two disorders.^ Using a positional cloning strategy, I have approached and cloned from the AN locus in human chromosomal band 11p13 a cDNA that is deleted in two patients with AN. The deletions in these patients overlap by about 70 kb and encompass the 3$\sp\prime$ end of the cDNA. This cDNA detects a 2.7 kb mRNA encoded by a transcription unit estimated to span approximately 50 kb of genomic DNA. The message is specifically expressed in all tissues affected in all forms of AN, namely within the presumptive iris, lens, neuroretina, the superficial layers of the cornea, the olfactory bulbs, and the cerebellum. Sequence analysis of the AN cDNA revealed a number of motifs characteristic of certain transcription factors. Chief among these are the presence of the paired domain, the homeodomain, and a carboxy-terminal domain rich in serine, threonine and proline residues. The overall structure shows high homology to the Drosophila segmentation gene paired and members of the murine Pax family of developmental control genes.^ Utilizing a conserved human genomic DNA sequence as probe, I was able to isolate an embryonic murine cDNA which is over 92% homologous in nucleotide sequence and virtually identical at the amino acid level to the human AN cDNA. The expression pattern of the murine gene is the same as that in man, supporting the conclusion that it probably corresponds to the Sey gene. Its specific expression in the neuroectodermal component of the eye, in glioblastomas, but not in the neural crest-derived PC12 pheochromocytoma cell line, suggests that a defect in neuroectodermal rather mesodermal development might be the common etiological factor underlying AN and Sey. ^
Complete NotI, SfiI, XbaI and BlnI cleavage maps of Escherichia coli K-12 strain MG1655 were constructed. Techniques used included: CHEF pulsed field gel electrophoresis; transposon mutagenesis; fragment hybridization to the ordered $\lambda$ library of Kohara et al.; fragment and cosmid hybridization to Southern blots; correlation of fragments and cleavage sites with EcoMap, a sequence-modified version of the genomic restriction map of Kohara et al.; and correlation of cleavage sites with DNA sequence databases. In all, 105 restriction sites were mapped and correlated with the EcoMap coordinate system.^ NotI, SfiI, XbaI and BlnI restriction patterns of five commonly used E. coli K-12 strains were compared to those of MG1655. The variability between strains, some of which are separated by numerous steps of mutagenic treatment, is readily detectable by pulsed-field gel electrophoresis. A model is presented to account for the difference between the strains on the basis of simple insertions, deletions, and in one case an inversion. Insertions and deletions ranged in size from 1 kb to 86 kb. Several of the larger features have previously been characterized and some of the smaller rearrangements can potentially account for previously reported genetic features of these strains.^ Some aspects of the frequency and distribution of NotI, SfiI, XbaI and BlnI cleavage sites were analyzed using a method based on Markov chain theory. Overlaps of Dam and Dcm methylase sites with XbaI and SfiI cleavage sites were examined. The one XbaI-Dam overlap in the database is in accord with the expected frequency of this overlap. The occurrence of certain types of SfiI-Dcm overlaps are overrepresented. Of the four subtypes of SfiI-Dcm overlap, only one has a partial inhibitory effect on the activity of SfiI. Recognition sites for all four enzymes are rarer than expected based on oligonucleotide frequency data, with this effect being much stronger for XbaI and BlnI than for NotI and SfiI. The latter two enzyme sites are rare mainly due to apparent negative selection against GGCC (both) and CGGCCG (NotI). The former two enzyme sites are rare mainly due to effects of the VSP repair system on certain di-tri- and tetranucleotides, most notably CTAG. Models are proposed to explain several of the anomalies of oligonucleotide distribution in E. coli, and the biological significance of the systems that produce these anomalies is discussed. ^
The focus of this thesis lies in the development of a sensitive method for the analysis of protein primary structure which can be easily used to confirm the DNA sequence of a protein's gene and determine the modifications which are made after translation. This technique involves the use of dipeptidyl aminopeptidase (DAP) and dipeptidyl carboxypeptidase (DCP) to hydrolyze the protein and the mass spectrometric analysis of the dipeptide products.^ Dipeptidyl carboxypeptidase was purified from human lung tissue and characterized with respect to its proteolytic activity. The results showed that the enzyme has a relatively unrestricted specificity, making it useful for the analysis of the C-terminal of proteins. Most of the dipeptide products were identified using gas chromatography/mass spectrometry (GC/MS). In order to analyze the peptides not hydrolyzed by DCP and DAP, as well as the dipeptides not identified by GC/MS, a FAB ion source was installed on a quadrupole mass spectrometer and its performance evaluated with a variety of compounds.^ Using these techniques, the sequences of the N-terminal and C-terminal regions and seven fragments of bacteriophage P22 tail protein have been verified. All of the dipeptides identified in these analysis were in the same DNA reading frame, thus ruling out the possibility of a single base being inserted or deleted from the DNA sequence. The verification of small sequences throughout the protein sequence also indicates that no large portions of the protein have been removed after translation. ^
Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^
DNA sequence variation is currently a major source of data for studying human origins, evolution, and demographic history, and for detecting linkage association of complex diseases. In this dissertation, I investigated DNA variation in worldwide populations from two ∼10 kb autosomal regions on 22q11.2 (noncoding) and 1q24 (introns). A total of 75 variant sites were found among 128 human sequences in the 22q11.2 region, yielding an estimate of 0.088% for nucleotide diversity (π), and a total of 52 variant sites were found among 122 human sequences in the 1q24 region with an estimated π value of 0.057%. The data from these two regions and a 10 kb noncoding region on Xq13.3 all show a strong excess of low-frequency variants in comparison to that expected from an equilibrium population, indicating a relatively recent population expansion. The effective population sizes estimated from the three regions were 11,000, 12,700, and 8,600, respectively, which are close to the commonly used value of 10,000. In each of the two autosomal regions, the age of the most recent common ancestor (MRCA) was estimated to be older than 1 million years among all the sequences and ∼600,000 years among non-African sequences, providing first evidence from autosomal noncoding or intronic regions for a genetic history of humans much more ancient than the emergence of modern humans. The ancient genetic history of humans indicates no severe bottleneck during the evolution of humans in the last half million years; otherwise, much of the ancient genetic history would have been lost during a severe bottleneck. This study strongly suggests that both the “out of Africa” and the multiregional models are too simple for explaining the evolution of modern humans. A compilation of genome-wide data revealed that nucleotide diversity is highest in autosomal regions, intermediate in X-linked regions, and lowest in Y-linked regions. The data suggest the existence of background selection or selective sweep on Y-linked loci. In general, the nucleotide diversity in humans is low compared to that in chimpanzee and Drosophila populations. ^
Repressor element 1 (RE1)-silencing transcription factor (REST)/neuron-restrictive silencer factor (NRSF) can repress several terminal neuronal differentiation genes by binding to a specific DNA sequence (RE1/neuron-restrictive silencer element [NRSE]) present in their regulatory regions. REST-VP16 binds to the same RE1/NRSE, but activates these REST/NRSF target genes. However, it is unclear whether REST-VP16 expression is sufficient to cause formation of functional neurons either from neural stem cells or from heterologous stem cells. Here we show that the expression of REST-VP16 in myoblasts grown under muscle differentiation conditions blocked entry into the muscle differentiation pathway, countered endogenous REST/NRSF-dependent repression, activated the REST/NRSF target genes, and, surprisingly, activated other neuronal differentiation genes and converted the myoblasts to a physiologically active neuronal phenotype. Furthermore, in vitro differentiated neurons produced by REST-VP16-expressing myoblasts, when injected into mouse brain, survived, incorporated into the normal brain, and did not form tumors. This is the first instance in which myoblasts were converted to a neuronal phenotype. Our results suggest that direct activation of REST/NRSF target genes with a single transgene, REST-VP16, is sufficient to activate other terminal neuronal differentiation genes and to override the muscle differentiation pathways, and they suggest that this approach provides an efficient way of triggering neuronal differentiation in myoblasts and possibly other stem cells.
Musculoskeletal infections are infections of the bone and surrounding tissues. They are currently diagnosed based on culture analysis, which is the gold standard for pathogen identification. However, these clinical laboratory methods are frequently inadequate for the identification of the causative agents, because a large percentage (25-50%) of confirmed musculoskeletal infections are false negatives in which no pathogen is identified in culture. My data supports these results. The goal of this project was to use PCR amplification of a portion of the 16S rRNA gene to test an alternative approach for the identification of these pathogens and to assess the diversity of the bacteria involved. The advantages of this alternative method are that it should increase sample sensitivity and the speed of detection. In addition, bacteria that are non-culturable or in low abundance can be detected using this molecular technique. However, a complication of this approach is that the majority of musculoskeletal infections are polymicrobial, which prohibits direct identification from the infected tissue by DNA sequencing of the initial 16S rDNA amplification products. One way to solve this problem is to use denaturing gradient gel electrophoresis (DGGE) to separate the PCR products before DNA sequencing. Denaturing gradient gel electrophoresis (DGGE) separates DNA molecules based on their melting point, which is determined by their DNA sequence. This analytical technique allows a mixture of PCR products of the same length that electrophoreses through agarose gels as one band, to be separated into different bands and then used for DNA sequence analysis. In this way, the DGGE allows for the identification of individual bacterial species in polymicrobial-infected tissue, which is critical for improving clinical outcomes. By combining the 16S rDNA amplification and the DGGE techniques together, an alternative approach for identification has been used. The 16S rRNA gene PCR-DGGE method includes several critical steps: DNA extraction from tissue biopsies, amplification of the bacterial DNA, PCR product separation by DGGE, amplification of the gel-extracted DNA, and DNA sequencing and analysis. Each step of the method was optimized to increase its sensitivity and for rapid detection of the bacteria present in human tissue samples. The limit of detection for the DNA extraction from tissue was at least 20 Staphylococcus aureus cells and the limit of detection for PCR was at least 0.05 pg of template DNA. The conditions for DGGE electrophoreses were optimized by using a double gradient of acrylamide (6 – 10%) and denaturant (30-70%), which increased the separation between distinct PCR products. The use of GelRed (Biotium) improved the DNA visualization in the DGGE gel. To recover the DNA from the DGGE gels the gel slices were excised, shredded in a bead beater, and the DNA was allowed to diffuse into sterile water overnight. The use of primers containing specific linkers allowed the entire amplified PCR product to be sequenced and then analyzed. The optimized 16S rRNA gene PCR-DGGE method was used to analyze 50 tissue biopsy samples chosen randomly from our collection. The results were compared to those of the Memorial Hermann Hospital Clinical Microbiology Laboratory for the same samples. The molecular method was congruent for 10 of the 17 (59%) culture negative tissue samples. In 7 of the 17 (41%) culture negative the molecular method identified a bacterium. The molecular method was congruent with the culture identification for 7 of the 33 (21%) positive cultured tissue samples. However, in 8 of the 33 (24%) the molecular method identified more organisms. In 13 of the 15 (87%) polymicrobial cultured tissue samples the molecular method identified at least one organism that was also identified by culture techniques. Overall, the DGGE analysis of 16S rDNA is an effective method to identify bacteria not identified by culture analysis.
The shuttle vector plasmid pZ189 was used to find the kinds of mutations that are induced by herpes simplex virus type-1 (HSV-1). In cells infected by HSV-1 the frequency of mutation in supF gene, the mutagenesis marker, was increased over background by from two- to seven-fold, reaching 0.14-0.45%. No increase was induced by infection by vaccinia virus under the same conditions. Mutagenesis was an early event, showing a four-fold increase in mutation frequency at only two hours after infection, and peaking at a seven-fold increase at four hours after infection. DNA sequencing and gel electrophoresis analysis were performed on 105 HSV-1 induced mutants and 65 spontaneous mutants and provided the following information: (1) A change in plasmid size was seen in 54% of HSV-1 related mutants, compared with only 37% of spontaneous mutants. (2) Among point mutations, the predominant type was G:C to A:T transition, which accounted for 51% of point mutations in mutants isolated from cells infected with HSV-1, and 32% of point mutations in spontaneous mutants. (3) Deletions of DNA were seen in HSV-1 related mutants at a frequency of 40%, compared with 29% in spontaneous mutants. The HSV-1 related deletions were about half the length of spontaneous mutants and three contained short filler sequences. (4) Fifteen (15%) of HSV-1 induced mutants revealed the altered restriction patterns on agarose gel electrophoresis analysis and were due either to rearrangements of plasmid DNA, and/or to insertion of sequences derived from chromosomal DNA (seven plasmids). No insertions of DNA from HSV-1 were detected. Among spontaneous mutants, only 5 (7.7%) were rearrangements and none had inserted chromosomal DNA. (5) DNA sequence analysis of seven plasmids with inserted chromosomal DNA revealed that four cases had repetitive DNA sequences integrated and the other three were unidentified sequences from the GenBank database. Three repetitive DNA included $\alpha$ satellite, Alu and KpnI family sequences. The other sequence was identified as tRNA-like component. The observed mutations have implications for the mechanism of malignant transformation of cells by HSV-1. ^
There have been numerous reports over the past several years on the ability of vitamin A analogs (retinoids) to modulate cell proliferation, malignant transformation, morphogenesis, and differentiation in a wide variety of cell types and organisms. Two families of nuclear retinoid-inducible, trans-acting, transcription-enhancing receptors that bear strong DNA sequence homology to thyroid and steroid hormone receptors have recently been discovered. The retinoic acid receptors (RARs) and retinoid X receptors (RXRs) each have at least three types designated $\alpha,$ $\beta,$ and $\gamma,$ which are encoded by separate genes and expressed in a tissue and cell type-specific manner. We have been interested in the mechanism by which retinoids inhibit tumor cell proliferation and induce differentiation. As a model system we have employed several murine melanoma cell lines (S91-C2, K1735P, and B16-F1), which are sensitive to the growth-inhibitory and differentiation-inducing effects of RA, as well as a RA-resistant subclone of one of the cell lines (S91-C154), in order to study the role of the nuclear RARs in these effects. The initial phase of this project consisted of the characterization of the expression pattern of the three known RAR and RXR types in the murine melanoma cell lines in order to determine whether any differences exist which may elucidate a role for any of the receptors in RA-induced growth inhibition and differentiation. The novel finding was made that the RAR-$\beta$ gene is rapidly induced from undetectable levels by RA treatment at the mRNA and protein level, and that the induction of RAR-$\beta$ by other biologically active retinoids correlated with their ability to inhibit the growth of the highly RA-sensitive S91-C2 cell line. This suggests a role for RAR-$\beta$ in the growth inhibiting effect of retinoids. The second phase of this project involves the stable expression of RAR-$\beta$ in the S91-C2 cells and the RAR-$\beta$ receptor-null cell line, K1735P. These studies have indicated an inverse correlation between RAR-$\beta$ expression and proliferation rate. ^
Expression of the differentiated skeletal muscle phenotype is a process that appears to occur in at least two stages. First, pluripotent stem cells become committed to the myogenic lineage. Although undifferentiated and capable of continued proliferation, determined myoblasts are restricted to a single developmental fate. Upon receiving the appropriate environmental signals, these determined myoblasts withdraw from the cell cycle, fuse to form multi-nucleated myotubes, and begin to express a battery of muscle-specific gene products that make up the functional and contractile apparatus of the muscle. This project is aimed at the identification and characterization of factors that control the determination and differentiation of myogenic cells. We have cloned a cDNA, called myogenin, that plays an important role in these processes. Myogenin is expressed exclusively in skeletal muscle in vivo and myogenic cell lines in vitro. Its expression is sharply upregulated during differentiation. When constitutively expressed in fibroblasts, myogenin converts these cells to the myogenic lineage. Transfected cells behave as myogenic tissue culture cells with respect to the genes they express, the way they respond to environmental cues, and are capable of fusing to form multinucleated myotubes. Sequence analysis showed that this cDNA has homology to a family of transcription factors in a region of 72 amino acids known as the basic helix-loop-helix motif. This domain appears to mediate binding to a DNA sequence element known as an E-box (CANNTG) essential for the activity of the enhancers of many muscle-specific genes.^ Analysis of myogenin in tissue culture cells showed that its expression is responsive to many of the environmental cues, such as the presence of growth factors and oncogenes, that modulate myogenesis. In an attempt to identify the cis- and trans-elements that control myogenin expression and thereby understand what factors are responsible for the establishment of the myogenic lineage, we have cloned the myogenin gene. After analysis of the gene structure, we constructed a series of reporter constructs from the 5$\prime$ upstream sequence of the myogenin gene to determine which cis-acting sequences might be important in myogenin regulation. We found that 184 nucleotides of the 5$\prime$ sequence was sufficient to direct high-level muscle-specific expression of the reporter gene. Two sequence elements present in the 184 fragment, an E-box and a MEF-2 site, have been shown previously to be important in muscle-specific transcription. Mutagenesis of these sites revealed that both sites are necessary for full activity of the myogenin promoter, and suggests that a complex hierarchy of transcription factors control myogenic differentiation. ^