13 resultados para Nucleotide sequence
em DigitalCommons@The Texas Medical Center
Resumo:
We describe the characterization of the herpes simplex virus type 2 (HSV-2) gene encoding infected cell protein 32 (ICP32) and virion protein 19c (VP19c). We also demonstrate that the HSV-1 UL38/ORF.553 open reading frame (ORF), which has been shown to specify a viral protein essential for capsid formation (B. Pertuiset, M. Boccara, J. Cebrian, N. Berthelot, S. Chousterman, F. Puvian-Dutilleul, J. Sisman, and P. Sheldrick, J. Virol. 63: 2169-2179, 1989), must encode the cognate HSV type 1 (HSV-1) ICP32/VP19c protein. The region of the HSV-2 genome deduced to contain the gene specifying ICP32/VP19c was isolated and subcloned, and the nucleotide sequence of 2,158 base pairs of HSV-2 DNA mapping immediately upstream of the gene encoding the large subunit of the viral ribonucleotide reductase was determined. This region of the HSV-2 genome contains a large ORF capable of encoding two related 50,538- and 49,472-molecular-weight polypeptides. Direct evidence that this ORF encodes HSV-2 ICP32/VP19c was provided by immunoblotting experiments that utilized antisera directed against synthetic oligopeptides corresponding to internal portions of the predicted polypeptides encoded by the HSV-2 ORF or antisera directed against a TrpE/HSV-2 ORF fusion protein. The type-common immunoreactivity of the two antisera and comparison of the primary amino acid sequences of the predicted products of the HSV-2 ORF and the equivalent genomic region of HSV-1 provided evidence that the HSV-1 UL38 ORF encodes the HSV-1 ICP32/VP19c. Analysis of the expression of the HSV-1 and HSV-2 ICP32/VP19c cognate proteins indicated that there may be differences in their modes of synthesis. Comparison of the predicted structure of the HSV-2 ICP32/VP19c protein with the structures of related proteins encoded by other herpes viruses suggested that the internal capsid architecture of the herpes family of viruses varies substantially.
Resumo:
The expression of the chicken fast skeletal myosin alkali light chain (MLC) 3f is subject to complex patterns of control by developmental and physiologic signals. Regulation over MLC3f gene expression is thought to be exerted primarily at the transcriptional level. The purpose of this dissertation was to identify cis-acting elements on the 5$\sp\prime$ flanking region of chicken MLC3f gene that are important for transcriptional regulation. The results show that the 5$\sp\prime$ flanking region of MLC3f gene contains multiple cis-acting elements. The nucleotide sequence of these elements demonstrates a high degree of conservation between different species and are also found in the 5$\sp\prime$ flanking regions of many muscle protein genes. The first regulatory region is located between $-$185 and $-$150 bp from the transcription start site and contains an AT-rich element. Linker scanner analyses have revealed that this element has a positive effect on transcription of the MLC3f promoter. Furthermore, when linked to a heterologous viral promoter, it can enhance reporter gene expression in a muscle-specific manner, independent of distance or orientation.^ The second regulatory region is located between $-$96 and $-$64 from the transcription start site. Sequences downstream of $-$96 have the capacity to drive muscle-specific reporter gene expression, although the region between $-$96 and $-$64 has no intrinsic enhancer-like activity. Linker scanner analyses have identified a GC-rich motif that required efficient transcription of the MLC3f promoter. Mutations to this region of DNA results in diminished capacity to drive reporter gene expression and is correlated with disruption of the ability to bind sequence-specific transcription factors. These sequence-specific DNA-binding proteins were detected in both muscle and non-muscle extracts. The results suggest that the mere presence or absence of transcription factors cannot be solely responsible for regulation of MLC3f expression and that tissue-specific expression may arise from complex interactions with muscle-specific, as well as more ubiquitous transcription factors with multiple regulatory elements on the gene. ^
Resumo:
The plasmid-encoded, constitutively produced $\beta$-lactamase gene from Enterococcus faecalis strain HH22 was genetically characterized. A restriction endonuclease map of the 5.1 kb EcoRI fragment encoding the enterococcal $\beta$-lactamase was prepared and compared with the restriction map of a cloned staphylococcal $\beta$-lactamase gene (from the naturally-occurring staphylococcal $\beta$-lactamase plasmid pI258). Comparison and hybridization studies showed that there were identical restriction sites in the region of the $\beta$-lactamase structural gene but not in the region surrounding this gene. Also the enterococcal $\beta$-lactamase plasmid did not encode resistance to mercury or cadmium which is encoded by the small, transducible staphylococcal $\beta$-lactamase plasmids. The nucleotide sequence of the enterococcal gene was shown to be identical to the published sequences of three of four staphylococcal type A $\beta$-lactamase genes; more differences were seen with the genes for staphylococcal type C and D enzymes. One hundred-forty nucleotides upstream of the $\beta$-lactamase start codon were also determined for the inducible staphylococcal $\beta$-lactamase gene on pI258; this sequence was identical to that of the constitutively expressed enterococcal gene indicating that the changes resulting in constitutive expression are not due to changes in the promoter or operator region. Moreover, complementation studies indicated that production of the enterococcal enzyme could be repressed. The gene for the enterococcal $\beta$-lactamase and an inducible staphylococcal $\beta$-lactamase were each cloned into a shuttle vector and then transformed into enterococcal and staphylococcal recipients. The major difference between the two host backgrounds was that more enzyme was produced by the staphylococcal host, regardless of the source of the gene but no qualitative difference was seen between the two genera. Also a difference in the level of resistance to ampicillin was seen between the two backgrounds with the cloned enzymes by MIC and time-kill studies. The location of the enzyme was found to be host dependent since each cloned gene generated extracellular (free) enzyme in the staphylococcus and cell bound enzyme in the enterococcus. Based on the identity of the enterococcal $\beta$-lactamase and several staphylococcal $\beta$-lactamases, these data suggest recent spread of $\beta$-lactamase to enterococci and also suggest loss of a functional repressor. ^
Resumo:
Aniridia (AN) is a congenital, panocular disorder of the eye characterized by the complete or partial absence of the iris. The disease can occur in both the sporadic and familial forms which, in the latter case, is inherited as an autosomal dominant trait with high penetrance. The objective of this study was to isolate and characterize the genes involved in AN and Sey, and thereby to gain a better understanding of the molecular basis of the two disorders.^ Using a positional cloning strategy, I have approached and cloned from the AN locus in human chromosomal band 11p13 a cDNA that is deleted in two patients with AN. The deletions in these patients overlap by about 70 kb and encompass the 3$\sp\prime$ end of the cDNA. This cDNA detects a 2.7 kb mRNA encoded by a transcription unit estimated to span approximately 50 kb of genomic DNA. The message is specifically expressed in all tissues affected in all forms of AN, namely within the presumptive iris, lens, neuroretina, the superficial layers of the cornea, the olfactory bulbs, and the cerebellum. Sequence analysis of the AN cDNA revealed a number of motifs characteristic of certain transcription factors. Chief among these are the presence of the paired domain, the homeodomain, and a carboxy-terminal domain rich in serine, threonine and proline residues. The overall structure shows high homology to the Drosophila segmentation gene paired and members of the murine Pax family of developmental control genes.^ Utilizing a conserved human genomic DNA sequence as probe, I was able to isolate an embryonic murine cDNA which is over 92% homologous in nucleotide sequence and virtually identical at the amino acid level to the human AN cDNA. The expression pattern of the murine gene is the same as that in man, supporting the conclusion that it probably corresponds to the Sey gene. Its specific expression in the neuroectodermal component of the eye, in glioblastomas, but not in the neural crest-derived PC12 pheochromocytoma cell line, suggests that a defect in neuroectodermal rather mesodermal development might be the common etiological factor underlying AN and Sey. ^
Resumo:
Heparanase, an endo-$\beta$-D-glucuronidase, has been associated with melanoma metastasis. Polyclonal antibodies directed against the murine N-terminal heparanase peptide detected a M$\sb{\rm r}\sim 97,000$ protein upon SDS-polyacrylamide gel electrophoresis of mouse melanoma and human melanoma cell lysates. In an indirect immunocytochemical study, metastatic human A375-SM and mouse B16-BL6 melanoma cells were stained with the anti-heparanase antibodies. Heparanase antigen was localized in the cytoplasm of permeabilized melanoma cells as well as at the cell surface of unpermeabilized cells. Immunohistochemical staining of frozen sections from syngeneic mouse organs containing micrometastases of B16-BL6 melanoma demonstrated heparanase localized in metastatic melanoma cells, but not in adjacent normal tissues. Similar studies using frozen sections of malignant melanomas resected from patients indicated that heparanase is localized in invading melanoma cells, but not in adjacent connective tissues.^ Monoclonal antibodies directed against murine heparanase were developed and characterized. Monoclonal antibody 10E5, an IgM, precipitated and inhibitated the enzymatic activity of heparanase. A 2.6 kb cDNA was isolated from a human melanoma $\lambda$gt11 cDNA library using the monoclonal antibody 10E5. Heparan sulfate cleavage activity was detected in the lysogen lysates from E. Coli Y1089 infected with the $\lambda$gt11 cDNA and this activity was inhibited in the presence of 10-fold excess of heparin, a potent inhibitor of heparanase. The nucleotide sequence of the cDNA was determined and insignificant homology was found with the gene sequences currently known. The cDNA hybridized to a 3.2-3.4 kb mRNA in human A375 melanoma, WI-38 fibroblast, and THP-1 leukemia cells using Northern blots.^ Heparanase expression was examined using Western and Northern blots. In comparison to human A375-P melanoma cells, the quantity of 97,000 protein recognized by the polyclonal anti-heparanase antibodies doubled in the metastatic variant A375-SM cells and the quantity of 3.2-3.4 kb mRNA doubled in A375MetMix, a metastatic variant similar to A375-SM cells. In B16 murine melanoma cell, the intensity of the 97,000 protein increased more than 2 times comparing with B16-F1 cells. The extent in the increase of the protein and the mRNA levels is comparable to the change of heparanase activity observed in those cells.^ In summary, the studies suggest that (a) the N-terminus of the heparanase molecule in mouse and human is antigenically related; (b) heparanase antigens are localized at the cell surface and in the cytoplasm of metastatic human and mouse melanoma cells; (c) heparanase antigens are localized in invasive and metastatic murine and human melanomas in vivo, but not in adjacent normal tissues; (d) heparanase molecule appeared to be differentially expressed at the transcriptional as well as at the translational level; and (e) the size of human heparanase mRNA is 3.2-3.4 kilobase. ^
Resumo:
The initial step in coronavirus-mouse hepatitis virus (MHV) replication is the synthesis of negative strand RNA from a positive strand genomic RNA template. Our approach to studying MHV RNA replication is to identify the cis-acting signals for RNA synthesis and the protein(s) which recognizes these signals at the 3$\sp\prime$ end of genomic RNA of MHV. To determine whether host cellular and/or virus-specific proteins interact with the 3$\sp\prime$ end of the coronavirus genome, an RNase T$\sb1$ protection/gel mobility shift electrophoresis assay was used to examine cytoplasmic extracts from either mock- or MHV-JHM-infected 17Cl-1 murine cells for the ability to form complexes with defined regions of the genomic RNA. A conserved 11 nucleotide sequence UGAAUGAAGUU at nucleotide positions 36 to 26 from the 3$\sp\prime$ end of genomic RNA was identified to be responsible for the specific binding of host proteins, by using a series of RNA probes with deletions and mutations in this region. The RNA probe containing the 11 nucleotide sequence bound approximately four host cellular proteins with a highly labeled 120 kDa and three minor species with sizes of 103, 81 and 55 kDa, assayed by UV-induced covalent cross-linking. Mutation of the 11 nucleotide motif strongly inhibited cellular protein binding, and decreased the amount of the 103 and 81 kDa proteins in the complex to undetectable levels and strongly reduced the binding of the 120 kDa protein. Less extensive mutations within this 11 nucleotide motif resulted in variable decreases in RNA-protein complex formation depending on each probe tested. The RNA-protein complexes observed with cytoplasmic extracts from MHV-JHM-infected cells in both RNase protection/gel mobility shift and UV cross-linking assays were indistinguishable to those observed with extracts from uninfected cells.^ To investigate the possible role of this 3$\sp\prime$ protein binding element in viral RNA replication in vivo, defective interfering RNA molecules with complete or partial mutations of the 11 nucleotide conserved sequence were transcribed in vitro, transfected to host 17Cl-1 cells in the presence of helper virus MHV-JHM and analyzed by agarose gel electrophoresis, competitive RT-PCR and direct sequencing of the RT-PCR products. Both negative strand synthesis and positive strand replication of DI RNA were affected by mutation that disrupts RNA-protein complex formation, even though the 11 mutated nucleotides were converted to wild type sequence, presumably by recombination with helper virus. Kinetic analysis indicated that recombination between DI RNA and helper virus occurred 5.5 to 7.5 hours post infection when replication of positive strand DI RNA was barely observed. Replication of positive strand DI RNAs carrying partial mutations within the 11 nucleotide motif was dependent upon recombination events after transfection. Replication was strongly inhibited when reversion to wild type sequence did not occur, and after recombination, reached similar levels as wild type DI RNA. A DI RNA with mutation upstream of the protein binding motif replicated as efficiently as wild type without undergoing recombination. Thus the conserved 11 nucleotide host protein binding motif appears to play an important role in viral RNA replication. ^
Resumo:
The goal of the present work was to identify and characterize gene sequences that are preferentially expressed in CML in an effort to better understand the molecular basis of the disease. As high abundance mRNAs generally encode proteins that are phenotypically characteristic of cells, positive-negative screening of a CML cDNA library was used to identify cDNA clones containing sequences preferentially transcribed in CML. One cDNA sequence that fulfilled this criterion, C-A3, has been characterized in some detail. It represents a small mRNA ((TURN)496 nucleotides) that is highly abundant ((TURN)2% of the poly(A('+))RNA) in cells from the chronic phase of CML. In situ hybridization to whole cells indicates the principal leukocytes that express C-A3 sequences are eosinophils, basophils and immature myelocytes. Surprisingly, CML patients with high numbers of myeloblasts do not have an abundance of C-A3 transcripts, although transcript levels remain elevated in patients with lymphoblasts. In AML, high transcript levels are only found sporadically and occasionally different sized transcripts can be detected. Sequences from the 3' end of the C-A3 message are present in 2-5 copies per haploid genome. The 3' end of C-A3 localizes to bands 8q21.1 and 8q23 by in situ chromosomal hybridization. This is a region that is often involved in hematopoietic malignancies. Restriction digests of human genomic DNA show a correlation between the presence of a 2.3 kb Hind III fragment and certain types of leukemia. All of the leukemic DNAs tested had this fragment. In comparison, only one of five normal DNAs had a band this size. Analysis of the nucleotide sequence indicates that C-A3 probably encodes a small, hydrophobic peptide which may be part of a larger protein. ^
Resumo:
The feasibility of establishment of continuously proliferating growth factor-dependent human B lymphocytes was investigated. Normal B lymphocytes prepared from peripheral venous blood were stimulated with a variety of known polyclonal B cell activators, in the continuous presence of various cytokine preparations. Continuously proliferating growth factor-dependent B cell populations were obtained from cultures activated with either insoluble anti-IgM ((mu)-chain specific), soluble anti-IgM, heat-killed Staphylococcus aureus Cowen I (SAC), or dextran sulphate (DxS), in the continuous presence of exogenously added growth factor preparations containing either IL-1, IL-2 and BCGF, or BCGF alone. Although growth factor-dependent B cell lines were obtained via all three methods of activation, the correlation of mode of activation and growth factor preparation proved to be critical. B cell lines could not be established with anti-(mu) activation in the presence of only BCGF; however, B cell lines were successfully obtained with SAC or DxS activation from those cultures continuously replenished with only BCGF. These cultured B lymphocyte populations were routinely maintained in logarithmic-phase growth in the presence of exogenously added growth factor, and exhibited a population doubling time of approximately 36 hours. They were shown to specifically absorb BCGF, suggesting the presence of membrane receptors for it. Also, these cultured B cells have been utilized for the development of a microassay for the assessment of a M(,r) 12,000-14,000 B cell growth factor activity that is accurate, sensitive, and precise. The pronounced sensitivity of this bioassay beyond that of the conventional peripheral blood B cell assay has aided in the purification to homogeneity of natural product extracellular BCGF (EC-BCGF), and in the determination of the nucleotide sequence for a gene coding for a protein exhibiting BCGF activity. Additionally, these B cell lines specifically absorb, and proliferate in the presence of, an affinity-purified M(,r) 60,000 trypsin-sensitive intracellular protein derived from freshly isolated human T lymphocytes, providing evidence for a putative intracellular precursor of EC-BCGF, or a novel high molecular weight BCGF species. ^
Resumo:
The nar operon, which encodes the nitrate reductase in Escherichia coli, can be induced under anaerobic conditions without nitrate to a low level and with nitrate to a maximum level. The anaerobic formation of nitrate reductase is dependent upon the fnr gene product while the narL gene product is required for further induction by nitrate. The sequence was determined across the entire promoter and regulatory region of the nar operon. The translational start site of the first structural gene of the nar operon, narG gene, was established by identifying the nucleotide sequence for the first 20 N-terminal amino acid residues of the alpha subunit of nitrate reductase. The transcriptional start site and the level of the transcript was determined by S1 mapping procedure. One major transcript was identified which was initiated 50 base pair (bp) upstream from the translational start site of the first structural gene. The synthesis of the transcript was repressed aerobically, fully induced by nitrate anaerobically, and greatly reduced in a ${\rm Fnr\sp-}$ mutant. Deletions were created in the 5$\sp\prime$ nar regulatory sequence with either an intact nar operon or a nar::lacZ fusion. The expression of the plasmids with deletions were determined in a strain with wild type fnr and narL loci, a Fnr- mutant strain and a NarL- mutant strain. These experiments demonstrated that the $5\sp\prime$ limit of the nar operon lies at about $-210$ bp from the transcription start site. The region required for anaerobic induction by the fnr gene product is located around $-60$ bp. Two putative narL recognition sites were identified, one of which is around $-200$ and another immediately adjacent to the fnr recognition region. The deletion of the sequences around $-200$ rendered the remaining narL complex repressive and thus decreased the expression of nar operon, suggesting that the two potential narL sites interact with each other over a significant length of DNA. ^
Resumo:
In this study, we present a trilocus sequence typing (TLST) scheme based on intragenic regions of two antigenic genes, ace and salA (encoding a collagen/laminin adhesin and a cell wall-associated antigen, respectively), and a gene associated with antibiotic resistance, lsa (encoding a putative ABC transporter), for subspecies differentiation of Enterococcus faecalis. Each of the alleles was analyzed using 50 E. faecalis isolates representing 42 diverse multilocus sequence types (ST(M); based on seven housekeeping genes) and four groups of clonally linked (by pulsed-field gel electrophoresis [PFGE]) isolates. The allelic profiles and/or concatenated sequences of the three genes agreed with multilocus sequence typing (MLST) results for typing of 49 of the 50 isolates; in addition to the one exception, two isolates were found to have identical TLST types but were single-locus variants (differing by a single nucleotide) by MLST and were therefore also classified as clonally related by MLST. TLST was also comparable to PFGE for establishing short-term epidemiological relationships, typing all isolates classified as clonally related by PFGE with the same type. TLST was then applied to representative isolates (of each PFGE subtype and isolation year) of a collection of 48 hospital isolates and demonstrated the same relationships between isolates of an outbreak strain as those found by MLST and PFGE. In conclusion, the TLST scheme described here was shown to be successful for investigating short-term epidemiology in a hospital setting and may provide an alternative to MLST for discriminating isolates.
Resumo:
Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^
Resumo:
(1) A mathematical theory for computing the probabilities of various nucleotide configurations is developed, and the probability of obtaining the correct phylogenetic tree (model tree) from sequence data is evaluated for six phylogenetic tree-making methods (UPGMA, distance Wagner method, transformed distance method, Fitch-Margoliash's method, maximum parsimony method, and compatibility method). The number of nucleotides (m*) necessary to obtain the correct tree with a probability of 95% is estimated with special reference to the human, chimpanzee, and gorilla divergence. m* is at least 4,200, but the availability of outgroup species greatly reduces m* for all methods except UPGMA. m* increases if transitions occur more frequently than transversions as in the case of mitochondrial DNA. (2) A new tree-making method called the neighbor-joining method is proposed. This method is applicable either for distance data or character state data. Computer simulation has shown that the neighbor-joining method is generally better than UPGMA, Farris' method, Li's method, and modified Farris method on recovering the true topology when distance data are used. A related method, the simultaneous partitioning method, is also discussed. (3) The maximum likelihood (ML) method for phylogeny reconstruction under the assumption of both constant and varying evolutionary rates is studied, and a new algorithm for obtaining the ML tree is presented. This method gives a tree similar to that obtained by UPGMA when constant evolutionary rate is assumed, whereas it gives a tree similar to that obtained by the maximum parsimony tree and the neighbor-joining method when varying evolutionary rate is assumed. ^
Resumo:
Theoretical and empirical studies were conducted on the pattern of nucleotide and amino acid substitution in evolution, taking into account the effects of mutation at the nucleotide level and purifying selection at the amino acid level. A theoretical model for predicting the evolutionary change in electrophoretic mobility of a protein was also developed by using information on the pattern of amino acid substitution. The specific problems studied and the main results obtained are as follows: (1) Estimation of the pattern of nucleotide substitution in DNA nuclear genomes. The pattern of point mutations and nucleotide substitutions among the four different nucleotides are inferred from the evolutionary changes of pseudogenes and functional genes, respectively. Both patterns are non-random, the rate of change varying considerably with nucleotide pair, and that in both cases transitions occur somewhat more frequently than transversions. In protein evolution, substitution occurs more often between amino acids with similar physico-chemical properties than between dissimilar amino acids. (2) Estimation of the pattern of nucleotide substitution in RNA genomes. The majority of mutations in retroviruses accumulate at the reverse transcription stage. Selection at the amino acid level is very weak, and almost non-existent between synonymous codons. The pattern of mutation is very different from that in DNA genomes. Nevertheless, the pattern of purifying selection at the amino acid level is similar to that in DNA genomes, although selection intensity is much weaker. (3) Evaluation of the determinants of molecular evolutionary rates in protein-coding genes. Based on rates of nucleotide substitution for mammalian genes, the rate of amino acid substitution of a protein is determined by its amino acid composition. The content of glycine is shown to correlate strongly and negatively with the rate of substitution. Empirical formulae, called indices of mutability, are developed in order to predict the rate of molecular evolution of a protein from data on its amino acid sequence. (4) Studies on the evolutionary patterns of electrophoretic mobility of proteins. A theoretical model was constructed that predicts the electric charge of a protein at any given pH and its isoelectric point from data on its primary and quaternary structures. Using this model, the evolutionary change in electrophoretic mobilities of different proteins and the expected amount of electrophoretically hidden genetic variation were studied. In the absence of selection for the pI value, proteins will on the average evolve toward a mildly basic pI. (Abstract shortened with permission of author.) ^