992 resultados para Motif analysis
Resumo:
The 3′ UTRs of eukaryotic genes participate in a variety of post-transcriptional (and some transcriptional) regulatory interactions. Some of these interactions are well characterised, but an undetermined number remain to be discovered. While some regulatory sequences in 3′ UTRs may be conserved over long evolutionary time scales, others may have only ephemeral functional significance as regulatory profiles respond to changing selective pressures. Here we propose a sensitive segmentation methodology for investigating patterns of composition and conservation in 3′ UTRs based on comparison of closely related species. We describe encodings of pairwise and three-way alignments integrating information about conservation, GC content and transition/transversion ratios and apply the method to three closely related Drosophila species: D. melanogaster, D. simulans and D. yakuba. Incorporating multiple data types greatly increased the number of segment classes identified compared to similar methods based on conservation or GC content alone. We propose that the number of segments and number of types of segment identified by the method can be used as proxies for functional complexity. Our main finding is that the number of segments and segment classes identified in 3′ UTRs is greater than in the same length of protein-coding sequence, suggesting greater functional complexity in 3′ UTRs. There is thus a need for sustained and extensive efforts by bioinformaticians to delineate functional elements in this important genomic fraction. C code, data and results are available upon request.
Resumo:
The tonic is a fundamental concept in Indian art music. It is the base pitch, which an artist chooses in order to construct the melodies during a rg(a) rendition, and all accompanying instruments are tuned using the tonic pitch. Consequently, tonic identification is a fundamental task for most computational analyses of Indian art music, such as intonation analysis, melodic motif analysis and rg recognition. In this paper we review existing approaches for tonic identification in Indian art music and evaluate them on six diverse datasets for a thorough comparison and analysis. We study the performance of each method in different contexts such as the presence/absence of additional metadata, the quality of audio data, the duration of audio data, music tradition (Hindustani/Carnatic) and the gender of the singer (male/female). We show that the approaches that combine multi-pitch analysis with machine learning provide the best performance in most cases (90% identification accuracy on average), and are robust across the aforementioned contexts compared to the approaches based on expert knowledge. In addition, we also show that the performance of the latter can be improved when additional metadata is available to further constrain the problem. Finally, we present a detailed error analysis of each method, providing further insights into the advantages and limitations of the methods.
Resumo:
We consider the problem of variable selection in regression modeling in high-dimensional spaces where there is known structure among the covariates. This is an unconventional variable selection problem for two reasons: (1) The dimension of the covariate space is comparable, and often much larger, than the number of subjects in the study, and (2) the covariate space is highly structured, and in some cases it is desirable to incorporate this structural information in to the model building process. We approach this problem through the Bayesian variable selection framework, where we assume that the covariates lie on an undirected graph and formulate an Ising prior on the model space for incorporating structural information. Certain computational and statistical problems arise that are unique to such high-dimensional, structured settings, the most interesting being the phenomenon of phase transitions. We propose theoretical and computational schemes to mitigate these problems. We illustrate our methods on two different graph structures: the linear chain and the regular graph of degree k. Finally, we use our methods to study a specific application in genomics: the modeling of transcription factor binding sites in DNA sequences. © 2010 American Statistical Association.
Resumo:
Abstract Background Bone fractures and loss represent significant costs for the public health system and often affect the patients quality of life, therefore, understanding the molecular basis for bone regeneration is essential. Cytokines, such as IL-6, IL-10 and TNFα, secreted by inflammatory cells at the lesion site, at the very beginning of the repair process, act as chemotactic factors for mesenchymal stem cells, which proliferate and differentiate into osteoblasts through the autocrine and paracrine action of bone morphogenetic proteins (BMPs), mainly BMP-2. Although it is known that BMP-2 binds to ActRI/BMPR and activates the SMAD 1/5/8 downstream effectors, little is known about the intracellular mechanisms participating in osteoblastic differentiation. We assessed differences in the phosphorylation status of different cellular proteins upon BMP-2 osteogenic induction of isolated murine skin mesenchymal stem cells using Triplex Stable Isotope Dimethyl Labeling coupled with LC/MS. Results From 150 μg of starting material, 2,264 proteins were identified and quantified at five different time points, 235 of which are differentially phosphorylated. Kinase motif analysis showed that several substrates display phosphorylation sites for Casein Kinase, p38, CDK and JNK. Gene ontology analysis showed an increase in biological processes related with signaling and differentiation at early time points after BMP2 induction. Moreover, proteins involved in cytoskeleton rearrangement, Wnt and Ras pathways were found to be differentially phosphorylated during all timepoints studied. Conclusions Taken together, these data, allow new insights on the intracellular substrates which are phosphorylated early on during differentiation to BMP2-driven osteoblastic differentiation of skin-derived mesenchymal stem cells.
Resumo:
Oligodeoxynucleotides (ODNs) containing unmethylated CpG motifs in certain contexts are known to be immunostimulatory in vertebrate systems. CpG ODNs with immune effects have been identified for many fish species but, to our knowledge, not for turbot. In this study, a turbot-effective CpG ODN, ODN 205, was identified and a plasmid, pCN5, was constructed which contains the CpG motif of ODN 205. When administered into turbot via intraperitoneal (i.p.) injection, both ODN 205 and pCN5 could (i) inhibit bacterial dissemination in blood in dose and time dependent manners, and (ii) protect against lethal bacterial challenge. Immunological analyses showed that in vitro treatment with ODN 205 stimulated peripheral blood leukocyte proliferation, while i.p. injection with ODN 205 enhanced the respiratory burst activity, chemiluminescence response, and acid phosphatase activity of turbot head kidney macrophages. pCN5 treatment-induced immune responses similar to those induced by ODN 205 treatment except that pCN5 could also enhance serum bactericidal activity in a calcium-independent manner. To examine whether ODN 205 and pCN5 had any effect on specific immunity, ODN 205 and pCN5 were co-administered into turbot with a Vibrio harveyi subunit vaccine, DegQ. The results showed that pCN5, but not ODN 205, significantly increased the immunoprotective efficacy of DegQ and enhanced the production of specific serum antibodies in the vaccinated fish. Further analysis indicated that vaccination with DegQ in the presence of pCN5 upregulated the expression of the genes encoding MHC class II alpha, IgM, Mx, and IL-8 receptor. Taken together, these results demonstrate that ODN 205 and pCN5 can stimulate the immune system of turbot and induce protection against bacterial challenge. In addition, pCN5 also possesses adjuvant property and can potentiate vaccine-induced specific immunity. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
We have suggested previously that both the negatively and positively charged residues of the highly conserved Glu/Asp-Arg-Tyr (E/DRY) motif play an important role in the activation process of the alpha(1b)-adreneric receptor (AR). In this study, R143 of the E/DRY sequence in the alpha(1b)-AR was mutated into several amino acids (Lys, His, Glu, Asp, Ala, Asn, and Ile). The charge-conserving mutation of R143 into lysine not only preserved the maximal agonist-induced response of the alpha(1b)-AR, but it also conferred high degree of constitutive activity to the receptor. Both basal and agonist-induced phosphorylation levels were significantly increased for the R143K mutant compared with those of the wild-type receptor. Other substitutions of R143 resulted in receptor mutants with either a small increase in constitutive activity (R143H and R143D), impairment (R143H, R143D), or complete loss of receptor-mediated response (R143E, R143A, R143N, R143I). The R413E mutant displayed a small, but significant increase in basal phosphorylation despite being severely impaired in receptor-mediated response. Interestingly, all the arginine mutants displayed increased affinity for agonist binding compared with the wild-type alpha(1b)-AR. A correlation was found between the extent of the affinity shift and the intrinsic activity of the agonists. The analysis of the receptor mutants using the allosteric ternary complex model in conjunction with the results of molecular dynamics simulations on the receptor models support the hypothesis that mutations of R143 can drive the isomerization of the alpha(1b)-AR into different states, highlighting the crucial role of this residue in the activation process of the receptor.
Resumo:
Decorin, a dermatan/chondroitin sulfate proteoglycan, is ubiquitously distributed in the extracellular matrix (ECM) of mammals. Decorin belongs to the small leucine rich proteoglycan (SLRP) family, a proteoglycan family characterized by a core protein dominated by Leucine Rich Repeat motifs. The decorin core protein appears to mediate the binding of decorin to ECM molecules, such as collagens and fibronectin. It is believed that the interactions of decorin with these ECM molecules contribute to the regulation of ECM assembly, cell adhesions, and cell proliferation. These basic biological processes play critical roles during embryonic development and wound healing and are altered in pathological conditions such as fibrosis and tumorgenesis. ^ In this dissertation, we discover that decorin core protein can bind to Zn2+ ions with high affinity. Zinc is an essential trace element in mammals. Zn2+ ions play a catalytic role in the activation of many enzymes and a structural role in the stabilization of protein conformation. By examining purified recombinant decorin and its core protein fragments for Zn2+ binding activity using Zn2+-chelating column chromatography and Zn2+-equilibrium dialysis approaches, we have located the Zn2+ binding domain to the N-terminal sequence of the decorin core protein. The decorin N-terminal domain appears to contain two Zn2+ binding sites with similar high binding affinity. The sequence of the decorin N-terminal domain does not resemble any other reported zinc-binding motifs and, therefore, represents a novel Zn 2+ binding motif. By investigating the influence of Zn2+ ions on decorin binding interactions, we found a novel Zn2+ dependent interaction with fibrinogen, the major plasma protein in blood clots. Furthermore, a recombinant peptide (MD4) consisting of a 41 amino acid sequence of mouse decorin N-terminal domain can prolong thrombin induced fibrinogen/fibrin clot formation. This suggests that in the presence of Zn2+ the decorin N-terminal domain has an anticoagulation activity. The changed Zn2+-binding activities of the truncated MD4 peptides and site-directed mutagenesis generated mutant peptides revealed that the functional MD4 peptide might contain both a structural zinc-binding site in the cysteine cluster region and a catalytic zinc site that could be created by the flanking sequences of the cysteine cluster region. A model of a loop-like structure for MD4 peptide is proposed. ^
Resumo:
Complex networks have been studied extensively due to their relevance to many real-world systems such as the world-wide web, the internet, biological and social systems. During the past two decades, studies of such networks in different fields have produced many significant results concerning their structures, topological properties, and dynamics. Three well-known properties of complex networks are scale-free degree distribution, small-world effect and self-similarity. The search for additional meaningful properties and the relationships among these properties is an active area of current research. This thesis investigates a newer aspect of complex networks, namely their multifractality, which is an extension of the concept of selfsimilarity. The first part of the thesis aims to confirm that the study of properties of complex networks can be expanded to a wider field including more complex weighted networks. Those real networks that have been shown to possess the self-similarity property in the existing literature are all unweighted networks. We use the proteinprotein interaction (PPI) networks as a key example to show that their weighted networks inherit the self-similarity from the original unweighted networks. Firstly, we confirm that the random sequential box-covering algorithm is an effective tool to compute the fractal dimension of complex networks. This is demonstrated on the Homo sapiens and E. coli PPI networks as well as their skeletons. Our results verify that the fractal dimension of the skeleton is smaller than that of the original network due to the shortest distance between nodes is larger in the skeleton, hence for a fixed box-size more boxes will be needed to cover the skeleton. Then we adopt the iterative scoring method to generate weighted PPI networks of five species, namely Homo sapiens, E. coli, yeast, C. elegans and Arabidopsis Thaliana. By using the random sequential box-covering algorithm, we calculate the fractal dimensions for both the original unweighted PPI networks and the generated weighted networks. The results show that self-similarity is still present in generated weighted PPI networks. This implication will be useful for our treatment of the networks in the third part of the thesis. The second part of the thesis aims to explore the multifractal behavior of different complex networks. Fractals such as the Cantor set, the Koch curve and the Sierspinski gasket are homogeneous since these fractals consist of a geometrical figure which repeats on an ever-reduced scale. Fractal analysis is a useful method for their study. However, real-world fractals are not homogeneous; there is rarely an identical motif repeated on all scales. Their singularity may vary on different subsets; implying that these objects are multifractal. Multifractal analysis is a useful way to systematically characterize the spatial heterogeneity of both theoretical and experimental fractal patterns. However, the tools for multifractal analysis of objects in Euclidean space are not suitable for complex networks. In this thesis, we propose a new box covering algorithm for multifractal analysis of complex networks. This algorithm is demonstrated in the computation of the generalized fractal dimensions of some theoretical networks, namely scale-free networks, small-world networks, random networks, and a kind of real networks, namely PPI networks of different species. Our main finding is the existence of multifractality in scale-free networks and PPI networks, while the multifractal behaviour is not confirmed for small-world networks and random networks. As another application, we generate gene interactions networks for patients and healthy people using the correlation coefficients between microarrays of different genes. Our results confirm the existence of multifractality in gene interactions networks. This multifractal analysis then provides a potentially useful tool for gene clustering and identification. The third part of the thesis aims to investigate the topological properties of networks constructed from time series. Characterizing complicated dynamics from time series is a fundamental problem of continuing interest in a wide variety of fields. Recent works indicate that complex network theory can be a powerful tool to analyse time series. Many existing methods for transforming time series into complex networks share a common feature: they define the connectivity of a complex network by the mutual proximity of different parts (e.g., individual states, state vectors, or cycles) of a single trajectory. In this thesis, we propose a new method to construct networks of time series: we define nodes by vectors of a certain length in the time series, and weight of edges between any two nodes by the Euclidean distance between the corresponding two vectors. We apply this method to build networks for fractional Brownian motions, whose long-range dependence is characterised by their Hurst exponent. We verify the validity of this method by showing that time series with stronger correlation, hence larger Hurst exponent, tend to have smaller fractal dimension, hence smoother sample paths. We then construct networks via the technique of horizontal visibility graph (HVG), which has been widely used recently. We confirm a known linear relationship between the Hurst exponent of fractional Brownian motion and the fractal dimension of the corresponding HVG network. In the first application, we apply our newly developed box-covering algorithm to calculate the generalized fractal dimensions of the HVG networks of fractional Brownian motions as well as those for binomial cascades and five bacterial genomes. The results confirm the monoscaling of fractional Brownian motion and the multifractality of the rest. As an additional application, we discuss the resilience of networks constructed from time series via two different approaches: visibility graph and horizontal visibility graph. Our finding is that the degree distribution of VG networks of fractional Brownian motions is scale-free (i.e., having a power law) meaning that one needs to destroy a large percentage of nodes before the network collapses into isolated parts; while for HVG networks of fractional Brownian motions, the degree distribution has exponential tails, implying that HVG networks would not survive the same kind of attack.
Resumo:
Geminivirus infectivity is thought to depend on interactions between the virus replication-associated proteins Rep or RepA and host retinoblastoma-related proteins (pRBR), which control cell-cycle progression. It was determined that the substitution of two amino acids in the Maize streak virus (MSV) RepA pRBR-interaction motif (LLCNE to LLCLK) abolished detectable RepA-pRBR interaction in yeast without abolishing infectivity in maize. Although the mutant virus was infectious in maize, it induced less severe symptoms than the wild-type virus. Sequence analysis of progeny viral DNA isolated from infected maize enabled detection of a high-frequency single-nucleotide reversion of C(601)A in the 3 nt mutated sequence of the Rep gene. Although it did not restore RepA-pRBR interaction in yeast, sequence-specific PCR showed that, in five out of eight plants, the C(601)A reversion appeared by day 10 post-inoculation. In all plants, the C(601)A revertant eventually completely replaced the original mutant population, indicating a high selection pressure for the single-nucleotide reversion. Apart from potentially revealing an alternative or possibly additional function for the stretch of DNA that encodes the apparently non-essential pRBR-interaction motif of MSV Rep, the consistent emergence and eventual dominance of the C(601)A revertant population might provide a useful tool for investigating aspects of MSV biology, such as replication, mutation and evolution rates, and complex population phenomena, such as competition between quasispecies and population turnover. © 2005 SGM.
De Novo Transcriptome Sequence Assembly and Analysis of RNA Silencing Genes of Nicotiana benthamiana
Resumo:
Background: Nicotiana benthamiana has been widely used for transient gene expression assays and as a model plant in the study of plant-microbe interactions, lipid engineering and RNA silencing pathways. Assembling the sequence of its transcriptome provides information that, in conjunction with the genome sequence, will facilitate gaining insight into the plant's capacity for high-level transient transgene expression, generation of mobile gene silencing signals, and hyper-susceptibility to viral infection. Methodology/Results: RNA-seq libraries from 9 different tissues were deep sequenced and assembled, de novo, into a representation of the transcriptome. The assembly, of16GB of sequence, yielded 237,340 contigs, clustering into 119,014 transcripts (unigenes). Between 80 and 85% of reads from all tissues could be mapped back to the full transcriptome. Approximately 63% of the unigenes exhibited a match to the Solgenomics tomato predicted proteins database. Approximately 94% of the Solgenomics N. benthamiana unigene set (16,024 sequences) matched our unigene set (119,014 sequences). Using homology searches we identified 31 homologues that are involved in RNAi-associated pathways in Arabidopsis thaliana, and show that they possess the domains characteristic of these proteins. Of these genes, the RNA dependent RNA polymerase gene, Rdr1, is transcribed but has a 72 nt insertion in exon1 that would cause premature termination of translation. Dicer-like 3 (DCL3) appears to lack both the DEAD helicase motif and second dsRNA binding motif, and DCL2 and AGO4b have unexpectedly high levels of transcription. Conclusions: The assembled and annotated representation of the transcriptome and list of RNAi-associated sequences are accessible at www.benthgenome.com alongside a draft genome assembly. These genomic resources will be very useful for further study of the developmental, metabolic and defense pathways of N. benthamiana and in understanding the mechanisms behind the features which have made it such a well-used model plant. © 2013 Nakasugi et al.
Resumo:
Background Transcription factors (TFs) co-ordinately regulate target genes that are dispersed throughout the genome. This co-ordinate regulation is achieved, in part, through the interaction of transcription factors with conserved cis-regulatory motifs that are in close proximity to the target genes. While much is known about the families of transcription factors that regulate gene expression in plants, there are few well characterised cis-regulatory motifs. In Arabidopsis, over-expression of the MYB transcription factor PAP1 (PRODUCTION OF ANTHOCYANIN PIGMENT 1) leads to transgenic plants with elevated anthocyanin levels due to the co-ordinated up-regulation of genes in the anthocyanin biosynthetic pathway. In addition to the anthocyanin biosynthetic genes, there are a number of un-associated genes that also change in expression level. This may be a direct or indirect consequence of the over-expression of PAP1. Results Oligo array analysis of PAP1 over-expression Arabidopsis plants identified genes co-ordinately up-regulated in response to the elevated expression of this transcription factor. Transient assays on the promoter regions of 33 of these up-regulated genes identified eight promoter fragments that were transactivated by PAP1. Bioinformatic analysis on these promoters revealed a common cis-regulatory motif that we showed is required for PAP1 dependent transactivation. Conclusion Co-ordinated gene regulation by individual transcription factors is a complex collection of both direct and indirect effects. Transient transactivation assays provide a rapid method to identify direct target genes from indirect target genes. Bioinformatic analysis of the promoters of these direct target genes is able to locate motifs that are common to this sub-set of promoters, which is impossible to identify with the larger set of direct and indirect target genes. While this type of analysis does not prove a direct interaction between protein and DNA, it does provide a tool to characterise cis-regulatory sequences that are necessary for transcription activation in a complex list of co-ordinately regulated genes.
Resumo:
Recent research has identified marine molluscs as an excellent source of omega-3 long-chain polyunsaturated fatty acids (lcPUFAs), based on their potential for endogenous synthesis of lcPUFAs. In this study we generated a representative list of fatty acyl desaturase (Fad) and elongation of very long-chain fatty acid (Elovl) genes from major orders of Phylum Mollusca, through the interrogation of transcriptome and genome sequences, and various publicly available databases. We have identified novel and uncharacterised Fad and Elovl sequences in the following species: Anadara trapezia, Nerita albicilla, Nerita melanotragus, Crassostrea gigas, Lottia gigantea, Aplysia californica, Loligo pealeii and Chlamys farreri. Based on alignments of translated protein sequences of Fad and Elovl genes, the haeme binding motif and histidine boxes of Fad proteins, and the histidine box and seventeen important amino acids in Elovl proteins, were highly conserved. Phylogenetic analysis of aligned reference sequences was used to reconstruct the evolutionary relationships for Fad and Elovl genes separately. Multiple, well resolved clades for both the Fad and Elovl sequences were observed, suggesting that repeated rounds of gene duplication best explain the distribution of Fad and Elovl proteins across the major orders of molluscs. For Elovl sequences, one clade contained the functionally characterised Elovl5 proteins, while another clade contained proteins hypothesised to have Elovl4 function. Additional well resolved clades consisted only of uncharacterised Elovl sequences. One clade from the Fad phylogeny contained only uncharacterised proteins, while the other clade contained functionally characterised delta-5 desaturase proteins. The discovery of an uncharacterised Fad clade is particularly interesting as these divergent proteins may have novel functions. Overall, this paper presents a number of novel Fad and Elovl genes suggesting that many mollusc groups possess most of the required enzymes for the synthesis of lcPUFAs.
Resumo:
Many studies have shown that we can gain additional information on time series by investigating their accompanying complex networks. In this work, we investigate the fundamental topological and fractal properties of recurrence networks constructed from fractional Brownian motions (FBMs). First, our results indicate that the constructed recurrence networks have exponential degree distributions; the average degree exponent 〈λ〉 increases first and then decreases with the increase of Hurst index H of the associated FBMs; the relationship between H and 〈λ〉 can be represented by a cubic polynomial function. We next focus on the motif rank distribution of recurrence networks, so that we can better understand networks at the local structure level. We find the interesting superfamily phenomenon, i.e., the recurrence networks with the same motif rank pattern being grouped into two superfamilies. Last, we numerically analyze the fractal and multifractal properties of recurrence networks. We find that the average fractal dimension 〈dB〉 of recurrence networks decreases with the Hurst index H of the associated FBMs, and their dependence approximately satisfies the linear formula 〈dB〉≈2-H, which means that the fractal dimension of the associated recurrence network is close to that of the graph of the FBM. Moreover, our numerical results of multifractal analysis show that the multifractality exists in these recurrence networks, and the multifractality of these networks becomes stronger at first and then weaker when the Hurst index of the associated time series becomes larger from 0.4 to 0.95. In particular, the recurrence network with the Hurst index H=0.5 possesses the strongest multifractality. In addition, the dependence relationships of the average information dimension 〈D(1)〉 and the average correlation dimension 〈D(2)〉 on the Hurst index H can also be fitted well with linear functions. Our results strongly suggest that the recurrence network inherits the basic characteristic and the fractal nature of the associated FBM series.
Resumo:
Mycobacterium leprae recA harbors an in-frame insertion sequence that encodes an intein homing endonuclease (PI-MleI). Most inteins (intein endonucleases) possess two conserved LAGLIDADG (DOD) motifs at their ctive center. A common feature of LAGLIDADG-type homing endonucleases is that they recognize and cleave the same or very similar DNA sequences. However, PI-MleI is distinctive from other members of the family of LAGLIDADG-type HEases for its modular structure with functionally separable domains for DNA-binding and cleavage, each with distinct sequence preferences. Sequence alignment analyses of PI-MleI revealed three putative LAGLIDADG motifs; however, there is conflicting bioinformatics data in regard to their identity and specific location within the intein polypeptide. To resolve this conflict and to determine the active-site residues essential for DNA target site recognition and double-stranded DNA cleavage, we performed site-directed mutagenesis of presumptive catalytic residues in the LAGLIDADG motifs. Analysis of target DNA recognition and kinetic parameters of the wild-type PI-MleI and its variants disclosed that the two amino acid residues, Asp(122) (in Block C) and Asp(193) (in functional Block E), are crucial to the double-stranded DNA endonuclease activity, whereas Asp(218) (in pseudo-Block E) is not. However, despite the reduced catalytic activity, the PI-MleI variants, like the wild-type PI-MleI, generated a footprint of the same length around the insertion site. The D122T variant showed significantly reduced catalytic activity, and D122A and D193A mutations although failed to affect their DNA-binding affinities, but abolished the double-stranded DNA cleavage activity. On the other hand, D122C variant showed approximately twofold higher double-stranded DNA cleavage activity, compared with the wild-type PI-MleI. These results provide compelling evidence that Asp(122) and Asp(193) in DOD motif I and II, respectively, are bona fide active-site residues essential for DNA cleavage activity. The implications of these results are discussed in this report.
Resumo:
A novel multiple turn conformation has been observed for a segment GPGRAFY in the crystal structure of a complex of HIV-1 gp120 V3 loop peptide with the Fab fragment of a neutralizing antibody [Ghiara ct al. (1994) Science 264, 82-85]. A structural motif has been defined for the peptide segment, employing idealized backbone conformations characterized by ranges of virtual C-alpha torsion angles and bond angles. A search of 122 high-resolution protein crystal structures has permitted identification of 24 examples of similar structural motifs. Two major conformational families have been identified, which differ primarily in the conformation at residue 3. The observed conformation at residue 3 in family 1 is left-handed helical (alpha(L)) and that in family 2 is right-handed helical (alpha(R)). Of the 10 examples in family 1, 9 examples have Gly residues at position 3. Of the 12 examples in family 2, 7 examples have Asn/Asp at position 3. Computer modeling of the V3 loop tip sequence using the two backbone conformational families as starting points leads to minimum-energy conformations in which antigenically important side-chains occupy similar spatial arrangements. This stereochemical analysis of the V3 loop tip sequence suggests a rational basis for the design of synthetic analog peptides for use as viral antagonists or synthetic antigens. (C) Munksgaard 1995.