148 resultados para Computational modelling by homology
em Université de Lausanne, Switzerland
Resumo:
Crystallographic data about T-Cell Receptor - peptide - major histocompatibility complex class I (TCRpMHC) interaction have revealed extremely diverse TCR binding modes triggering antigen recognition. Understanding the molecular basis that governs TCR orientation over pMHC is still a considerable challenge. We present a simplified rigid approach applied on all non-redundant TCRpMHC crystal structures available. The CHARMM force field in combination with the FACTS implicit solvation model is used to study the role of long-distance interactions between the TCR and pMHC. We demonstrate that the sum of the coulomb interactions and the electrostatic solvation energies is sufficient to identify two orientations corresponding to energetic minima at 0° and 180° from the native orientation. Interestingly, these results are shown to be robust upon small structural variations of the TCR such as changes induced by Molecular Dynamics simulations, suggesting that shape complementarity is not required to obtain a reliable signal. Accurate energy minima are also identified by confronting unbound TCR crystal structures to pMHC. Furthermore, we decompose the electrostatic energy into residue contributions to estimate their role in the overall orientation. Results show that most of the driving force leading to the formation of the complex is defined by CDR1,2/MHC interactions. This long-distance contribution appears to be independent from the binding process itself, since it is reliably identified without considering neither short-range energy terms nor CDR induced fit upon binding. Ultimately, we present an attempt to predict the TCR/pMHC binding mode for a TCR structure obtained by homology modeling. The simplicity of the approach and the absence of any fitted parameters make it also easily applicable to other types of macromolecular protein complexes.
Resumo:
Transcription Activator-Like Effector Nucleases (TALEN) are potential tools for precise genome engineering of laboratory animals. We report the first targeted genomic integration in the rat using TALENs (Transcription Activator-Like Effector Nucleases) by homology-derived recombination (HDR). We assembled TALENs and designed a linear donor insert targeting a pA476T mutation in the rat Glucocorticoid Receptor (Nr3c1) namely GR(dim), that prevents receptor homodimerization in the mouse. TALEN mRNA and linear double-stranded donor were microinjected into rat one-cell embryos. Overall, we observed targeted genomic modifications in 17% of the offspring, indicating high TALEN cutting efficiency in rat zygotes.
Resumo:
The Na,K-ATPase is a major ion-motive ATPase of the P-type family responsible for many aspects of cellular homeostasis. To determine the structure of the pathway for cations across the transmembrane portion of the Na,K-ATPase, we mutated 24 residues of the fourth transmembrane segment into cysteine and studied their function and accessibility by exposure to the sulfhydryl reagent 2-aminoethyl-methanethiosulfonate. Accessibility was also examined after treatment with palytoxin, which transforms the Na,K-pump into a cation channel. Of the 24 tested cysteine mutants, seven had no or a much reduced transport function. In particular cysteine mutants of the highly conserved "PEG" motif had a strongly reduced activity. However, most of the non-functional mutants could still be transformed by palytoxin as well as all of the functional mutants. Accessibility, determined as a 2-aminoethyl-methanethiosulfonate-induced reduction of the transport activity or as inhibition of the membrane conductance after palytoxin treatment, was observed for the following positions: Phe(323), Ile(322), Gly(326), Ala(330), Pro(333), Glu(334), and Gly(335). In accordance with a structural model of the Na,K-ATPase obtained by homology modeling with the two published structures of sarcoplasmic and endoplasmic reticulum calcium ATPase (Protein Data Bank codes 1EUL and 1IWO), the results suggest the presence of a cation pathway along the side of the fourth transmembrane segment that faces the space between transmembrane segments 5 and 6. The phenylalanine residue in position 323 has a critical position at the outer mouth of the cation pathway. The residues thought to form the cation binding site II ((333)PEGL) are also part of the accessible wall of the cation pathway opened by palytoxin through the Na,K-pump.
Resumo:
Na,K-ATPase, the main active transport system for monovalent cations in animal cells, is responsible for maintaining Na(+) and K(+) gradients across the plasma membrane. During its transport cycle it binds three cytoplasmic Na(+) ions and releases them on the extracellular side of the membrane, and then binds two extracellular K(+) ions and releases them into the cytoplasm. The fourth, fifth, and sixth transmembrane helices of the alpha subunit of Na,K-ATPase are known to be involved in Na(+) and K(+) binding sites, but the gating mechanisms that control the access of these ions to their binding sites are not yet fully understood. We have focused on the second extracellular loop linking transmembrane segments 3 and 4 and attempted to determine its role in gating. We replaced 13 residues of this loop in the rat alpha1 subunit, from E314 to G326, by cysteine, and then studied the function of these mutants using electrophysiological techniques. We analyzed the results using a structural model obtained by homology with SERCA, and ab initio calculations for the second extracellular loop. Four mutants were markedly modified by the sulfhydryl reagent MTSET, and we investigated them in detail. The substituted cysteines were more readily accessible to MTSET in the E1 conformation for the Y315C, W317C, and I322C mutants. Mutations or derivatization of the substituted cysteines in the second extracellular loop resulted in major increases in the apparent affinity for extracellular K(+), and this was associated with a reduction in the maximum activity. The changes produced by the E314C mutation were reversed by MTSET treatment. In the W317C and I322C mutants, MTSET also induced a moderate shift of the E1/E2 equilibrium towards the E1(Na) conformation under Na/Na exchange conditions. These findings indicate that the second extracellular loop must be functionally linked to the gating mechanism that controls the access of K(+) to its binding site.
Resumo:
Bioactive small molecules, such as drugs or metabolites, bind to proteins or other macro-molecular targets to modulate their activity, which in turn results in the observed phenotypic effects. For this reason, mapping the targets of bioactive small molecules is a key step toward unraveling the molecular mechanisms underlying their bioactivity and predicting potential side effects or cross-reactivity. Recently, large datasets of protein-small molecule interactions have become available, providing a unique source of information for the development of knowledge-based approaches to computationally identify new targets for uncharacterized molecules or secondary targets for known molecules. Here, we introduce SwissTargetPrediction, a web server to accurately predict the targets of bioactive molecules based on a combination of 2D and 3D similarity measures with known ligands. Predictions can be carried out in five different organisms, and mapping predictions by homology within and between different species is enabled for close paralogs and orthologs. SwissTargetPrediction is accessible free of charge and without login requirement at http://www.swisstargetprediction.ch.
Resumo:
High-molecular-weight (HMW) penicillin-binding proteins (PBPs) are divided into class A and class B PBPs, which are bifunctional transpeptidases/transglycosylases and monofunctional transpeptidases, respectively. We determined the sequences for the HMW PBP genes of Streptococcus gordonii, a gingivo-dental commensal related to Streptococcus pneumoniae. Five HMW PBPs were identified, including three class A (PBPs 1A, 1B, and 2A) and two class B (PBPs 2B and 2X) PBPs, by homology with those of S. pneumoniae and by radiolabeling with [3H]penicillin. Single and double deletions of each of them were achieved by allelic replacement. All could be deleted, except for PBP 2X, which was essential. Morphological alterations occurred after deletion of PBP 1A (lozenge shape), PBP 2A (separation defect and chaining), and PBP 2B (aberrant septation and premature lysis) but not PBP 1B. The muropeptide cross-link patterns remained similar in all strains, indicating that cross-linkage for one missing PBP could be replaced by others. However, PBP 1A mutants presented shorter glycan chains (by 30%) and a relative decrease (25%) in one monomer stem peptide. Growth rate and viability under aeration, hyperosmolarity, and penicillin exposure were affected primarily in PBP 2B-deleted mutants. In contrast, chain-forming PBP 2A-deleted mutants withstood better aeration, probably because they formed clusters that impaired oxygen diffusion. Double deletion could be generated with any PBP combination and resulted in more-altered mutants. Thus, single deletion of four of the five HMW genes had a detectable effect on the bacterial morphology and/or physiology, and only PBP 1B seemed redundant a priori.
Resumo:
Abstract: The ß-oxidation is the universal pathway that allows living organisms to degrade fatty acids. leading to lipid homeostasis and carbon and energy recovery from the fatty acid molecules. This pathway is centred on four core enzymatic activities sufficient to degrade saturated fatty acids. Additional auxiliary enzymes of the ß-oxidation are necessary for the complete degradation of a larger array of molecules encompassing the unsaturated fatty acids. The main pathways of the ßoxidation of fatty acids have been investigated extensively and auxiliary enzymes are well-known in mammals and yeast. The comparison of the established ß-oxidation systems suggests that the activities that are required to proceed to the full degradation of unsaturated fatty acids are present regardless of the organism and rely on common active site templates. The precise identity of the plant enzymes was unknown. By homology searches in the genome of Arabidopsis thaliana, I identified genes. encoding for proteins that could be orthologous to the yeast or animal auxiliary enzymes Δ 3, Δ 2-enoyl-CoA isomerase, Δ 3,5, Δ 2,4 -dienoyl-CoA isomerase, and type 2 enoyl-CoA hydratase. I established that these genes are expressed in Arabidopsis and that their expression can be correlated to the expression of core ß-oxidation genes. Through the observation of chimeric fluorescent protein fusions, I demonstrated that the identified proteins are localized in the peroxisóme, the only organelle where the ß-oxidation occurs in plants. Enzymatic assays were performed with the partially purified enzymes to demonstrate that the identified enzymes can catalyze the same in vitro reactions as their non-plant orthologs. The activities in vivo of the plant enzymes were demonstrated by heterologous complementation of the corresponding yeast Saccharomyces cerevisiae mutants. The complementation was visualized using the artificial polyhydroxyalkanoate (PHA) production in yeast peroxisomes. The recombinant strains, expressing a Pseudomonas aeruginosa PHA synthase modified for a peroxisomal localization, produce this polymer that serves as a trap for the 3-hydroxyacyl-CoA intermediaries of the ßoxidation and that reflects qualitatively and quantitatively the array of molecules that are processed through the ß-oxidation. This complementation demonstrated the implication of the plant Δ 3, Δ 2-enoyl-CoA isomerases and Δ3,5, Δ2,4-dienoyl-CoA isomerase in the degradation of odd chain position unsaturated fatty acids. The presence of a monofunctional type 2 enoyl-CoA hydratase is a novel in eukaryotes. Downregulation of the corresponding gene expression in an Arabidopsis line, modified to produce PHA in the peroxisome, demonstrated thàt this enzyme participates in vivo to the conversion of the intermediate 3R-hydroxyacyl-CoA, generated by the metabolism of fatty acids with a cis (Z)-unsaturated bond on an even-numbered carbon, to the 2Eenoyl-CoA for further degradation through the core ß-oxidation cycle. Résumé: La ß-oxydation est une voie universelle de dégradation des acides gras qui permet aux organismes vivants d'assurer une homéostasie lipidique et de récupérer l'énergie et le carbone contenus dans les acides gras. Le coeur de cette voie est composé de quatre réactions enzymatiques suffisantes à la dégradation des acides gras saturés. La présence des enzymes auxiliaires de la ß-oxydation est nécessaire à la dégradation d'une gamme plus étendue de molécules comprenant les acides gras insaturés. Les voies principales de la ß-oxydation des acides gras ont été étudiées en détail et les enzymes auxiliaires sont déterminées chez les mammifères et la levure. La comparaison entre les systèmes de ß-oxydation connus suggère que les activités requises pour la dégradation complète des acides gras insaturés reposent sur la présence de site actifs similaires. L'identité précise des enzymes auxiliaires chez les plantes était inconnue. En cherchant par homologie dans le génome de la plante modèle Arabidopsis thaliana, j'ai identifié des gènes codant pour des protéines pouvant être orthologues aux enzymes auxiliaires Δ3 Δ2-enoyl-CoA isomérase, Δ 3,5 Δ 2,4-dienoyl-CoA isomérase et enoyl-CoA hydratase de type 2 d'origine fongique ou mammalienne. J'ai établi la corrélation de l'expression de ces gènes dans Arabidopsis avec celle de gènes des enzymes du coeur de la ß-oxydation. En observant des chimères de fusion avec des protéines fluorescentes, j'ai démontré que les protéines identifiées sont localisées dans le péroxysomes, le seul organelle où la ß-oxydation se déroule chez les plantes. Des essais enzymatiques ont été conduits avec ces enzymes partiellement purifiées pour démontrer que les enzymes identifiées sont capables de catalyser in vitro les mêmes réactions que leurs orthologues non végétaux. Les activités des enzymes végétales in vivo ont été .démontrées par complémentation hétérologue des mutants de délétion correspondants de levure Saccharomyces cerevisiae. La visualisation de la complémentation est rendue possible par la synthèse de polyhydroxyalcanoate (PHA) dans les péroxysomes de levure. Les souches recombinantes expriment la PHA synthase de Pseudomonas aeruginosa modifiée pour être localisée dans le péroxysome produisent ce polymère qui sert de piège pour les 3-hydroxyacylCoAs intermédiaires de la ß-oxydation et qui reflète qualitativement et quantitativement la gamme de molécules qui subit la ß-oxydation. Cette complémentation a permis de démontrer que les Δ3, Δ2-enoyl-CoA isomérases, et la Δ3.5, Δ2,4-dienoyl-CoA isomérase végétales sont impliquées dans la dégradation des acides gras insaturés en position impaire. L'enoyl-CoA hydratase de type 2 monofonctionelle est une enzyme nouvelle chez les eucaryotes. La sous-expression du gène correspondant dans une lignée d'Arabidopsis modifiée pour produite du PHA dans le péroxysome a permis de démontrer que cette enzyme participe in vivo à la dégradation des acides gras ayant une double liaison en conformation cis (Z) en position paire.
Resumo:
Combinatorial optimization involves finding an optimal solution in a finite set of options; many everyday life problems are of this kind. However, the number of options grows exponentially with the size of the problem, such that an exhaustive search for the best solution is practically infeasible beyond a certain problem size. When efficient algorithms are not available, a practical approach to obtain an approximate solution to the problem at hand, is to start with an educated guess and gradually refine it until we have a good-enough solution. Roughly speaking, this is how local search heuristics work. These stochastic algorithms navigate the problem search space by iteratively turning the current solution into new candidate solutions, guiding the search towards better solutions. The search performance, therefore, depends on structural aspects of the search space, which in turn depend on the move operator being used to modify solutions. A common way to characterize the search space of a problem is through the study of its fitness landscape, a mathematical object comprising the space of all possible solutions, their value with respect to the optimization objective, and a relationship of neighborhood defined by the move operator. The landscape metaphor is used to explain the search dynamics as a sort of potential function. The concept is indeed similar to that of potential energy surfaces in physical chemistry. Borrowing ideas from that field, we propose to extend to combinatorial landscapes the notion of the inherent network formed by energy minima in energy landscapes. In our case, energy minima are the local optima of the combinatorial problem, and we explore several definitions for the network edges. At first, we perform an exhaustive sampling of local optima basins of attraction, and define weighted transitions between basins by accounting for all the possible ways of crossing the basins frontier via one random move. Then, we reduce the computational burden by only counting the chances of escaping a given basin via random kick moves that start at the local optimum. Finally, we approximate network edges from the search trajectory of simple search heuristics, mining the frequency and inter-arrival time with which the heuristic visits local optima. Through these methodologies, we build a weighted directed graph that provides a synthetic view of the whole landscape, and that we can characterize using the tools of complex networks science. We argue that the network characterization can advance our understanding of the structural and dynamical properties of hard combinatorial landscapes. We apply our approach to prototypical problems such as the Quadratic Assignment Problem, the NK model of rugged landscapes, and the Permutation Flow-shop Scheduling Problem. We show that some network metrics can differentiate problem classes, correlate with problem non-linearity, and predict problem hardness as measured from the performances of trajectory-based local search heuristics.
Resumo:
Metabolic problems lead to numerous failures during clinical trials, and much effort is now devoted to developing in silico models predicting metabolic stability and metabolites. Such models are well known for cytochromes P450 and some transferases, whereas less has been done to predict the activity of human hydrolases. The present study was undertaken to develop a computational approach able to predict the hydrolysis of novel esters by human carboxylesterase hCES2. The study involved first a homology modeling of the hCES2 protein based on the model of hCES1 since the two proteins share a high degree of homology (congruent with 73%). A set of 40 known substrates of hCES2 was taken from the literature; the ligands were docked in both their neutral and ionized forms using GriDock, a parallel tool based on the AutoDock4.0 engine which can perform efficient and easy virtual screening analyses of large molecular databases exploiting multi-core architectures. Useful statistical models (e.g., r (2) = 0.91 for substrates in their unprotonated state) were calculated by correlating experimental pK(m) values with distance between the carbon atom of the substrate's ester group and the hydroxy function of Ser228. Additional parameters in the equations accounted for hydrophobic and electrostatic interactions between substrates and contributing residues. The negatively charged residues in the hCES2 cavity explained the preference of the enzyme for neutral substrates and, more generally, suggested that ligands which interact too strongly by ionic bonds (e.g., ACE inhibitors) cannot be good CES2 substrates because they are trapped in the cavity in unproductive modes and behave as inhibitors. The effects of protonation on substrate recognition and the contrasting behavior of substrates and products were finally investigated by MD simulations of some CES2 complexes.
Resumo:
MOTIVATION: The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we need to establish relations between ontologies describing different species. RESULTS: We present a new algorithm, and its implementation in the software Homolonto, to create new relationships between anatomical ontologies, based on the homology concept. Homolonto uses a supervised ontology alignment approach. Several alignments can be merged, forming homology groups. We also present an algorithm to generate relationships between these homology groups. This has been used to build a multi-species ontology, for the database of gene expression evolution Bgee. AVAILABILITY: download section of the Bgee website http://bgee.unil.ch/
Resumo:
The resistance of mosquitoes to chemical insecticides is threatening vector control programmes worldwide. Cytochrome P450 monooxygenases (CYPs) are known to play a major role in insecticide resistance, allowing resistant insects to metabolize insecticides at a higher rate. Among them, members of the mosquito CYP6Z subfamily, like Aedes aegypti CYP6Z8 and its Anopheles gambiae orthologue CYP6Z2, have been frequently associated with pyrethroid resistance. However, their role in the pyrethroid degradation pathway remains unclear. In the present study, we created a genetically modified yeast strain overexpressing Ae. aegypti cytochrome P450 reductase and CYP6Z8, thereby producing the first mosquito P450-CPR (NADPH-cytochrome P450-reductase) complex in a yeast recombinant system. The results of the present study show that: (i) CYP6Z8 metabolizes PBAlc (3-phenoxybenzoic alcohol) and PBAld (3-phenoxybenzaldehyde), common pyrethroid metabolites produced by carboxylesterases, producing PBA (3-phenoxybenzoic acid); (ii) CYP6Z8 transcription is induced by PBAlc, PBAld and PBA; (iii) An. gambiae CYP6Z2 metabolizes PBAlc and PBAld in the same way; (iv) PBA is the major metabolite produced in vivo and is excreted without further modification; and (v) in silico modelling of substrate-enzyme interactions supports a similar role of other mosquito CYP6Zs in pyrethroid degradation. By playing a pivotal role in the degradation of pyrethroid insecticides, mosquito CYP6Zs thus represent good targets for mosquito-resistance management strategies.
Resumo:
BACKGROUND: Qualitative frameworks, especially those based on the logical discrete formalism, are increasingly used to model regulatory and signalling networks. A major advantage of these frameworks is that they do not require precise quantitative data, and that they are well-suited for studies of large networks. While numerous groups have developed specific computational tools that provide original methods to analyse qualitative models, a standard format to exchange qualitative models has been missing. RESULTS: We present the Systems Biology Markup Language (SBML) Qualitative Models Package ("qual"), an extension of the SBML Level 3 standard designed for computer representation of qualitative models of biological networks. We demonstrate the interoperability of models via SBML qual through the analysis of a specific signalling network by three independent software tools. Furthermore, the collective effort to define the SBML qual format paved the way for the development of LogicalModel, an open-source model library, which will facilitate the adoption of the format as well as the collaborative development of algorithms to analyse qualitative models. CONCLUSIONS: SBML qual allows the exchange of qualitative models among a number of complementary software tools. SBML qual has the potential to promote collaborative work on the development of novel computational approaches, as well as on the specification and the analysis of comprehensive qualitative models of regulatory and signalling networks.
Resumo:
Depth-averaged velocities and unit discharges within a 30 km reach of one of the world's largest rivers, the Rio Parana, Argentina, were simulated using three hydrodynamic models with different process representations: a reduced complexity (RC) model that neglects most of the physics governing fluid flow, a two-dimensional model based on the shallow water equations, and a three-dimensional model based on the Reynolds-averaged Navier-Stokes equations. Row characteristics simulated using all three models were compared with data obtained by acoustic Doppler current profiler surveys at four cross sections within the study reach. This analysis demonstrates that, surprisingly, the performance of the RC model is generally equal to, and in some instances better than, that of the physics based models in terms of the statistical agreement between simulated and measured flow properties. In addition, in contrast to previous applications of RC models, the present study demonstrates that the RC model can successfully predict measured flow velocities. The strong performance of the RC model reflects, in part, the simplicity of the depth-averaged mean flow patterns within the study reach and the dominant role of channel-scale topographic features in controlling the flow dynamics. Moreover, the very low water surface slopes that typify large sand-bed rivers enable flow depths to be estimated reliably in the RC model using a simple fixed-lid planar water surface approximation. This approach overcomes a major problem encountered in the application of RC models in environments characterised by shallow flows and steep bed gradients. The RC model is four orders of magnitude faster than the physics based models when performing steady-state hydrodynamic calculations. However, the iterative nature of the RC model calculations implies a reduction in computational efficiency relative to some other RC models. A further implication of this is that, if used to simulate channel morphodynamics, the present RC model may offer only a marginal advantage in terms of computational efficiency over approaches based on the shallow water equations. These observations illustrate the trade off between model realism and efficiency that is a key consideration in RC modelling. Moreover, this outcome highlights a need to rethink the use of RC morphodynamic models in fluvial geomorphology and to move away from existing grid-based approaches, such as the popular cellular automata (CA) models, that remain essentially reductionist in nature. In the case of the world's largest sand-bed rivers, this might be achieved by implementing the RC model outlined here as one element within a hierarchical modelling framework that would enable computationally efficient simulation of the morphodynamics of large rivers over millennial time scales. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The M-Coffee server is a web server that makes it possible to compute multiple sequence alignments (MSAs) by running several MSA methods and combining their output into one single model. This allows the user to simultaneously run all his methods of choice without having to arbitrarily choose one of them. The MSA is delivered along with a local estimation of its consistency with the individual MSAs it was derived from. The computation of the consensus multiple alignment is carried out using a special mode of the T-Coffee package [Notredame, Higgins and Heringa (T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000; 302: 205-217); Wallace, O'Sullivan, Higgins and Notredame (M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 2006; 34: 1692-1699)] Given a set of sequences (DNA or proteins) in FASTA format, M-Coffee delivers a multiple alignment in the most common formats. M-Coffee is a freeware open source package distributed under a GPL license and it is available either as a standalone package or as a web service from www.tcoffee.org.
Resumo:
Abstract One of the most important issues in molecular biology is to understand regulatory mechanisms that control gene expression. Gene expression is often regulated by proteins, called transcription factors which bind to short (5 to 20 base pairs),degenerate segments of DNA. Experimental efforts towards understanding the sequence specificity of transcription factors is laborious and expensive, but can be substantially accelerated with the use of computational predictions. This thesis describes the use of algorithms and resources for transcriptionfactor binding site analysis in addressing quantitative modelling, where probabilitic models are built to represent binding properties of a transcription factor and can be used to find new functional binding sites in genomes. Initially, an open-access database(HTPSELEX) was created, holding high quality binding sequences for two eukaryotic families of transcription factors namely CTF/NF1 and LEFT/TCF. The binding sequences were elucidated using a recently described experimental procedure called HTP-SELEX, that allows generation of large number (> 1000) of binding sites using mass sequencing technology. For each HTP-SELEX experiments we also provide accurate primary experimental information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, and assembled clone sequences of binding sequences. The database also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols.The database is available at http://wwwisrec.isb-sib.ch/htpselex/ and and ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex. The Expectation-Maximisation(EM) algorithm is one the frequently used methods to estimate probabilistic models to represent the sequence specificity of transcription factors. We present computer simulations in order to estimate the precision of EM estimated models as a function of data set parameters(like length of initial sequences, number of initial sequences, percentage of nonbinding sequences). We observed a remarkable robustness of the EM algorithm with regard to length of training sequences and the degree of contamination. The HTPSELEX database and the benchmarked results of the EM algorithm formed part of the foundation for the subsequent project, where a statistical framework called hidden Markov model has been developed to represent sequence specificity of the transcription factors CTF/NF1 and LEF1/TCF using the HTP-SELEX experiment data. The hidden Markov model framework is capable of both predicting and classifying CTF/NF1 and LEF1/TCF binding sites. A covariance analysis of the binding sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism. We next tested the LEF1/TCF model by computing binding scores for a set of LEF1/TCF binding sequences for which relative affinities were determined experimentally using non-linear regression. The predicted and experimentally determined binding affinities were in good correlation.