979 resultados para Bioinformatics Analysis


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genome sequence varies in numerous ways among individuals although the gross architecture is fixed for all humans. Retrotransposons create one of the most abundant structural variants in the human genome and are divided in many families, with certain members in some families, e.g., L1, Alu, SVA, and HERV-K, remaining active for transposition. Along with other types of genomic variants, retrotransponson-derived variants contribute to the whole spectrum of genome variants in humans. With the advancement of sequencing techniques, many human genomes are being sequenced at the individual level, fueling the comparative research on these variants among individuals. In this thesis, the evolution and functional impact of structural variations is examined primarily focusing on retrotransposons in the context of human evolution. The thesis comprises of three different studies on the topics that are presented in three data chapters. First, the recent evolution of all human specific AluYb members, representing the second most active subfamily of Alus, was tracked to identify their source/master copy using a novel approach. All human-specific AluYb elements from the reference genome were extracted, aligned with one another to construct clusters of similar copies and each cluster was analyzed to generate the evolutionary relationship between the members of the cluster. The approach resulted in identification of one major driver copy of all human specific Yb8 and the source copy of the Yb9 lineage. Three new subfamilies within the AluYb family – Yb8a1, Yb10 and Yb11 were also identified, with Yb11 being the youngest and most polymorphic. Second, an attempt to construct a relation between transposable elements (TEs) and tandem repeats (TRs) was made at a genome-wide scale for the first time. Upon sequence comparison, positional cross-checking and other relevant analyses, it was observed that over 20% of all TRs are derived from TEs. This result established the first connection between these two types of repetitive elements, and extends our appreciation for the impact of TEs on genomes. Furthermore, only 6% of these TE-derived TRs follow the already postulated initiation and expansion mechanisms, suggesting that the others are likely to follow a yet-unidentified mechanism. Third, by taking a combination of multiple computational approaches involving all types of genetic variations published so far including transposable elements, the first whole genome sequence of the most recent common ancestor of all modern human populations that diverged into different populations around 125,000-100,000 years ago was constructed. The study shows that the current reference genome sequence is 8.89 million base pairs larger than our common ancestor’s genome, contributed by a whole spectrum of genetic mechanisms. The use of this ancestral reference genome to facilitate the analysis of personal genomes was demonstrated using an example genome and more insightful recent evolutionary analyses involving the Neanderthal genome. The three data chapters presented in this thesis conclude that the tandem repeats and transposable elements are not two entirely distinctly isolated elements as over 20% TRs are actually derived from TEs. Certain subfamilies of TEs themselves are still evolving with the generation of newer subfamilies. The evolutionary analyses of all TEs along with other genomic variants helped to construct the genome sequence of the most recent common ancestor to all modern human populations which provides a better alternative to human reference genome and can be a useful resource for the study of personal genomics, population genetics, human and primate evolution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Le rôle important joué par la mitochondrie dans la cellule eucaryote est admis depuis longtemps. Cependant, la composition exacte des mitochondries, ainsi que les processus biologiques qui sy déroulent restent encore largement inconnus. Deux facteurs principaux permettent dexpliquer pourquoi létude des mitochondries progresse si lentement : le manque defficacité des méthodes didentification des protéines mitochondriales et le manque de précision dans lannotation de ces protéines. En conséquence, nous avons développé un nouvel outil informatique, YimLoc, qui permet de prédire avec succès les protéines mitochondriales à partir des séquences génomiques. Cet outil intègre plusieurs indicateurs existants, et sa performance est supérieure à celle des indicateurs considérés individuellement. Nous avons analysé environ 60 génomes fongiques avec YimLoc afin de lever la controverse concernant la localisation de la bêta-oxydation dans ces organismes. Contrairement à ce qui était généralement admis, nos résultats montrent que la plupart des groupes de Fungi possèdent une bêta-oxydation mitochondriale. Ce travail met également en évidence la diversité des processus de bêta-oxydation chez les champignons, en corrélation avec leur utilisation des acides gras comme source dénergie et de carbone. De plus, nous avons étudié le composant clef de la voie de bêta-oxydation mitochondriale, lacyl-CoA déshydrogénase (ACAD), dans 250 espèces, couvrant les 3 domaines de la vie, en combinant la prédiction de la localisation subcellulaire avec la classification en sous-familles et linférence phylogénétique. Notre étude suggère que les gènes ACAD font partie dune ancienne famille qui a adopté des stratégies évolutionnaires innovatrices afin de générer un large ensemble denzymes susceptibles dutiliser la plupart des acides gras et des acides aminés. Finalement, afin de permettre la prédiction de protéines mitochondriales à partir de données autres que les séquences génomiques, nous avons développé le logiciel TESTLoc qui utilise comme données des Expressed Sequence Tags (ESTs). La performance de TESTLoc est significativement supérieure à celle de tout autre outil de prédiction connu. En plus de fournir deux nouveaux outils de prédiction de la localisation subcellulaire utilisant différents types de données, nos travaux démontrent comment lassociation de la prédiction de la localisation subcellulaire à dautres méthodes danalyse in silico permet daméliorer la connaissance des protéines mitochondriales. De plus, ces travaux proposent des hypothèses claires et faciles à vérifier par des expériences, ce qui présente un grand potentiel pour faire progresser nos connaissances des métabolismes mitochondriaux.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Naïvement perçu, le processus d’évolution est une succession d’événements de duplication et de mutations graduelles dans le génome qui mènent à des changements dans les fonctions et les interactions du protéome. La famille des hydrolases de guanosine triphosphate (GTPases) similaire à Ras constitue un bon modèle de travail afin de comprendre ce phénomène fondamental, car cette famille de protéines contient un nombre limité d’éléments qui diffèrent en fonctionnalité et en interactions. Globalement, nous désirons comprendre comment les mutations singulières au niveau des GTPases affectent la morphologie des cellules ainsi que leur degré d’impact sur les populations asynchrones. Mon travail de maîtrise vise à classifier de manière significative différents phénotypes de la levure Saccaromyces cerevisiae via l’analyse de plusieurs critères morphologiques de souches exprimant des GTPases mutées et natives. Notre approche à base de microscopie et d’analyses bioinformatique des images DIC (microscopie d’interférence différentielle de contraste) permet de distinguer les phénotypes propres aux cellules natives et aux mutants. L’emploi de cette méthode a permis une détection automatisée et une caractérisation des phénotypes mutants associés à la sur-expression de GTPases constitutivement actives. Les mutants de GTPases constitutivement actifs Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V ont été analysés avec succès. En effet, l’implémentation de différents algorithmes de partitionnement, permet d’analyser des données qui combinent les mesures morphologiques de population native et mutantes. Nos résultats démontrent que l’algorithme Fuzzy C-Means performe un partitionnement efficace des cellules natives ou mutantes, où les différents types de cellules sont classifiés en fonction de plusieurs facteurs de formes cellulaires obtenus à partir des images DIC. Cette analyse démontre que les mutations Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V induisent respectivement des phénotypes amorphe, allongé, rond et large qui sont représentés par des vecteurs de facteurs de forme distincts. Ces distinctions sont observées avec différentes proportions (morphologie mutante / morphologie native) dans les populations de mutants. Le développement de nouvelles méthodes automatisées d’analyse morphologique des cellules natives et mutantes s’avère extrêmement utile pour l’étude de la famille des GTPases ainsi que des résidus spécifiques qui dictent leurs fonctions et réseau d’interaction. Nous pouvons maintenant envisager de produire des mutants de GTPases qui inversent leur fonction en ciblant des résidus divergents. La substitution fonctionnelle est ensuite détectée au niveau morphologique grâce à notre nouvelle stratégie quantitative. Ce type d’analyse peut également être transposé à d’autres familles de protéines et contribuer de manière significative au domaine de la biologie évolutive.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, we observed a significant increase of food fraud ranging from false label claims to the use of additives and fillers to increase profitability. Recently in 2013, horse and pig DNA were detected in beef products sold from several retailers. Mass spectrometry has become the workhorse in protein research and the detection of marker proteins could serve for both animal species and tissue authentication. Meat species authenticity will be performed using a well defined proteogenomic annotation, carefully chosen surrogate tryptic peptides and analysis using a hybrid quadrupole-Orbitrap mass spectrometer. Selected mammalian meat samples were homogenized, proteins were extracted and digested with trypsin. The samples were analyzed using a high-resolution mass spectrometer. The chromatography was achieved using a 30 minutes linear gradient along with a BioBasic C8 100 × 1 mm column at a flow rate of 75 µL/min. The mass spectrometer was operated in full-scan high resolution and accurate mass. MS/MS spectra were collected for selected proteotypic peptides. Muscular proteins were methodically analyzed in silico in order to generate tryptic peptide mass lists and theoretical MS/MS spectra. Following a comprehensive bottom-up proteomic analysis, we were able to detect and identify a proteotypic myoglobin tryptic peptide [120-134] for each species with observed m/z below 1.3 ppm compared to theoretical values. Moreover, proteotypic peptides from myosin-1, myosin-2 and -hemoglobin were also identified. This targeted method allowed a comprehensive meat speciation down to 1% (w/w) of undesired product.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Les histones sont des protéines nucléaires hautement conservées chez les cellules des eucaryotes. Elles permettent d’organiser et de compacter l’ADN sous la forme de nucléosomes, ceux-ci representant les sous unités de base de la chromatine. Les histones peuvent être modifiées par de nombreuses modifications post-traductionnelles (PTMs) telles que l’acétylation, la méthylation et la phosphorylation. Ces modifications jouent un rôle essentiel dans la réplication de l’ADN, la transcription et l’assemblage de la chromatine. L’abondance de ces modifications peut varier de facon significative lors du developpement des maladies incluant plusieurs types de cancer. Par exemple, la perte totale de la triméthylation sur H4K20 ainsi que l’acétylation sur H4K16 sont des marqueurs tumoraux spécifiques a certains types de cancer chez l’humain. Par conséquent, l’étude de ces modifications et des événements determinant la dynamique des leurs changements d’abondance sont des atouts importants pour mieux comprendre les fonctions cellulaires et moléculaires lors du développement de la maladie. De manière générale, les modifications des histones sont étudiées par des approches biochimiques telles que les immuno-buvardage de type Western ou les méthodes d’immunoprécipitation de la chromatine (ChIP). Cependant, ces approches présentent plusieurs inconvénients telles que le manque de spécificité ou la disponibilité des anticorps, leur coût ou encore la difficulté de les produire et de les valider. Au cours des dernières décennies, la spectrométrie de masse (MS) s’est avérée être une méthode performante pour la caractérisation et la quantification des modifications d’histones. La MS offre de nombreux avantages par rapport aux techniques traditionnelles. Entre autre, elle permet d’effectuer des analyses reproductibles, spécifiques et facilite l’etude d’un large spectre de PTMs en une seule analyse. Dans cette thèse, nous présenterons le développement et l’application de nouveaux outils analytiques pour l’identification et à la quantification des PTMs modifiant les histones. Dans un premier temps, une méthode a été développée pour mesurer les changements d’acétylation spécifiques à certains sites des histones. Cette méthode combine l’analyse des histones intactes et les méthodes de séquençage peptidique afin de déterminer les changements d’acétylation suite à la réaction in vitro par l’histone acétyltransférase (HAT) de levure Rtt109 en présence de ses chaperonnes (Asf1 ou Vps75). Dans un second temps, nous avons développé une méthode d’analyse des peptides isomériques des histones. Cette méthode combine la LC-MS/MS à haute résolution et un nouvel outil informatique appelé Iso-PeptidAce qui permet de déconvoluer les spectres mixtes de peptides isomériques. Nous avons évalué Iso-PeptidAce avec un mélange de peptides synthétiques isomériques. Nous avons également validé les performances de cette approche avec des histones isolées de cellules humaines érythroleucémiques (K562) traitées avec des inhibiteurs d’histones désacétylases (HDACi) utilisés en clinique, et des histones de Saccharomyces cerevisiae liées au facteur d’assemblage de la chromatine (CAF-1) purifiées par chromatographie d’affinité. Enfin, en utilisant la méthode présentée précédemment, nous avons fait une analyse approfondie de la spécificité de plusieurs HATs et HDACs chez Schizosaccharomyces pombe. Nous avons donc déterminé les niveaux d’acétylation d’histones purifiées à partir de cellules contrôles ou de souches mutantes auxquelles il manque une HAT ou HDAC. Notre analyse nous a permis de valider plusieurs cibles connues des HATs et HDACs et d’en identifier de nouvelles. Nos données ont également permis de définir le rôle des différentes HATs et HDACs dans le maintien de l’équilibre d’acétylation des histones. Dans l’ensemble, nous anticipons que les méthodes décrites dans cette thèse permettront de résoudre certains défis rencontrés dans l’étude de la chromatine. De plus, ces données apportent de nouvelles connaissances pour l’élaboration d’études génétiques et biochimiques utilisant S. pombe.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: Intrinsic protein disorder is functionally implicated in numerous biological roles and is, therefore, ubiquitous in proteins from all three kingdoms of life. Determining the disordered regions in proteins presents a challenge for experimental methods and so recently there has been much focus on the development of improved predictive methods. In this article, a novel technique for disorder prediction, called DISOclust, is described, which is based on the analysis of multiple protein fold recognition models. The DISOclust method is rigorously benchmarked against the top.ve methods from the CASP7 experiment. In addition, the optimal consensus of the tested methods is determined and the added value from each method is quantified. Results: The DISOclust method is shown to add the most value to a simple consensus of methods, even in the absence of target sequence homology to known structures. A simple consensus of methods that includes DISOclust can significantly outperform all of the previous individual methods tested.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ANeCA is a fully automated implementation of Nested Clade Phylogeographic Analysis. This was originally developed by Templeton and colleagues, and has been used to infer, from the pattern of gene sequence polymorphisms in a geographically structured population, the historical demographic processes that have shaped its evolution. Until now it has been necessary to perform large parts of the procedure manually. We provide a program that will take data in Nexus sequential format, and directly output a set of inferences. The software also includes TCS v1.18 and GeoDis v2.2 as part of automation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives: The aim of this study was to determine and compare the proteomes of three triclosan-resistant mutants of Salmonella enterica serovar Typhimurium in order to identify proteins involved in triclosan resistance. Methods: The proteomes of three distinct but isogenic triclosan-resistant mutants were determined using two-dimensional liquid chromatography mass separation. Bioinformatics was then used to identify and quantify tryptic peptides in order to determine protein expression. Results: Proteomic analysis of the triclosan-resistant mutants identified a common set of proteins involved in production of pyruvate or fatty acid with differential expression in all mutants, but also demonstrated specific patterns of expression associated with each phenotype. Conclusions: These data show that triclosan resistance can occur via distinct pathways in Salmonella, and demonstrate a novel triclosan resistance network that is likely to have relevance to other pathogenic bacteria subject to triclosan exposure and may provide new targets for development of antimicrobial agents.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Affymetrix GeneChip arrays are widely used for transcriptomic studies in a diverse range of species. Each gene is represented on a GeneChip array by a probe- set, consisting of up to 16 probe-pairs. Signal intensities across probe- pairs within a probe-set vary in part due to different physical hybridisation characteristics of individual probes with their target labelled transcripts. We have previously developed a technique to study the transcriptomes of heterologous species based on hybridising genomic DNA (gDNA) to a GeneChip array designed for a different species, and subsequently using only those probes with good homology. Results: Here we have investigated the effects of hybridising homologous species gDNA to study the transcriptomes of species for which the arrays have been designed. Genomic DNA from Arabidopsis thaliana and rice (Oryza sativa) were hybridised to the Affymetrix Arabidopsis ATH1 and Rice Genome GeneChip arrays respectively. Probe selection based on gDNA hybridisation intensity increased the number of genes identified as significantly differentially expressed in two published studies of Arabidopsis development, and optimised the analysis of technical replicates obtained from pooled samples of RNA from rice. Conclusion: This mixed physical and bioinformatics approach can be used to optimise estimates of gene expression when using GeneChip arrays.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Alzheimer`s Disease (AD) is the most common type of dementia among the elderly, with devastating consequences for the patient, their relatives, and caregivers. More than 300 genetic polymorphisms have been involved with AD, demonstrating that this condition is polygenic and with a complex pattern of inheritance. This paper aims to report and compare the results of AD genetics studies in case-control and familial analysis performed in Brazil since our first publication, 10 years ago. They include the following genes/markers: Apolipoprotein E (APOE), 5-hidroxytryptamine transporter length polymorphic region (5-HTTLPR), brain-derived neurotrophin factor (BDNF), monoamine oxidase A (MAO-A), and two simple-sequence tandem repeat polymorphisms (DXS1047 and D10S1423). Previously unpublished data of the interleukin-1 alpha (IL-1 alpha) and interleukin-1 beta (IL-1 beta) genes are reported here briefly. Results from others Brazilian studies with AD patients are also reported at this short review. Four local families studied with various markers at the chromosome 21, 19, 14, and 1 are briefly reported for the first time. The importance of studying DNA samples from Brazil is highlighted because of the uniqueness of its population, which presents both intense ethnical miscegenation, mainly at the east coast, but also clusters with high inbreeding rates in rural areas at the countryside. We discuss the current stage of extending these studies using high-throughput methods of large-scale genotyping, such as single nucleotide polymorphism microarrays, associated with bioinformatics tools that allow the analysis of such extensive number of genetics variables, with different levels of penetrance. There is still a long way between the huge amount of data gathered so far and the actual application toward the full understanding of AD, but the final goal is to develop precise tools for diagnosis and prognosis, creating new strategies for better treatments based on genetic profile.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Superoxide dismutases (SODs) are a crucial class of enzymes in the combat against intracellular free radical damage. They eliminate superoxide radicals by converting them into hydrogen peroxide and oxygen. In spite of their very different life cycles and infection strategies, the human parasites Plasmodium falciparum, Trypanosoma cruzi and Trypanosoma brucei are known to be sensitive to oxidative stress. Thus the parasite Fe-SODs have become attractive targets for novel drug development. Here we report the crystal structures of FeSODs from the trypanosomes T. brucei at 2.0 angstrom and T. cruzi at 1.9 angstrom resolution, and that from P. falciparum at a higher resolution (2.0 angstrom) to that previously reported. The homodimeric enzymes are compared to the related human MnSOD with particular attention to structural aspects which are relevant for drug design. Although the structures possess a very similar overall fold, differences between the enzymes at the entrance to the channel which leads to the active site could be identified. These lead to a slightly broader and more positively charged cavity in the parasite enzymes. Furthermore, a statistical coupling analysis (SCA) for the whole Fe/MnSOD family reveals different patterns of residue coupling for Mn and Fe SODs, as well as for the dimeric and tetrameric states. In both cases, the statistically coupled residues lie adjacent to the conserved core surrounding the metal center and may be expected to be responsible for its fine tuning, leading to metal ion specificity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tuberculosis (TB) is one of the most common infectious diseases known to man and responsible for millions of human deaths in the world. The increasing incidence of TB in developing countries, the proliferation of multidrug resistant strains, and the absence of resources for treatment have highlighted the need of developing new drugs against TB. The shikimate pathway leads to the biosynthesis of chorismate, a precursor of aromatic amino acids. This pathway is absent from mammals and shown to be essential for the survival of Mycobacterium tuberculosis, the causative agent of TB. Accordingly, enzymes of aromatic amino acid biosynthesis pathway represent promising targets for structure-based drug design. The first reaction in phenylalanine biosynthesis involves the conversion of chorismate to prephenate, catalyzed by chorismate mutase. The second reaction is catalyzed by prephenate dehydratase (PDT) and involves decarboxylation and dehydratation of prephenate to form phenylpyruvate, the precursor of phenylalanine. Here, we describe utilization of different techniques to infer the structure of M. tuberculosis PDT (MtbPDT) in solution. Small angle X-ray scattering and ultracentrifugation analysis showed that the protein oligomeric state is a tetramer and MtbPDT is a flat disk protein. Bioinformatics tools were used to infer the structure of MtbPDT A molecular model for MtbPDT is presented and molecular dynamics simulations indicate that MtbPDT i.s stable. Experimental and molecular modeling results were in agreement and provide evidence for a tetrameric state of MtbPDT in solution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.