856 resultados para Data Driven Clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Crystallographic data about T-Cell Receptor - peptide - major histocompatibility complex class I (TCRpMHC) interaction have revealed extremely diverse TCR binding modes triggering antigen recognition. Understanding the molecular basis that governs TCR orientation over pMHC is still a considerable challenge. We present a simplified rigid approach applied on all non-redundant TCRpMHC crystal structures available. The CHARMM force field in combination with the FACTS implicit solvation model is used to study the role of long-distance interactions between the TCR and pMHC. We demonstrate that the sum of the coulomb interactions and the electrostatic solvation energies is sufficient to identify two orientations corresponding to energetic minima at 0° and 180° from the native orientation. Interestingly, these results are shown to be robust upon small structural variations of the TCR such as changes induced by Molecular Dynamics simulations, suggesting that shape complementarity is not required to obtain a reliable signal. Accurate energy minima are also identified by confronting unbound TCR crystal structures to pMHC. Furthermore, we decompose the electrostatic energy into residue contributions to estimate their role in the overall orientation. Results show that most of the driving force leading to the formation of the complex is defined by CDR1,2/MHC interactions. This long-distance contribution appears to be independent from the binding process itself, since it is reliably identified without considering neither short-range energy terms nor CDR induced fit upon binding. Ultimately, we present an attempt to predict the TCR/pMHC binding mode for a TCR structure obtained by homology modeling. The simplicity of the approach and the absence of any fitted parameters make it also easily applicable to other types of macromolecular protein complexes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure. RESULTS: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae. CONCLUSION: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Acquiring lexical information is a complex problem, typically approached by relying on a number of contexts to contribute information for classification. One of the first issues to address in this domain is the determination of such contexts. The work presented here proposes the use of automatically obtained FORMAL role descriptors as features used to draw nouns from the same lexical semantic class together in an unsupervised clustering task. We have dealt with three lexical semantic classes (HUMAN, LOCATION and EVENT) in English. The results obtained show that it is possible to discriminate between elements from different lexical semantic classes using only FORMAL role information, hence validating our initial hypothesis. Also, iterating our method accurately accounts for fine-grained distinctions within lexical classes, namely distinctions involving ambiguous expressions. Moreover, a filtering and bootstrapping strategy employed in extracting FORMAL role descriptors proved to minimize effects of sparse data and noise in our task.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In conditions of T lymphopenia, interleukin (IL) 7 levels rise and, via T cell receptor for antigen-self-major histocompatibility complex (MHC) interaction, induce residual naive T cells to proliferate. This pattern of lymphopenia-induced "homeostatic" proliferation is typically quite slow and causes a gradual increase in total T cell numbers and differentiation into cells with features of memory cells. In contrast, we describe a novel form of homeostatic proliferation that occurs when naive T cells encounter raised levels of IL-2 and IL-15 in vivo. In this situation, CD8(+) T cells undergo massive expansion and rapid differentiation into effector cells, thus closely resembling the T cell response to foreign antigens. However, the responses induced by IL-2/IL-15 are not seen in MHC-deficient hosts, implying that the responses are driven by self-ligands. Hence, homeostatic proliferation of naive T cells can be either slow or fast, with the quality of the response to self being dictated by the particular cytokine (IL-7 vs. IL-2/IL-15) concerned. The relevance of the data to the gradual transition of naive T cells into memory-phenotype (MP) cells with age is discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We use CEX repeated cross-section data on consumption and income, to evaluate the nature of increased income inequality in the 1980s and 90s. We decompose unexpected changes in family income into transitory and permanent, and idiosyncratic and aggregate components, and estimate the contribution of each component to total inequality. The model we use is a linearized incomplete markets model, enriched to incorporate risk-sharing while maintaining tractability. Our estimates suggest that taking risk sharing into account is important for the model fit; that the increase in inequality in the 1980s was mainly permanent; and that inequality is driven almost entirely by idiosyncratic income risk. In addition we find no evidence for cyclical behavior of consumption risk, casting doubt on Constantinides and Duffie s (1995) explanation for the equity premium puzzle.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work proposes an original contribution to the understanding of shermen spatial behavior, based on the behavioral ecology and movement ecology paradigms. Through the analysis of Vessel Monitoring System (VMS) data, we characterized the spatial behavior of Peruvian anchovy shermen at di erent scales: (1) the behavioral modes within shing trips (i.e., searching, shing and cruising); (2) the behavioral patterns among shing trips; (3) the behavioral patterns by shing season conditioned by ecosystem scenarios; and (4) the computation of maps of anchovy presence proxy from the spatial patterns of behavioral mode positions. At the rst scale considered, we compared several Markovian (hidden Markov and semi-Markov models) and discriminative models (random forests, support vector machines and arti cial neural networks) for inferring the behavioral modes associated with VMS tracks. The models were trained under a supervised setting and validated using tracks for which behavioral modes were known (from on-board observers records). Hidden semi-Markov models performed better, and were retained for inferring the behavioral modes on the entire VMS dataset. At the second scale considered, each shing trip was characterized by several features, including the time spent within each behavioral mode. Using a clustering analysis, shing trip patterns were classi ed into groups associated to management zones, eet segments and skippers' personalities. At the third scale considered, we analyzed how ecological conditions shaped shermen behavior. By means of co-inertia analyses, we found signi cant associations between shermen, anchovy and environmental spatial dynamics, and shermen behavioral responses were characterized according to contrasted environmental scenarios. At the fourth scale considered, we investigated whether the spatial behavior of shermen re ected to some extent the spatial distribution of anchovy. Finally, this work provides a wider view of shermen behavior: shermen are not only economic agents, but they are also foragers, constrained by ecosystem variability. To conclude, we discuss how these ndings may be of importance for sheries management, collective behavior analyses and end-to-end models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cannabis use is highly prevalent among people with schizophrenia, and coupled with impaired cognition, is thought to heighten the risk of illness onset. However, while heavy cannabis use has been associated with cognitive deficits in long-term users, studies among patients with schizophrenia have been contradictory. This article consists of 2 studies. In Study I, a meta-analysis of 10 studies comprising 572 patients with established schizophrenia (with and without comorbid cannabis use) was conducted. Patients with a history of cannabis use were found to have superior neuropsychological functioning. This finding was largely driven by studies that included patients with a lifetime history of cannabis use rather than current or recent use. In Study II, we examined the neuropsychological performance of 85 patients with first-episode psychosis (FEP) and 43 healthy nonusing controls. Relative to controls, FEP patients with a history of cannabis use (FEP + CANN; n = 59) displayed only selective neuropsychological impairments while those without a history (FEP - CANN; n = 26) displayed generalized deficits. When directly compared, FEP + CANN patients performed better on tests of visual memory, working memory, and executive functioning. Patients with early onset cannabis use had less neuropsychological impairment than patients with later onset use. Together, these findings suggest that patients with schizophrenia or FEP with a history of cannabis use have superior neuropsychological functioning compared with nonusing patients. This association between better cognitive performance and cannabis use in schizophrenia may be driven by a subgroup of "neurocognitively less impaired" patients, who only developed psychosis after a relatively early initiation into cannabis use.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: A relative inability to capture a sufficiently large patient population in any one geographic location has traditionally limited research into rare diseases. METHODS AND RESULTS: Clinicians interested in the rare disease lymphangioleiomyomatosis (LAM) have worked with the LAM Treatment Alliance, the MIT Media Lab, and Clozure Associates to cooperate in the design of a state-of-the-art data coordination platform that can be used for clinical trials and other research focused on the global LAM patient population. This platform is a component of a set of web-based resources, including a patient self-report data portal, aimed at accelerating research in rare diseases in a rigorous fashion. CONCLUSIONS: Collaboration between clinicians, researchers, advocacy groups, and patients can create essential community resource infrastructure to accelerate rare disease research. The International LAM Registry is an example of such an effort. 82.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider systems described by nonlinear stochastic differential equations with multiplicative noise. We study the relaxation time of the steady-state correlation function as a function of noise parameters. We consider the white- and nonwhite-noise case for a prototype model for which numerical data are available. We discuss the validity of analytical approximation schemes. For the white-noise case we discuss the results of a projector-operator technique. This discussion allows us to give a generalization of the method to the non-white-noise case. Within this generalization, we account for the growth of the relaxation time as a function of the correlation time of the noise. This behavior is traced back to the existence of a non-Markovian term in the equation for the correlation function.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La présente étude est à la fois une évaluation du processus de la mise en oeuvre et des impacts de la police de proximité dans les cinq plus grandes zones urbaines de Suisse - Bâle, Berne, Genève, Lausanne et Zurich. La police de proximité (community policing) est à la fois une philosophie et une stratégie organisationnelle qui favorise un partenariat renouvelé entre la police et les communautés locales dans le but de résoudre les problèmes relatifs à la sécurité et à l'ordre public. L'évaluation de processus a analysé des données relatives aux réformes internes de la police qui ont été obtenues par l'intermédiaire d'entretiens semi-structurés avec des administrateurs clés des cinq départements de police, ainsi que dans des documents écrits de la police et d'autres sources publiques. L'évaluation des impacts, quant à elle, s'est basée sur des variables contextuelles telles que des statistiques policières et des données de recensement, ainsi que sur des indicateurs d'impacts construit à partir des données du Swiss Crime Survey (SCS) relatives au sentiment d'insécurité, à la perception du désordre public et à la satisfaction de la population à l'égard de la police. Le SCS est un sondage régulier qui a permis d'interroger des habitants des cinq grandes zones urbaines à plusieurs reprises depuis le milieu des années 1980. L'évaluation de processus a abouti à un « Calendrier des activités » visant à créer des données de panel permettant de mesurer les progrès réalisés dans la mise en oeuvre de la police de proximité à l'aide d'une grille d'évaluation à six dimensions à des intervalles de cinq ans entre 1990 et 2010. L'évaluation des impacts, effectuée ex post facto, a utilisé un concept de recherche non-expérimental (observational design) dans le but d'analyser les impacts de différents modèles de police de proximité dans des zones comparables à travers les cinq villes étudiées. Les quartiers urbains, délimités par zone de code postal, ont ainsi été regroupés par l'intermédiaire d'une typologie réalisée à l'aide d'algorithmes d'apprentissage automatique (machine learning). Des algorithmes supervisés et non supervisés ont été utilisés sur les données à haute dimensionnalité relatives à la criminalité, à la structure socio-économique et démographique et au cadre bâti dans le but de regrouper les quartiers urbains les plus similaires dans des clusters. D'abord, les cartes auto-organisatrices (self-organizing maps) ont été utilisées dans le but de réduire la variance intra-cluster des variables contextuelles et de maximiser simultanément la variance inter-cluster des réponses au sondage. Ensuite, l'algorithme des forêts d'arbres décisionnels (random forests) a permis à la fois d'évaluer la pertinence de la typologie de quartier élaborée et de sélectionner les variables contextuelles clés afin de construire un modèle parcimonieux faisant un minimum d'erreurs de classification. Enfin, pour l'analyse des impacts, la méthode des appariements des coefficients de propension (propensity score matching) a été utilisée pour équilibrer les échantillons prétest-posttest en termes d'âge, de sexe et de niveau d'éducation des répondants au sein de chaque type de quartier ainsi identifié dans chacune des villes, avant d'effectuer un test statistique de la différence observée dans les indicateurs d'impacts. De plus, tous les résultats statistiquement significatifs ont été soumis à une analyse de sensibilité (sensitivity analysis) afin d'évaluer leur robustesse face à un biais potentiel dû à des covariables non observées. L'étude relève qu'au cours des quinze dernières années, les cinq services de police ont entamé des réformes majeures de leur organisation ainsi que de leurs stratégies opérationnelles et qu'ils ont noué des partenariats stratégiques afin de mettre en oeuvre la police de proximité. La typologie de quartier développée a abouti à une réduction de la variance intra-cluster des variables contextuelles et permet d'expliquer une partie significative de la variance inter-cluster des indicateurs d'impacts avant la mise en oeuvre du traitement. Ceci semble suggérer que les méthodes de géocomputation aident à équilibrer les covariables observées et donc à réduire les menaces relatives à la validité interne d'un concept de recherche non-expérimental. Enfin, l'analyse des impacts a révélé que le sentiment d'insécurité a diminué de manière significative pendant la période 2000-2005 dans les quartiers se trouvant à l'intérieur et autour des centres-villes de Berne et de Zurich. Ces améliorations sont assez robustes face à des biais dus à des covariables inobservées et covarient dans le temps et l'espace avec la mise en oeuvre de la police de proximité. L'hypothèse alternative envisageant que les diminutions observées dans le sentiment d'insécurité soient, partiellement, un résultat des interventions policières de proximité semble donc être aussi plausible que l'hypothèse nulle considérant l'absence absolue d'effet. Ceci, même si le concept de recherche non-expérimental mis en oeuvre ne peut pas complètement exclure la sélection et la régression à la moyenne comme explications alternatives. The current research project is both a process and impact evaluation of community policing in Switzerland's five major urban areas - Basel, Bern, Geneva, Lausanne, and Zurich. Community policing is both a philosophy and an organizational strategy that promotes a renewed partnership between the police and the community to solve problems of crime and disorder. The process evaluation data on police internal reforms were obtained through semi-structured interviews with key administrators from the five police departments as well as from police internal documents and additional public sources. The impact evaluation uses official crime records and census statistics as contextual variables as well as Swiss Crime Survey (SCS) data on fear of crime, perceptions of disorder, and public attitudes towards the police as outcome measures. The SCS is a standing survey instrument that has polled residents of the five urban areas repeatedly since the mid-1980s. The process evaluation produced a "Calendar of Action" to create panel data to measure community policing implementation progress over six evaluative dimensions in intervals of five years between 1990 and 2010. The impact evaluation, carried out ex post facto, uses an observational design that analyzes the impact of the different community policing models between matched comparison areas across the five cities. Using ZIP code districts as proxies for urban neighborhoods, geospatial data mining algorithms serve to develop a neighborhood typology in order to match the comparison areas. To this end, both unsupervised and supervised algorithms are used to analyze high-dimensional data on crime, the socio-economic and demographic structure, and the built environment in order to classify urban neighborhoods into clusters of similar type. In a first step, self-organizing maps serve as tools to develop a clustering algorithm that reduces the within-cluster variance in the contextual variables and simultaneously maximizes the between-cluster variance in survey responses. The random forests algorithm then serves to assess the appropriateness of the resulting neighborhood typology and to select the key contextual variables in order to build a parsimonious model that makes a minimum of classification errors. Finally, for the impact analysis, propensity score matching methods are used to match the survey respondents of the pretest and posttest samples on age, gender, and their level of education for each neighborhood type identified within each city, before conducting a statistical test of the observed difference in the outcome measures. Moreover, all significant results were subjected to a sensitivity analysis to assess the robustness of these findings in the face of potential bias due to some unobserved covariates. The study finds that over the last fifteen years, all five police departments have undertaken major reforms of their internal organization and operating strategies and forged strategic partnerships in order to implement community policing. The resulting neighborhood typology reduced the within-cluster variance of the contextual variables and accounted for a significant share of the between-cluster variance in the outcome measures prior to treatment, suggesting that geocomputational methods help to balance the observed covariates and hence to reduce threats to the internal validity of an observational design. Finally, the impact analysis revealed that fear of crime dropped significantly over the 2000-2005 period in the neighborhoods in and around the urban centers of Bern and Zurich. These improvements are fairly robust in the face of bias due to some unobserved covariate and covary temporally and spatially with the implementation of community policing. The alternative hypothesis that the observed reductions in fear of crime were at least in part a result of community policing interventions thus appears at least as plausible as the null hypothesis of absolutely no effect, even if the observational design cannot completely rule out selection and regression to the mean as alternative explanations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The proportion of population living in or around cites is more important than ever. Urban sprawl and car dependence have taken over the pedestrian-friendly compact city. Environmental problems like air pollution, land waste or noise, and health problems are the result of this still continuing process. The urban planners have to find solutions to these complex problems, and at the same time insure the economic performance of the city and its surroundings. At the same time, an increasing quantity of socio-economic and environmental data is acquired. In order to get a better understanding of the processes and phenomena taking place in the complex urban environment, these data should be analysed. Numerous methods for modelling and simulating such a system exist and are still under development and can be exploited by the urban geographers for improving our understanding of the urban metabolism. Modern and innovative visualisation techniques help in communicating the results of such models and simulations. This thesis covers several methods for analysis, modelling, simulation and visualisation of problems related to urban geography. The analysis of high dimensional socio-economic data using artificial neural network techniques, especially self-organising maps, is showed using two examples at different scales. The problem of spatiotemporal modelling and data representation is treated and some possible solutions are shown. The simulation of urban dynamics and more specifically the traffic due to commuting to work is illustrated using multi-agent micro-simulation techniques. A section on visualisation methods presents cartograms for transforming the geographic space into a feature space, and the distance circle map, a centre-based map representation particularly useful for urban agglomerations. Some issues on the importance of scale in urban analysis and clustering of urban phenomena are exposed. A new approach on how to define urban areas at different scales is developed, and the link with percolation theory established. Fractal statistics, especially the lacunarity measure, and scale laws are used for characterising urban clusters. In a last section, the population evolution is modelled using a model close to the well-established gravity model. The work covers quite a wide range of methods useful in urban geography. Methods should still be developed further and at the same time find their way into the daily work and decision process of urban planners. La part de personnes vivant dans une région urbaine est plus élevé que jamais et continue à croître. L'étalement urbain et la dépendance automobile ont supplanté la ville compacte adaptée aux piétons. La pollution de l'air, le gaspillage du sol, le bruit, et des problèmes de santé pour les habitants en sont la conséquence. Les urbanistes doivent trouver, ensemble avec toute la société, des solutions à ces problèmes complexes. En même temps, il faut assurer la performance économique de la ville et de sa région. Actuellement, une quantité grandissante de données socio-économiques et environnementales est récoltée. Pour mieux comprendre les processus et phénomènes du système complexe "ville", ces données doivent être traitées et analysées. Des nombreuses méthodes pour modéliser et simuler un tel système existent et sont continuellement en développement. Elles peuvent être exploitées par le géographe urbain pour améliorer sa connaissance du métabolisme urbain. Des techniques modernes et innovatrices de visualisation aident dans la communication des résultats de tels modèles et simulations. Cette thèse décrit plusieurs méthodes permettant d'analyser, de modéliser, de simuler et de visualiser des phénomènes urbains. L'analyse de données socio-économiques à très haute dimension à l'aide de réseaux de neurones artificiels, notamment des cartes auto-organisatrices, est montré à travers deux exemples aux échelles différentes. Le problème de modélisation spatio-temporelle et de représentation des données est discuté et quelques ébauches de solutions esquissées. La simulation de la dynamique urbaine, et plus spécifiquement du trafic automobile engendré par les pendulaires est illustrée à l'aide d'une simulation multi-agents. Une section sur les méthodes de visualisation montre des cartes en anamorphoses permettant de transformer l'espace géographique en espace fonctionnel. Un autre type de carte, les cartes circulaires, est présenté. Ce type de carte est particulièrement utile pour les agglomérations urbaines. Quelques questions liées à l'importance de l'échelle dans l'analyse urbaine sont également discutées. Une nouvelle approche pour définir des clusters urbains à des échelles différentes est développée, et le lien avec la théorie de la percolation est établi. Des statistiques fractales, notamment la lacunarité, sont utilisées pour caractériser ces clusters urbains. L'évolution de la population est modélisée à l'aide d'un modèle proche du modèle gravitaire bien connu. Le travail couvre une large panoplie de méthodes utiles en géographie urbaine. Toutefois, il est toujours nécessaire de développer plus loin ces méthodes et en même temps, elles doivent trouver leur chemin dans la vie quotidienne des urbanistes et planificateurs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An equation for mean first-passage times of non-Markovian processes driven by colored noise is derived through an appropriate backward integro-differential equation. The equation is solved in a Bourret-like approximation. In a weak-noise bistable situation, non-Markovian effects are taken into account by an effective diffusion coefficient. In this situation, our results compare satisfactorily with other approaches and experimental data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ObjectiveCandidate genes for non-alcoholic fatty liver disease (NAFLD) identified by a bioinformatics approach were examined for variant associations to quantitative traits of NAFLD-related phenotypes.Research Design and MethodsBy integrating public database text mining, trans-organism protein-protein interaction transferal, and information on liver protein expression a protein-protein interaction network was constructed and from this a smaller isolated interactome was identified. Five genes from this interactome were selected for genetic analysis. Twenty-one tag single-nucleotide polymorphisms (SNPs) which captured all common variation in these genes were genotyped in 10,196 Danes, and analyzed for association with NAFLD-related quantitative traits, type 2 diabetes (T2D), central obesity, and WHO-defined metabolic syndrome (MetS).Results273 genes were included in the protein-protein interaction analysis and EHHADH, ECHS1, HADHA, HADHB, and ACADL were selected for further examination. A total of 10 nominal statistical significant associations (P<0.05) to quantitative metabolic traits were identified. Also, the case-control study showed associations between variation in the five genes and T2D, central obesity, and MetS, respectively. Bonferroni adjustments for multiple testing negated all associations.ConclusionsUsing a bioinformatics approach we identified five candidate genes for NAFLD. However, we failed to provide evidence of associations with major effects between SNPs in these five genes and NAFLD-related quantitative traits, T2D, central obesity, and MetS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure.Results: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae.Conclusion: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.