962 resultados para Clustering a large document collection
Resumo:
A recurring task in the analysis of mass genome annotation data from high-throughput technologies is the identification of peaks or clusters in a noisy signal profile. Examples of such applications are the definition of promoters on the basis of transcription start site profiles, the mapping of transcription factor binding sites based on ChIP-chip data and the identification of quantitative trait loci (QTL) from whole genome SNP profiles. Input to such an analysis is a set of genome coordinates associated with counts or intensities. The output consists of a discrete number of peaks with respective volumes, extensions and center positions. We have developed for this purpose a flexible one-dimensional clustering tool, called MADAP, which we make available as a web server and as standalone program. A set of parameters enables the user to customize the procedure to a specific problem. The web server, which returns results in textual and graphical form, is useful for small to medium-scale applications, as well as for evaluation and parameter tuning in view of large-scale applications, requiring a local installation. The program written in C++ can be freely downloaded from ftp://ftp.epd.unil.ch/pub/software/unix/madap. The MADAP web server can be accessed at http://www.isrec.isb-sib.ch/madap/.
Resumo:
Large animal models are an important resource for the understanding of human disease and for evaluating the applicability of new therapies to human patients. For many diseases, such as cone dystrophy, research effort is hampered by the lack of such models. Lentiviral transgenesis is a methodology broadly applicable to animals from many different species. When conjugated to the expression of a dominant mutant protein, this technology offers an attractive approach to generate new large animal models in a heterogeneous background. We adopted this strategy to mimic the phenotype diversity encounter in humans and generate a cohort of pigs for cone dystrophy by expressing a dominant mutant allele of the guanylate cyclase 2D (GUCY2D) gene. Sixty percent of the piglets were transgenic, with mutant GUCY2D mRNA detected in the retina of all animals tested. Functional impairment of vision was observed among the transgenic pigs at 3 months of age, with a follow-up at 1 year indicating a subsequent slower progression of phenotype. Abnormal retina morphology, notably among the cone photoreceptor cell population, was observed exclusively amongst the transgenic animals. Of particular note, these transgenic animals were characterized by a range in the severity of the phenotype, reflecting the human clinical situation. We demonstrate that a transgenic approach using lentiviral vectors offers a powerful tool for large animal model development. Not only is the efficiency of transgenesis higher than conventional transgenic methodology but this technique also produces a heterogeneous cohort of transgenic animals that mimics the genetic variation encountered in human patients.
Resumo:
Les investigations dans le milieu des accidents de la circulation sont très complexes. Elles nécessitent la mise en oeuvre d'un grand nombre de spécialités venant de domaines très différents. Si certains de ces domaines sont déjà bien exploités, d'autres demeurent encore incomplets et il arrive de nos jours d'observer des lacunes dans la pratique, auxquelles il est primordial de remédier.Ce travail de thèse, intitulé « l'exploitation des traces dans les accidents de la circulation », est issu d'une réflexion interdisciplinaire entre de multiples aspects des sciences forensiques. Il s'agit principalement d'une recherche ayant pour objectif de démontrer les avantages découlant d'une synergie entre les microtraces et l'étude de la dynamique d'un accident. Afin de donner une dimension très opérationnelle à ce travail, l'ensemble des démarches entreprises a été axé de manière à optimiser l'activité des premiers intervenants sur les lieux.Après une partie introductive et ayant trait au projet de recherche, traitant des aspects théoriques de la reconstruction d'une scène d'accident, le lecteur est invité à prendre connaissance de cinq chapitres pratiques, abordés selon la doctrine « du général au particulier ». La première étape de cette partie pratique concerne l'étude de la morphologie des traces. Des séquences d'examens sont proposées pour améliorer l'interprétation des contacts entre véhicules et obstacles impliqués dans un accident. Les mécanismes de transfert des traces de peinture sont ensuite étudiés et une série de tests en laboratoire est pratiquée sur des pièces de carrosseries automobiles. Différents paramètres sont ainsi testés afin de comprendre leur impact sur la fragilité d'un système de peinture. Par la suite, une liste de cas traités (crash-tests et cas réels), apportant des informations intéressantes sur le traitement d'une affaire et permettant de confirmer les résultats obtenus est effectuée. Il s'ensuit un recueil de traces, issu de l'expérience pratique acquise et ayant pour but d'aiguiller la recherche et le prélèvement sur les lieux. Finalement, la problématique d'une banque de données « accident », permettant une gestion optimale des traces récoltées est abordée.---The investigations of traffic accidents are very complex. They require the implementation of a large number of specialties coming from very different domains. If some of these domains are already well exploited, others remain still incomplete and it happens nowadays to observe gaps in the practice, which it is essential to remedy. This thesis, entitled "the exploitation of traces in traffic accidents", arises from a multidisciplinary reflection between the different aspects of forensic science. It is primarily a research aimed to demonstrate the benefits of synergy between microtrace evidence and accidents dynamics. To give a very operational dimension to this work, all the undertaken initiatives were centred so as to optimise the activity of the first participants on the crime scene.After an introductory part treating theoretical aspects of the reconstruction of an accident scene the reader is invited to get acquainted with five practical chapters, according to the doctrine "from general to particular". For the first stage of this practical part, the problem of the morphology of traces is approached and sequences of examinations are proposed to improve the interpretation of the contacts between vehicles and obstacles involved in an accident. Afterwards, the mechanisms of transfer of traces of paint are studied and a series of tests in laboratory is practised on pieces of automobile bodies. Various parameters are thus tested to understand their impact on the fragility of a system of paint. It follows that a list of treated cases (crash-tests and real cases) is created, allowing to bring interesting information on the treatment of a case and confirm the obtained results. Then, this work goes on with a collection of traces, stemming from the acquired experience that aims to steer the research and the taking of evidence on scenes. Finally, the practical part of this thesis ends with the problem of a database « accident », allowing an optimal management of the collected traces.
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
BACKGROUND: Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS: Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION: Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.
Resumo:
Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.
Resumo:
BACKGROUND: In myasthenia gravis, antibody-mediated blockade of acetylcholine receptors at the neuromuscular junction abolishes the naturally occurring 'safety factor' of synaptic transmission. Acetylcholinesterase inhibitors provide temporary symptomatic treatment of muscle weakness, but there is controversy about their long-term efficacy, dosage and side effects. OBJECTIVES: To evaluate the efficacy of acetylcholinesterase inhibitors in all forms of myasthenia gravis. SEARCH STRATEGY: We searched The Cochrane Neuromuscular Disease Group Specialized Register (5 October 2009), The Cochrane Central Register of Controlled Trials CENTRAL) (The Cochrane Library Issue 3, 2009), MEDLINE (January 1966 to September 2009), EMBASE (January 1980 to September 2009) for randomised controlled trials and quasi-randomised controlled trials regarding usage of acetylcholinesterase inhibitors in myasthenia gravis. Two authors scanned the articles for any study eligible for inclusion. We also contacted the authors and known experts in the field to identify additional published or unpublished data. SELECTION CRITERIA: Types of studies: all randomised or quasi-randomised trials.Types of participants: all myasthenia gravis patients diagnosed by an internationally accepted definition.Types of interventions: treatment with any form of acetylcholinesterase inhibitor.Types of outcome measuresPrimary outcome measureImprovement in the presenting symptoms within 1 to 14 days of the start of treatment.Secondary outcome measures(1) Improvement in the presenting symptoms more than 14 days after the start of treatment.(2) Change in impairment measured by a recognised and preferably validated scale, such as the quantitative myasthenia gravis score within 1 to 14 days and more than 14 days after the start of treatment.(3) Myasthenia Gravis Association of America post-intervention status more than 14 days after start of treatment.(4) Adverse events: muscarinic side effects. DATA COLLECTION AND ANALYSIS: One author (MMM) extracted the data, which were checked by a second author. We contacted study authors for extra information and collected data on adverse effects from the trials. MAIN RESULTS: We did not find any large randomised or quasi-randomised trials of acetylcholinesterase inhibitors in generalised myasthenia gravis. One cross-over randomised trial using intranasal neostigmine in a total of 10 subjects was only available as an abstract. AUTHORS' CONCLUSIONS: Except for one small and inconclusive trial of intranasal neostigmine, no randomised controlled trial has been conducted on the use of acetylcholinesterase inhibitors in myasthenia gravis. Response to acetylcholinesterase inhibitors in observational studies is so clear that a randomised controlled trial depriving participants in the placebo arm of treatment would be difficult to justify.
Resumo:
Thy-1 is a membrane glycoprotein suggested to stabilize or inhibit growth of neuronal processes. However, its precise function has remained obscure, because its endogenous ligand is unknown. We previously showed that Thy-1 binds directly to α(V)β(3) integrin in trans eliciting responses in astrocytes. Nonetheless, whether α(V)β(3) integrin might also serve as a Thy-1-ligand triggering a neuronal response has not been explored. Thus, utilizing primary neurons and a neuron-derived cell line CAD, Thy-1-mediated effects of α(V)β(3) integrin on growth and retraction of neuronal processes were tested. In astrocyte-neuron co-cultures, endogenous α(V)β(3) integrin restricted neurite outgrowth. Likewise, α(V)β(3)-Fc was sufficient to suppress neurite extension in Thy-1(+), but not in Thy-1(-) CAD cells. In differentiating primary neurons exposed to α(V)β(3)-Fc, fewer and shorter dendrites were detected. This effect was abolished by cleavage of Thy-1 from the neuronal surface using phosphoinositide-specific phospholipase C (PI-PLC). Moreover, α(V)β(3)-Fc also induced retraction of already extended Thy-1(+)-axon-like neurites in differentiated CAD cells as well as of axonal terminals in differentiated primary neurons. Axonal retraction occurred when redistribution and clustering of Thy-1 molecules in the plasma membrane was induced by α(V)β(3) integrin. Binding of α(V)β(3)-Fc was detected in Thy-1 clusters during axon retraction of primary neurons. Moreover, α(V)β(3)-Fc-induced Thy-1 clustering correlated in time and space with redistribution and inactivation of Src kinase. Thus, our data indicates that α(V)β(3) integrin is a ligand for Thy-1 that upon binding not only restricts the growth of neurites, but also induces retraction of already existing processes by inducing Thy-1 clustering. We propose that these events participate in bi-directional astrocyte-neuron communication relevant to axonal repair after neuronal damage.
Resumo:
Human cooperation is typically coordinated by institutions, which determine the outcome structure of the social interactions individuals engage in. Explaining the Neolithic transition from small- to large-scale societies involves understanding how these institutions co-evolve with demography. We study this using a demographically explicit model of institution formation in a patch-structured population. Each patch supports both social and asocial niches. Social individuals create an institution, at a cost to themselves, by negotiating how much of the costly public good provided by cooperators is invested into sanctioning defectors. The remainder of their public good is invested in technology that increases carrying capacity, such as irrigation systems. We show that social individuals can invade a population of asocials, and form institutions that support high levels of cooperation. We then demonstrate conditions where the co-evolution of cooperation, institutions, and demographic carrying capacity creates a transition from small- to large-scale social groups.
Resumo:
The success of combination antiretroviral therapy is limited by the evolutionary escape dynamics of HIV-1. We used Isotonic Conjunctive Bayesian Networks (I-CBNs), a class of probabilistic graphical models, to describe this process. We employed partial order constraints among viral resistance mutations, which give rise to a limited set of mutational pathways, and we modeled phenotypic drug resistance as monotonically increasing along any escape pathway. Using this model, the individualized genetic barrier (IGB) to each drug is derived as the probability of the virus not acquiring additional mutations that confer resistance. Drug-specific IGBs were combined to obtain the IGB to an entire regimen, which quantifies the virus' genetic potential for developing drug resistance under combination therapy. The IGB was tested as a predictor of therapeutic outcome using between 2,185 and 2,631 treatment change episodes of subtype B infected patients from the Swiss HIV Cohort Study Database, a large observational cohort. Using logistic regression, significant univariate predictors included most of the 18 drugs and single-drug IGBs, the IGB to the entire regimen, the expert rules-based genotypic susceptibility score (GSS), several individual mutations, and the peak viral load before treatment change. In the multivariate analysis, the only genotype-derived variables that remained significantly associated with virological success were GSS and, with 10-fold stronger association, IGB to regimen. When predicting suppression of viral load below 400 cps/ml, IGB outperformed GSS and also improved GSS-containing predictors significantly, but the difference was not significant for suppression below 50 cps/ml. Thus, the IGB to regimen is a novel data-derived predictor of treatment outcome that has potential to improve the interpretation of genotypic drug resistance tests.
Resumo:
In order to distinguish dysfunctional gait; clinicians require a measure of reference gait parameters for each population. This study provided normative values for widely used parameters in more than 1400 able-bodied adults over the age of 65. We also measured the foot clearance parameters (i.e., height of the foot above ground during swing phase) that are crucial to understand the complex relationship between gait and falls as well as obstacle negotiation strategies. We used a shoe-worn inertial sensor on each foot and previously validated algorithms to extract the gait parameters during 20 m walking trials in a corridor at a self-selected pace. We investigated the difference of the gait parameters between male and female participants by considering the effect of age and height factors. Besides; we examined the inter-relation of the clearance parameters with the gait speed. The sample size and breadth of gait parameters provided in this study offer a unique reference resource for the researchers.
Resumo:
This document presents the results of a state-of-practice survey of transportation agencies that are installing intelligent transportation sensors (ITS) and other devices along with their environmental sensing stations (ESS) also referred to as roadway weather information system (RWIS) assets.
Resumo:
La syncope est un symptôme clinique fréquent mais son origine demeure indéterminée jusque dans 60% des cas de patients admis dans un centre d'urgences. Le développement de consultations spécialisées de la syncope a considérablement modifié l'évaluation des patients avec une syncope inexpliquée en les orientant vers des stratégies d'investigations non-invasives, tels que le tilt-test, le massage du sinus carotidien et le test ^hyperventilation. Cependant, il existe peu de données dans 10 la littérature concernant dans la performance diagnostique réelle de ces tests fonctionnels.Notre travail de recherche porte sur l'analyse des données des 939 premiers patients adressés à la consultation ambulatoire de la syncope du CHUV pour l'investigation d'une syncope d'origine indéterminée. L'objectif de notre travail de thèse est 1) d'évaluer la performance diagnostique de l'algorithme de prise en charge standardisé et de ses différents tests pratiqués dans le cadre de notre 15 consultation et 2) de déterminer les caractéristiques cliniques communes des patients avec un diagnostic final de syncope d'origine rythmique ou vaso-vagale.Notre travail de thèse démontre qu'un algorithme de prise en charge standardisé basé sur des tests non-invasifs permet de déterminer 2/3 des causes de syncope initialement d'origine indéterminée. Par ailleurs, notre travail montre que des étiologies bénignes, telles que la syncope d'origine vaso- 20 vagale ou psychogène, représentent la moitié des causes syncopales alors que les arythmies cardiaques demeurent peu fréquentes. Finalement, notre travail démontre que l'absence de symptomatologie prodromique, en particulier chez les patients âgés avec une limitation fonctionnelle ou un allongement de la durée de l'onde Ρ à l'électrocardiogramme, suggère une syncope d'origine rythmique. Ce travail de thèse contribuera à optimaliser notre algorithme de prise 25 en charge standardisée de la syncope d'origine indéterminée et ouvre de nouvelles perspectives de recherche dans le développement de modèles basés sur des facteurs cliniques permettant de prédire les principales causes syncopales.