23 resultados para Knowledge discovery in databases
em Université de Lausanne, Switzerland
Resumo:
We present a new framework for large-scale data clustering. The main idea is to modify functional dimensionality reduction techniques to directly optimize over discrete labels using stochastic gradient descent. Compared to methods like spectral clustering our approach solves a single optimization problem, rather than an ad-hoc two-stage optimization approach, does not require a matrix inversion, can easily encode prior knowledge in the set of implementable functions, and does not have an ?out-of-sample? problem. Experimental results on both artificial and real-world datasets show the usefulness of our approach.
Resumo:
Peptide toxins synthesized by venomous animals have been extensively studied in the last decades. To be useful to the scientific community, this knowledge has been stored, annotated and made easy to retrieve by several databases. The aim of this article is to present what type of information users can access from each database. ArachnoServer and ConoServer focus on spider toxins and cone snail toxins, respectively. UniProtKB, a generalist protein knowledgebase, has an animal toxin-dedicated annotation program that includes toxins from all venomous animals. Finally, the ATDB metadatabase compiles data and annotations from other databases and provides toxin ontology.
Resumo:
Peptide toxins synthesized by venomous animals have been extensively studied in the last decades. To be useful to the scientific community, this knowledge has been stored, annotated and made easy to retrieve by several databases. The aim of this article is to present what type of information users can access from each database. ArachnoServer and ConoServer focus on spider toxins and cone snail toxins, respectively. UniProtKB, a generalist protein knowledgebase, has an animal toxin-dedicated annotation program that includes toxins from all venomous animals. Finally, the ATDB metadatabase compiles data and annotations from other databases and provides toxin ontology.
Resumo:
Anti-doping authorities have high expectations of the athlete steroidal passport (ASP) for anabolic-androgenic steroids misuse detection. However, it is still limited to the monitoring of known well-established compounds and might greatly benefit from the discovery of new relevant biomarkers candidates. In this context, steroidomics opens the way to the untargeted simultaneous evaluation of a high number of compounds. Analytical platforms associating the performance of ultra-high pressure liquid chromatography (UHPLC) and the high mass-resolving power of quadrupole time-of-flight (QTOF) mass spectrometers are particularly adapted for such purpose. An untargeted steroidomic approach was proposed to analyse urine samples from a clinical trial for the discovery of relevant biomarkers of testosterone undecanoate oral intake. Automatic peak detection was performed and a filter of reference steroid metabolites mass-to-charge ratio (m/z) values was applied to the raw data to ensure the selection of a subset of steroid-related features. Chemometric tools were applied for the filtering and the analysis of UHPLC-QTOF-MS(E) data. Time kinetics could be assessed with N-way projections to latent structures discriminant analysis (N-PLS-DA) and a detection window was confirmed. Orthogonal projections to latent structures discriminant analysis (O-PLS-DA) classification models were evaluated in a second step to assess the predictive power of both known metabolites and unknown compounds. A shared and unique structure plot (SUS-plot) analysis was performed to select the most promising unknown candidates and receiver operating characteristic (ROC) curves were computed to assess specificity criteria applied in routine doping control. This approach underlined the pertinence to monitor both glucuronide and sulphate steroid conjugates and include them in the athletes passport, while promising biomarkers were also highlighted.
Resumo:
L'introduction des technologies de séquençage de nouvelle génération est en vue de révolutionner la médecine moderne. L'impact de ces nouveaux outils a déjà contribué à la découverte de nouveaux gènes et de voies cellulaires impliqués dans la pathologie de maladies génétiques rares ou communes. En revanche, l'énorme quantité de données générées par ces systèmes ainsi que la complexité des analyses bioinformatiques nécessaires, engendre un goulet d'étranglement pour résoudre les cas les plus difficiles. L'objectif de cette thèse a été d'identifier les causes génétiques de deux maladies héréditaires utilisant ces nouvelles techniques de séquençage, couplées à des technologies d'enrichissement de gènes. Dans ce cadre, nous avons développé notre propre méthode de travail (pipeline) pour l'alignement des fragments de séquence (reads). Suite à l'identification de gènes, nous avons réalisé une analyse fonctionnelle pour élucider leur rôle dans la maladie. Dans un premier temps, nous avons étudié et identifié des mutations impliquées dans une forme récessive de la rétinite pigmentaire qui est à ce jour la dégénérescence rétinienne héréditaire la plus fréquente. En particulier, nous avons constaté que des mutations faux-sens dans le gène FAM161A étaient la cause de la rétinite pigmentaire préalablement associé avec le locus RP28. De plus, nous avons démontré que ce gène avait des fonctions au niveau du cil du photorécepteur, complétant le large spectre des cilliopathies rétiniennes héréditaires. Dans un second temps, nous avons exploré la possibilité qu'un syndrome, relativement fréquent en pédiatrie de fièvre récurrente, appelé PFAPA (acronyme de fièvre périodique avec adénite stomatite, pharyngite et cervical aphteuse) puisse avoir une origine génétique. L'étiologie de cette maladie n'étant pas claire, nous avons tenté d'identifier le spectre génétique de patients PFAPA. Comme nous n'avons pas pu mettre à jour un nouveau gène unique muté et responsable de la maladie chez tous les individus dépistés, il semblerait qu'un modèle génétique plus complexe suggérant l'implication de plusieurs gènes dans la pathologie ait été identifié chez les patients touchés. Ces gènes seraient notamment impliqués dans des processus liés à l'inflammation ce qui élargirait l'impact de ces études à d'autres maladies auto-inflammatoires.
Resumo:
The number of agents that are potentially effective in the adjuvant treatment of locally advanced resectable colon cancer is increasing. Consequently, it is important to ascertain which subgroups of patients will benefit from a specific treatment. Despite more than two decades of research into the molecular genetics of colon cancer, there is a lack of prognostic and predictive molecular biomarkers with proven utility in this setting. A secondary objective of the Pan European Trials in Adjuvant Colon Cancer-3 trial, which compared irinotecan in combination with 5-fluorouracil and leucovorin in the postoperative treatment of stage III and stage II colon cancer patients, was to undertake a translational research study to assess a panel of putative prognostic and predictive markers in a large colon cancer patient cohort. The Cancer and Leukemia Group B 89803 trial, in a similar design, also investigated the use of prognostic and predictive biomarkers in this setting. In this article, the authors, who are coinvestigators from these trials and performed similar investigations of biomarker discovery in the adjuvant treatment of colon cancer, review the current status of biomarker research in this field, drawing on their experiences and considering future strategies for biomarker discovery in the postgenomic era.
Resumo:
In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.
Resumo:
Among the various determinants of treatment response, the achievement of sufficient blood levels is essential for curing malaria. For helping us at improving our current understanding of antimalarial drugs pharmacokinetics, efficacy and toxicity, we have developed a liquid chromatography-tandem mass spectrometry method (LC-MS/MS) requiring 200mul of plasma for the simultaneous determination of 14 antimalarial drugs and their metabolites which are the components of the current first-line combination treatments for malaria (artemether, artesunate, dihydroartemisinin, amodiaquine, N-desethyl-amodiaquine, lumefantrine, desbutyl-lumefantrine, piperaquine, pyronaridine, mefloquine, chloroquine, quinine, pyrimethamine and sulfadoxine). Plasma is purified by a combination of protein precipitation, evaporation and reconstitution in methanol/ammonium formate 20mM (pH 4.0) 1:1. Reverse-phase chromatographic separation of antimalarial drugs is obtained using a gradient elution of 20mM ammonium formate and acetonitrile both containing 0.5% formic acid, followed by rinsing and re-equilibration to the initial solvent composition up to 21min. Analyte quantification, using matrix-matched calibration samples, is performed by electro-spray ionization-triple quadrupole mass spectrometry by selected reaction monitoring detection in the positive mode. The method was validated according to FDA recommendations, including assessment of extraction yield, matrix effect variability, overall process efficiency, standard addition experiments as well as antimalarials short- and long-term stability in plasma. The reactivity of endoperoxide-containing antimalarials in the presence of hemolysis was tested both in vitro and on malaria patients samples. With this method, signal intensity of artemisinin decreased by about 20% in the presence of 0.2% hemolysed red-blood cells in plasma, whereas its derivatives were essentially not affected. The method is precise (inter-day CV%: 3.1-12.6%) and sensitive (lower limits of quantification 0.15-3.0 and 0.75-5ng/ml for basic/neutral antimalarials and artemisinin derivatives, respectively). This is the first broad-range LC-MS/MS assay covering the currently in-use antimalarials. It is an improvement over previous methods in terms of convenience (a single extraction procedure for 14 major antimalarials and metabolites reducing significantly the analytical time), sensitivity, selectivity and throughput. While its main limitation is investment costs for the equipment, plasma samples can be collected in the field and kept at 4 degrees C for up to 48h before storage at -80 degrees C. It is suited to detecting the presence of drug in subjects for screening purposes and quantifying drug exposure after treatment. It may contribute to filling the current knowledge gaps in the pharmacokinetics/pharmacodynamics relationships of antimalarials and better define the therapeutic dose ranges in different patient populations.
Resumo:
The discovery in mammalian cells of hundreds of small RNA molecules, called microRNAs, with the potential to modulate the expression of the majority of the protein-coding genes has revolutionized many areas of biomedical research, including the diabetes field. MicroRNAs function as translational repressors and are emerging as key regulators of most, if not all, physiological processes. Moreover, alterations in the level or function of microRNAs are associated with an increasing number of diseases. Here, we describe the mechanisms governing the biogenesis and activities of microRNAs. We present evidence for the involvement of microRNAs in diabetes mellitus, by outlining the contribution of these small RNA molecules in the control of pancreatic beta-cell functions and by reviewing recent studies reporting changes in microRNA expression in tissues isolated from diabetes animal models. MicroRNAs hold great potential as therapeutic targets. We describe the strategies developed for the delivery of molecules mimicking or blocking the function of these tiny regulators of gene expression in living animals. In addition, because changes in serum microRNA profiles have been shown to occur in association with different human diseases, we also discuss the potential use of microRNAs as blood biomarkers for prevention and management of diabetes.
Resumo:
Oxalate catabolism, which can have both medical and environmental implications, is performed by phylogenetically diverse bacteria. The formyl-CoA-transferase gene was chosen as a molecular marker of the oxalotrophic function. Degenerated primers were deduced from an alignment of frc gene sequences available in databases. The specificity of primers was tested on a variety of frc-containing and frc-lacking bacteria. The frc-primers were then used to develop PCR-DGGE and real-time SybrGreen PCR assays in soils containing various amounts of oxalate. Some PCR products from pure cultures and from soil samples were cloned and sequenced. Data were used to generate a phylogenetic tree showing that environmental PCR products belonged to the target physiological group. The extent of diversity visualised on DGGE pattern was higher for soil samples containing carbonate resulting from oxalate catabolism. Moreover, the amount of frc gene copies in the investigated soils was detected in the range of 1.64x10(7) to 1.75x10(8)/g of dry soil under oxalogenic tree (representing 0.5 to 1.2% of total 16S rRNA gene copies), whereas the number of frc gene copies in the reference soil was 6.4x10(6) (or 0.2% of 16S rRNA gene copies). This indicates that oxalotrophic bacteria are numerous and widespread in soils and that a relationship exists between the presence of the oxalogenic trees Milicia excelsa and Afzelia africana and the relative abundance of oxalotrophic guilds in the total bacterial communities. This is obviously related to the accomplishment of the oxalate-carbonate pathway, which explains the alkalinization and calcium carbonate accumulation occurring below these trees in an otherwise acidic soil. The molecular tools developed in this study will allow in-depth understanding of the functional implication of these bacteria on carbonate accumulation as a way of atmospheric CO(2) sequestration.