871 resultados para CLASS DISCOVERY
Resumo:
Breast cancer is the most common diagnosed cancer and the leading cause of cancer death among females worldwide. It is considered a highly heterogeneous disease and it must be classified into more homogeneous groups. Hence, the purpose of this study was to classify breast tumors based on variations in gene expression patterns derived from RNA sequencing by using different class discovery methods. 42 breast tumors paired-samples were sequenced by Illumine Genome Analyzer and the data was analyzed and prepared by TopHat2 and htseq-count. As reported previously, breast cancer could be grouped into five main groups known as basal epithelial-like group, HER2 group, normal breast-like group and two Luminal groups with a distinctive expression profile. Classifying breast tumor samples by using PAM50 method, the most common subtype was Luminal B and was significantly associated with ESR1 and ERBB2 high expression. Luminal A subtype had ESR1 and SLC39A6 significant high expression, whereas HER2 subtype had a high expression of ERBB2 and CNNE1 genes and low luminal epithelial gene expression. Basal-like and normal-like subtypes were associated with low expression of ESR1, PgR and HER2, and had significant high expression of cytokeratins 5 and 17. Our results were similar compared with TGCA breast cancer data results and with known studies related with breast cancer classification. Classifying breast tumors could add significant prognostic and predictive information to standard parameters, and moreover, identify marker genes for each subtype to find a better therapy for patients with breast cancer.
Resumo:
Array technologies have made it possible to record simultaneously the expression pattern of thousands of genes. A fundamental problem in the analysis of gene expression data is the identification of highly relevant genes that either discriminate between phenotypic labels or are important with respect to the cellular process studied in the experiment: for example cell cycle or heat shock in yeast experiments, chemical or genetic perturbations of mammalian cell lines, and genes involved in class discovery for human tumors. In this paper we focus on the task of unsupervised gene selection. The problem of selecting a small subset of genes is particularly challenging as the datasets involved are typically characterized by a very small sample size ?? the order of few tens of tissue samples ??d by a very large feature space as the number of genes tend to be in the high thousands. We propose a model independent approach which scores candidate gene selections using spectral properties of the candidate affinity matrix. The algorithm is very straightforward to implement yet contains a number of remarkable properties which guarantee consistent sparse selections. To illustrate the value of our approach we applied our algorithm on five different datasets. The first consists of time course data from four well studied Hematopoietic cell lines (HL-60, Jurkat, NB4, and U937). The other four datasets include three well studied treatment outcomes (large cell lymphoma, childhood medulloblastomas, breast tumors) and one unpublished dataset (lymph status). We compared our approach both with other unsupervised methods (SOM,PCA,GS) and with supervised methods (SNR,RMB,RFE). The results clearly show that our approach considerably outperforms all the other unsupervised approaches in our study, is competitive with supervised methods and in some case even outperforms supervised approaches.
Resumo:
Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The enzyme dihydroorotate dehydrogenase (DHODH) has been suggested as a promising target for the design of trypanocidal agents. We report here the discovery of novel inhibitors of Trypanosoma cruzi DHODH identified by a combination of virtual screening and ITC methods. Monitoring of the enzymatic reaction in the presence of selected ligands together with structural information obtained from X-ray crystallography analysis have allowed the identification and validation of a novel site of interaction (S2 site). This has provided important structural insights for the rational design of T cruzi and Leishmania major DHODH inhibitors. The most potent compound (1) in the investigated series inhibits TcDHODH enzyme with K(i)(app) value of 19.28 mu M and possesses a ligand efficiency of 0.54 kcal mol(-1) per non-H atom. The compounds described in this work are promising hits for further development. (C) 2010 Elsevier Masson SAS. All rights reserved.
Resumo:
We extend the standard price discovery analysis to estimate the information share of dual-class shares across domestic and foreign markets. By examining both common and preferred shares, we aim to extract information not only about the fundamental value of the rm, but also about the dual-class premium. In particular, our interest lies on the price discovery mechanism regulating the prices of common and preferred shares in the BM&FBovespa as well as the prices of their ADR counterparts in the NYSE and in the Arca platform. However, in the presence of contemporaneous correlation between the innovations, the standard information share measure depends heavily on the ordering we attribute to prices in the system. To remain agnostic about which are the leading share class and market, one could for instance compute some weighted average information share across all possible orderings. This is extremely inconvenient given that we are dealing with 2 share prices in Brazil, 4 share prices in the US, plus the exchange rate (and hence over 5,000 permutations!). We thus develop a novel methodology to carry out price discovery analyses that does not impose any ex-ante assumption about which share class or trading platform conveys more information about shocks in the fundamental price. As such, our procedure yields a single measure of information share, which is invariant to the ordering of the variables in the system. Simulations of a simple market microstructure model show that our information share estimator works pretty well in practice. We then employ transactions data to study price discovery in two dual-class Brazilian stocks and their ADRs. We uncover two interesting ndings. First, the foreign market is at least as informative as the home market. Second, shocks in the dual-class premium entail a permanent e ect in normal times, but transitory in periods of nancial distress. We argue that the latter is consistent with the expropriation of preferred shareholders as a class.
Resumo:
We have evaluated two synthetic epothilone analogues lacking the 12,13-epoxide functionality, 12,13-desoxyepothilone B (dEpoB), and 12,13-desoxyepothilone F (dEpoF). The concentrations required for 50% growth inhibition (IC50) for a variety of anticancer agents were measured in CCRF-CEM/VBL1000 cells (2,048-fold resistance to vinblastine). By using dEpoB, dEpoF, aza-EpoB, and paclitaxel, the IC50 values were 0.029, 0.092, 2.99, and 5.17 μM, respectively. These values represent 4-, 33.5-, 1,423- and 3,133-fold resistance, respectively, when compared with the corresponding IC50 in the parent [nonmultiple drug-resistant (MDR)] CCRF-CEM cells. We then produced MDR human lung carcinoma A549 cells by continuous exposure of the tumor cells to sublethal concentrations of dEpoB (1.8 yr), vinblastine (1.2 yr), and paclitaxel (1.8 yr). This continued exposure led to the development of 2.1-, 4,848-, and 2,553-fold resistance to each drug, respectively. The therapeutic effect of dEpoB and paclitaxel was also compared in vivo in a mouse model by using various tumor xenografts. dEpoB is much more effective in reducing tumor sizes in all MDR tumors tested. Analysis of dEpoF, an analog possessing greater aqueous solubility than dEpoB, showed curative effects similar to dEpoB against K562, CCRF-CEM, and MX-1 xenografts. These results indicate that dEpoB and dEpoF are efficacious antitumor agents with both a broad chemotherapeutic spectrum and wide safety margins.
Resumo:
We present Herschel PACS 100 and 160 μm observations of the solar-type stars α Men, HD 88230 and HD 210277, which form part of the FGK stars sample of the Herschel open time key programme (OTKP) DUNES (DUst around NEarby Stars). Our observations show small infrared excesses at 160 μm for all three stars. HD 210277 also shows a small excess at 100 μm, while the 100 μm fluxes of α Men and HD 88230 agree with the stellar photospheric predictions. We attribute these infrared excesses to a new class of cold, faint debris discs. Both α Men and HD 88230 are spatially resolved in the PACS 160 μm images, while HD 210277 is point-like at that wavelength. The projected linear sizes of the extended emission lie in the range from ~115 to ≤ 250 AU. The estimated black body temperatures from the 100 and 160 μm fluxes are ≲22 K, and the fractional luminosity of the cold dust is L_dust/L_⋆ ~ 10^-6, close to the luminosity of the solar-system’s Kuiper belt. These debris discs are the coldest and faintest discs discovered so far around mature stars, so they cannot be explained easily invoking “classical” debris disc models.
Resumo:
Peptides that induce and recall T-cell responses are called T-cell epitopes. T-cell epitopes may be useful in a subunit vaccine against malaria. Computer models that simulate peptide binding to MHC are useful for selecting candidate T-cell epitopes since they minimize the number of experiments required for their identification. We applied a combination of computational and immunological strategies to select candidate T-cell epitopes. A total of 86 experimental binding assays were performed in three rounds of identification of HLA-All binding peptides from the six preerythrocytic malaria antigens. Thirty-six peptides were experimentally confirmed as binders. We show that the cyclical refinement of the ANN models results in a significant improvement of the efficiency of identifying potential T-cell epitopes. (C) 2001 by Elsevier Science Inc.
Resumo:
Herpesviruses, such as murine and human cytomegalovirus (MCMV and HCMV), can establish a persistent infection within the host and have diverse mechanisms as protection from host immune defences'. Several herpesvirus genes that are homologous to host immune modulators have been identified, and are implicated in viral evasion of the host immune response(2,3). The discovery of a viral major histocompatibility complex (MHC) class I homologue, encoded by HCMV(4), led to speculation that it might function as an immune modulator and disrupt presentation of peptides by MHC class I to cytotoxic T cells(5). However, there is no evidence concerning the biological significance of this gene during viral infection. Recent analysis of the MCMV genome has also demonstrated the presence of a MHC class I homologue(6). Here we show that a recombinant MCMV,in which. the gene encoding the class I homologue has been disrupted, has severely restricted replication during the acute stage of infection compared with wild-type MCMV, We demonstrate by in vivo depletion studies that natural killer (NK) cells are responsible for the attenuated phenotype of the mutant. Thus the viral MHC dass I homologue contributes to immune evasion through interference with NK cell-mediated clearance.
Resumo:
Catalase is an important virulence factor for survival in macrophages and other phagocytic cells. In Chlamydiaceae, no catalase had been described so far. With the sequencing and annotation of the full genomes of Chlamydia-related bacteria, the presence of different catalase-encoding genes has been documented. However, their distribution in the Chlamydiales order and the functionality of these catalases remain unknown. Phylogeny of chlamydial catalases was inferred using MrBayes, maximum likelihood, and maximum parsimony algorithms, allowing the description of three clade 3 and two clade 2 catalases. Only monofunctional catalases were found (no catalase-peroxidase or Mn-catalase). All presented a conserved catalytic domain and tertiary structure. Enzymatic activity of cloned chlamydial catalases was assessed by measuring hydrogen peroxide degradation. The catalases are enzymatically active with different efficiencies. The catalase of Parachlamydia acanthamoebae is the least efficient of all (its catalytic activity was 2 logs lower than that of Pseudomonas aeruginosa). Based on the phylogenetic analysis, we hypothesize that an ancestral class 2 catalase probably was present in the common ancestor of all current Chlamydiales but was retained only in Criblamydia sequanensis and Neochlamydia hartmannellae. The catalases of class 3, present in Estrella lausannensis and Parachlamydia acanthamoebae, probably were acquired by lateral gene transfer from Rhizobiales, whereas for Waddlia chondrophila they likely originated from Legionellales or Actinomycetales. The acquisition of catalases on several occasions in the Chlamydiales suggests the importance of this enzyme for the bacteria in their host environment.
Resumo:
Cancer omics data are exponentially created and associated with clinical variables, and important findings can be extracted based on bioinformatics approaches which can then be experimentally validated. Many of these findings are related to a specific class of non-coding RNA molecules called microRNAs (miRNAs) (post-transcriptional regulators of mRNA expression). The related research field is quite heterogeneous and bioinformaticians, clinicians, statisticians and biologists, as well as data miners and engineers collaborate to cure stored data and on new impulses coming from the output of the latest Next Generation Sequencing technologies. Here we review the main research findings on miRNA of the first 10 years in colon cancer research with an emphasis on possible uses in clinical practice. This review intends to provide a road map in the jungle of publications of miRNA in colorectal cancer, focusing on data availability and new ways to generate biologically relevant information out of these huge amounts of data.
Resumo:
Molecular shape has long been known to be an important property for the process of molecular recognition. Previous studies postulated the existence of a drug-like shape space that could be used to artificially bias the composition of screening libraries, with the aim to increase the chance of success in Hit Identification. In this work, it was analysed to which extend this assumption holds true. Normalized Principal Moments of Inertia Ratios (NPRs) have been used to describe the molecular shape of small molecules. It was investigated, whether active molecules of diverse targets are located in preferred subspaces of the NPR shape space. Results illustrated a significantly stronger clustering than could be expected by chance, with parts of the space unlikely to be occupied by active compounds. Furthermore, a strong enrichment of elongated, rather flat shapes could be observed, while globular compounds were highly underrepresented. This was confirmed for a wide range of small molecule datasets from different origins. Active compounds exhibited a high overlap in their shape distributions across different targets, making a purely shape based discrimination very difficult. An additional perspective was provided by comparing the shapes of protein binding pockets with those of their respective ligands. Although more globular than their ligands, it was observed that binding sites shapes exhibited a similarly skewed distribution in shape space: spherical shapes were highly underrepresented. This was different for unoccupied binding pockets of smaller size. These were on the contrary identified to possess a more globular shape. The relation between shape complementarity and exhibited bioactivity was analysed; a moderate correlation between bioactivity and parameters including pocket coverage, distance in shape space, and others could be identified, which reflects the importance of shape complementarity. However, this also suggests that other aspects are of relevance for molecular recognition. A subsequent analysis assessed if and how shape and volume information retrieved from pocket or respective reference ligands could be used as a pre-filter in a virtual screening approach. ln Lead Optimization compounds need to get optimized with respect to a variety of pararneters. Here, the availability of past success stories is very valuable, as they can guide medicinal chemists during their analogue synthesis plans. However, although of tremendous interest for the public domain, so far only large corporations had the ability to mine historical knowledge in their proprietary databases. With the aim to provide such information, the SwissBioisostere database was developed and released during this thesis. This database contains information on 21,293,355 performed substructural exchanges, corresponding to 5,586,462 unique replacements that have been measured in 35,039 assays against 1,948 molecular targets representing 30 target classes, and on their impact on bioactivity . A user-friendly interface was developed that provides facile access to these data and is accessible at http//www.swissbioisostere.ch. The ChEMBL database was used as primary data source of bioactivity information. Matched molecular pairs have been identified in the extracted and cleaned data. Success-based scores were developed and integrated into the database to allow re-ranking of proposed replacements by their past outcomes. It was analysed to which degree these scores correlate with chemical similarity of the underlying fragments. An unexpectedly weak relationship was detected and further investigated. Use cases of this database were envisioned, and functionalities implemented accordingly: replacement outcomes are aggregatable at the assay level, and it was shawn that an aggregation at the target or target class level could also be performed, but should be accompanied by a careful case-by-case assessment. It was furthermore observed that replacement success depends on the activity of the starting compound A within a matched molecular pair A-B. With increasing potency the probability to lose bioactivity through any substructural exchange was significantly higher than in low affine binders. A potential existence of a publication bias could be refuted. Furthermore, often performed medicinal chemistry strategies for structure-activity-relationship exploration were analysed using the acquired data. Finally, data originating from pharmaceutical companies were compared with those reported in the literature. It could be seen that industrial medicinal chemistry can access replacement information not available in the public domain. In contrast, a large amount of often-performed replacements within companies could also be identified in literature data. Preferences for particular replacements differed between these two sources. The value of combining different endpoints in an evaluation of molecular replacements was investigated. The performed studies highlighted furthermore that there seem to exist no universal substructural replacement that always retains bioactivity irrespective of the biological environment. A generalization of bioisosteric replacements seems therefore not possible. - La forme tridimensionnelle des molécules a depuis longtemps été reconnue comme une propriété importante pour le processus de reconnaissance moléculaire. Des études antérieures ont postulé que les médicaments occupent préférentiellement un sous-ensemble de l'espace des formes des molécules. Ce sous-ensemble pourrait être utilisé pour biaiser la composition de chimiothèques à cribler, dans le but d'augmenter les chances d'identifier des Hits. L'analyse et la validation de cette assertion fait l'objet de cette première partie. Les Ratios de Moments Principaux d'Inertie Normalisés (RPN) ont été utilisés pour décrire la forme tridimensionnelle de petites molécules de type médicament. Il a été étudié si les molécules actives sur des cibles différentes se co-localisaient dans des sous-espaces privilégiés de l'espace des formes. Les résultats montrent des regroupements de molécules incompatibles avec une répartition aléatoire, avec certaines parties de l'espace peu susceptibles d'être occupées par des composés actifs. Par ailleurs, un fort enrichissement en formes allongées et plutôt plates a pu être observé, tandis que les composés globulaires étaient fortement sous-représentés. Cela a été confirmé pour un large ensemble de compilations de molécules d'origines différentes. Les distributions de forme des molécules actives sur des cibles différentes se recoupent largement, rendant une discrimination fondée uniquement sur la forme très difficile. Une perspective supplémentaire a été ajoutée par la comparaison des formes des ligands avec celles de leurs sites de liaison (poches) dans leurs protéines respectives. Bien que plus globulaires que leurs ligands, il a été observé que les formes des poches présentent une distribution dans l'espace des formes avec le même type d'asymétrie que celle observée pour les ligands: les formes sphériques sont fortement sous représentées. Un résultat différent a été obtenu pour les poches de plus petite taille et cristallisées sans ligand: elles possédaient une forme plus globulaire. La relation entre complémentarité de forme et bioactivité a été également analysée; une corrélation modérée entre bioactivité et des paramètres tels que remplissage de poche, distance dans l'espace des formes, ainsi que d'autres, a pu être identifiée. Ceci reflète l'importance de la complémentarité des formes, mais aussi l'implication d'autres facteurs. Une analyse ultérieure a évalué si et comment la forme et le volume d'une poche ou de ses ligands de référence pouvaient être utilisés comme un pré-filtre dans une approche de criblage virtuel. Durant l'optimisation d'un Lead, de nombreux paramètres doivent être optimisés simultanément. Dans ce contexte, la disponibilité d'exemples d'optimisations réussies est précieuse, car ils peuvent orienter les chimistes médicinaux dans leurs plans de synthèse par analogie. Cependant, bien que d'un extrême intérêt pour les chercheurs dans le domaine public, seules les grandes sociétés pharmaceutiques avaient jusqu'à présent la capacité d'exploiter de telles connaissances au sein de leurs bases de données internes. Dans le but de remédier à cette limitation, la base de données SwissBioisostere a été élaborée et publiée dans le domaine public au cours de cette thèse. Cette base de données contient des informations sur 21 293 355 échanges sous-structuraux observés, correspondant à 5 586 462 remplacements uniques mesurés dans 35 039 tests contre 1948 cibles représentant 30 familles, ainsi que sur leur impact sur la bioactivité. Une interface a été développée pour permettre un accès facile à ces données, accessible à http:/ /www.swissbioisostere.ch. La base de données ChEMBL a été utilisée comme source de données de bioactivité. Une version modifiée de l'algorithme de Hussain et Rea a été implémentée pour identifier les Matched Molecular Pairs (MMP) dans les données préparées au préalable. Des scores de succès ont été développés et intégrés dans la base de données pour permettre un reclassement des remplacements proposés selon leurs résultats précédemment observés. La corrélation entre ces scores et la similarité chimique des fragments correspondants a été étudiée. Une corrélation plus faible qu'attendue a été détectée et analysée. Différents cas d'utilisation de cette base de données ont été envisagés, et les fonctionnalités correspondantes implémentées: l'agrégation des résultats de remplacement est effectuée au niveau de chaque test, et il a été montré qu'elle pourrait également être effectuée au niveau de la cible ou de la classe de cible, sous réserve d'une analyse au cas par cas. Il a en outre été constaté que le succès d'un remplacement dépend de l'activité du composé A au sein d'une paire A-B. Il a été montré que la probabilité de perdre la bioactivité à la suite d'un remplacement moléculaire quelconque est plus importante au sein des molécules les plus actives que chez les molécules de plus faible activité. L'existence potentielle d'un biais lié au processus de publication par articles a pu être réfutée. En outre, les stratégies fréquentes de chimie médicinale pour l'exploration des relations structure-activité ont été analysées à l'aide des données acquises. Enfin, les données provenant des compagnies pharmaceutiques ont été comparées à celles reportées dans la littérature. Il a pu être constaté que les chimistes médicinaux dans l'industrie peuvent accéder à des remplacements qui ne sont pas disponibles dans le domaine public. Par contre, un grand nombre de remplacements fréquemment observés dans les données de l'industrie ont également pu être identifiés dans les données de la littérature. Les préférences pour certains remplacements particuliers diffèrent entre ces deux sources. L'intérêt d'évaluer les remplacements moléculaires simultanément selon plusieurs paramètres (bioactivité et stabilité métabolique par ex.) a aussi été étudié. Les études réalisées ont souligné qu'il semble n'exister aucun remplacement sous-structural universel qui conserve toujours la bioactivité quel que soit le contexte biologique. Une généralisation des remplacements bioisostériques ne semble donc pas possible.
Resumo:
MHC class II (MHCII) genes are transactivated by the NOD-like receptor (NLR) family member CIITA, which is recruited to SXY enhancers of MHCII promoters via a DNA-binding "enhanceosome" complex. NLRC5, another NLR protein, was recently found to control transcription of MHC class I (MHCI) genes. However, detailed understanding of NLRC5's target gene specificity and mechanism of action remained lacking. We performed ChIP-sequencing experiments to gain comprehensive information on NLRC5-regulated genes. In addition to classical MHCI genes, we exclusively identified novel targets encoding non-classical MHCI molecules having important functions in immunity and tolerance. ChIP-sequencing performed with Rfx5(-/-) cells, which lack the pivotal enhanceosome factor RFX5, demonstrated its strict requirement for NLRC5 recruitment. Accordingly, Rfx5-knockout mice phenocopy Nlrc5 deficiency with respect to defective MHCI expression. Analysis of B cell lines lacking RFX5, RFXAP, or RFXANK further corroborated the importance of the enhanceosome for MHCI expression. Although recruited by common DNA-binding factors, CIITA and NLRC5 exhibit non-redundant functions, shown here using double-deficient Nlrc5(-/-)CIIta(-/-) mice. These paradoxical findings were resolved by using a "de novo" motif-discovery approach showing that the SXY consensus sequence occupied by NLRC5 in vivo diverges significantly from that occupied by CIITA. These sequence differences were sufficient to determine preferential occupation and transactivation by NLRC5 or CIITA, respectively, and the S box was found to be the essential feature conferring NLRC5 specificity. These results broaden our knowledge on the transcriptional activities of NLRC5 and CIITA, revealing their dependence on shared enhanceosome factors but their recruitment to distinct enhancer motifs in vivo. Furthermore, we demonstrated selectivity of NLRC5 for genes encoding MHCI or related proteins, rendering it an attractive target for therapeutic intervention. NLRC5 and CIITA thus emerge as paradigms for a novel class of transcriptional regulators dedicated for transactivating extremely few, phylogenetically related genes.
Resumo:
This report outlines the discovery, the design and development of new compounds, and, structure-activity relationships for this drug category. Updated approaches to planned syntheses of new worthy ACE-inhibitors are also exploited.
Resumo:
In recent years, there have been major developments in the understanding of the cell cycle. It is now known that normal cellular proliferation is tightly regulated by the activation and deactivation of a series of proteins that constitute the cell cycle machinery. The expression and activity of components of the cell cycle can be altered during the development of a variety of diseases where aberrant proliferation contributes to the pathology of the illness. Apart from yielding a new source of untapped therapeutic targets, it is likely that manipulating the activity of such proteins in diseased states will provide an important route for treating proliferative disorders, and the opportunity to develop a novel class of future medicines.