957 resultados para degenerate test set
Resumo:
This work evaluates the efficiency of economic levels of theory for the prediction of (3)J(HH) spin-spin coupling constants, to be used when robust electronic structure methods are prohibitive. To that purpose, DFT methods like mPW1PW91. B3LYP and PBEPBE were used to obtain coupling constants for a test set whose coupling constants are well known. Satisfactory results were obtained in most of cases, with the mPW1PW91/6-31G(d,p)//B3LYP/6-31G(d,p) leading the set. In a second step. B3LYP was replaced by the semiempirical methods PM6 and RM1 in the geometry optimizations. Coupling constants calculated with these latter structures were at least as good as the ones obtained by pure DFT methods. This is a promising result, because some of the main objectives of computational chemistry - low computational cost and time, allied to high performance and precision - were attained together. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Aldolase has emerged as a promising molecular target for the treatment of human African trypanosomiasis. Over the last years, due to the increasing number of patients infected with Trypanosoma brucei, there is an urgent need for new drugs to treat this neglected disease. In the present study, two-dimensional fragment-based quantitative-structure activity relationship (QSAR) models were generated for a series of inhibitors of aldolase. Through the application of leave-one-out and leave-many-out cross-validation procedures, significant correlation coefficients were obtained (r(2) = 0.98 and q(2) = 0.77) as an indication of the statistical internal and external consistency of the models. The best model was employed to predict pK(i) values for a series of test set compounds, and the predicted values were in good agreement with the experimental results, showing the power of the model for untested compounds. Moreover, structure-based molecular modeling studies were performed to investigate the binding mode of the inhibitors in the active site of the parasitic target enzyme. The structural and QSAR results provided useful molecular information for the design of new aldolase inhibitors within this structural class.
Resumo:
Selective modulation of liver X receptor beta (LXR beta) has been recognized as an important approach to prevent or reverse the atherosclerotic process. In the present work, we have developed robust conformation-independent fragment-based quantitative structure-activity and structure-selectivity relationship models for a series of quinolines and cinnolines as potent modulators of the two LXR sub-types. The generated models were then used to predict the potency of an external test set and the predicted values were in good agreement with the experimental results, indicating the potential of the models for untested compounds. The final 2D molecular recognition patterns obtained were integrated to 3D structure-based molecular modeling studies to provide useful insights into the chemical and structural determinants for increased LXR beta binding affinity and selectivity. (C) 2011 Elsevier Inc. All rights reserved.
Resumo:
Blood-brain barrier (BBB) permeation is an essential property for drugs that act in the central nervous system (CNS) for the treatment of human diseases, such as epilepsy, depression, Alzheimer's disease, Parkinson disease, schizophrenia, among others. In the present work, quantitative structure-property relationship (QSPR) studies were conducted for the development and validation of in silico models for the prediction of BBB permeation. The data set used has substantial chemical diversity and a relatively wide distribution of property values. The generated QSPR models showed good statistical parameters and were successfully employed for the prediction of a test set containing 48 compounds. The predictive models presented herein are useful in the identification, selection and design of new drug candidates having improved pharmacokinetic properties.
Resumo:
Quantitative structure – activity relationships (QSARs) developed to evaluate percentage of inhibition of STa-stimulated (Escherichia coli) cGMP accumulation in T84 cells are calculated by the Monte Carlo method. This endpoint represents a measure of biological activity of a substance against diarrhea. Statistical quality of the developed models is quite good. The approach is tested using three random splits of data into the training and test sets. The statistical characteristics for three splits are the following: (1) n = 20, r2 = 0.7208, q2 = 0.6583, s = 16.9, F = 46 (training set); n = 11, r2 = 0.8986, s = 14.6 (test set); (2) n = 19, r2 = 0.6689, q2 = 0.5683, s = 17.6, F = 34 (training set); n = 12, r2 = 0.8998, s = 12.1 (test set); and (3) n = 20, r2 = 0.7141, q2 = 0.6525, s = 14.7, F = 45 (training set); n = 11, r2 = 0.8858, s = 19.5 (test set). Based on the proposed here models hypothetical compounds which can be useful agents against diarrhea are suggested.
Resumo:
Human African trypanosomiasis, also known as sleeping sickness, is a major cause of death in Africa, and for which there are no safe and effective treatments available. The enzyme aldolase from Trypanosoma brucei is an attractive, validated target for drug development. A series of alkyl‑glycolamido and alkyl-monoglycolate derivatives was studied employing a combination of drug design approaches. Three-dimensional quantitative structure-activity relationships (3D QSAR) models were generated using the comparative molecular field analysis (CoMFA). Significant results were obtained for the best QSAR model (r2 = 0.95, non-cross-validated correlation coefficient, and q2 = 0.80, cross-validated correlation coefficient), indicating its predictive ability for untested compounds. The model was then used to predict values of the dependent variables (pKi) of an external test set,the predicted values were in good agreement with the experimental results. The integration of 3D QSAR, molecular docking and molecular dynamics simulations provided further insight into the structural basis for selective inhibition of the target enzyme.
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
Coupled-Cluster-Theorie (CC) ist in der heutigen Quantenchemie eine der erfolgreichsten Methoden zur genauen Beschreibung von Molekülen. Die in dieser Arbeit vorgestellten Ergebnisse zeigen, daß neben den Berechnungen von Energien eine Reihe von Eigenschaften wie Strukturparameter, Schwingungsfrequenzen und Rotations-Schwingungs-Parameter kleiner und mittelgrofler Moleküle zuverlässig und präzise vorhergesagt werden können. Im ersten Teil der Arbeit wird mit dem Spin-adaptierten Coupled-Cluster-Ansatz (SA-CC) ein neuer Weg zur Verbesserung der Beschreibung von offenschaligen Systemen vorgestellt. Dabei werden zur Bestimmung der unbekannten Wellenfunktionsparameter zusätzlich die CC-Spingleichungen gelöst. Durch dieses Vorgehen wird gewährleistet, daß die erhaltene Wellenfunktion eine Spineigenfunktion ist. Die durchgeführte Implementierung des Spin-adaptierten CC-Ansatzes unter Berücksichtigung von Einfach- und Zweifachanregungen (CCSD) für high-spin Triplett-Systeme wird ausführlich erläutert. Im zweiten Teil werden CC-Additionsschemata vorgestellt, die auf der Annahme der Additivität von Elektronenkorrelations- und Basissatzeffekten basieren. Die etablierte Vorgehensweise, verschiedene Beiträge zur Energie mit an den Rechenaufwand angepassten Basissätzen separat zu berechnen und aufzusummieren, wird hier auf Gradienten und Kraftkonstanten übertragen. Für eine Beschreibung von Bindungslängen und harmonischen Schwingungsfrequenzen mit experimenteller Genauigkeit ist die Berücksichtigung von Innerschalenkorrelationseffekten sowie Dreifach- und Vierfachanregungen im Clusteroperator der Wellenfunktion nötig. Die Basissatzkonvergenz wird dabei zusätzlich mit Extrapolationsmethoden beschleunigt. Die quantitative Vorhersage der Bindungslängen von 17 kleinen Molekülen, aufgebaut aus Atomen der ersten Langperiode, ist so mit einer Genauigkeit von wenigen Hundertstel Pikometern möglich. Für die Schwingungsfrequenzen dieser Moleküle weist das CC-Additionsschema basierend auf den berechneten Kraftkonstanten im Vergleich zu experimentellen Ergebnissen einen mittleren absoluten Fehler von 3.5 cm-1 und eine Standardabweichung von 2.2 cm-1 auf. Darüber hinaus werden zur Unterstützung von experimentellen Untersuchungen berechnete spektroskopische Daten einiger größerer Moleküle vorgelegt. Die in dieser Arbeit vorgestellten Untersuchungen zur Isomerisierung von Dihalogensulfanen XSSX (X= F, Cl) oder die Berechnung von Struktur- und Rotations-Schwingungs-Parametern für die Moleküle CHCl2F und CHClF2 zeigen, daß bereits störungstheoretische CCSD(T)-Näherungsmethoden qualitativ gute Vorhersagen experimenteller Resultate liefern. Desweiteren werden Diskrepanzen von experimentellen und berechneten Bindungsabständen bei den Molekülen Borhydrid- und Carbenylium durch die Berücksichtigung des elektronischen Beitrages zum Trägheitsmoment beseitigt.
Resumo:
La tesi è finalizzata ad una preliminare fase di sperimentazione di un algoritmo che, a partire da dati di acustica, sia in grado di classificare le specie di pesce presenti in cale mono e plurispecifiche. I dati sono stati acquisiti nella fascia costiera della Sicilia meridionale, durante alcune campagne di ricerca effettuate tra il 2002 e il 2011, dall’IAMC – CNR di Capo Granitola. Sono stati registrati i valori delle variabili ambientali e biotiche tramite metodologia acustica e della composizione dei banchi di pesci catturati tramite cale sperimentali: acciughe, sardine, suri, altre specie pelagiche e pesci demersali. La metodologia proposta per la classificazione dei segnali acustici nasce dalla fusione di logica fuzzy e teorema di Bayes, per dar luogo ad un approccio modellistico consistente in un compilatore naïve Bayes operante in ambiente fuzzy. Nella fattispecie si è proceduto alla fase di training del classificatore, mediante un learning sample di percentuali delle categorie ittiche sopra menzionate, e ai dati di alcune delle osservazioni acustiche, biotiche e abiotiche, rilevate dall’echosurvey sugli stessi banchi. La validazione del classificatore è stata effettuata sul test set, ossia sui dati che non erano stati scelti per la fase di training. Per ciascuna cala, sono stati infine tracciati dei grafici di dispersione/correlazione dei gruppi ittici e le percentuali simulate. Come misura di corrispondenza dei dati sono stati considerati i valori di regressione R2 tra le percentuali reali e quelle calcolate dal classificatore fuzzy naïve Bayes. Questi, risultando molto alti (0,9134-0,99667), validavano il risultato del classificatore che discriminava con accuratezza le ecotracce provenienti dai banchi. L’applicabilità del classificatore va comunque testata e verificata oltre i limiti imposti da un lavoro di tesi; in particolare la fase di test va riferita a specie diverse, a condizioni ambientali al contorno differenti da quelle riscontrate e all’utilizzo di learning sample meno estesi.
Resumo:
Die Themengebiete dieser Arbeit umfassen sowohl methodische Weiterentwicklungen im Rahmen der ab initio zweiter Ordnungsmethoden CC2 und ADC(2) als auch Anwendungen dieser Weiterentwick-lungen auf aktuelle Fragestellungen. Die methodischen Erweiterungen stehen dabei hauptsächlich im Zusammenhang mit Übergangsmomenten zwischen angeregten Zuständen. Durch die Implementie-rung der selbigen ist nun die Berechnung transienter Absorptionsspektren möglich. Die Anwendungen behandeln vorwiegend das Feld der organischen Halbleiter und deren photo-elektronische Eigen-schaften. Dabei spielen die bislang wenig erforschten Triplett-Excimere eine zentrale Rolle.rnDie Übergangsmomente zwischen angeregten Zuständen wurden in das Programmpaket TUR-BOMOLE implementiert. Dadurch wurde die Berechnung der Übergangsmomente zwischen Zustän-den gleicher Multiplizität (d.h. sowohl Singulett-Singulett- als auch Triplett-Triplett-Übergänge) und unterschiedlicher Multiplizität (also Singulett-Triplett-Übergänge) möglich. Als Erweiterung wurde durch ein Interface zum ORCA Programm die Berechnung von Spin-Orbit-Matrixelementen (SOMEs) implementiert. Des Weiteren kann man mit dieser Implementierung auch Übergänge in offenschaligen Systemen berechnen. Um den Speicherbedarf und die Rechenzeit möglichst gering zu halten wurde die resolution-of-the-identity (RI-) Näherung benutzt. Damit lässt sich der Speicherbedarf von O(N4) auf O(N3) reduzieren, da die mit O(N4) skalierenden Größen (z. B. die T2-Amplituden) sehr effizient aus RI-Intermediaten berechnet werden können und daher nicht abgespeichert werden müssen. Dadurch wird eine Berechnung für mittelgroße Moleküle (ca. 20-50 Atome) mit einer angemessenen Basis möglich.rnDie Genauigkeit der Übergangsmomente zwischen angeregten Zuständen wurde für einen Testsatz kleiner Moleküle sowie für ausgewählte größere organische Moleküle getestet. Dabei stellte sich her-aus, dass der Fehler der RI-Näherung sehr klein ist. Die Vorhersage der transienten Spektren mit CC2 bzw. ADC(2) birgt allerdings ein Problem, da diese Methoden solche Zustände nur sehr unzureichend beschreiben, welche hauptsächlich durch zweifach-Anregungen bezüglich der Referenzdeterminante erzeugt werden. Dies ist für die Spektren aus dem angeregten Zustand relevant, da Übergänge zu diesen Zuständen energetisch zugänglich und erlaubt sein können. Ein Beispiel dafür wird anhand eines Singulett-Singulett-Spektrums in der vorliegenden Arbeit diskutiert. Für die Übergänge zwischen Triplettzuständen ist dies allerdings weniger problematisch, da die energetisch niedrigsten Doppelan-regungen geschlossenschalig sind und daher für Tripletts nicht auftreten.rnVon besonderem Interesse für diese Arbeit ist die Bildung von Excimeren im angeregten Triplettzu-stand. Diese können aufgrund starker Wechselwirkungen zwischen den π-Elektronensystemen großer organischer Moleküle auftreten, wie sie zum Beispiel als organische Halbleiter in organischen Leucht-dioden eingesetzt werden. Dabei können die Excimere die photo-elktronischen Eigenschaften dieser Substanzen signifikant beeinflussen. Im Rahmen dieser Dissertation wurden daher zwei solcher Sys-teme untersucht, [3.3](4,4’)Biphenylophan und das Naphthalin-Dimer. Hierzu wurden die transienten Anregungsspektren aus dem ersten angeregten Triplettzustand berechnet und diese Ergebnisse für die Interpretation der experimentellen Spektren herangezogen. Aufgrund der guten Übereinstimmung zwischen den berechneten und den experimentellen Spektren konnte gezeigt werden, dass es für eine koplanare Anordnung der beiden Monomere zu einer starken Kopplung zwischen lokal angereg-ten und charge-transfer Zuständen kommt. Diese Kopplung resultiert in einer signifikanten energeti-schen Absenkung des ersten angeregten Zustandes und zu einem sehr geringen Abstand zwischen den Monomereinheiten. Dabei ist der angeregte Zustand über beide Monomere delokalisiert. Die star-ke Kopplung tritt bei einem intermolekularen Abstand ≤4 Å auf, was einem typischen Abstand in orga-nischen Halbleitern entspricht. In diesem Bereich kann man zur Berechnung dieser Systeme nicht auf die Förster-Dexter-Theorie zurückgreifen, da diese nur für den Grenzfall der schwachen Kopplung gültig ist.
Resumo:
Analyzing and modeling relationships between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects in chemical datasets is a challenging task for scientific researchers in the field of cheminformatics. Therefore, (Q)SAR model validation is essential to ensure future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to approve its use in real-world scenarios as an alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model is still under discussion. In this work, we empirically compare a k-fold cross-validation with external test set validation. The introduced workflow allows to apply the built and validated models to large amounts of unseen data, and to compare the performance of the different validation approaches. Our experimental results indicate that cross-validation produces (Q)SAR models with higher predictivity than external test set validation and reduces the variance of the results. Statistical validation is important to evaluate the performance of (Q)SAR models, but does not support the user in better understanding the properties of the model or the underlying correlations. We present the 3D molecular viewer CheS-Mapper (Chemical Space Mapper) that arranges compounds in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kinds of features, like structural fragments as well as quantitative chemical descriptors. Comprehensive functionalities including clustering, alignment of compounds according to their 3D structure, and feature highlighting aid the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. Even though visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allows for the investigation of model validation results are still lacking. We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. New functionalities in CheS-Mapper 2.0 facilitate the analysis of (Q)SAR information and allow the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. Our approach reveals if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.
Resumo:
The first part of this three-part review on the relevance of laboratory testing of composites and adhesives deals with approval requirements for composite materials. We compare the in vivo and in vitro literature data and discuss the relevance of in vitro analyses. The standardized ISO protocols are presented, with a focus on the evaluation of physical parameters. These tests all have a standardized protocol that describes the entire test set-up. The tests analyse flexural strength, depth of cure, susceptibility to ambient light, color stability, water sorption and solubility, and radiopacity. Some tests have a clinical correlation. A high flexural strength, for instance, decreases the risk of fractures of the marginal ridge in posterior restorations and incisal edge build-ups of restored anterior teeth. Other tests do not have a clinical correlation or the threshold values are too low, which results in an approval of materials that show inferior clinical properties (e.g., radiopacity). It is advantageous to know the test set-ups and the ideal threshold values to correctly interpret the material data. Overall, however, laboratory assessment alone cannot ensure the clinical success of a product.
Resumo:
In clinical diagnostics, it is of outmost importance to correctly identify the source of a metastatic tumor, especially if no apparent primary tumor is present. Tissue-based proteomics might allow correct tumor classification. As a result, we performed MALDI imaging to generate proteomic signatures for different tumors. These signatures were used to classify common cancer types. At first, a cohort comprised of tissue samples from six adenocarcinoma entities located at different organ sites (esophagus, breast, colon, liver, stomach, thyroid gland, n = 171) was classified using two algorithms for a training and test set. For the test set, Support Vector Machine and Random Forest yielded overall accuracies of 82.74 and 81.18%, respectively. Then, colon cancer liver metastasis samples (n = 19) were introduced into the classification. The liver metastasis samples could be discriminated with high accuracy from primary tumors of colon cancer and hepatocellular carcinoma. Additionally, colon cancer liver metastasis samples could be successfully classified by using colon cancer primary tumor samples for the training of the classifier. These findings demonstrate that MALDI imaging-derived proteomic classifiers can discriminate between different tumor types at different organ sites and in the same site.
Resumo:
The execution of a project requires resources that are generally scarce. Classical approaches to resource allocation assume that the usage of these resources by an individual project activity is constant during the execution of that activity; in practice, however, the project manager may vary resource usage over time within prescribed bounds. This variation gives rise to the project scheduling problem which consists in allocating the scarce resources to the project activities over time such that the project duration is minimized, the total number of resource units allocated equals the prescribed work content of each activity, and various work-content-related constraints are met. We formulate this problem for the first time as a mixed-integer linear program. Our computational results for a standard test set from the literature indicate that this model outperforms the state-of-the-art solution methods for this problem.
Resumo:
BACKGROUND Retinal optical coherence tomography (OCT) permits quantification of retinal layer atrophy relevant to assessment of neurodegeneration in multiple sclerosis (MS). Measurement artefacts may limit the use of OCT to MS research. OBJECTIVE An expert task force convened with the aim to provide guidance on the use of validated quality control (QC) criteria for the use of OCT in MS research and clinical trials. METHODS A prospective multi-centre (n = 13) study. Peripapillary ring scan QC rating of an OCT training set (n = 50) was followed by a test set (n = 50). Inter-rater agreement was calculated using kappa statistics. Results were discussed at a round table after the assessment had taken place. RESULTS The inter-rater QC agreement was substantial (kappa = 0.7). Disagreement was found highest for judging signal strength (kappa = 0.40). Future steps to resolve these issues were discussed. CONCLUSION Substantial agreement for QC assessment was achieved with aid of the OSCAR-IB criteria. The task force has developed a website for free online training and QC certification. The criteria may prove useful for future research and trials in MS using OCT as a secondary outcome measure in a multi-centre setting.