971 resultados para Fast Computation Algorithm
Resumo:
The prediction of binding modes (BMs) occurring between a small molecule and a target protein of biological interest has become of great importance for drug development. The overwhelming diversity of needs leaves room for docking approaches addressing specific problems. Nowadays, the universe of docking software ranges from fast and user friendly programs to algorithmically flexible and accurate approaches. EADock2 is an example of the latter. Its multiobjective scoring function was designed around the CHARMM22 force field and the FACTS solvation model. However, the major drawback of such a software design lies in its computational cost. EADock dihedral space sampling (DSS) is built on the most efficient features of EADock2, namely its hybrid sampling engine and multiobjective scoring function. Its performance is equivalent to that of EADock2 for drug-like ligands, while the CPU time required has been reduced by several orders of magnitude. This huge improvement was achieved through a combination of several innovative features including an automatic bias of the sampling toward putative binding sites, and a very efficient tree-based DSS algorithm. When the top-scoring prediction is considered, 57% of BMs of a test set of 251 complexes were reproduced within 2 Å RMSD to the crystal structure. Up to 70% were reproduced when considering the five top scoring predictions. The success rate is lower in cross-docking assays but remains comparable with that of the latest version of AutoDock that accounts for the protein flexibility. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011.
Resumo:
Descriptors based on Molecular Interaction Fields (MIF) are highly suitable for drug discovery, but their size (thousands of variables) often limits their application in practice. Here we describe a simple and fast computational method that extracts from a MIF a handful of highly informative points (hot spots) which summarize the most relevant information. The method was specifically developed for drug discovery, is fast, and does not require human supervision, being suitable for its application on very large series of compounds. The quality of the results has been tested by running the method on the ligand structure of a large number of ligand-receptor complexes and then comparing the position of the selected hot spots with actual atoms of the receptor. As an additional test, the hot spots obtained with the novel method were used to obtain GRIND-like molecular descriptors which were compared with the original GRIND. In both cases the results show that the novel method is highly suitable for describing ligand-receptor interactions and compares favorably with other state-of-the-art methods.
Resumo:
The generalization of simple correspondence analysis, for two categorical variables, to multiple correspondence analysis where they may be three or more variables, is not straighforward, both from a mathematical and computational point of view. In this paper we detail the exact computational steps involved in performing a multiple correspondence analysis, including the special aspects of adjusting the principal inertias to correct the percentages of inertia, supplementary points and subset analysis. Furthermore, we give the algorithm for joint correspondence analysis where the cross-tabulations of all unique pairs of variables are analysed jointly. The code in the R language for every step of the computations is given, as well as the results of each computation.
Resumo:
The in situ hybridization Allen Mouse Brain Atlas was mined for proteases expressed in the somatosensory cerebral cortex. Among the 480 genes coding for protease/peptidases, only four were found enriched in cortical interneurons: Reln coding for reelin; Adamts8 and Adamts15 belonging to the class of metzincin proteases involved in reshaping the perineuronal net (PNN) and Mme encoding for Neprilysin, the enzyme degrading amyloid β-peptides. The pattern of expression of metalloproteases (MPs) was analyzed by single-cell reverse transcriptase multiplex PCR after patch clamp and was compared with the expression of 10 canonical interneurons markers and 12 additional genes from the Allen Atlas. Clustering of these genes by K-means algorithm displays five distinct clusters. Among these five clusters, two fast-spiking interneuron clusters expressing the calcium-binding protein Pvalb were identified, one co-expressing Pvalb with Sst (PV-Sst) and another co-expressing Pvalb with three metallopeptidases Adamts8, Adamts15 and Mme (PV-MP). By using Wisteria floribunda agglutinin, a specific marker for PNN, PV-MP interneurons were found surrounded by PNN, whereas the ones expressing Sst, PV-Sst, were not.
Resumo:
An algorithm for computing correlation filters based on synthetic discriminant functions that can be displayed on current spatial light modulators is presented. The procedure is nondivergent, computationally feasible, and capable of producing multiple solutions, thus overcoming some of the pitfalls of previous methods.
Resumo:
This paper proposes a very fast method for blindly approximating a nonlinear mapping which transforms a sum of random variables. The estimation is surprisingly good even when the basic assumption is not satisfied.We use the method for providing a good initialization for inverting post-nonlinear mixtures and Wiener systems. Experiments show that the algorithm speed is strongly improved and the asymptotic performance is preserved with a very low extra computational cost.
Resumo:
Background: Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease. Results: In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data. Conclusions: Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.
Resumo:
When dealing with nonlinear blind processing algorithms (deconvolution or post-nonlinear source separation), complex mathematical estimations must be done giving as a result very slow algorithms. This is the case, for example, in speech processing, spike signals deconvolution or microarray data analysis. In this paper, we propose a simple method to reduce computational time for the inversion of Wiener systems or the separation of post-nonlinear mixtures, by using a linear approximation in a minimum mutual information algorithm. Simulation results demonstrate that linear spline interpolation is fast and accurate, obtaining very good results (similar to those obtained without approximation) while computational time is dramatically decreased. On the other hand, cubic spline interpolation also obtains similar good results, but due to its intrinsic complexity, the global algorithm is much more slow and hence not useful for our purpose.
Resumo:
Résumé: Le développement rapide de nouvelles technologies comme l'imagerie médicale a permis l'expansion des études sur les fonctions cérébrales. Le rôle principal des études fonctionnelles cérébrales est de comparer l'activation neuronale entre différents individus. Dans ce contexte, la variabilité anatomique de la taille et de la forme du cerveau pose un problème majeur. Les méthodes actuelles permettent les comparaisons interindividuelles par la normalisation des cerveaux en utilisant un cerveau standard. Les cerveaux standards les plus utilisés actuellement sont le cerveau de Talairach et le cerveau de l'Institut Neurologique de Montréal (MNI) (SPM99). Les méthodes de recalage qui utilisent le cerveau de Talairach, ou celui de MNI, ne sont pas suffisamment précises pour superposer les parties plus variables d'un cortex cérébral (p.ex., le néocortex ou la zone perisylvienne), ainsi que les régions qui ont une asymétrie très importante entre les deux hémisphères. Le but de ce projet est d'évaluer une nouvelle technique de traitement d'images basée sur le recalage non-rigide et utilisant les repères anatomiques. Tout d'abord, nous devons identifier et extraire les structures anatomiques (les repères anatomiques) dans le cerveau à déformer et celui de référence. La correspondance entre ces deux jeux de repères nous permet de déterminer en 3D la déformation appropriée. Pour les repères anatomiques, nous utilisons six points de contrôle qui sont situés : un sur le gyrus de Heschl, un sur la zone motrice de la main et le dernier sur la fissure sylvienne, bilatéralement. Evaluation de notre programme de recalage est accomplie sur les images d'IRM et d'IRMf de neuf sujets parmi dix-huit qui ont participés dans une étude précédente de Maeder et al. Le résultat sur les images anatomiques, IRM, montre le déplacement des repères anatomiques du cerveau à déformer à la position des repères anatomiques de cerveau de référence. La distance du cerveau à déformer par rapport au cerveau de référence diminue après le recalage. Le recalage des images fonctionnelles, IRMf, ne montre pas de variation significative. Le petit nombre de repères, six points de contrôle, n'est pas suffisant pour produire les modifications des cartes statistiques. Cette thèse ouvre la voie à une nouvelle technique de recalage du cortex cérébral dont la direction principale est le recalage de plusieurs points représentant un sillon cérébral. Abstract : The fast development of new technologies such as digital medical imaging brought to the expansion of brain functional studies. One of the methodolgical key issue in brain functional studies is to compare neuronal activation between individuals. In this context, the great variability of brain size and shape is a major problem. Current methods allow inter-individual comparisions by means of normalisation of subjects' brains in relation to a standard brain. A largerly used standard brains are the proportional grid of Talairach and Tournoux and the Montreal Neurological Insititute standard brain (SPM99). However, there is a lack of more precise methods for the superposition of more variable portions of the cerebral cortex (e.g, neocrotex and perisyvlian zone) and in brain regions highly asymmetric between the two cerebral hemipsheres (e.g. planum termporale). The aim of this thesis is to evaluate a new image processing technique based on non-linear model-based registration. Contrary to the intensity-based, model-based registration uses spatial and not intensitiy information to fit one image to another. We extract identifiable anatomical features (point landmarks) in both deforming and target images and by their correspondence we determine the appropriate deformation in 3D. As landmarks, we use six control points that are situated: one on the Heschl'y Gyrus, one on the motor hand area, and one on the sylvian fissure, bilaterally. The evaluation of this model-based approach is performed on MRI and fMRI images of nine of eighteen subjects participating in the Maeder et al. study. Results on anatomical, i.e. MRI, images, show the mouvement of the deforming brain control points to the location of the reference brain control points. The distance of the deforming brain to the reference brain is smallest after the registration compared to the distance before the registration. Registration of functional images, i.e fMRI, doesn't show a significant variation. The small number of registration landmarks, i.e. six, is obvious not sufficient to produce significant modification on the fMRI statistical maps. This thesis opens the way to a new computation technique for cortex registration in which the main directions will be improvement of the registation algorithm, using not only one point as landmark, but many points, representing one particular sulcus.
Resumo:
The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.
Resumo:
In two previous papers [J. Differential Equations, 228 (2006), pp. 530 579; Discrete Contin. Dyn. Syst. Ser. B, 6 (2006), pp. 1261 1300] we have developed fast algorithms for the computations of invariant tori in quasi‐periodic systems and developed theorems that assess their accuracy. In this paper, we study the results of implementing these algorithms and study their performance in actual implementations. More importantly, we note that, due to the speed of the algorithms and the theoretical developments about their reliability, we can compute with confidence invariant objects close to the breakdown of their hyperbolicity properties. This allows us to identify a mechanism of loss of hyperbolicity and measure some of its quantitative regularities. We find that some systems lose hyperbolicity because the stable and unstable bundles approach each other but the Lyapunov multipliers remain away from 1. We find empirically that, close to the breakdown, the distances between the invariant bundles and the Lyapunov multipliers which are natural measures of hyperbolicity depend on the parameters, with power laws with universal exponents. We also observe that, even if the rigorous justifications in [J. Differential Equations, 228 (2006), pp. 530-579] are developed only for hyperbolic tori, the algorithms work also for elliptic tori in Hamiltonian systems. We can continue these tori and also compute some bifurcations at resonance which may lead to the existence of hyperbolic tori with nonorientable bundles. We compute manifolds tangent to nonorientable bundles.
Resumo:
Although fetal anatomy can be adequately viewed in new multi-slice MR images, many critical limitations remain for quantitative data analysis. To this end, several research groups have recently developed advanced image processing methods, often denoted by super-resolution (SR) techniques, to reconstruct from a set of clinical low-resolution (LR) images, a high-resolution (HR) motion-free volume. It is usually modeled as an inverse problem where the regularization term plays a central role in the reconstruction quality. Literature has been quite attracted by Total Variation energies because of their ability in edge preserving but only standard explicit steepest gradient techniques have been applied for optimization. In a preliminary work, it has been shown that novel fast convex optimization techniques could be successfully applied to design an efficient Total Variation optimization algorithm for the super-resolution problem. In this work, two major contributions are presented. Firstly, we will briefly review the Bayesian and Variational dual formulations of current state-of-the-art methods dedicated to fetal MRI reconstruction. Secondly, we present an extensive quantitative evaluation of our SR algorithm previously introduced on both simulated fetal and real clinical data (with both normal and pathological subjects). Specifically, we study the robustness of regularization terms in front of residual registration errors and we also present a novel strategy for automatically select the weight of the regularization as regards the data fidelity term. Our results show that our TV implementation is highly robust in front of motion artifacts and that it offers the best trade-off between speed and accuracy for fetal MRI recovery as in comparison with state-of-the art methods.
Resumo:
We present an algorithm for the computation of reducible invariant tori of discrete dynamical systems that is suitable for tori of dimensions larger than 1. It is based on a quadratically convergent scheme that approximates, at the same time, the Fourier series of the torus, its Floquet transformation, and its Floquet matrix. The Floquet matrix describes the linearization of the dynamics around the torus and, hence, its linear stability. The algorithm presents a high degree of parallelism, and the computational effort grows linearly with the number of Fourier modes needed to represent the solution. For these reasons it is a very good option to compute quasi-periodic solutions with several basic frequencies. The paper includes some examples (flows) to show the efficiency of the method in a parallel computer. In these flows we compute invariant tori of dimensions up to 5, by taking suitable sections.
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
The identifiability of the parameters of a heat exchanger model without phase change was studied in this Master’s thesis using synthetically made data. A fast, two-step Markov chain Monte Carlo method (MCMC) was tested with a couple of case studies and a heat exchanger model. The two-step MCMC-method worked well and decreased the computation time compared to the traditional MCMC-method. The effect of measurement accuracy of certain control variables to the identifiability of parameters was also studied. The accuracy used did not seem to have a remarkable effect to the identifiability of parameters. The use of the posterior distribution of parameters in different heat exchanger geometries was studied. It would be computationally most efficient to use the same posterior distribution among different geometries in the optimisation of heat exchanger networks. According to the results, this was possible in the case when the frontal surface areas were the same among different geometries. In the other cases the same posterior distribution can be used for optimisation too, but that will give a wider predictive distribution as a result. For condensing surface heat exchangers the numerical stability of the simulation model was studied. As a result, a stable algorithm was developed.