973 resultados para Tabu search algorithms
Resumo:
The noise power spectrum (NPS) is the reference metric for understanding the noise content in computed tomography (CT) images. To evaluate the noise properties of clinical multidetector (MDCT) scanners, local 2D and 3D NPSs were computed for different acquisition reconstruction parameters.A 64- and a 128-MDCT scanners were employed. Measurements were performed on a water phantom in axial and helical acquisition modes. CT dose index was identical for both installations. Influence of parameters such as the pitch, the reconstruction filter (soft, standard and bone) and the reconstruction algorithm (filtered-back projection (FBP), adaptive statistical iterative reconstruction (ASIR)) were investigated. Images were also reconstructed in the coronal plane using a reformat process. Then 2D and 3D NPS methods were computed.In axial acquisition mode, the 2D axial NPS showed an important magnitude variation as a function of the z-direction when measured at the phantom center. In helical mode, a directional dependency with lobular shape was observed while the magnitude of the NPS was kept constant. Important effects of the reconstruction filter, pitch and reconstruction algorithm were observed on 3D NPS results for both MDCTs. With ASIR, a reduction of the NPS magnitude and a shift of the NPS peak to the low frequency range were visible. 2D coronal NPS obtained from the reformat images was impacted by the interpolation when compared to 2D coronal NPS obtained from 3D measurements.The noise properties of volume measured in last generation MDCTs was studied using local 3D NPS metric. However, impact of the non-stationarity noise effect may need further investigations.
Resumo:
The research reported in this series of article aimed at (1) automating the search of questioned ink specimens in ink reference collections and (2) at evaluating the strength of ink evidence in a transparent and balanced manner. These aims require that ink samples are analysed in an accurate and reproducible way and that they are compared in an objective and automated way. This latter requirement is due to the large number of comparisons that are necessary in both scenarios. A research programme was designed to (a) develop a standard methodology for analysing ink samples in a reproducible way, (b) comparing automatically and objectively ink samples and (c) evaluate the proposed methodology in forensic contexts. This report focuses on the last of the three stages of the research programme. The calibration and acquisition process and the mathematical comparison algorithms were described in previous papers [C. Neumann, P. Margot, New perspectives in the use of ink evidence in forensic science-Part I: Development of a quality assurance process for forensic ink analysis by HPTLC, Forensic Sci. Int. 185 (2009) 29-37; C. Neumann, P. Margot, New perspectives in the use of ink evidence in forensic science-Part II: Development and testing of mathematical algorithms for the automatic comparison of ink samples analysed by HPTLC, Forensic Sci. Int. 185 (2009) 38-50]. In this paper, the benefits and challenges of the proposed concepts are tested in two forensic contexts: (1) ink identification and (2) ink evidential value assessment. The results show that different algorithms are better suited for different tasks. This research shows that it is possible to build digital ink libraries using the most commonly used ink analytical technique, i.e. high-performance thin layer chromatography, despite its reputation of lacking reproducibility. More importantly, it is possible to assign evidential value to ink evidence in a transparent way using a probabilistic model. It is therefore possible to move away from the traditional subjective approach, which is entirely based on experts' opinion, and which is usually not very informative. While there is room for the improvement, this report demonstrates the significant gains obtained over the traditional subjective approach for the search of ink specimens in ink databases, and the interpretation of their evidential value.
Resumo:
The state of the art to describe image quality in medical imaging is to assess the performance of an observer conducting a task of clinical interest. This can be done by using a model observer leading to a figure of merit such as the signal-to-noise ratio (SNR). Using the non-prewhitening (NPW) model observer, we objectively characterised the evolution of its figure of merit in various acquisition conditions. The NPW model observer usually requires the use of the modulation transfer function (MTF) as well as noise power spectra. However, although the computation of the MTF poses no problem when dealing with the traditional filtered back-projection (FBP) algorithm, this is not the case when using iterative reconstruction (IR) algorithms, such as adaptive statistical iterative reconstruction (ASIR) or model-based iterative reconstruction (MBIR). Given that the target transfer function (TTF) had already shown it could accurately express the system resolution even with non-linear algorithms, we decided to tune the NPW model observer, replacing the standard MTF by the TTF. It was estimated using a custom-made phantom containing cylindrical inserts surrounded by water. The contrast differences between the inserts and water were plotted for each acquisition condition. Then, mathematical transformations were performed leading to the TTF. As expected, the first results showed a dependency of the image contrast and noise levels on the TTF for both ASIR and MBIR. Moreover, FBP also proved to be dependent of the contrast and noise when using the lung kernel. Those results were then introduced in the NPW model observer. We observed an enhancement of SNR every time we switched from FBP to ASIR to MBIR. IR algorithms greatly improve image quality, especially in low-dose conditions. Based on our results, the use of MBIR could lead to further dose reduction in several clinical applications.
Resumo:
Recently, several anonymization algorithms have appeared for privacy preservation on graphs. Some of them are based on random-ization techniques and on k-anonymity concepts. We can use both of them to obtain an anonymized graph with a given k-anonymity value. In this paper we compare algorithms based on both techniques in orderto obtain an anonymized graph with a desired k-anonymity value. We want to analyze the complexity of these methods to generate anonymized graphs and the quality of the resulting graphs.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.
Resumo:
3 Summary 3. 1 English The pharmaceutical industry has been facing several challenges during the last years, and the optimization of their drug discovery pipeline is believed to be the only viable solution. High-throughput techniques do participate actively to this optimization, especially when complemented by computational approaches aiming at rationalizing the enormous amount of information that they can produce. In siiico techniques, such as virtual screening or rational drug design, are now routinely used to guide drug discovery. Both heavily rely on the prediction of the molecular interaction (docking) occurring between drug-like molecules and a therapeutically relevant target. Several softwares are available to this end, but despite the very promising picture drawn in most benchmarks, they still hold several hidden weaknesses. As pointed out in several recent reviews, the docking problem is far from being solved, and there is now a need for methods able to identify binding modes with a high accuracy, which is essential to reliably compute the binding free energy of the ligand. This quantity is directly linked to its affinity and can be related to its biological activity. Accurate docking algorithms are thus critical for both the discovery and the rational optimization of new drugs. In this thesis, a new docking software aiming at this goal is presented, EADock. It uses a hybrid evolutionary algorithm with two fitness functions, in combination with a sophisticated management of the diversity. EADock is interfaced with .the CHARMM package for energy calculations and coordinate handling. A validation was carried out on 37 crystallized protein-ligand complexes featuring 11 different proteins. The search space was defined as a sphere of 15 R around the center of mass of the ligand position in the crystal structure, and conversely to other benchmarks, our algorithms was fed with optimized ligand positions up to 10 A root mean square deviation 2MSD) from the crystal structure. This validation illustrates the efficiency of our sampling heuristic, as correct binding modes, defined by a RMSD to the crystal structure lower than 2 A, were identified and ranked first for 68% of the complexes. The success rate increases to 78% when considering the five best-ranked clusters, and 92% when all clusters present in the last generation are taken into account. Most failures in this benchmark could be explained by the presence of crystal contacts in the experimental structure. EADock has been used to understand molecular interactions involved in the regulation of the Na,K ATPase, and in the activation of the nuclear hormone peroxisome proliferatoractivated receptors a (PPARa). It also helped to understand the action of common pollutants (phthalates) on PPARy, and the impact of biotransformations of the anticancer drug Imatinib (Gleevec®) on its binding mode to the Bcr-Abl tyrosine kinase. Finally, a fragment-based rational drug design approach using EADock was developed, and led to the successful design of new peptidic ligands for the a5ß1 integrin, and for the human PPARa. In both cases, the designed peptides presented activities comparable to that of well-established ligands such as the anticancer drug Cilengitide and Wy14,643, respectively. 3.2 French Les récentes difficultés de l'industrie pharmaceutique ne semblent pouvoir se résoudre que par l'optimisation de leur processus de développement de médicaments. Cette dernière implique de plus en plus. de techniques dites "haut-débit", particulièrement efficaces lorsqu'elles sont couplées aux outils informatiques permettant de gérer la masse de données produite. Désormais, les approches in silico telles que le criblage virtuel ou la conception rationnelle de nouvelles molécules sont utilisées couramment. Toutes deux reposent sur la capacité à prédire les détails de l'interaction moléculaire entre une molécule ressemblant à un principe actif (PA) et une protéine cible ayant un intérêt thérapeutique. Les comparatifs de logiciels s'attaquant à cette prédiction sont flatteurs, mais plusieurs problèmes subsistent. La littérature récente tend à remettre en cause leur fiabilité, affirmant l'émergence .d'un besoin pour des approches plus précises du mode d'interaction. Cette précision est essentielle au calcul de l'énergie libre de liaison, qui est directement liée à l'affinité du PA potentiel pour la protéine cible, et indirectement liée à son activité biologique. Une prédiction précise est d'une importance toute particulière pour la découverte et l'optimisation de nouvelles molécules actives. Cette thèse présente un nouveau logiciel, EADock, mettant en avant une telle précision. Cet algorithme évolutionnaire hybride utilise deux pressions de sélections, combinées à une gestion de la diversité sophistiquée. EADock repose sur CHARMM pour les calculs d'énergie et la gestion des coordonnées atomiques. Sa validation a été effectuée sur 37 complexes protéine-ligand cristallisés, incluant 11 protéines différentes. L'espace de recherche a été étendu à une sphère de 151 de rayon autour du centre de masse du ligand cristallisé, et contrairement aux comparatifs habituels, l'algorithme est parti de solutions optimisées présentant un RMSD jusqu'à 10 R par rapport à la structure cristalline. Cette validation a permis de mettre en évidence l'efficacité de notre heuristique de recherche car des modes d'interactions présentant un RMSD inférieur à 2 R par rapport à la structure cristalline ont été classés premier pour 68% des complexes. Lorsque les cinq meilleures solutions sont prises en compte, le taux de succès grimpe à 78%, et 92% lorsque la totalité de la dernière génération est prise en compte. La plupart des erreurs de prédiction sont imputables à la présence de contacts cristallins. Depuis, EADock a été utilisé pour comprendre les mécanismes moléculaires impliqués dans la régulation de la Na,K ATPase et dans l'activation du peroxisome proliferatoractivated receptor a (PPARa). Il a également permis de décrire l'interaction de polluants couramment rencontrés sur PPARy, ainsi que l'influence de la métabolisation de l'Imatinib (PA anticancéreux) sur la fixation à la kinase Bcr-Abl. Une approche basée sur la prédiction des interactions de fragments moléculaires avec protéine cible est également proposée. Elle a permis la découverte de nouveaux ligands peptidiques de PPARa et de l'intégrine a5ß1. Dans les deux cas, l'activité de ces nouveaux peptides est comparable à celles de ligands bien établis, comme le Wy14,643 pour le premier, et le Cilengitide (PA anticancéreux) pour la seconde.
Resumo:
We consider the numerical treatment of the optical flow problem by evaluating the performance of the trust region method versus the line search method. To the best of our knowledge, the trust region method is studied here for the first time for variational optical flow computation. Four different optical flow models are used to test the performance of the proposed algorithm combining linear and nonlinear data terms with quadratic and TV regularization. We show that trust region often performs better than line search; especially in the presence of non-linearity and non-convexity in the model.
Resumo:
This work focuses on the prediction of the two main nitrogenous variables that describe the water quality at the effluent of a Wastewater Treatment Plant. We have developed two kind of Neural Networks architectures based on considering only one output or, in the other hand, the usual five effluent variables that define the water quality: suspended solids, biochemical organic matter, chemical organic matter, total nitrogen and total Kjedhal nitrogen. Two learning techniques based on a classical adaptative gradient and a Kalman filter have been implemented. In order to try to improve generalization and performance we have selected variables by means genetic algorithms and fuzzy systems. The training, testing and validation sets show that the final networks are able to learn enough well the simulated available data specially for the total nitrogen
Resumo:
As modern molecular biology moves towards the analysis of biological systems as opposed to their individual components, the need for appropriate mathematical and computational techniques for understanding the dynamics and structure of such systems is becoming more pressing. For example, the modeling of biochemical systems using ordinary differential equations (ODEs) based on high-throughput, time-dense profiles is becoming more common-place, which is necessitating the development of improved techniques to estimate model parameters from such data. Due to the high dimensionality of this estimation problem, straight-forward optimization strategies rarely produce correct parameter values, and hence current methods tend to utilize genetic/evolutionary algorithms to perform non-linear parameter fitting. Here, we describe a completely deterministic approach, which is based on interval analysis. This allows us to examine entire sets of parameters, and thus to exhaust the global search within a finite number of steps. In particular, we show how our method may be applied to a generic class of ODEs used for modeling biochemical systems called Generalized Mass Action Models (GMAs). In addition, we show that for GMAs our method is amenable to the technique in interval arithmetic called constraint propagation, which allows great improvement of its efficiency. To illustrate the applicability of our method we apply it to some networks of biochemical reactions appearing in the literature, showing in particular that, in addition to estimating system parameters in the absence of noise, our method may also be used to recover the topology of these networks.
Resumo:
Inference of Markov random field images segmentation models is usually performed using iterative methods which adapt the well-known expectation-maximization (EM) algorithm for independent mixture models. However, some of these adaptations are ad hoc and may turn out numerically unstable. In this paper, we review three EM-like variants for Markov random field segmentation and compare their convergence properties both at the theoretical and practical levels. We specifically advocate a numerical scheme involving asynchronous voxel updating, for which general convergence results can be established. Our experiments on brain tissue classification in magnetic resonance images provide evidence that this algorithm may achieve significantly faster convergence than its competitors while yielding at least as good segmentation results.
Resumo:
The study reports a set of forty proteinogenic histidine-containing dipeptides as potential carbonyl quenchers. The peptides were chosen to cover as exhaustively as possible the accessible chemical space, and their quenching activities toward 4-hydroxy-2-nonenal (HNE) and pyridoxal were evaluated by HPLC analyses. The peptides were capped at the C-terminus as methyl esters or amides to favor their resistance to proteolysis and diastereoisomeric pairs were considered to reveal the influence of configuration on quenching. On average, the examined dipeptides are less active than the parent compound carnosine (βAla + His) thus emphasizing the unfavorable effect of the shortening of the βAla residue as confirmed by the control dipeptide Gly-His. Nevertheless, some peptides show promising activities toward HNE combined with a remarkable selectivity. The results emphasize the beneficial role of aromatic and positively charged residues, while negatively charged and H-bonding side chains show a detrimental effect on quenching. As a trend, ester derivatives are slightly more active than amides while heterochiral peptides are more active than their homochiral diastereoisomer. Overall, the results reveal that quenching activity strongly depends on conformational effects and vicinal residues (as evidenced by the reported QSAR analysis), offering insightful clues for the design of improved carbonyl quenchers and to rationalize the specific reactivity of histidine residues within proteins.
Resumo:
Combinatorial optimization involves finding an optimal solution in a finite set of options; many everyday life problems are of this kind. However, the number of options grows exponentially with the size of the problem, such that an exhaustive search for the best solution is practically infeasible beyond a certain problem size. When efficient algorithms are not available, a practical approach to obtain an approximate solution to the problem at hand, is to start with an educated guess and gradually refine it until we have a good-enough solution. Roughly speaking, this is how local search heuristics work. These stochastic algorithms navigate the problem search space by iteratively turning the current solution into new candidate solutions, guiding the search towards better solutions. The search performance, therefore, depends on structural aspects of the search space, which in turn depend on the move operator being used to modify solutions. A common way to characterize the search space of a problem is through the study of its fitness landscape, a mathematical object comprising the space of all possible solutions, their value with respect to the optimization objective, and a relationship of neighborhood defined by the move operator. The landscape metaphor is used to explain the search dynamics as a sort of potential function. The concept is indeed similar to that of potential energy surfaces in physical chemistry. Borrowing ideas from that field, we propose to extend to combinatorial landscapes the notion of the inherent network formed by energy minima in energy landscapes. In our case, energy minima are the local optima of the combinatorial problem, and we explore several definitions for the network edges. At first, we perform an exhaustive sampling of local optima basins of attraction, and define weighted transitions between basins by accounting for all the possible ways of crossing the basins frontier via one random move. Then, we reduce the computational burden by only counting the chances of escaping a given basin via random kick moves that start at the local optimum. Finally, we approximate network edges from the search trajectory of simple search heuristics, mining the frequency and inter-arrival time with which the heuristic visits local optima. Through these methodologies, we build a weighted directed graph that provides a synthetic view of the whole landscape, and that we can characterize using the tools of complex networks science. We argue that the network characterization can advance our understanding of the structural and dynamical properties of hard combinatorial landscapes. We apply our approach to prototypical problems such as the Quadratic Assignment Problem, the NK model of rugged landscapes, and the Permutation Flow-shop Scheduling Problem. We show that some network metrics can differentiate problem classes, correlate with problem non-linearity, and predict problem hardness as measured from the performances of trajectory-based local search heuristics.
Resumo:
Networks are evolving toward a ubiquitous model in which heterogeneousdevices are interconnected. Cryptographic algorithms are required for developing securitysolutions that protect network activity. However, the computational and energy limitationsof network devices jeopardize the actual implementation of such mechanisms. In thispaper, we perform a wide analysis on the expenses of launching symmetric and asymmetriccryptographic algorithms, hash chain functions, elliptic curves cryptography and pairingbased cryptography on personal agendas, and compare them with the costs of basic operatingsystem functions. Results show that although cryptographic power costs are high and suchoperations shall be restricted in time, they are not the main limiting factor of the autonomyof a device.