909 resultados para EM algorithms
Resumo:
Biological data are inherently interconnected: protein sequences are connected to their annotations, the annotations are structured into ontologies, and so on. While protein-protein interactions are already represented by graphs, in this work I am presenting how a graph structure can be used to enrich the annotation of protein sequences thanks to algorithms that analyze the graph topology. We also describe a novel solution to restrict the data generation needed for building such a graph, thanks to constraints on the data and dynamic programming. The proposed algorithm ideally improves the generation time by a factor of 5. The graph representation is then exploited to build a comprehensive database, thanks to the rising technology of graph databases. While graph databases are widely used for other kind of data, from Twitter tweets to recommendation systems, their application to bioinformatics is new. A graph database is proposed, with a structure that can be easily expanded and queried.
Resumo:
Combinatorial Optimization is becoming ever more crucial, in these days. From natural sciences to economics, passing through urban centers administration and personnel management, methodologies and algorithms with a strong theoretical background and a consolidated real-word effectiveness is more and more requested, in order to find, quickly, good solutions to complex strategical problems. Resource optimization is, nowadays, a fundamental ground for building the basements of successful projects. From the theoretical point of view, Combinatorial Optimization rests on stable and strong foundations, that allow researchers to face ever more challenging problems. However, from the application point of view, it seems that the rate of theoretical developments cannot cope with that enjoyed by modern hardware technologies, especially with reference to the one of processors industry. In this work we propose new parallel algorithms, designed for exploiting the new parallel architectures available on the market. We found that, exposing the inherent parallelism of some resolution techniques (like Dynamic Programming), the computational benefits are remarkable, lowering the execution times by more than an order of magnitude, and allowing to address instances with dimensions not possible before. We approached four Combinatorial Optimization’s notable problems: Packing Problem, Vehicle Routing Problem, Single Source Shortest Path Problem and a Network Design problem. For each of these problems we propose a collection of effective parallel solution algorithms, either for solving the full problem (Guillotine Cuts and SSSPP) or for enhancing a fundamental part of the solution method (VRP and ND). We endorse our claim by presenting computational results for all problems, either on standard benchmarks from the literature or, when possible, on data from real-world applications, where speed-ups of one order of magnitude are usually attained, not uncommonly scaling up to 40 X factors.
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.
Resumo:
Intelligent Transport Systems (ITS) consists in the application of ICT to transport to offer new and improved services to the mobility of people and freights. While using ITS, travellers produce large quantities of data that can be collected and analysed to study their behaviour and to provide information to decision makers and planners. The thesis proposes innovative deployments of classification algorithms for Intelligent Transport System with the aim to support the decisions on traffic rerouting, bus transport demand and behaviour of two wheelers vehicles. The first part of this work provides an overview and a classification of a selection of clustering algorithms that can be implemented for the analysis of ITS data. The first contribution of this thesis is an innovative use of the agglomerative hierarchical clustering algorithm to classify similar travels in terms of their origin and destination, together with the proposal for a methodology to analyse drivers’ route choice behaviour using GPS coordinates and optimal alternatives. The clusters of repetitive travels made by a sample of drivers are then analysed to compare observed route choices to the modelled alternatives. The results of the analysis show that drivers select routes that are more reliable but that are more expensive in terms of travel time. Successively, different types of users of a service that provides information on the real time arrivals of bus at stop are classified using Support Vector Machines. The results shows that the results of the classification of different types of bus transport users can be used to update or complement the census on bus transport flows. Finally, the problem of the classification of accidents made by two wheelers vehicles is presented together with possible future application of clustering methodologies aimed at identifying and classifying the different types of accidents.
Resumo:
Data sets describing the state of the earth's atmosphere are of great importance in the atmospheric sciences. Over the last decades, the quality and sheer amount of the available data increased significantly, resulting in a rising demand for new tools capable of handling and analysing these large, multidimensional sets of atmospheric data. The interdisciplinary work presented in this thesis covers the development and the application of practical software tools and efficient algorithms from the field of computer science, aiming at the goal of enabling atmospheric scientists to analyse and to gain new insights from these large data sets. For this purpose, our tools combine novel techniques with well-established methods from different areas such as scientific visualization and data segmentation. In this thesis, three practical tools are presented. Two of these tools are software systems (Insight and IWAL) for different types of processing and interactive visualization of data, the third tool is an efficient algorithm for data segmentation implemented as part of Insight.Insight is a toolkit for the interactive, three-dimensional visualization and processing of large sets of atmospheric data, originally developed as a testing environment for the novel segmentation algorithm. It provides a dynamic system for combining at runtime data from different sources, a variety of different data processing algorithms, and several visualization techniques. Its modular architecture and flexible scripting support led to additional applications of the software, from which two examples are presented: the usage of Insight as a WMS (web map service) server, and the automatic production of a sequence of images for the visualization of cyclone simulations. The core application of Insight is the provision of the novel segmentation algorithm for the efficient detection and tracking of 3D features in large sets of atmospheric data, as well as for the precise localization of the occurring genesis, lysis, merging and splitting events. Data segmentation usually leads to a significant reduction of the size of the considered data. This enables a practical visualization of the data, statistical analyses of the features and their events, and the manual or automatic detection of interesting situations for subsequent detailed investigation. The concepts of the novel algorithm, its technical realization, and several extensions for avoiding under- and over-segmentation are discussed. As example applications, this thesis covers the setup and the results of the segmentation of upper-tropospheric jet streams and cyclones as full 3D objects. Finally, IWAL is presented, which is a web application for providing an easy interactive access to meteorological data visualizations, primarily aimed at students. As a web application, the needs to retrieve all input data sets and to install and handle complex visualization tools on a local machine are avoided. The main challenge in the provision of customizable visualizations to large numbers of simultaneous users was to find an acceptable trade-off between the available visualization options and the performance of the application. Besides the implementational details, benchmarks and the results of a user survey are presented.
Resumo:
Magnetic Resonance Spectroscopy (MRS) is an advanced clinical and research application which guarantees a specific biochemical and metabolic characterization of tissues by the detection and quantification of key metabolites for diagnosis and disease staging. The "Associazione Italiana di Fisica Medica (AIFM)" has promoted the activity of the "Interconfronto di spettroscopia in RM" working group. The purpose of the study is to compare and analyze results obtained by perfoming MRS on scanners of different manufacturing in order to compile a robust protocol for spectroscopic examinations in clinical routines. This thesis takes part into this project by using the GE Signa HDxt 1.5 T at the Pavillion no. 11 of the S.Orsola-Malpighi hospital in Bologna. The spectral analyses have been performed with the jMRUI package, which includes a wide range of preprocessing and quantification algorithms for signal analysis in the time domain. After the quality assurance on the scanner with standard and innovative methods, both spectra with and without suppression of the water peak have been acquired on the GE test phantom. The comparison of the ratios of the metabolite amplitudes over Creatine computed by the workstation software, which works on the frequencies, and jMRUI shows good agreement, suggesting that quantifications in both domains may lead to consistent results. The characterization of an in-house phantom provided by the working group has achieved its goal of assessing the solution content and the metabolite concentrations with good accuracy. The goodness of the experimental procedure and data analysis has been demonstrated by the correct estimation of the T2 of water, the observed biexponential relaxation curve of Creatine and the correct TE value at which the modulation by J coupling causes the Lactate doublet to be inverted in the spectrum. The work of this thesis has demonstrated that it is possible to perform measurements and establish protocols for data analysis, based on the physical principles of NMR, which are able to provide robust values for the spectral parameters of clinical use.
Resumo:
Die vorliegende Arbeit behandelt die Entwicklung und Verbesserung von linear skalierenden Algorithmen für Elektronenstruktur basierte Molekulardynamik. Molekulardynamik ist eine Methode zur Computersimulation des komplexen Zusammenspiels zwischen Atomen und Molekülen bei endlicher Temperatur. Ein entscheidender Vorteil dieser Methode ist ihre hohe Genauigkeit und Vorhersagekraft. Allerdings verhindert der Rechenaufwand, welcher grundsätzlich kubisch mit der Anzahl der Atome skaliert, die Anwendung auf große Systeme und lange Zeitskalen. Ausgehend von einem neuen Formalismus, basierend auf dem großkanonischen Potential und einer Faktorisierung der Dichtematrix, wird die Diagonalisierung der entsprechenden Hamiltonmatrix vermieden. Dieser nutzt aus, dass die Hamilton- und die Dichtematrix aufgrund von Lokalisierung dünn besetzt sind. Das reduziert den Rechenaufwand so, dass er linear mit der Systemgröße skaliert. Um seine Effizienz zu demonstrieren, wird der daraus entstehende Algorithmus auf ein System mit flüssigem Methan angewandt, das extremem Druck (etwa 100 GPa) und extremer Temperatur (2000 - 8000 K) ausgesetzt ist. In der Simulation dissoziiert Methan bei Temperaturen oberhalb von 4000 K. Die Bildung von sp²-gebundenem polymerischen Kohlenstoff wird beobachtet. Die Simulationen liefern keinen Hinweis auf die Entstehung von Diamant und wirken sich daher auf die bisherigen Planetenmodelle von Neptun und Uranus aus. Da das Umgehen der Diagonalisierung der Hamiltonmatrix die Inversion von Matrizen mit sich bringt, wird zusätzlich das Problem behandelt, eine (inverse) p-te Wurzel einer gegebenen Matrix zu berechnen. Dies resultiert in einer neuen Formel für symmetrisch positiv definite Matrizen. Sie verallgemeinert die Newton-Schulz Iteration, Altmans Formel für beschränkte und nicht singuläre Operatoren und Newtons Methode zur Berechnung von Nullstellen von Funktionen. Der Nachweis wird erbracht, dass die Konvergenzordnung immer mindestens quadratisch ist und adaptives Anpassen eines Parameters q in allen Fällen zu besseren Ergebnissen führt.
Resumo:
In this thesis we present techniques that can be used to speed up the calculation of perturbative matrix elements for observables with many legs ($n = 3, 4, 5, 6, 7, ldots$). We investigate several ways to achieve this, including the use of Monte Carlo methods, the leading-color approximation, numerically less precise but faster operations, and SSE-vectorization. An important idea is the use of enquote{random polarizations} for which we derive subtraction terms for the real corrections in next-to-leading order calculations. We present the effectiveness of all these methods in the context of electron-positron scattering to $n$ jets, $n$ ranging from two to seven.
Resumo:
In vielen Industriezweigen, zum Beispiel in der Automobilindustrie, werden Digitale Versuchsmodelle (Digital MockUps) eingesetzt, um die Konstruktion und die Funktion eines Produkts am virtuellen Prototypen zu überprüfen. Ein Anwendungsfall ist dabei die Überprüfung von Sicherheitsabständen einzelner Bauteile, die sogenannte Abstandsanalyse. Ingenieure ermitteln dabei für bestimmte Bauteile, ob diese in ihrer Ruhelage sowie während einer Bewegung einen vorgegeben Sicherheitsabstand zu den umgebenden Bauteilen einhalten. Unterschreiten Bauteile den Sicherheitsabstand, so muss deren Form oder Lage verändert werden. Dazu ist es wichtig, die Bereiche der Bauteile, welche den Sicherhabstand verletzen, genau zu kennen. rnrnIn dieser Arbeit präsentieren wir eine Lösung zur Echtzeitberechnung aller den Sicherheitsabstand unterschreitenden Bereiche zwischen zwei geometrischen Objekten. Die Objekte sind dabei jeweils als Menge von Primitiven (z.B. Dreiecken) gegeben. Für jeden Zeitpunkt, in dem eine Transformation auf eines der Objekte angewendet wird, berechnen wir die Menge aller den Sicherheitsabstand unterschreitenden Primitive und bezeichnen diese als die Menge aller toleranzverletzenden Primitive. Wir präsentieren in dieser Arbeit eine ganzheitliche Lösung, welche sich in die folgenden drei großen Themengebiete unterteilen lässt.rnrnIm ersten Teil dieser Arbeit untersuchen wir Algorithmen, die für zwei Dreiecke überprüfen, ob diese toleranzverletzend sind. Hierfür präsentieren wir verschiedene Ansätze für Dreiecks-Dreiecks Toleranztests und zeigen, dass spezielle Toleranztests deutlich performanter sind als bisher verwendete Abstandsberechnungen. Im Fokus unserer Arbeit steht dabei die Entwicklung eines neuartigen Toleranztests, welcher im Dualraum arbeitet. In all unseren Benchmarks zur Berechnung aller toleranzverletzenden Primitive beweist sich unser Ansatz im dualen Raum immer als der Performanteste.rnrnDer zweite Teil dieser Arbeit befasst sich mit Datenstrukturen und Algorithmen zur Echtzeitberechnung aller toleranzverletzenden Primitive zwischen zwei geometrischen Objekten. Wir entwickeln eine kombinierte Datenstruktur, die sich aus einer flachen hierarchischen Datenstruktur und mehreren Uniform Grids zusammensetzt. Um effiziente Laufzeiten zu gewährleisten ist es vor allem wichtig, den geforderten Sicherheitsabstand sinnvoll im Design der Datenstrukturen und der Anfragealgorithmen zu beachten. Wir präsentieren hierzu Lösungen, die die Menge der zu testenden Paare von Primitiven schnell bestimmen. Darüber hinaus entwickeln wir Strategien, wie Primitive als toleranzverletzend erkannt werden können, ohne einen aufwändigen Primitiv-Primitiv Toleranztest zu berechnen. In unseren Benchmarks zeigen wir, dass wir mit unseren Lösungen in der Lage sind, in Echtzeit alle toleranzverletzenden Primitive zwischen zwei komplexen geometrischen Objekten, bestehend aus jeweils vielen hunderttausend Primitiven, zu berechnen. rnrnIm dritten Teil präsentieren wir eine neuartige, speicheroptimierte Datenstruktur zur Verwaltung der Zellinhalte der zuvor verwendeten Uniform Grids. Wir bezeichnen diese Datenstruktur als Shrubs. Bisherige Ansätze zur Speicheroptimierung von Uniform Grids beziehen sich vor allem auf Hashing Methoden. Diese reduzieren aber nicht den Speicherverbrauch der Zellinhalte. In unserem Anwendungsfall haben benachbarte Zellen oft ähnliche Inhalte. Unser Ansatz ist in der Lage, den Speicherbedarf der Zellinhalte eines Uniform Grids, basierend auf den redundanten Zellinhalten, verlustlos auf ein fünftel der bisherigen Größe zu komprimieren und zur Laufzeit zu dekomprimieren.rnrnAbschießend zeigen wir, wie unsere Lösung zur Berechnung aller toleranzverletzenden Primitive Anwendung in der Praxis finden kann. Neben der reinen Abstandsanalyse zeigen wir Anwendungen für verschiedene Problemstellungen der Pfadplanung.
Resumo:
Computing the weighted geometric mean of large sparse matrices is an operation that tends to become rapidly intractable, when the size of the matrices involved grows. However, if we are not interested in the computation of the matrix function itself, but just in that of its product times a vector, the problem turns simpler and there is a chance to solve it even when the matrix mean would actually be impossible to compute. Our interest is motivated by the fact that this calculation has some practical applications, related to the preconditioning of some operators arising in domain decomposition of elliptic problems. In this thesis, we explore how such a computation can be efficiently performed. First, we exploit the properties of the weighted geometric mean and find several equivalent ways to express it through real powers of a matrix. Hence, we focus our attention on matrix powers and examine how well-known techniques can be adapted to the solution of the problem at hand. In particular, we consider two broad families of approaches for the computation of f(A) v, namely quadrature formulae and Krylov subspace methods, and generalize them to the pencil case f(A\B) v. Finally, we provide an extensive experimental evaluation of the proposed algorithms and also try to assess how convergence speed and execution time are influenced by some characteristics of the input matrices. Our results suggest that a few elements have some bearing on the performance and that, although there is no best choice in general, knowing the conditioning and the sparsity of the arguments beforehand can considerably help in choosing the best strategy to tackle the problem.
Resumo:
The problem of localizing a scatterer, which represents a tumor, in a homogeneous circular domain, which represents a breast, is addressed. A breast imaging method based on microwaves is considered. The microwave imaging involves to several techniques for detecting, localizing and characterizing tumors in breast tissues. In all such methods an electromagnetic inverse scattering problem exists. For the scattering detection method, an algorithm based on a linear procedure solution, inspired by MUltiple SIgnal Classification algorithm (MUSIC) and Time Reversal method (TR), is implemented. The algorithm returns a reconstructed image of the investigation domain in which it is detected the scatterer position. This image is called pseudospectrum. A preliminary performance analysis of the algorithm vying the working frequency is performed: the resolution and the signal-to-noise ratio of the pseudospectra are improved if a multi-frequency approach is considered. The Geometrical Mean-MUSIC algorithm (GM- MUSIC) is proposed as multi-frequency method. The performance of the GMMUSIC is tested in different real life computer simulations. The performed analysis shows that the algorithm detects the scatterer until the electrical parameters of the breast are known. This is an evident limit, since, in a real life situation, the anatomy of the breast is unknown. An improvement in GM-MUSIC is proposed: the Eye-GMMUSIC algorithm. Eye-GMMUSIC algorithm needs no a priori information on the electrical parameters of the breast. It is an optimizing algorithm based on the pattern search algorithm: it searches the breast parameters which minimize the Signal-to-Clutter Mean Ratio (SCMR) in the signal. Finally, the GM-MUSIC and the Eye-GMMUSIC algorithms are tested on a microwave breast cancer detection system consisting of an dipole antenna, a Vector Network Analyzer and a novel breast phantom built at University of Bologna. The reconstruction of the experimental data confirm the GM-MUSIC ability to localize a scatterer in a homogeneous medium.
Resumo:
Il cancro della prostata (PCa) è il tumore maligno non-cutaneo più diffuso tra gli uomini ed è il secondo tumore che miete più vittime nei paesi occidentali. La necessità di nuove tecniche non invasive per la diagnosi precoce del PCa è aumentata negli anni. 1H-MRS (proton magnetic resonance spectroscopy) e 1H-MRSI (proton magnetic resonance spectroscopy imaging) sono tecniche avanzate di spettroscopia in risonanza magnetica che permettono di individuare presenza di metaboliti come citrato, colina, creatina e in alcuni casi poliammine in uno o più voxel nel tessuto prostatico. L’abbondanza o l’assenza di uno di questi metaboliti rende possibile discriminare un tessuto sano da uno patologico. Le tecniche di spettroscopia RM sono correntemente utilizzate nella pratica clinica per cervello e fegato, con l’utilizzo di software dedicati per l’analisi degli spettri. La quantificazione di metaboliti nella prostata invece può risultare difficile a causa del basso rapporto segnale/rumore (SNR) degli spettri e del forte accoppiamento-j del citrato. Lo scopo principale di questo lavoro è di proporre un software prototipo per la quantificazione automatica di citrato, colina e creatina nella prostata. Lo sviluppo del programma e dei suoi algoritmi è stato portato avanti all’interno dell’IRST (Istituto Romagnolo per lo Studio e la cura dei Tumori) con l’aiuto dell’unità di fisica sanitaria. Il cuore del programma è un algoritmo iterativo per il fit degli spettri che fa uso di simulazioni MRS sviluppate con il pacchetto di librerie GAMMA in C++. L’accuratezza delle quantificazioni è stata testata con dei fantocci realizzati all’interno dei laboratori dell’istituto. Tutte le misure spettroscopiche sono state eseguite con il nuovo scanner Philips Ingenia 3T, una delle machine di risonanza magnetica più avanzate per applicazioni cliniche. Infine, dopo aver eseguito i test in vitro sui fantocci, sono stati acquisiti gli spettri delle prostate di alcuni volontari sani, per testare se il programma fosse in grado di lavorare in condizioni di basso SNR.
Resumo:
I Polar Codes sono la prima classe di codici a correzione d’errore di cui è stato dimostrato il raggiungimento della capacità per ogni canale simmetrico, discreto e senza memoria, grazie ad un nuovo metodo introdotto recentemente, chiamato ”Channel Polarization”. In questa tesi verranno descritti in dettaglio i principali algoritmi di codifica e decodifica. In particolare verranno confrontate le prestazioni dei simulatori sviluppati per il ”Successive Cancellation Decoder” e per il ”Successive Cancellation List Decoder” rispetto ai risultati riportati in letteratura. Al fine di migliorare la distanza minima e di conseguenza le prestazioni, utilizzeremo uno schema concatenato con il polar code come codice interno ed un CRC come codice esterno. Proporremo inoltre una nuova tecnica per analizzare la channel polarization nel caso di trasmissione su canale AWGN che risulta il modello statistico più appropriato per le comunicazioni satellitari e nelle applicazioni deep space. In aggiunta, investigheremo l’importanza di una accurata approssimazione delle funzioni di polarizzazione.
Resumo:
In the last years radar sensor networks for localization and tracking in indoor environment have generated more and more interest, especially for anti-intrusion security systems. These networks often use Ultra Wide Band (UWB) technology, which consists in sending very short (few nanoseconds) impulse signals. This approach guarantees high resolution and accuracy and also other advantages such as low price, low power consumption and narrow-band interference (jamming) robustness. In this thesis the overall data processing (done in MATLAB environment) is discussed, starting from experimental measures from sensor devices, ending with the 2D visualization of targets movements over time and focusing mainly on detection and localization algorithms. Moreover, two different scenarios and both single and multiple target tracking are analyzed.
Resumo:
Lo streaming è una tecnica per trasferire contenuti multimediali sulla rete globale, utilizzato per esempio da servizi come YouTube e Netflix; dopo una breve attesa, durante la quale un buffer di sicurezza viene riempito, l'utente può usufruire del contenuto richiesto. Cisco e Sandvine, che con cadenza regolare pubblicano bollettini sullo stato di Internet, affermano che lo streaming video ha, e avrà sempre di più, un grande impatto sulla rete globale. Il buon design delle applicazioni di streaming riveste quindi un ruolo importante, sia per la soddisfazione degli utenti che per la stabilità dell'infrastruttura. HTTP Adaptive Streaming indica una famiglia di implementazioni volta a offrire la migliore qualità video possibile (in termini di bit rate) in funzione della bontà della connessione Internet dell'utente finale: il riproduttore multimediale può cambiare in ogni momento il bit rate, scegliendolo in un insieme predefinito, adattandosi alle condizioni della rete. Per ricavare informazioni sullo stato della connettività, due famiglie di metodi sono possibili: misurare la velocità di scaricamento dei precedenti trasferimenti (approccio rate-based), oppure, come recentemente proposto da Netflix, utilizzare l'occupazione del buffer come dato principale (buffer-based). In questo lavoro analizziamo algoritmi di adattamento delle due famiglie, con l'obiettivo di confrontarli su metriche riguardanti la soddisfazione degli utenti, l'utilizzo della rete e la competizione su un collo di bottiglia. I risultati dei nostri test non definiscono un chiaro vincitore, riconoscendo comunque la bontà della nuova proposta, ma evidenziando al contrario che gli algoritmi buffer-based non sempre riescono ad allocare in modo imparziale le risorse di rete.