37 resultados para Machine to Machine
em AMS Tesi di Laurea - Alm@DL - Università di Bologna
Resumo:
The 1d extended Hubbard model with soft-shoulder potential has proved itself
to be very difficult to study due its non solvability and to competition between terms of the Hamiltonian. Given this, we tried to investigate its phase diagram for filling n=2/5 and range of soft-shoulder potential r=2 by using Machine Learning techniques. That led to a rich phase diagram; calling U, V the parameters associated to the Hubbard potential and the soft-shoulder potential respectively, we found that for V<5 and U>3 the system is always in Tomonaga Luttinger Liquid phase, then becomes a Cluster Luttinger Liquid for 5
Resumo:
The emissions estimation, both during homologation and standard driving, is one of the new challenges that automotive industries have to face. The new European and American regulation will allow a lower and lower quantity of Carbon Monoxide emission and will require that all the vehicles have to be able to monitor their own pollutants production. Since numerical models are too computationally expensive and approximated, new solutions based on Machine Learning are replacing standard techniques. In this project we considered a real V12 Internal Combustion Engine to propose a novel approach pushing Random Forests to generate meaningful prediction also in extreme cases (extrapolation, very high frequency peaks, noisy instrumentation etc.). The present work proposes also a data preprocessing pipeline for strongly unbalanced datasets and a reinterpretation of the regression problem as a classification problem in a logarithmic quantized domain. Results have been evaluated for two different models representing a pure interpolation scenario (more standard) and an extrapolation scenario, to test the out of bounds robustness of the model. The employed metrics take into account different aspects which can affect the homologation procedure, so the final analysis will focus on combining all the specific performances together to obtain the overall conclusions.
Resumo:
This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.
Resumo:
The aim of this thesis project is to automatically localize HCC tumors in the human liver and subsequently predict if the tumor will undergo microvascular infiltration (MVI), the initial stage of metastasis development. The input data for the work have been partially supplied by Sant'Orsola Hospital and partially downloaded from online medical databases. Two Unet models have been implemented for the automatic segmentation of the livers and the HCC malignancies within it. The segmentation models have been evaluated with the Intersection-over-Union and the Dice Coefficient metrics. The outcomes obtained for the liver automatic segmentation are quite good (IOU = 0.82; DC = 0.35); the outcomes obtained for the tumor automatic segmentation (IOU = 0.35; DC = 0.46) are, instead, affected by some limitations: it can be state that the algorithm is almost always able to detect the location of the tumor, but it tends to underestimate its dimensions. The purpose is to achieve the CT images of the HCC tumors, necessary for features extraction. The 14 Haralick features calculated from the 3D-GLCM, the 120 Radiomic features and the patients' clinical information are collected to build a dataset of 153 features. Now, the goal is to build a model able to discriminate, based on the features given, the tumors that will undergo MVI and those that will not. This task can be seen as a classification problem: each tumor needs to be classified either as “MVI positive” or “MVI negative”. Techniques for features selection are implemented to identify the most descriptive features for the problem at hand and then, a set of classification models are trained and compared. Among all, the models with the best performances (around 80-84% ± 8-15%) result to be the XGBoost Classifier, the SDG Classifier and the Logist Regression models (without penalization and with Lasso, Ridge or Elastic Net penalization).
Resumo:
The dissertation starts by providing a description of the phenomena related to the increasing importance recently acquired by satellite applications. The spread of such technology comes with implications, such as an increase in maintenance cost, from which derives the interest in developing advanced techniques that favor an augmented autonomy of spacecrafts in health monitoring. Machine learning techniques are widely employed to lay a foundation for effective systems specialized in fault detection by examining telemetry data. Telemetry consists of a considerable amount of information; therefore, the adopted algorithms must be able to handle multivariate data while facing the limitations imposed by on-board hardware features. In the framework of outlier detection, the dissertation addresses the topic of unsupervised machine learning methods. In the unsupervised scenario, lack of prior knowledge of the data behavior is assumed. In the specific, two models are brought to attention, namely Local Outlier Factor and One-Class Support Vector Machines. Their performances are compared in terms of both the achieved prediction accuracy and the equivalent computational cost. Both models are trained and tested upon the same sets of time series data in a variety of settings, finalized at gaining insights on the effect of the increase in dimensionality. The obtained results allow to claim that both models, combined with a proper tuning of their characteristic parameters, successfully comply with the role of outlier detectors in multivariate time series data. Nevertheless, under this specific context, Local Outlier Factor results to be outperforming One-Class SVM, in that it proves to be more stable over a wider range of input parameter values. This property is especially valuable in unsupervised learning since it suggests that the model is keen to adapting to unforeseen patterns.
Resumo:
The aim of TinyML is to bring the capability of Machine Learning to ultra-low-power devices, typically under a milliwatt, and with this it breaks the traditional power barrier that prevents the widely distributed machine intelligence. TinyML allows greater reactivity and privacy by conducting inference on the computer and near-sensor while avoiding the energy cost associated with wireless communication, which is far higher at this scale than that of computing. In addition, TinyML’s efficiency makes a class of smart, battery-powered, always-on applications that can revolutionize the collection and processing of data in real time. This emerging field, which is the end of a lot of innovation, is ready to speed up its growth in the coming years. In this thesis, we deploy three model on a microcontroller. For the model, datasets are retrieved from an online repository and are preprocessed as per our requirement. The model is then trained on the split of preprocessed data at its best to get the most accuracy out of it. Later the trained model is converted to C language to make it possible to deploy on the microcontroller. Finally, we take step towards incorporating the model into the microcontroller by implementing and evaluating an interface for the user to utilize the microcontroller’s sensors. In our thesis, we will have 4 chapters. The first will give us an introduction of TinyML. The second chapter will help setup the TinyML Environment. The third chapter will be about a major use of TinyML in Wake Word Detection. The final chapter will deal with Gesture Recognition in TinyML.
Resumo:
The final goal of the thesis should be a real-world application in the production test data environment. This includes the pre-processing of the data, building models and visualizing the results. To do this, different machine learning models, outlier prediction oriented, should be investigated using a real dataset. Finally, the different outlier prediction algorithms should be compared, and their performance discussed.
Resumo:
Il riconoscimento delle condizioni del manto stradale partendo esclusivamente dai dati raccolti dallo smartphone di un ciclista a bordo del suo mezzo è un ambito di ricerca finora poco esplorato. Per lo sviluppo di questa tesi è stata sviluppata un'apposita applicazione, che combinata a script Python permette di riconoscere differenti tipologie di asfalto. L’applicazione raccoglie i dati rilevati dai sensori di movimento integrati nello smartphone, che registra i movimenti mentre il ciclista è alla guida del suo mezzo. Lo smartphone è fissato in un apposito holder fissato sul manubrio della bicicletta e registra i dati provenienti da giroscopio, accelerometro e magnetometro. I dati sono memorizzati su file CSV, che sono elaborati fino ad ottenere un unico DataSet contenente tutti i dati raccolti con le features estratte mediante appositi script Python. A ogni record sarà assegnato un cluster deciso in base ai risultati prodotti da K-means, risultati utilizzati in seguito per allenare algoritmi Supervised. Lo scopo degli algoritmi è riconoscere la tipologia di manto stradale partendo da questi dati. Per l’allenamento, il DataSet è stato diviso in due parti: il training set dal quale gli algoritmi imparano a classificare i dati e il test set sul quale gli algoritmi applicano ciò che hanno imparato per dare in output la classificazione che ritengono idonea. Confrontando le previsioni degli algoritmi con quello che i dati effettivamente rappresentano si ottiene la misura dell’accuratezza dell’algoritmo.
Resumo:
Il volume di tesi ha riguardato lo sviluppo di un'applicazione mobile che sfrutta la Realtà Aumentata e il Machine Learning nel contesto della biodiversità. Nello specifico si è realizzato un modello di AI che permetta la classificazione di immagini di fiori. Tale modello è stato poi integrato in Android, al fine della realizzazione di un'app che riesca a riconoscere specifiche specie di fiori, oltre a individuare gli insetti impollinatori attratti da essi e rappresentarli in Realtà Aumentata.
Resumo:
The scientific success of the LHC experiments at CERN highly depends on the availability of computing resources which efficiently store, process, and analyse the amount of data collected every year. This is ensured by the Worldwide LHC Computing Grid infrastructure that connect computing centres distributed all over the world with high performance network. LHC has an ambitious experimental program for the coming years, which includes large investments and improvements both for the hardware of the detectors and for the software and computing systems, in order to deal with the huge increase in the event rate expected from the High Luminosity LHC (HL-LHC) phase and consequently with the huge amount of data that will be produced. Since few years the role of Artificial Intelligence has become relevant in the High Energy Physics (HEP) world. Machine Learning (ML) and Deep Learning algorithms have been successfully used in many areas of HEP, like online and offline reconstruction programs, detector simulation, object reconstruction, identification, Monte Carlo generation, and surely they will be crucial in the HL-LHC phase. This thesis aims at contributing to a CMS R&D project, regarding a ML "as a Service" solution for HEP needs (MLaaS4HEP). It consists in a data-service able to perform an entire ML pipeline (in terms of reading data, processing data, training ML models, serving predictions) in a completely model-agnostic fashion, directly using ROOT files of arbitrary size from local or distributed data sources. This framework has been updated adding new features in the data preprocessing phase, allowing more flexibility to the user. Since the MLaaS4HEP framework is experiment agnostic, the ATLAS Higgs Boson ML challenge has been chosen as physics use case, with the aim to test MLaaS4HEP and the contribution done with this work.
Resumo:
The aim of this essay, which focuses on patent translation, is to compare the use of Computer-Assisted Translation (CAT) and Machine Translation (MT). During my curricular internship at a specialized-translation agency called Centro Traduzioni Imolese, I was able to practice patent translation thanks to CAT tools like SDL Trados Studio, something I have never studied at university in Forlì. Nowadays, however, Machine Translation is widely used in patent translation as well, due to the vast number of technical terms and their repetitiveness in patents, so the machine can translate automatically and rapidly all repeated terms with the same word, thanks to the use of corpora and translation memories linked to the patent field. In the first chapter I will give a definition of what a patent is, and I will introduce the concept of patent literature; afterwards, I will illustrate the differences between Computer-Assisted Translation and Machine Translation used in patent translation. In the second chapter I will translate two portions of patent 102019000018530, via the Matecat online application, translating the first part with CAT and the second part with MT, then doing the same for the second portion selected from the patent. Finally, in the third chapter, I will analyse the two translations, comparing the results in order to discover which is the more efficient method for translating patents.
Resumo:
As a consequence of the diffusion of next generation sequencing techniques, metagenomics databases have become one of the most promising repositories of information about features and behavior of microorganisms. One of the subjects that can be studied from those data are bacteria populations. Next generation sequencing techniques allow to study the bacteria population within an environment by sampling genetic material directly from it, without the needing of culturing a similar population in vitro and observing its behavior. As a drawback, it is quite complex to extract information from those data and usually there is more than one way to do that; AMR is no exception. In this study we will discuss how the quantified AMR, which regards the genotype of the bacteria, can be related to the bacteria phenotype and its actual level of resistance against the specific substance. In order to have a quantitative information about bacteria genotype, we will evaluate the resistome from the read libraries, aligning them against CARD database. With those data, we will test various machine learning algorithms for predicting the bacteria phenotype. The samples that we exploit should resemble those that could be obtained from a natural context, but are actually produced by a read libraries simulation tool. In this way we are able to design the populations with bacteria of known genotype, so that we can relay on a secure ground truth for training and testing our algorithms.
Resumo:
Il morbo di Alzheimer è ancora una malattia incurabile. Negli ultimi anni l'aumento progressivo dell'aspettativa di vita ha contribuito a un'insorgenza maggiore di questa patologia, specialmente negli stati con l'età media più alta, tra cui l'Italia. La prevenzione risulta una delle poche vie con cui è possibile arginarne lo sviluppo, ed in questo testo vengono analizzate le potenzialità di alcune tecniche di Machine Learning atte alla creazione di modelli di supporto diagnostico per Alzheimer. Dopo un'opportuna introduzione al morbo di Alzheimer ed al funzionamento generale del Machine Learning, vengono presentate e approfondite due delle tecniche più promettenti per la diagnosi di patologie neurologiche, ovvero la Support Vector Machine (macchina a supporto vettoriale, SVM) e la Convolutional Neural Network (rete neurale convoluzionale, CNN), con annessi risultati, punti di forza e principali debolezze. La conclusione verterà sul possibile futuro delle intelligenze artificiali, con particolare attenzione all'ambito sanitario, e verranno discusse le principali difficoltà nelle quali queste incombono prima di essere commercializzate, insieme a plausibili soluzioni.
Resumo:
In recent times, a significant research effort has been focused on how deformable linear objects (DLOs) can be manipulated for real world applications such as assembly of wiring harnesses for the automotive and aerospace sector. This represents an open topic because of the difficulties in modelling accurately the behaviour of these objects and simulate a task involving their manipulation, considering a variety of different scenarios. These problems have led to the development of data-driven techniques in which machine learning techniques are exploited to obtain reliable solutions. However, this approach makes the solution difficult to be extended, since the learning must be replicated almost from scratch as the scenario changes. It follows that some model-based methodology must be introduced to generalize the results and reduce the training effort accordingly. The objective of this thesis is to develop a solution for the DLOs manipulation to assemble a wiring harness for the automotive sector based on adaptation of a base trajectory set by means of reinforcement learning methods. The idea is to create a trajectory planning software capable of solving the proposed task, reducing where possible the learning time, which is done in real time, but at the same time presenting suitable performance and reliability. The solution has been implemented on a collaborative 7-DOFs Panda robot at the Laboratory of Automation and Robotics of the University of Bologna. Experimental results are reported showing how the robot is capable of optimizing the manipulation of the DLOs gaining experience along the task repetition, but showing at the same time a high success rate from the very beginning of the learning phase.
Resumo:
Il tema della biodiversità sta assumendo sempre più importanza negli ultimi decenni a causa delle condizioni di rischio, dovute alle attività umane, a cui l'intero mondo naturale è costantemente sottoposto. In questo contesto diventa sempre più importante l'educazione ambientale per aumentare la consapevolezza delle persone e per far si che ognuno possa adottare i dovuti accorgimenti nel rispetto e nella preservazione della natura. Questo progetto nasce con l'obiettivo di approfondire il tema della sensibilizzazione, attraverso lo sviluppo di una applicazione nativa android in grado di classificare gli insetti impollinatori e che, grazie all'integrazione di elementi di gamification, sia in grado di motivare l'utente ad approfondire le proprie conoscenze. Il progetto di tesi è suddiviso in tre capitoli: il primo descrive i concetti di biodiversità, gamification e citizen science su cui si basa l'elaborato; il secondo capitolo rappresenta la fase di progettazione per strutturare il database, le interfacce grafiche e per capire le tecnologie migliore da utilizzare; infine il terzo capitolo mostra l'implementazione completa del progetto, descrivendone nel dettaglio le funzionalità.