37 resultados para Depth Estimation,Deep Learning,Disparity Estimation,Computer Vision,Stereo Vision
One of the most visionary goals of Artificial Intelligence is to create a system able to mimic and eventually surpass the intelligence observed in biological systems including, ambitiously, the one observed in humans. The main distinctive strength of humans is their ability to build a deep understanding of the world by learning continuously and drawing from their experiences. This ability, which is found in various degrees in all intelligent biological beings, allows them to adapt and properly react to changes by incrementally expanding and refining their knowledge. Arguably, achieving this ability is one of the main goals of Artificial Intelligence and a cornerstone towards the creation of intelligent artificial agents. Modern Deep Learning approaches allowed researchers and industries to achieve great advancements towards the resolution of many long-standing problems in areas like Computer Vision and Natural Language Processing. However, while this current age of renewed interest in AI allowed for the creation of extremely useful applications, a concerningly limited effort is being directed towards the design of systems able to learn continuously. The biggest problem that hinders an AI system from learning incrementally is the catastrophic forgetting phenomenon. This phenomenon, which was discovered in the 90s, naturally occurs in Deep Learning architectures where classic learning paradigms are applied when learning incrementally from a stream of experiences. This dissertation revolves around the Continual Learning field, a sub-field of Machine Learning research that has recently made a comeback following the renewed interest in Deep Learning approaches. This work will focus on a comprehensive view of continual learning by considering algorithmic, benchmarking, and applicative aspects of this field. This dissertation will also touch on community aspects such as the design and creation of research tools aimed at supporting Continual Learning research, and the theoretical and practical aspects concerning public competitions in this field.
Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.
The Standard Model (SM) of particle physics predicts the existence of a Higgs field responsible for the generation of particles' mass. However, some aspects of this theory remain unsolved, supposing the presence of new physics Beyond the Standard Model (BSM) with the production of new particles at a higher energy scale compared to the current experimental limits. The search for additional Higgs bosons is, in fact, predicted by theoretical extensions of the SM including the Minimal Supersymmetry Standard Model (MSSM). In the MSSM, the Higgs sector consists of two Higgs doublets, resulting in five physical Higgs particles: two charged bosons $H^{\pm}$, two neutral scalars $h$ and $H$, and one pseudoscalar $A$. The work presented in this thesis is dedicated to the search of neutral non-Standard Model Higgs bosons decaying to two muons in the model independent MSSM scenario. Proton-proton collision data recorded by the CMS experiment at the CERN LHC at a center-of-mass energy of 13 TeV are used, corresponding to an integrated luminosity of $35.9\ \text{fb}^{-1}$. Such search is sensitive to neutral Higgs bosons produced either via gluon fusion process or in association with a $\text{b}\bar{\text{b}}$ quark pair. The extensive usage of Machine and Deep Learning techniques is a fundamental element in the discrimination between signal and background simulated events. A new network structure called parameterised Neural Network (pNN) has been implemented, replacing a whole set of single neural networks trained at a specific mass hypothesis value with a single neural network able to generalise well and interpolate in the entire mass range considered. The results of the pNN signal/background discrimination are used to set a model independent 95\% confidence level expected upper limit on the production cross section times branching ratio, for a generic $\phi$ boson decaying into a muon pair in the 130 to 1000 GeV range.
Deep learning methods are extremely promising machine learning tools to analyze neuroimaging data. However, their potential use in clinical settings is limited because of the existing challenges of applying these methods to neuroimaging data. In this study, first a data leakage type caused by slice-level data split that is introduced during training and validation of a 2D CNN is surveyed and a quantitative assessment of the model’s performance overestimation is presented. Second, an interpretable, leakage-fee deep learning software written in a python language with a wide range of options has been developed to conduct both classification and regression analysis. The software was applied to the study of mild cognitive impairment (MCI) in patients with small vessel disease (SVD) using multi-parametric MRI data where the cognitive performance of 58 patients measured by five neuropsychological tests is predicted using a multi-input CNN model taking brain image and demographic data. Each of the cognitive test scores was predicted using different MRI-derived features. As MCI due to SVD has been hypothesized to be the effect of white matter damage, DTI-derived features MD and FA produced the best prediction outcome of the TMT-A score which is consistent with the existing literature. In a second study, an interpretable deep learning system aimed at 1) classifying Alzheimer disease and healthy subjects 2) examining the neural correlates of the disease that causes a cognitive decline in AD patients using CNN visualization tools and 3) highlighting the potential of interpretability techniques to capture a biased deep learning model is developed. Structural magnetic resonance imaging (MRI) data of 200 subjects was used by the proposed CNN model which was trained using a transfer learning-based approach producing a balanced accuracy of 71.6%. Brain regions in the frontal and parietal lobe showing the cerebral cortex atrophy were highlighted by the visualization tools.
The term Artificial intelligence acquired a lot of baggage since its introduction and in its current incarnation is synonymous with Deep Learning. The sudden availability of data and computing resources has opened the gates to myriads of applications. Not all are created equal though, and problems might arise especially for fields not closely related to the tasks that pertain tech companies that spearheaded DL. The perspective of practitioners seems to be changing, however. Human-Centric AI emerged in the last few years as a new way of thinking DL and AI applications from the ground up, with a special attention at their relationship with humans. The goal is designing a system that can gracefully integrate in already established workflows, as in many real-world scenarios AI may not be good enough to completely replace its humans. Often this replacement may even be unneeded or undesirable. Another important perspective comes from, Andrew Ng, a DL pioneer, who recently started shifting the focus of development from “better models” towards better, and smaller, data. He defined his approach Data-Centric AI. Without downplaying the importance of pushing the state of the art in DL, we must recognize that if the goal is creating a tool for humans to use, more raw performance may not align with more utility for the final user. A Human-Centric approach is compatible with a Data-Centric one, and we find that the two overlap nicely when human expertise is used as the driving force behind data quality. This thesis documents a series of case-studies where these approaches were employed, to different extents, to guide the design and implementation of intelligent systems. We found human expertise proved crucial in improving datasets and models. The last chapter includes a slight deviation, with studies on the pandemic, still preserving the human and data centric perspective.
The development of Next Generation Sequencing promotes Biology in the Big Data era. The ever-increasing gap between proteins with known sequences and those with a complete functional annotation requires computational methods for automatic structure and functional annotation. My research has been focusing on proteins and led so far to the development of three novel tools, DeepREx, E-SNPs&GO and ISPRED-SEQ, based on Machine and Deep Learning approaches. DeepREx computes the solvent exposure of residues in a protein chain. This problem is relevant for the definition of structural constraints regarding the possible folding of the protein. DeepREx exploits Long Short-Term Memory layers to capture residue-level interactions between positions distant in the sequence, achieving state-of-the-art performances. With DeepRex, I conducted a large-scale analysis investigating the relationship between solvent exposure of a residue and its probability to be pathogenic upon mutation. E-SNPs&GO predicts the pathogenicity of a Single Residue Variation. Variations occurring on a protein sequence can have different effects, possibly leading to the onset of diseases. E-SNPs&GO exploits protein embeddings generated by two novel Protein Language Models (PLMs), as well as a new way of representing functional information coming from the Gene Ontology. The method achieves state-of-the-art performances and is extremely time-efficient when compared to traditional approaches. ISPRED-SEQ predicts the presence of Protein-Protein Interaction sites in a protein sequence. Knowing how a protein interacts with other molecules is crucial for accurate functional characterization. ISPRED-SEQ exploits a convolutional layer to parse local context after embedding the protein sequence with two novel PLMs, greatly surpassing the current state-of-the-art. All methods are published in international journals and are available as user-friendly web servers. They have been developed keeping in mind standard guidelines for FAIRness (FAIR: Findable, Accessible, Interoperable, Reusable) and are integrated into the public collection of tools provided by ELIXIR, the European infrastructure for Bioinformatics.
The Cherenkov Telescope Array (CTA) will be the next-generation ground-based observatory to study the universe in the very-high-energy domain. The observatory will rely on a Science Alert Generation (SAG) system to analyze the real-time data from the telescopes and generate science alerts. The SAG system will play a crucial role in the search and follow-up of transients from external alerts, enabling multi-wavelength and multi-messenger collaborations. It will maximize the potential for the detection of the rarest phenomena, such as gamma-ray bursts (GRBs), which are the science case for this study. This study presents an anomaly detection method based on deep learning for detecting gamma-ray burst events in real-time. The performance of the proposed method is evaluated and compared against the Li&Ma standard technique in two use cases of serendipitous discoveries and follow-up observations, using short exposure times. The method shows promising results in detecting GRBs and is flexible enough to allow real-time search for transient events on multiple time scales. The method does not assume background nor source models and doe not require a minimum number of photon counts to perform analysis, making it well-suited for real-time analysis. Future improvements involve further tests, relaxing some of the assumptions made in this study as well as post-trials correction of the detection significance. Moreover, the ability to detect other transient classes in different scenarios must be investigated for completeness. The system can be integrated within the SAG system of CTA and deployed on the onsite computing clusters. This would provide valuable insights into the method's performance in a real-world setting and be another valuable tool for discovering new transient events in real-time. Overall, this study makes a significant contribution to the field of astrophysics by demonstrating the effectiveness of deep learning-based anomaly detection techniques for real-time source detection in gamma-ray astronomy.
Anche se l'isteroscopia con la biopsia endometriale è il gold standard nella diagnosi della patologia intracavitaria uterina, l'esperienza dell’isteroscopista è fondamentale per una diagnosi corretta. Il Deep Learning (DL) come metodica di intelligenza artificiale potrebbe essere un aiuto per superare questo limite. Sono disponibili pochi studi con risultati preliminari e mancano ricerche che valutano le prestazioni dei modelli di DL nell'identificazione delle lesioni intrauterine e il possibile aiuto derivato dai fattori clinici. Obiettivo: Sviluppare un modello di DL per identificare e classificare le patologie endocavitarie uterine dalle immagini isteroscopiche. Metodi: È stato eseguito uno studio di coorte retrospettivo osservazionale monocentrico su una serie consecutiva di casi isteroscopici di pazienti con patologia intracavitaria uterina confermata all’esame istologico eseguiti al Policlinico S. Orsola. Le immagini isteroscopiche sono state usate per costruire un modello di DL per la classificazione e l'identificazione delle lesioni intracavitarie con e senza l'aiuto di fattori clinici (età, menopausa, AUB, terapia ormonale e tamoxifene). Come risultati dello studio abbiamo calcolato le metriche diagnostiche del modello di DL nella classificazione e identificazione delle lesioni uterine intracavitarie con e senza l'aiuto dei fattori clinici. Risultati: Abbiamo esaminato 1.500 immagini provenienti da 266 casi: 186 pazienti avevano lesioni focali benigne, 25 lesioni diffuse benigne e 55 lesioni preneoplastiche/neoplastiche. Sia per quanto riguarda la classificazione che l’identificazione, le migliori prestazioni sono state raggiunte con l'aiuto dei fattori clinici, complessivamente con precision dell'80,11%, recall dell'80,11%, specificità del 90,06%, F1 score dell’80,11% e accuratezza dell’86,74% per la classificazione. Per l’identificazione abbiamo ottenuto un rilevamento complessivo dell’85,82%, precision 93,12%, recall del 91,63% ed F1 score del 92,37%. Conclusioni: Il modello DL ha ottenuto una bassa performance nell’identificazione e classificazione delle lesioni intracavitarie uterine dalle immagini isteroscopiche. Anche se la migliore performance diagnostica è stata ottenuta con l’aiuto di fattori clinici specifici, questo miglioramento è stato scarso.
There are many diseases that affect the thyroid gland, and among them are carcinoma. Thyroid cancer is the most common endocrine neoplasm and the second most frequent cancer in the 0-49 age group. This thesis deals with two studies I conducted during my PhD. The first concerns the development of a Deep Learning model to be able to assist the pathologist in screening of thyroid cytology smears. This tool created in collaboration with Prof. Diciotti, affiliated with the DEI-UNIBO "Guglielmo Marconi" Department of Electrical Energy and Information Engineering, has an important clinical implication in that it allows patients to be stratified between those who should undergo surgery and those who should not. The second concerns the application of spatial transcriptomics on well-differentiated thyroid carcinomas to better understand their invasion mechanisms and thus to better comprehend which genes may be involved in the proliferation of these tumors. This project specifically was made possible through a fruitful collaboration with the Gustave Roussy Institute in Paris. Studying thyroid carcinoma deeply is essential to improve patient care, increase survival rates, and enhance the overall understanding of this prevalent cancer. It can lead to more effective prevention, early detection, and treatment strategies that benefit both patients and the healthcare system.
Deep Neural Networks (DNNs) have revolutionized a wide range of applications beyond traditional machine learning and artificial intelligence fields, e.g., computer vision, healthcare, natural language processing and others. At the same time, edge devices have become central in our society, generating an unprecedented amount of data which could be used to train data-hungry models such as DNNs. However, the potentially sensitive or confidential nature of gathered data poses privacy concerns when storing and processing them in centralized locations. To this purpose, decentralized learning decouples model training from the need of directly accessing raw data, by alternating on-device training and periodic communications. The ability of distilling knowledge from decentralized data, however, comes at the cost of facing more challenging learning settings, such as coping with heterogeneous hardware and network connectivity, statistical diversity of data, and ensuring verifiable privacy guarantees. This Thesis proposes an extensive overview of decentralized learning literature, including a novel taxonomy and a detailed description of the most relevant system-level contributions in the related literature for privacy, communication efficiency, data and system heterogeneity, and poisoning defense. Next, this Thesis presents the design of an original solution to tackle communication efficiency and system heterogeneity, and empirically evaluates it on federated settings. For communication efficiency, an original method, specifically designed for Convolutional Neural Networks, is also described and evaluated against the state-of-the-art. Furthermore, this Thesis provides an in-depth review of recently proposed methods to tackle the performance degradation introduced by data heterogeneity, followed by empirical evaluations on challenging data distributions, highlighting strengths and possible weaknesses of the considered solutions. Finally, this Thesis presents a novel perspective on the usage of Knowledge Distillation as a mean for optimizing decentralized learning systems in settings characterized by data heterogeneity or system heterogeneity. Our vision on relevant future research directions close the manuscript.
The first mechanical Automaton concept was found in a Chinese text written in the 3rd century BC, while Computer Vision was born in the late 1960s. Therefore, visual perception applied to machines (i.e. the Machine Vision) is a young and exciting alliance. When robots came in, the new field of Robotic Vision was born, and these terms began to be erroneously interchanged. In short, we can say that Machine Vision is an engineering domain, which concern the industrial use of Vision. The Robotic Vision, instead, is a research field that tries to incorporate robotics aspects in computer vision algorithms. Visual Servoing, for example, is one of the problems that cannot be solved by computer vision only. Accordingly, a large part of this work deals with boosting popular Computer Vision techniques by exploiting robotics: e.g. the use of kinematics to localize a vision sensor, mounted as the robot end-effector. The remainder of this work is dedicated to the counterparty, i.e. the use of computer vision to solve real robotic problems like grasping objects or navigate avoiding obstacles. Will be presented a brief survey about mapping data structures most widely used in robotics along with SkiMap, a novel sparse data structure created both for robotic mapping and as a general purpose 3D spatial index. Thus, several approaches to implement Object Detection and Manipulation, by exploiting the aforementioned mapping strategies, will be proposed, along with a completely new Machine Teaching facility in order to simply the training procedure of modern Deep Learning networks.
Depth represents a crucial piece of information in many practical applications, such as obstacle avoidance and environment mapping. This information can be provided either by active sensors, such as LiDARs, or by passive devices like cameras. A popular passive device is the binocular rig, which allows triangulating the depth of the scene through two synchronized and aligned cameras. However, many devices that are already available in several infrastructures are monocular passive sensors, such as most of the surveillance cameras. The intrinsic ambiguity of the problem makes monocular depth estimation a challenging task. Nevertheless, the recent progress of deep learning strategies is paving the way towards a new class of algorithms able to handle this complexity. This work addresses many relevant topics related to the monocular depth estimation problem. It presents networks capable of predicting accurate depth values even on embedded devices and without the need of expensive ground-truth labels at training time. Moreover, it introduces strategies to estimate the uncertainty of these models, and it shows that monocular networks can easily generate training labels for different tasks at scale. Finally, it evaluates off-the-shelf monocular depth predictors for the relevant use case of social distance monitoring, and shows how this technology allows to overcome already existing strategies limitations.
The quality of fish products is indispensably linked to the freshness of the raw material modulated by appropriate manipulation and storage conditions, specially the storage temperature after catch. The purpose of the research presented in this thesis, which was largely conducted in the context of a research project funded by Italian Ministry of Agricultural, Food and Forestry Policies (MIPAAF), concerned the evaluation of the freshness of farmed and wild fish species, in relation to different storage conditions, under ice (0°C) or at refrigeration temperature (4°C). Several specimens of different species, bogue (Boops boops), red mullet (Mullus barbatus), sea bream (Sparus aurata) and sea bass (Dicentrarchus labrax), during storage, under the different temperature conditions adopted, have been examined. The assessed control parameters were physical (texture, through the use of a dynamometer; visual quality using a computer vision system (CVS)), chemical (through footprint metabolomics 1H-NMR) and sensory (Quality Index Method (QIM). Microbiological determinations were also carried out on the species of hake (Merluccius merluccius). In general obtained results confirmed that the temperature of manipulation/conservation is a key factor in maintaining fish freshness. NMR spectroscopy showed to be able to quantify and evaluate the kinetics for unselected compounds during fish degradation, even a posteriori. This can be suitable for the development of new parameters related to quality and freshness. The development of physical methods, particularly the image analysis performed by computer vision system (CVS), for the evaluation of fish degradation, is very promising. Among CVS parameters, skin colour, presence and distribution of gill mucus, and eye shape modification evidenced a high sensibility for the estimation of fish quality loss, as a function of the adopted storage conditions. Particularly the eye concavity index detected on fish eye showed a high positive correlation with total QIM score.
In the last decades, Artificial Intelligence has witnessed multiple breakthroughs in deep learning. In particular, purely data-driven approaches have opened to a wide variety of successful applications due to the large availability of data. Nonetheless, the integration of prior knowledge is still required to compensate for specific issues like lack of generalization from limited data, fairness, robustness, and biases. In this thesis, we analyze the methodology of integrating knowledge into deep learning models in the field of Natural Language Processing (NLP). We start by remarking on the importance of knowledge integration. We highlight the possible shortcomings of these approaches and investigate the implications of integrating unstructured textual knowledge. We introduce Unstructured Knowledge Integration (UKI) as the process of integrating unstructured knowledge into machine learning models. We discuss UKI in the field of NLP, where knowledge is represented in a natural language format. We identify UKI as a complex process comprised of multiple sub-processes, different knowledge types, and knowledge integration properties to guarantee. We remark on the challenges of integrating unstructured textual knowledge and bridge connections with well-known research areas in NLP. We provide a unified vision of structured knowledge extraction (KE) and UKI by identifying KE as a sub-process of UKI. We investigate some challenging scenarios where structured knowledge is not a feasible prior assumption and formulate each task from the point of view of UKI. We adopt simple yet effective neural architectures and discuss the challenges of such an approach. Finally, we identify KE as a form of symbolic representation. From this perspective, we remark on the need of defining sophisticated UKI processes to verify the validity of knowledge integration. To this end, we foresee frameworks capable of combining symbolic and sub-symbolic representations for learning as a solution.
The study of ancient, undeciphered scripts presents unique challenges, that depend both on the nature of the problem and on the peculiarities of each writing system. In this thesis, I present two computational approaches that are tailored to two different tasks and writing systems. The first of these methods is aimed at the decipherment of the Linear A afraction signs, in order to discover their numerical values. This is achieved with a combination of constraint programming, ad-hoc metrics and paleographic considerations. The second main contribution of this thesis regards the creation of an unsupervised deep learning model which uses drawings of signs from ancient writing system to learn to distinguish different graphemes in the vector space. This system, which is based on techniques used in the field of computer vision, is adapted to the study of ancient writing systems by incorporating information about sequences in the model, mirroring what is often done in natural language processing. In order to develop this model, the Cypriot Greek Syllabary is used as a target, since this is a deciphered writing system. Finally, this unsupervised model is adapted to the undeciphered Cypro-Minoan and it is used to answer open questions about this script. In particular, by reconstructing multiple allographs that are not agreed upon by paleographers, it supports the idea that Cypro-Minoan is a single script and not a collection of three script like it was proposed in the literature. These results on two different tasks shows that computational methods can be applied to undeciphered scripts, despite the relatively low amount of available data, paving the way for further advancement in paleography using these methods.