889 resultados para Artificial intelligence -- Computer programs
Resumo:
In recent years, IoT technology has radically transformed many crucial industrial and service sectors such as healthcare. The multi-facets heterogeneity of the devices and the collected information provides important opportunities to develop innovative systems and services. However, the ubiquitous presence of data silos and the poor semantic interoperability in the IoT landscape constitute a significant obstacle in the pursuit of this goal. Moreover, achieving actionable knowledge from the collected data requires IoT information sources to be analysed using appropriate artificial intelligence techniques such as automated reasoning. In this thesis work, Semantic Web technologies have been investigated as an approach to address both the data integration and reasoning aspect in modern IoT systems. In particular, the contributions presented in this thesis are the following: (1) the IoT Fitness Ontology, an OWL ontology that has been developed in order to overcome the issue of data silos and enable semantic interoperability in the IoT fitness domain; (2) a Linked Open Data web portal for collecting and sharing IoT health datasets with the research community; (3) a novel methodology for embedding knowledge in rule-defined IoT smart home scenarios; and (4) a knowledge-based IoT home automation system that supports a seamless integration of heterogeneous devices and data sources.
Resumo:
Deep Neural Networks (DNNs) have revolutionized a wide range of applications beyond traditional machine learning and artificial intelligence fields, e.g., computer vision, healthcare, natural language processing and others. At the same time, edge devices have become central in our society, generating an unprecedented amount of data which could be used to train data-hungry models such as DNNs. However, the potentially sensitive or confidential nature of gathered data poses privacy concerns when storing and processing them in centralized locations. To this purpose, decentralized learning decouples model training from the need of directly accessing raw data, by alternating on-device training and periodic communications. The ability of distilling knowledge from decentralized data, however, comes at the cost of facing more challenging learning settings, such as coping with heterogeneous hardware and network connectivity, statistical diversity of data, and ensuring verifiable privacy guarantees. This Thesis proposes an extensive overview of decentralized learning literature, including a novel taxonomy and a detailed description of the most relevant system-level contributions in the related literature for privacy, communication efficiency, data and system heterogeneity, and poisoning defense. Next, this Thesis presents the design of an original solution to tackle communication efficiency and system heterogeneity, and empirically evaluates it on federated settings. For communication efficiency, an original method, specifically designed for Convolutional Neural Networks, is also described and evaluated against the state-of-the-art. Furthermore, this Thesis provides an in-depth review of recently proposed methods to tackle the performance degradation introduced by data heterogeneity, followed by empirical evaluations on challenging data distributions, highlighting strengths and possible weaknesses of the considered solutions. Finally, this Thesis presents a novel perspective on the usage of Knowledge Distillation as a mean for optimizing decentralized learning systems in settings characterized by data heterogeneity or system heterogeneity. Our vision on relevant future research directions close the manuscript.
Resumo:
In the Era of precision medicine and big medical data sharing, it is necessary to solve the work-flow of digital radiological big data in a productive and effective way. In particular, nowadays, it is possible to extract information “hidden” in digital images, in order to create diagnostic algorithms helping clinicians to set up more personalized therapies, which are in particular targets of modern oncological medicine. Digital images generated by the patient have a “texture” structure that is not visible but encrypted; it is “hidden” because it cannot be recognized by sight alone. Thanks to artificial intelligence, pre- and post-processing software and generation of mathematical calculation algorithms, we could perform a classification based on non-visible data contained in radiological images. Being able to calculate the volume of tissue body composition could lead to creating clasterized classes of patients inserted in standard morphological reference tables, based on human anatomy distinguished by gender and age, and maybe in future also by race. Furthermore, the branch of “morpho-radiology" is a useful modality to solve problems regarding personalized therapies, which is particularly needed in the oncological field. Actually oncological therapies are no longer based on generic drugs but on target personalized therapy. The lack of gender and age therapies table could be filled thanks to morpho-radiology data analysis application.
Resumo:
The abundance of visual data and the push for robust AI are driving the need for automated visual sensemaking. Computer Vision (CV) faces growing demand for models that can discern not only what images "represent," but also what they "evoke." This is a demand for tools mimicking human perception at a high semantic level, categorizing images based on concepts like freedom, danger, or safety. However, automating this process is challenging due to entropy, scarcity, subjectivity, and ethical considerations. These challenges not only impact performance but also underscore the critical need for interoperability. This dissertation focuses on abstract concept-based (AC) image classification, guided by three technical principles: situated grounding, performance enhancement, and interpretability. We introduce ART-stract, a novel dataset of cultural images annotated with ACs, serving as the foundation for a series of experiments across four key domains: assessing the effectiveness of the end-to-end DL paradigm, exploring cognitive-inspired semantic intermediaries, incorporating cultural and commonsense aspects, and neuro-symbolic integration of sensory-perceptual data with cognitive-based knowledge. Our results demonstrate that integrating CV approaches with semantic technologies yields methods that surpass the current state of the art in AC image classification, outperforming the end-to-end deep vision paradigm. The results emphasize the role semantic technologies can play in developing both effective and interpretable systems, through the capturing, situating, and reasoning over knowledge related to visual data. Furthermore, this dissertation explores the complex interplay between technical and socio-technical factors. By merging technical expertise with an understanding of human and societal aspects, we advocate for responsible labeling and training practices in visual media. These insights and techniques not only advance efforts in CV and explainable artificial intelligence but also propel us toward an era of AI development that harmonizes technical prowess with deep awareness of its human and societal implications.
Resumo:
Riding the wave of recent groundbreaking achievements, artificial intelligence (AI) is currently the buzzword on everybody’s lips and, allowing algorithms to learn from historical data, Machine Learning (ML) emerged as its pinnacle. The multitude of algorithms, each with unique strengths and weaknesses, highlights the absence of a universal solution and poses a challenging optimization problem. In response, automated machine learning (AutoML) navigates vast search spaces within minimal time constraints. By lowering entry barriers, AutoML emerged as promising the democratization of AI, yet facing some challenges. In data-centric AI, the discipline of systematically engineering data used to build an AI system, the challenge of configuring data pipelines is rather simple. We devise a methodology for building effective data pre-processing pipelines in supervised learning as well as a data-centric AutoML solution for unsupervised learning. In human-centric AI, many current AutoML tools were not built around the user but rather around algorithmic ideas, raising ethical and social bias concerns. We contribute by deploying AutoML tools aiming at complementing, instead of replacing, human intelligence. In particular, we provide solutions for single-objective and multi-objective optimization and showcase the challenges and potential of novel interfaces featuring large language models. Finally, there are application areas that rely on numerical simulators, often related to earth observations, they tend to be particularly high-impact and address important challenges such as climate change and crop life cycles. We commit to coupling these physical simulators with (Auto)ML solutions towards a physics-aware AI. Specifically, in precision farming, we design a smart irrigation platform that: allows real-time monitoring of soil moisture, predicts future moisture values, and estimates water demand to schedule the irrigation.
Resumo:
The scientific success of the LHC experiments at CERN highly depends on the availability of computing resources which efficiently store, process, and analyse the amount of data collected every year. This is ensured by the Worldwide LHC Computing Grid infrastructure that connect computing centres distributed all over the world with high performance network. LHC has an ambitious experimental program for the coming years, which includes large investments and improvements both for the hardware of the detectors and for the software and computing systems, in order to deal with the huge increase in the event rate expected from the High Luminosity LHC (HL-LHC) phase and consequently with the huge amount of data that will be produced. Since few years the role of Artificial Intelligence has become relevant in the High Energy Physics (HEP) world. Machine Learning (ML) and Deep Learning algorithms have been successfully used in many areas of HEP, like online and offline reconstruction programs, detector simulation, object reconstruction, identification, Monte Carlo generation, and surely they will be crucial in the HL-LHC phase. This thesis aims at contributing to a CMS R&D project, regarding a ML "as a Service" solution for HEP needs (MLaaS4HEP). It consists in a data-service able to perform an entire ML pipeline (in terms of reading data, processing data, training ML models, serving predictions) in a completely model-agnostic fashion, directly using ROOT files of arbitrary size from local or distributed data sources. This framework has been updated adding new features in the data preprocessing phase, allowing more flexibility to the user. Since the MLaaS4HEP framework is experiment agnostic, the ATLAS Higgs Boson ML challenge has been chosen as physics use case, with the aim to test MLaaS4HEP and the contribution done with this work.
Resumo:
Activation functions within neural networks play a crucial role in Deep Learning since they allow to learn complex and non-trivial patterns in the data. However, the ability to approximate non-linear functions is a significant limitation when implementing neural networks in a quantum computer to solve typical machine learning tasks. The main burden lies in the unitarity constraint of quantum operators, which forbids non-linearity and poses a considerable obstacle to developing such non-linear functions in a quantum setting. Nevertheless, several attempts have been made to tackle the realization of the quantum activation function in the literature. Recently, the idea of the QSplines has been proposed to approximate a non-linear activation function by implementing the quantum version of the spline functions. Yet, QSplines suffers from various drawbacks. Firstly, the final function estimation requires a post-processing step; thus, the value of the activation function is not available directly as a quantum state. Secondly, QSplines need many error-corrected qubits and a very long quantum circuits to be executed. These constraints do not allow the adoption of the QSplines on near-term quantum devices and limit their generalization capabilities. This thesis aims to overcome these limitations by leveraging hybrid quantum-classical computation. In particular, a few different methods for Variational Quantum Splines are proposed and implemented, to pave the way for the development of complete quantum activation functions and unlock the full potential of quantum neural networks in the field of quantum machine learning.
Resumo:
Application of dataset fusion techniques to an object detection task, involving the use of deep learning as convolutional neural networks, to manage to create a single RCNN architecture able to inference with good performances on two distinct datasets with different domains.
Resumo:
Gaze estimation has gained interest in recent years for being an important cue to obtain information about the internal cognitive state of humans. Regardless of whether it is the 3D gaze vector or the point of gaze (PoG), gaze estimation has been applied in various fields, such as: human robot interaction, augmented reality, medicine, aviation and automotive. In the latter field, as part of Advanced Driver-Assistance Systems (ADAS), it allows the development of cutting-edge systems capable of mitigating road accidents by monitoring driver distraction. Gaze estimation can be also used to enhance the driving experience, for instance, autonomous driving. It also can improve comfort with augmented reality components capable of being commanded by the driver's eyes. Although, several high-performance real-time inference works already exist, just a few are capable of working with only a RGB camera on computationally constrained devices, such as a microcontroller. This work aims to develop a low-cost, efficient and high-performance embedded system capable of estimating the driver's gaze using deep learning and a RGB camera. The proposed system has achieved near-SOTA performances with about 90% less memory footprint. The capabilities to generalize in unseen environments have been evaluated through a live demonstration, where high performance and near real-time inference were obtained using a webcam and a Raspberry Pi4.
Resumo:
In order to estimate depth through supervised deep learning-based stereo methods, it is necessary to have access to precise ground truth depth data. While the gathering of precise labels is commonly tackled by deploying depth sensors, this is not always a viable solution. For instance, in many applications in the biomedical domain, the choice of sensors capable of sensing depth at small distances with high precision on difficult surfaces (that present non-Lambertian properties) is very limited. It is therefore necessary to find alternative techniques to gather ground truth data without having to rely on external sensors. In this thesis, two different approaches have been tested to produce supervision data for biomedical images. The first aims to obtain input stereo image pairs and disparities through simulation in a virtual environment, while the second relies on a non-learned disparity estimation algorithm in order to produce noisy disparities, which are then filtered by means of hand-crafted confidence measures to create noisy labels for a subset of pixels. Among the two, the second approach, which is referred in literature as proxy-labeling, has shown the best results and has even outperformed the non-learned disparity estimation algorithm used for supervision.
Resumo:
The Neural Networks customized and tested in this thesis (WaldoNet, FlowNet and PatchNet) are a first exploration and approach to the Template Matching task. The possibilities of extension are therefore many and some are proposed below. During my thesis, I have analyzed the functioning of the classical algorithms and adapted with deep learning algorithms. The features extracted from both the template and the query images resemble the keypoints of the SIFT algorithm. Then, instead of similarity function or keypoints matching, WaldoNet and PatchNet use the convolutional layer to compare the features, while FlowNet uses the correlational layer. In addition, I have identified the major challenges of the Template Matching task (affine/non-affine transformations, intensity changes...) and solved them with a careful design of the dataset.
Resumo:
Uno degli obiettivi più ambizioni e interessanti dell'informatica, specialmente nel campo dell'intelligenza artificiale, consiste nel raggiungere la capacità di far ragionare un computer in modo simile a come farebbe un essere umano. I più recenti successi nell'ambito delle reti neurali profonde, specialmente nel campo dell'elaborazione del testo in linguaggio naturale, hanno incentivato lo studio di nuove tecniche per affrontare tale problema, a cominciare dal ragionamento deduttivo, la forma più semplice e lineare di ragionamento logico. La domanda fondamentale alla base di questa tesi è infatti la seguente: in che modo una rete neurale basata sull'architettura Transformer può essere impiegata per avanzare lo stato dell'arte nell'ambito del ragionamento deduttivo in linguaggio naturale? Nella prima parte di questo lavoro presento uno studio approfondito di alcune tecnologie recenti che hanno affrontato questo problema con intuizioni vincenti. Da questa analisi emerge come particolarmente efficace l'integrazione delle reti neurali con tecniche simboliche più tradizionali. Nella seconda parte propongo un focus sull'architettura ProofWriter, che ha il pregio di essere relativamente semplice e intuitiva pur presentando prestazioni in linea con quelle dei concorrenti. Questo approfondimento mette in luce la capacità dei modelli T5, con il supporto del framework HuggingFace, di produrre più risposte alternative, tra cui è poi possibile cercare esternamente quella corretta. Nella terza e ultima parte fornisco un prototipo che mostra come si può impiegare tale tecnica per arricchire i sistemi tipo ProofWriter con approcci simbolici basati su nozioni linguistiche, conoscenze specifiche sul dominio applicativo o semplice buonsenso. Ciò che ne risulta è un significativo miglioramento dell'accuratezza rispetto al ProofWriter originale, ma soprattutto la dimostrazione che è possibile sfruttare tale capacità dei modelli T5 per migliorarne le prestazioni.
Resumo:
Depth estimation from images has long been regarded as a preferable alternative compared to expensive and intrusive active sensors, such as LiDAR and ToF. The topic has attracted the attention of an increasingly wide audience thanks to the great amount of application domains, such as autonomous driving, robotic navigation and 3D reconstruction. Among the various techniques employed for depth estimation, stereo matching is one of the most widespread, owing to its robustness, speed and simplicity in setup. Recent developments has been aided by the abundance of annotated stereo images, which granted to deep learning the opportunity to thrive in a research area where deep networks can reach state-of-the-art sub-pixel precision in most cases. Despite the recent findings, stereo matching still begets many open challenges, two among them being finding pixel correspondences in presence of objects that exhibits a non-Lambertian behaviour and processing high-resolution images. Recently, a novel dataset named Booster, which contains high-resolution stereo pairs featuring a large collection of labeled non-Lambertian objects, has been released. The work shown that training state-of-the-art deep neural network on such data improves the generalization capabilities of these networks also in presence of non-Lambertian surfaces. Regardless being a further step to tackle the aforementioned challenge, Booster includes a rather small number of annotated images, and thus cannot satisfy the intensive training requirements of deep learning. This thesis work aims to investigate novel view synthesis techniques to augment the Booster dataset, with ultimate goal of improving stereo matching reliability in presence of high-resolution images that displays non-Lambertian surfaces.
Resumo:
Neural scene representation and neural rendering are new computer vision techniques that enable the reconstruction and implicit representation of real 3D scenes from a set of 2D captured images, by fitting a deep neural network. The trained network can then be used to render novel views of the scene. A recent work in this field, Neural Radiance Fields (NeRF), presented a state-of-the-art approach, which uses a simple Multilayer Perceptron (MLP) to generate photo-realistic RGB images of a scene from arbitrary viewpoints. However, NeRF does not model any light interaction with the fitted scene; therefore, despite producing compelling results for the view synthesis task, it does not provide a solution for relighting. In this work, we propose a new architecture to enable relighting capabilities in NeRF-based representations and we introduce a new real-world dataset to train and evaluate such a model. Our method demonstrates the ability to perform realistic rendering of novel views under arbitrary lighting conditions.
Resumo:
The usage of Optical Character Recognition’s (OCR, systems is a widely spread technology into the world of Computer Vision and Machine Learning. It is a topic that interest many field, for example the automotive, where becomes a specialized task known as License Plate Recognition, useful for many application from the automation of toll road to intelligent payments. However, OCR systems need to be very accurate and generalizable in order to be able to extract the text of license plates under high variable conditions, from the type of camera used for acquisition to light changes. Such variables compromise the quality of digitalized real scenes causing the presence of noise and degradation of various type, which can be minimized with the application of modern approaches for image iper resolution and noise reduction. Oneclass of them is known as Generative Neural Networks, which are very strong ally for the solution of this popular problem.