783 resultados para Neural Network Models for Competing Risks Data
Resumo:
Much of the real-world dataset, including textual data, can be represented using graph structures. The use of graphs to represent textual data has many advantages, mainly related to maintaining a more significant amount of information, such as the relationships between words and their types. In recent years, many neural network architectures have been proposed to deal with tasks on graphs. Many of them consider only node features, ignoring or not giving the proper relevance to relationships between them. However, in many node classification tasks, they play a fundamental role. This thesis aims to analyze the main GNNs, evaluate their advantages and disadvantages, propose an innovative solution considered as an extension of GAT, and apply them to a case study in the biomedical field. We propose the reference GNNs, implemented with methodologies later analyzed, and then applied to a question answering system in the biomedical field as a replacement for the pre-existing GNN. We attempt to obtain better results by using models that can accept as input both node and edge features. As shown later, our proposed models can beat the original solution and define the state-of-the-art for the task under analysis.
Resumo:
The first topic analyzed in the thesis will be Neural Architecture Search (NAS). I will focus on two different tools that I developed, one to optimize the architecture of Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing that has recently emerged, and one to optimize the data precision of tensors inside CNNs. The first NAS proposed explicitly targets the optimization of the most peculiar architectural parameters of TCNs, namely dilation, receptive field, and the number of features in each layer. Note that this is the first NAS that explicitly targets these networks. The second NAS proposed instead focuses on finding the most efficient data format for a target CNN, with the granularity of the layer filter. Note that applying these two NASes in sequence allows an "application designer" to minimize the structure of the neural network employed, minimizing the number of operations or the memory usage of the network. After that, the second topic described is the optimization of neural network deployment on edge devices. Importantly, exploiting edge platforms' scarce resources is critical for NN efficient execution on MCUs. To do so, I will introduce DORY (Deployment Oriented to memoRY) -- an automatic tool to deploy CNNs on low-cost MCUs. DORY, in different steps, can manage different levels of memory inside the MCU automatically, offload the computation workload (i.e., the different layers of a neural network) to dedicated hardware accelerators, and automatically generates ANSI C code that orchestrates off- and on-chip transfers with the computation phases. On top of this, I will introduce two optimized computation libraries that DORY can exploit to deploy TCNs and Transformers on edge efficiently. I conclude the thesis with two different applications on bio-signal analysis, i.e., heart rate tracking and sEMG-based gesture recognition.
Assessing brain connectivity through electroencephalographic signal processing and modeling analysis
Resumo:
Brain functioning relies on the interaction of several neural populations connected through complex connectivity networks, enabling the transmission and integration of information. Recent advances in neuroimaging techniques, such as electroencephalography (EEG), have deepened our understanding of the reciprocal roles played by brain regions during cognitive processes. The underlying idea of this PhD research is that EEG-related functional connectivity (FC) changes in the brain may incorporate important neuromarkers of behavior and cognition, as well as brain disorders, even at subclinical levels. However, a complete understanding of the reliability of the wide range of existing connectivity estimation techniques is still lacking. The first part of this work addresses this limitation by employing Neural Mass Models (NMMs), which simulate EEG activity and offer a unique tool to study interconnected networks of brain regions in controlled conditions. NMMs were employed to test FC estimators like Transfer Entropy and Granger Causality in linear and nonlinear conditions. Results revealed that connectivity estimates reflect information transmission between brain regions, a quantity that can be significantly different from the connectivity strength, and that Granger causality outperforms the other estimators. A second objective of this thesis was to assess brain connectivity and network changes on EEG data reconstructed at the cortical level. Functional brain connectivity has been estimated through Granger Causality, in both temporal and spectral domains, with the following goals: a) detect task-dependent functional connectivity network changes, focusing on internal-external attention competition and fear conditioning and reversal; b) identify resting-state network alterations in a subclinical population with high autistic traits. Connectivity-based neuromarkers, compared to the canonical EEG analysis, can provide deeper insights into brain mechanisms and may drive future diagnostic methods and therapeutic interventions. However, further methodological studies are required to fully understand the accuracy and information captured by FC estimates, especially concerning nonlinear phenomena.
Resumo:
Deep Neural Networks (DNNs) have revolutionized a wide range of applications beyond traditional machine learning and artificial intelligence fields, e.g., computer vision, healthcare, natural language processing and others. At the same time, edge devices have become central in our society, generating an unprecedented amount of data which could be used to train data-hungry models such as DNNs. However, the potentially sensitive or confidential nature of gathered data poses privacy concerns when storing and processing them in centralized locations. To this purpose, decentralized learning decouples model training from the need of directly accessing raw data, by alternating on-device training and periodic communications. The ability of distilling knowledge from decentralized data, however, comes at the cost of facing more challenging learning settings, such as coping with heterogeneous hardware and network connectivity, statistical diversity of data, and ensuring verifiable privacy guarantees. This Thesis proposes an extensive overview of decentralized learning literature, including a novel taxonomy and a detailed description of the most relevant system-level contributions in the related literature for privacy, communication efficiency, data and system heterogeneity, and poisoning defense. Next, this Thesis presents the design of an original solution to tackle communication efficiency and system heterogeneity, and empirically evaluates it on federated settings. For communication efficiency, an original method, specifically designed for Convolutional Neural Networks, is also described and evaluated against the state-of-the-art. Furthermore, this Thesis provides an in-depth review of recently proposed methods to tackle the performance degradation introduced by data heterogeneity, followed by empirical evaluations on challenging data distributions, highlighting strengths and possible weaknesses of the considered solutions. Finally, this Thesis presents a novel perspective on the usage of Knowledge Distillation as a mean for optimizing decentralized learning systems in settings characterized by data heterogeneity or system heterogeneity. Our vision on relevant future research directions close the manuscript.
Resumo:
This thesis focuses on automating the time-consuming task of manually counting activated neurons in fluorescent microscopy images, which is used to study the mechanisms underlying torpor. The traditional method of manual annotation can introduce bias and delay the outcome of experiments, so the author investigates a deep-learning-based procedure to automatize this task. The author explores two of the main convolutional-neural-network (CNNs) state-of-the-art architectures: UNet and ResUnet family model, and uses a counting-by-segmentation strategy to provide a justification of the objects considered during the counting process. The author also explores a weakly-supervised learning strategy that exploits only dot annotations. The author quantifies the advantages in terms of data reduction and counting performance boost obtainable with a transfer-learning approach and, specifically, a fine-tuning procedure. The author released the dataset used for the supervised use case and all the pre-training models, and designed a web application to share both the counting process pipeline developed in this work and the models pre-trained on the dataset analyzed in this work.
Resumo:
Artificial Intelligence (AI) and Machine Learning (ML) are novel data analysis techniques providing very accurate prediction results. They are widely adopted in a variety of industries to improve efficiency and decision-making, but they are also being used to develop intelligent systems. Their success grounds upon complex mathematical models, whose decisions and rationale are usually difficult to comprehend for human users to the point of being dubbed as black-boxes. This is particularly relevant in sensitive and highly regulated domains. To mitigate and possibly solve this issue, the Explainable AI (XAI) field became prominent in recent years. XAI consists of models and techniques to enable understanding of the intricated patterns discovered by black-box models. In this thesis, we consider model-agnostic XAI techniques, which can be applied to Tabular data, with a particular focus on the Credit Scoring domain. Special attention is dedicated to the LIME framework, for which we propose several modifications to the vanilla algorithm, in particular: a pair of complementary Stability Indices that accurately measure LIME stability, and the OptiLIME policy which helps the practitioner finding the proper balance among explanations' stability and reliability. We subsequently put forward GLEAMS a model-agnostic surrogate interpretable model which requires to be trained only once, while providing both Local and Global explanations of the black-box model. GLEAMS produces feature attributions and what-if scenarios, from both dataset and model perspective. Eventually, we argue that synthetic data are an emerging trend in AI, being more and more used to train complex models instead of original data. To be able to explain the outcomes of such models, we must guarantee that synthetic data are reliable enough to be able to translate their explanations to real-world individuals. To this end we propose DAISYnt, a suite of tests to measure synthetic tabular data quality and privacy.
Resumo:
The scientific success of the LHC experiments at CERN highly depends on the availability of computing resources which efficiently store, process, and analyse the amount of data collected every year. This is ensured by the Worldwide LHC Computing Grid infrastructure that connect computing centres distributed all over the world with high performance network. LHC has an ambitious experimental program for the coming years, which includes large investments and improvements both for the hardware of the detectors and for the software and computing systems, in order to deal with the huge increase in the event rate expected from the High Luminosity LHC (HL-LHC) phase and consequently with the huge amount of data that will be produced. Since few years the role of Artificial Intelligence has become relevant in the High Energy Physics (HEP) world. Machine Learning (ML) and Deep Learning algorithms have been successfully used in many areas of HEP, like online and offline reconstruction programs, detector simulation, object reconstruction, identification, Monte Carlo generation, and surely they will be crucial in the HL-LHC phase. This thesis aims at contributing to a CMS R&D project, regarding a ML "as a Service" solution for HEP needs (MLaaS4HEP). It consists in a data-service able to perform an entire ML pipeline (in terms of reading data, processing data, training ML models, serving predictions) in a completely model-agnostic fashion, directly using ROOT files of arbitrary size from local or distributed data sources. This framework has been updated adding new features in the data preprocessing phase, allowing more flexibility to the user. Since the MLaaS4HEP framework is experiment agnostic, the ATLAS Higgs Boson ML challenge has been chosen as physics use case, with the aim to test MLaaS4HEP and the contribution done with this work.
Resumo:
The amplitude of motor evoked potentials (MEPs) elicited by transcranial magnetic stimulation (TMS) of the primary motor cortex (M1) shows a large variability from trial to trial, although MEPs are evoked by the same repeated stimulus. A multitude of factors is believed to influence MEP amplitudes, such as cortical, spinal and motor excitability state. The goal of this work is to explore to which degree the variation in MEP amplitudes can be explained by the cortical state right before the stimulation. Specifically, we analyzed a dataset acquired on eleven healthy subjects comprising, for each subject, 840 single TMS pulses applied to the left M1 during acquisition of electroencephalography (EEG) and electromyography (EMG). An interpretable convolutional neural network, named SincEEGNet, was utilized to discriminate between low- and high-corticospinal excitability trials, defined according to the MEP amplitude, using in input the pre-TMS EEG. This data-driven approach enabled considering multiple brain locations and frequency bands without any a priori selection. Post-hoc interpretation techniques were adopted to enhance interpretation by identifying the more relevant EEG features for the classification. Results show that individualized classifiers successfully discriminated between low and high M1 excitability states in all participants. Outcomes of the interpretation methods suggest the importance of the electrodes situated over the TMS stimulation site, as well as the relevance of the temporal samples of the input EEG closer to the stimulation time. This novel decoding method allows causal investigation of the cortical excitability state, which may be relevant for personalizing and increasing the efficacy of therapeutic brain-state dependent brain stimulation (for example in patients affected by Parkinson’s disease).
Resumo:
Depth estimation from images has long been regarded as a preferable alternative compared to expensive and intrusive active sensors, such as LiDAR and ToF. The topic has attracted the attention of an increasingly wide audience thanks to the great amount of application domains, such as autonomous driving, robotic navigation and 3D reconstruction. Among the various techniques employed for depth estimation, stereo matching is one of the most widespread, owing to its robustness, speed and simplicity in setup. Recent developments has been aided by the abundance of annotated stereo images, which granted to deep learning the opportunity to thrive in a research area where deep networks can reach state-of-the-art sub-pixel precision in most cases. Despite the recent findings, stereo matching still begets many open challenges, two among them being finding pixel correspondences in presence of objects that exhibits a non-Lambertian behaviour and processing high-resolution images. Recently, a novel dataset named Booster, which contains high-resolution stereo pairs featuring a large collection of labeled non-Lambertian objects, has been released. The work shown that training state-of-the-art deep neural network on such data improves the generalization capabilities of these networks also in presence of non-Lambertian surfaces. Regardless being a further step to tackle the aforementioned challenge, Booster includes a rather small number of annotated images, and thus cannot satisfy the intensive training requirements of deep learning. This thesis work aims to investigate novel view synthesis techniques to augment the Booster dataset, with ultimate goal of improving stereo matching reliability in presence of high-resolution images that displays non-Lambertian surfaces.
Resumo:
This thesis contributes to the ArgMining 2021 shared task on Key Point Analysis. Key Point Analysis entails extracting and calculating the prevalence of a concise list of the most prominent talking points, from an input corpus. These talking points are usually referred to as key points. Key point analysis is divided into two subtasks: Key Point Matching, which involves assigning a matching score to each key point/argument pair, and Key Point Generation, which consists of the generation of key points. The task of Key Point Matching was approached using different models: a pretrained Sentence Transformers model and a tree-constrained Graph Neural Network were tested. The best model was the fine-tuned Sentence Transformers, which achieved a mean Average Precision score of 0.75, ranking 12 compared to other participating teams. The model was then used for the subtask of Key Point Generation using the extractive method in the selection of key point candidates and the model developed for the previous subtask to evaluate them.
Resumo:
In this work, the artificial neural networks (ANN) and partial least squares (PLS) regression were applied to UV spectral data for quantitative determination of thiamin hydrochloride (VB1), riboflavin phosphate (VB2), pyridoxine hydrochloride (VB6) and nicotinamide (VPP) in pharmaceutical samples. For calibration purposes, commercial samples in 0.2 mol L-1 acetate buffer (pH 4.0) were employed as standards. The concentration ranges used in the calibration step were: 0.1 - 7.5 mg L-1 for VB1, 0.1 - 3.0 mg L-1 for VB2, 0.1 - 3.0 mg L-1 for VB6 and 0.4 - 30.0 mg L-1 for VPP. From the results it is possible to verify that both methods can be successfully applied for these determinations. The similar error values were obtained by using neural network or PLS methods. The proposed methodology is simple, rapid and can be easily used in quality control laboratories.
Resumo:
Animal welfare has been an important research topic in animal production mainly in its ways of assessment. Vocalization is found to be an interesting tool for evaluating welfare as it provides data in a non-invasive way as well as it allows easy automation of process. The present research had as objective the implementation of an algorithm based on artificial neural network that had the potential of identifying vocalization related to welfare pattern indicatives. The research was done in two parts, the first was the development of the algorithm, and the second its validation with data from the field. Previous records allowed the development of the algorithm from behaviors observed in sows housed in farrowing cages. Matlab® software was used for implementing the network. It was selected a retropropagation gradient algorithm for training the network with the following stop criteria: maximum of 5,000 interactions or error quadratic addition smaller than 0.1. Validation was done with sows and piglets housed in commercial farm. Among the usual behaviors the ones that deserved enhancement were: the feed dispute at farrowing and the eventual risk of involuntary aggression between the piglets or between those and the sow. The algorithm was able to identify through the noise intensity the inherent risk situation of piglets welfare reduction.
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
This work proposes a new approach using a committee machine of artificial neural networks to classify masses found in mammograms as benign or malignant. Three shape factors, three edge-sharpness measures, and 14 texture measures are used for the classification of 20 regions of interest (ROIs) related to malignant tumors and 37 ROIs related to benign masses. A group of multilayer perceptrons (MLPs) is employed as a committee machine of neural network classifiers. The classification results are reached by combining the responses of the individual classifiers. Experiments involving changes in the learning algorithm of the committee machine are conducted. The classification accuracy is evaluated using the area A. under the receiver operating characteristics (ROC) curve. The A, result for the committee machine is compared with the A, results obtained using MLPs and single-layer perceptrons (SLPs), as well as a linear discriminant analysis (LDA) classifier Tests are carried out using the student's t-distribution. The committee machine classifier outperforms the MLP SLP, and LDA classifiers in the following cases: with the shape measure of spiculation index, the A, values of the four methods are, in order 0.93, 0.84, 0.75, and 0.76; and with the edge-sharpness measure of acutance, the values are 0.79, 0.70, 0.69, and 0.74. Although the features with which improvement is obtained with the committee machines are not the same as those that provided the maximal value of A(z) (A(z) = 0.99 with some shape features, with or without the committee machine), they correspond to features that are not critically dependent on the accuracy of the boundaries of the masses, which is an important result. (c) 2008 SPIE and IS&T.
Resumo:
This article focuses on the identification of the number of paths with different lengths between pairs of nodes in complex networks and how these paths can be used for characterization of topological properties of theoretical and real-world complex networks. This analysis revealed that the number of paths can provide a better discrimination of network models than traditional network measurements. In addition, the analysis of real-world networks suggests that the long-range connectivity tends to be limited in these networks and may be strongly related to network growth and organization.