784 resultados para deep learning, convolutional neural network, computer aided detection, mammografie
Resumo:
Il tumore al seno si colloca al primo posto per livello di mortalità tra le patologie tumorali che colpiscono la popolazione femminile mondiale. Diversi studi clinici hanno dimostrato come la diagnosi da parte del radiologo possa essere aiutata e migliorata dai sistemi di Computer Aided Detection (CAD). A causa della grande variabilità di forma e dimensioni delle masse tumorali e della somiglianza di queste con i tessuti che le ospitano, la loro ricerca automatizzata è un problema estremamente complicato. Un sistema di CAD è generalmente composto da due livelli di classificazione: la detection, responsabile dell’individuazione delle regioni sospette presenti sul mammogramma (ROI) e quindi dell’eliminazione preventiva delle zone non a rischio; la classificazione vera e propria (classification) delle ROI in masse e tessuto sano. Lo scopo principale di questa tesi è lo studio di nuove metodologie di detection che possano migliorare le prestazioni ottenute con le tecniche tradizionali. Si considera la detection come un problema di apprendimento supervisionato e lo si affronta mediante le Convolutional Neural Networks (CNN), un algoritmo appartenente al deep learning, nuova branca del machine learning. Le CNN si ispirano alle scoperte di Hubel e Wiesel riguardanti due tipi base di cellule identificate nella corteccia visiva dei gatti: le cellule semplici (S), che rispondono a stimoli simili ai bordi, e le cellule complesse (C) che sono localmente invarianti all’esatta posizione dello stimolo. In analogia con la corteccia visiva, le CNN utilizzano un’architettura profonda caratterizzata da strati che eseguono sulle immagini, alternativamente, operazioni di convoluzione e subsampling. Le CNN, che hanno un input bidimensionale, vengono solitamente usate per problemi di classificazione e riconoscimento automatico di immagini quali oggetti, facce e loghi o per l’analisi di documenti.
Resumo:
Convolutional Neural Networks (CNN) have become the state-of-the-art methods on many large scale visual recognition tasks. For a lot of practical applications, CNN architectures have a restrictive requirement: A huge amount of labeled data are needed for training. The idea of generative pretraining is to obtain initial weights of the network by training the network in a completely unsupervised way and then fine-tune the weights for the task at hand using supervised learning. In this thesis, a general introduction to Deep Neural Networks and algorithms are given and these methods are applied to classification tasks of handwritten digits and natural images for developing unsupervised feature learning. The goal of this thesis is to find out if the effect of pretraining is damped by recent practical advances in optimization and regularization of CNN. The experimental results show that pretraining is still a substantial regularizer, however, not a necessary step in training Convolutional Neural Networks with rectified activations. On handwritten digits, the proposed pretraining model achieved a classification accuracy comparable to the state-of-the-art methods.
Resumo:
Questo lavoro è iniziato con uno studio teorico delle principali tecniche di classificazione di immagini note in letteratura, con particolare attenzione ai più diffusi modelli di rappresentazione dell’immagine, quali il modello Bag of Visual Words, e ai principali strumenti di Apprendimento Automatico (Machine Learning). In seguito si è focalizzata l’attenzione sulla analisi di ciò che costituisce lo stato dell’arte per la classificazione delle immagini, ovvero il Deep Learning. Per sperimentare i vantaggi dell’insieme di metodologie di Image Classification, si è fatto uso di Torch7, un framework di calcolo numerico, utilizzabile mediante il linguaggio di scripting Lua, open source, con ampio supporto alle metodologie allo stato dell’arte di Deep Learning. Tramite Torch7 è stata implementata la vera e propria classificazione di immagini poiché questo framework, grazie anche al lavoro di analisi portato avanti da alcuni miei colleghi in precedenza, è risultato essere molto efficace nel categorizzare oggetti in immagini. Le immagini su cui si sono basati i test sperimentali, appartengono a un dataset creato ad hoc per il sistema di visione 3D con la finalità di sperimentare il sistema per individui ipovedenti e non vedenti; in esso sono presenti alcuni tra i principali ostacoli che un ipovedente può incontrare nella propria quotidianità. In particolare il dataset si compone di potenziali ostacoli relativi a una ipotetica situazione di utilizzo all’aperto. Dopo avere stabilito dunque che Torch7 fosse il supporto da usare per la classificazione, l’attenzione si è concentrata sulla possibilità di sfruttare la Visione Stereo per aumentare l’accuratezza della classificazione stessa. Infatti, le immagini appartenenti al dataset sopra citato sono state acquisite mediante una Stereo Camera con elaborazione su FPGA sviluppata dal gruppo di ricerca presso il quale è stato svolto questo lavoro. Ciò ha permesso di utilizzare informazioni di tipo 3D, quali il livello di depth (profondità) di ogni oggetto appartenente all’immagine, per segmentare, attraverso un algoritmo realizzato in C++, gli oggetti di interesse, escludendo il resto della scena. L’ultima fase del lavoro è stata quella di testare Torch7 sul dataset di immagini, preventivamente segmentate attraverso l’algoritmo di segmentazione appena delineato, al fine di eseguire il riconoscimento della tipologia di ostacolo individuato dal sistema.
Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network
Resumo:
Automated tissue characterization is one of the most crucial components of a computer aided diagnosis (CAD) system for interstitial lung diseases (ILDs). Although much research has been conducted in this field, the problem remains challenging. Deep learning techniques have recently achieved impressive results in a variety of computer vision problems, raising expectations that they might be applied in other domains, such as medical image analysis. In this paper, we propose and evaluate a convolutional neural network (CNN), designed for the classification of ILD patterns. The proposed network consists of 5 convolutional layers with 2×2 kernels and LeakyReLU activations, followed by average pooling with size equal to the size of the final feature maps and three dense layers. The last dense layer has 7 outputs, equivalent to the classes considered: healthy, ground glass opacity (GGO), micronodules, consolidation, reticulation, honeycombing and a combination of GGO/reticulation. To train and evaluate the CNN, we used a dataset of 14696 image patches, derived by 120 CT scans from different scanners and hospitals. To the best of our knowledge, this is the first deep CNN designed for the specific problem. A comparative analysis proved the effectiveness of the proposed CNN against previous methods in a challenging dataset. The classification performance (~85.5%) demonstrated the potential of CNNs in analyzing lung patterns. Future work includes, extending the CNN to three-dimensional data provided by CT volume scans and integrating the proposed method into a CAD system that aims to provide differential diagnosis for ILDs as a supportive tool for radiologists.
Resumo:
As one of the most popular deep learning models, convolution neural network (CNN) has achieved huge success in image information extraction. Traditionally CNN is trained by supervised learning method with labeled data and used as a classifier by adding a classification layer in the end. Its capability of extracting image features is largely limited due to the difficulty of setting up a large training dataset. In this paper, we propose a new unsupervised learning CNN model, which uses a so-called convolutional sparse auto-encoder (CSAE) algorithm pre-Train the CNN. Instead of using labeled natural images for CNN training, the CSAE algorithm can be used to train the CNN with unlabeled artificial images, which enables easy expansion of training data and unsupervised learning. The CSAE algorithm is especially designed for extracting complex features from specific objects such as Chinese characters. After the features of articficial images are extracted by the CSAE algorithm, the learned parameters are used to initialize the first CNN convolutional layer, and then the CNN model is fine-Trained by scene image patches with a linear classifier. The new CNN model is applied to Chinese scene text detection and is evaluated with a multilingual image dataset, which labels Chinese, English and numerals texts separately. More than 10% detection precision gain is observed over two CNN models.
Resumo:
We introduce a type of 2-tier convolutional neural network model for learning distributed paragraph representations for a special task (e.g. paragraph or short document level sentiment analysis and text topic categorization). We decompose the paragraph semantics into 3 cascaded constitutes: word representation, sentence composition and document composition. Specifically, we learn distributed word representations by a continuous bag-of-words model from a large unstructured text corpus. Then, using these word representations as pre-trained vectors, distributed task specific sentence representations are learned from a sentence level corpus with task-specific labels by the first tier of our model. Using these sentence representations as distributed paragraph representation vectors, distributed paragraph representations are learned from a paragraph-level corpus by the second tier of our model. It is evaluated on DBpedia ontology classification dataset and Amazon review dataset. Empirical results show the effectiveness of our proposed learning model for generating distributed paragraph representations.
Resumo:
Acoustic Emission (AE) monitoring can be used to detect the presence of damage as well as determine its location in Structural Health Monitoring (SHM) applications. Information on the time difference of the signal generated by the damage event arriving at different sensors is essential in performing localization. This makes the time of arrival (ToA) an important piece of information to retrieve from the AE signal. Generally, this is determined using statistical methods such as the Akaike Information Criterion (AIC) which is particularly prone to errors in the presence of noise. And given that the structures of interest are surrounded with harsh environments, a way to accurately estimate the arrival time in such noisy scenarios is of particular interest. In this work, two new methods are presented to estimate the arrival times of AE signals which are based on Machine Learning. Inspired by great results in the field, two models are presented which are Deep Learning models - a subset of machine learning. They are based on Convolutional Neural Network (CNN) and Capsule Neural Network (CapsNet). The primary advantage of such models is that they do not require the user to pre-define selected features but only require raw data to be given and the models establish non-linear relationships between the inputs and outputs. The performance of the models is evaluated using AE signals generated by a custom ray-tracing algorithm by propagating them on an aluminium plate and compared to AIC. It was found that the relative error in estimation on the test set was < 5% for the models compared to around 45% of AIC. The testing process was further continued by preparing an experimental setup and acquiring real AE signals to test on. Similar performances were observed where the two models not only outperform AIC by more than a magnitude in their average errors but also they were shown to be a lot more robust as compared to AIC which fails in the presence of noise.
Resumo:
The Neural Networks customized and tested in this thesis (WaldoNet, FlowNet and PatchNet) are a first exploration and approach to the Template Matching task. The possibilities of extension are therefore many and some are proposed below. During my thesis, I have analyzed the functioning of the classical algorithms and adapted with deep learning algorithms. The features extracted from both the template and the query images resemble the keypoints of the SIFT algorithm. Then, instead of similarity function or keypoints matching, WaldoNet and PatchNet use the convolutional layer to compare the features, while FlowNet uses the correlational layer. In addition, I have identified the major challenges of the Template Matching task (affine/non-affine transformations, intensity changes...) and solved them with a careful design of the dataset.
Resumo:
In this thesis, we propose to infer pixel-level labelling in video by utilising only object category information, exploiting the intrinsic structure of video data. Our motivation is the observation that image-level labels are much more easily to be acquired than pixel-level labels, and it is natural to find a link between the image level recognition and pixel level classification in video data, which would transfer learned recognition models from one domain to the other one. To this end, this thesis proposes two domain adaptation approaches to adapt the deep convolutional neural network (CNN) image recognition model trained from labelled image data to the target domain exploiting both semantic evidence learned from CNN, and the intrinsic structures of unlabelled video data. Our proposed approaches explicitly model and compensate for the domain adaptation from the source domain to the target domain which in turn underpins a robust semantic object segmentation method for natural videos. We demonstrate the superior performance of our methods by presenting extensive evaluations on challenging datasets comparing with the state-of-the-art methods.
Resumo:
Computer-Aided Tomography Angiography (CTA) images are the standard for assessing Peripheral artery disease (PAD). This paper presents a Computer Aided Detection (CAD) and Computer Aided Measurement (CAM) system for PAD. The CAD stage detects the arterial network using a 3D region growing method and a fast 3D morphology operation. The CAM stage aims to accurately measure the artery diameters from the detected vessel centerline, compensating for the partial volume effect using Expectation Maximization (EM) and a Markov Random field (MRF). The system has been evaluated on phantom data and also applied to fifteen (15) CTA datasets, where the detection accuracy of stenosis was 88% and the measurement accuracy was with an 8% error.
Resumo:
Training a system to recognize handwritten words is a task that requires a large amount of data with their correct transcription. However, the creation of such a training set, including the generation of the ground truth, is tedious and costly. One way of reducing the high cost of labeled training data acquisition is to exploit unlabeled data, which can be gathered easily. Making use of both labeled and unlabeled data is known as semi-supervised learning. One of the most general versions of semi-supervised learning is self-training, where a recognizer iteratively retrains itself on its own output on new, unlabeled data. In this paper we propose to apply semi-supervised learning, and in particular self-training, to the problem of cursive, handwritten word recognition. The special focus of the paper is on retraining rules that define what data are actually being used in the retraining phase. In a series of experiments it is shown that the performance of a neural network based recognizer can be significantly improved through the use of unlabeled data and self-training if appropriate retraining rules are applied.
Resumo:
The amplitude of motor evoked potentials (MEPs) elicited by transcranial magnetic stimulation (TMS) of the primary motor cortex (M1) shows a large variability from trial to trial, although MEPs are evoked by the same repeated stimulus. A multitude of factors is believed to influence MEP amplitudes, such as cortical, spinal and motor excitability state. The goal of this work is to explore to which degree the variation in MEP amplitudes can be explained by the cortical state right before the stimulation. Specifically, we analyzed a dataset acquired on eleven healthy subjects comprising, for each subject, 840 single TMS pulses applied to the left M1 during acquisition of electroencephalography (EEG) and electromyography (EMG). An interpretable convolutional neural network, named SincEEGNet, was utilized to discriminate between low- and high-corticospinal excitability trials, defined according to the MEP amplitude, using in input the pre-TMS EEG. This data-driven approach enabled considering multiple brain locations and frequency bands without any a priori selection. Post-hoc interpretation techniques were adopted to enhance interpretation by identifying the more relevant EEG features for the classification. Results show that individualized classifiers successfully discriminated between low and high M1 excitability states in all participants. Outcomes of the interpretation methods suggest the importance of the electrodes situated over the TMS stimulation site, as well as the relevance of the temporal samples of the input EEG closer to the stimulation time. This novel decoding method allows causal investigation of the cortical excitability state, which may be relevant for personalizing and increasing the efficacy of therapeutic brain-state dependent brain stimulation (for example in patients affected by Parkinson’s disease).
Resumo:
This paper discusses a multi-layer feedforward (MLF) neural network incident detection model that was developed and evaluated using field data. In contrast to published neural network incident detection models which relied on simulated or limited field data for model development and testing, the model described in this paper was trained and tested on a real-world data set of 100 incidents. The model uses speed, flow and occupancy data measured at dual stations, averaged across all lanes and only from time interval t. The off-line performance of the model is reported under both incident and non-incident conditions. The incident detection performance of the model is reported based on a validation-test data set of 40 incidents that were independent of the 60 incidents used for training. The false alarm rates of the model are evaluated based on non-incident data that were collected from a freeway section which was video-taped for a period of 33 days. A comparative evaluation between the neural network model and the incident detection model in operation on Melbourne's freeways is also presented. The results of the comparative performance evaluation clearly demonstrate the substantial improvement in incident detection performance obtained by the neural network model. The paper also presents additional results that demonstrate how improvements in model performance can be achieved using variable decision thresholds. Finally, the model's fault-tolerance under conditions of corrupt or missing data is investigated and the impact of loop detector failure/malfunction on the performance of the trained model is evaluated and discussed. The results presented in this paper provide a comprehensive evaluation of the developed model and confirm that neural network models can provide fast and reliable incident detection on freeways. (C) 1997 Elsevier Science Ltd. All rights reserved.
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2013