6 resultados para multi-class classification
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.
Resumo:
Diabetes is a rapidly increasing worldwide problem which is characterised by defective metabolism of glucose that causes long-term dysfunction and failure of various organs. The most common complication of diabetes is diabetic retinopathy (DR), which is one of the primary causes of blindness and visual impairment in adults. The rapid increase of diabetes pushes the limits of the current DR screening capabilities for which the digital imaging of the eye fundus (retinal imaging), and automatic or semi-automatic image analysis algorithms provide a potential solution. In this work, the use of colour in the detection of diabetic retinopathy is statistically studied using a supervised algorithm based on one-class classification and Gaussian mixture model estimation. The presented algorithm distinguishes a certain diabetic lesion type from all other possible objects in eye fundus images by only estimating the probability density function of that certain lesion type. For the training and ground truth estimation, the algorithm combines manual annotations of several experts for which the best practices were experimentally selected. By assessing the algorithm’s performance while conducting experiments with the colour space selection, both illuminance and colour correction, and background class information, the use of colour in the detection of diabetic retinopathy was quantitatively evaluated. Another contribution of this work is the benchmarking framework for eye fundus image analysis algorithms needed for the development of the automatic DR detection algorithms. The benchmarking framework provides guidelines on how to construct a benchmarking database that comprises true patient images, ground truth, and an evaluation protocol. The evaluation is based on the standard receiver operating characteristics analysis and it follows the medical practice in the decision making providing protocols for image- and pixel-based evaluations. During the work, two public medical image databases with ground truth were published: DIARETDB0 and DIARETDB1. The framework, DR databases and the final algorithm, are made public in the web to set the baseline results for automatic detection of diabetic retinopathy. Although deviating from the general context of the thesis, a simple and effective optic disc localisation method is presented. The optic disc localisation is discussed, since normal eye fundus structures are fundamental in the characterisation of DR.
Resumo:
The aim of this Master’s thesis is to find a method for classifying spare part criticality in the case company. Several approaches exist for criticality classification of spare parts. The practical problem in this thesis is the lack of a generic analysis method for classifying spare parts of proprietary equipment of the case company. In order to find a classification method, a literature review of various analysis methods is required. The requirements of the case company also have to be recognized. This is achieved by consulting professionals in the company. The literature review states that the analytic hierarchy process (AHP) combined with decision tree models is a common method for classifying spare parts in academic literature. Most of the literature discusses spare part criticality in stock holding perspective. This is relevant perspective also for a customer orientated original equipment manufacturer (OEM), as the case company. A decision tree model is developed for classifying spare parts. The decision tree classifies spare parts into five criticality classes according to five criteria. The criteria are: safety risk, availability risk, functional criticality, predictability of failure and probability of failure. The criticality classes describe the level of criticality from non-critical to highly critical. The method is verified for classifying spare parts of a full deposit stripping machine. The classification can be utilized as a generic model for recognizing critical spare parts of other similar equipment, according to which spare part recommendations can be created. Purchase price of an item and equipment criticality were found to have no effect on spare part criticality in this context. Decision tree is recognized as the most suitable method for classifying spare part criticality in the company.
Resumo:
The objective of this thesis is to develop and generalize further the differential evolution based data classification method. For many years, evolutionary algorithms have been successfully applied to many classification tasks. Evolution algorithms are population based, stochastic search algorithms that mimic natural selection and genetics. Differential evolution is an evolutionary algorithm that has gained popularity because of its simplicity and good observed performance. In this thesis a differential evolution classifier with pool of distances is proposed, demonstrated and initially evaluated. The differential evolution classifier is a nearest prototype vector based classifier that applies a global optimization algorithm, differential evolution, to determine the optimal values for all free parameters of the classifier model during the training phase of the classifier. The differential evolution classifier applies the individually optimized distance measure for each new data set to be classified is generalized to cover a pool of distances. Instead of optimizing a single distance measure for the given data set, the selection of the optimal distance measure from a predefined pool of alternative measures is attempted systematically and automatically. Furthermore, instead of only selecting the optimal distance measure from a set of alternatives, an attempt is made to optimize the values of the possible control parameters related with the selected distance measure. Specifically, a pool of alternative distance measures is first created and then the differential evolution algorithm is applied to select the optimal distance measure that yields the highest classification accuracy with the current data. After determining the optimal distance measures for the given data set together with their optimal parameters, all determined distance measures are aggregated to form a single total distance measure. The total distance measure is applied to the final classification decisions. The actual classification process is still based on the nearest prototype vector principle; a sample belongs to the class represented by the nearest prototype vector when measured with the optimized total distance measure. During the training process the differential evolution algorithm determines the optimal class vectors, selects optimal distance metrics, and determines the optimal values for the free parameters of each selected distance measure. The results obtained with the above method confirm that the choice of distance measure is one of the most crucial factors for obtaining higher classification accuracy. The results also demonstrate that it is possible to build a classifier that is able to select the optimal distance measure for the given data set automatically and systematically. After finding optimal distance measures together with optimal parameters from the particular distance measure results are then aggregated to form a total distance, which will be used to form the deviation between the class vectors and samples and thus classify the samples. This thesis also discusses two types of aggregation operators, namely, ordered weighted averaging (OWA) based multi-distances and generalized ordered weighted averaging (GOWA). These aggregation operators were applied in this work to the aggregation of the normalized distance values. The results demonstrate that a proper combination of aggregation operator and weight generation scheme play an important role in obtaining good classification accuracy. The main outcomes of the work are the six new generalized versions of previous method called differential evolution classifier. All these DE classifier demonstrated good results in the classification tasks.
Resumo:
The main objective of this study was todo a statistical analysis of ecological type from optical satellite data, using Tipping's sparse Bayesian algorithm. This thesis uses "the Relevence Vector Machine" algorithm in ecological classification betweenforestland and wetland. Further this bi-classification technique was used to do classification of many other different species of trees and produces hierarchical classification of entire subclasses given as a target class. Also, we carried out an attempt to use airborne image of same forest area. Combining it with image analysis, using different image processing operation, we tried to extract good features and later used them to perform classification of forestland and wetland.
Resumo:
The purpose of this thesis is to present a new approach to the lossy compression of multispectral images. Proposed algorithm is based on combination of quantization and clustering. Clustering was investigated for compression of the spatial dimension and the vector quantization was applied for spectral dimension compression. Presenting algo¬rithms proposes to compress multispectral images in two stages. During the first stage we define the classes' etalons, another words to each uniform areas are located inside the image the number of class is given. And if there are the pixels are not yet assigned to some of the clusters then it doing during the second; pass and assign to the closest eta¬lons. Finally a compressed image is represented with a flat index image pointing to a codebook with etalons. The decompression stage is instant too. The proposed method described in this paper has been tested on different satellite multispectral images from different resources. The numerical results and illustrative examples of the method are represented too.