26 resultados para Semi-supervised classification

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diabetes is a rapidly increasing worldwide problem which is characterised by defective metabolism of glucose that causes long-term dysfunction and failure of various organs. The most common complication of diabetes is diabetic retinopathy (DR), which is one of the primary causes of blindness and visual impairment in adults. The rapid increase of diabetes pushes the limits of the current DR screening capabilities for which the digital imaging of the eye fundus (retinal imaging), and automatic or semi-automatic image analysis algorithms provide a potential solution. In this work, the use of colour in the detection of diabetic retinopathy is statistically studied using a supervised algorithm based on one-class classification and Gaussian mixture model estimation. The presented algorithm distinguishes a certain diabetic lesion type from all other possible objects in eye fundus images by only estimating the probability density function of that certain lesion type. For the training and ground truth estimation, the algorithm combines manual annotations of several experts for which the best practices were experimentally selected. By assessing the algorithm’s performance while conducting experiments with the colour space selection, both illuminance and colour correction, and background class information, the use of colour in the detection of diabetic retinopathy was quantitatively evaluated. Another contribution of this work is the benchmarking framework for eye fundus image analysis algorithms needed for the development of the automatic DR detection algorithms. The benchmarking framework provides guidelines on how to construct a benchmarking database that comprises true patient images, ground truth, and an evaluation protocol. The evaluation is based on the standard receiver operating characteristics analysis and it follows the medical practice in the decision making providing protocols for image- and pixel-based evaluations. During the work, two public medical image databases with ground truth were published: DIARETDB0 and DIARETDB1. The framework, DR databases and the final algorithm, are made public in the web to set the baseline results for automatic detection of diabetic retinopathy. Although deviating from the general context of the thesis, a simple and effective optic disc localisation method is presented. The optic disc localisation is discussed, since normal eye fundus structures are fundamental in the characterisation of DR.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The subject of the thesis is automatic sentence compression with machine learning, so that the compressed sentences remain both grammatical and retain their essential meaning. There are multiple possible uses for the compression of natural language sentences. In this thesis the focus is generation of television program subtitles, which often are compressed version of the original script of the program. The main part of the thesis consists of machine learning experiments for automatic sentence compression using different approaches to the problem. The machine learning methods used for this work are linear-chain conditional random fields and support vector machines. Also we take a look which automatic text analysis methods provide useful features for the task. The data used for machine learning is supplied by Lingsoft Inc. and consists of subtitles in both compressed an uncompressed form. The models are compared to a baseline system and comparisons are made both automatically and also using human evaluation, because of the potentially subjective nature of the output. The best result is achieved using a CRF - sequence classification using a rich feature set. All text analysis methods help classification and most useful method is morphological analysis. Tutkielman aihe on suomenkielisten lauseiden automaattinen tiivistäminen koneellisesti, niin että lyhennetyt lauseet säilyttävät olennaisen informaationsa ja pysyvät kieliopillisina. Luonnollisen kielen lauseiden tiivistämiselle on monta käyttötarkoitusta, mutta tässä tutkielmassa aihetta lähestytään television ohjelmien tekstittämisen kautta, johon käytännössä kuuluu alkuperäisen tekstin lyhentäminen televisioruudulle paremmin sopivaksi. Tutkielmassa kokeillaan erilaisia koneoppimismenetelmiä tekstin automaatiseen lyhentämiseen ja tarkastellaan miten hyvin erilaiset luonnollisen kielen analyysimenetelmät tuottavat informaatiota, joka auttaa näitä menetelmiä lyhentämään lauseita. Lisäksi tarkastellaan minkälainen lähestymistapa tuottaa parhaan lopputuloksen. Käytetyt koneoppimismenetelmät ovat tukivektorikone ja lineaarisen sekvenssin mallinen CRF. Koneoppimisen tukena käytetään tekstityksiä niiden eri käsittelyvaiheissa, jotka on saatu Lingsoft OY:ltä. Luotuja malleja vertaillaan Lopulta mallien lopputuloksia evaluoidaan automaattisesti ja koska teksti lopputuksena on jossain määrin subjektiivinen myös ihmisarviointiin perustuen. Vertailukohtana toimii kirjallisuudesta poimittu menetelmä. Tutkielman tuloksena paras lopputulos saadaan aikaan käyttäen CRF sekvenssi-luokittelijaa laajalla piirrejoukolla. Kaikki kokeillut teksin analyysimenetelmät auttavat luokittelussa, joista tärkeimmän panoksen antaa morfologinen analyysi.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selostus: Suomen happamien sulfaattimaiden kansainvälinen luokittelu

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main objective of this study was todo a statistical analysis of ecological type from optical satellite data, using Tipping's sparse Bayesian algorithm. This thesis uses "the Relevence Vector Machine" algorithm in ecological classification betweenforestland and wetland. Further this bi-classification technique was used to do classification of many other different species of trees and produces hierarchical classification of entire subclasses given as a target class. Also, we carried out an attempt to use airborne image of same forest area. Combining it with image analysis, using different image processing operation, we tried to extract good features and later used them to perform classification of forestland and wetland.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Puhdastilojen suunnittelussa pyritään saamaan hallittu ja valvottu ilmanpuhtaus luokiteltuun tilaan.Luokittelu tapahtuu puhdastilastandardeilla, lisäksi lääkevalmisteita valmistettavassa tilassa GMP -säädösten mukaisin luokituksin. Puhdastilastandardi ISO 14644 käsittää seitsemän osaa, jossa on käsitelty puhdastilaa koskevia määräyksiä suunnittelusta käyttöön ja testaukseen. GMP-säädökset sisältävät yhdeksän kappaletta, joista kappale 3: 'Tilat ja laitteet' on keskeinen osa lääkeainevalmistuksen puhdastilasuunnittelua. Puhtaan ilman aikaansaamiseksi puhdastilaan merkittävimmät roolit ovat ilmanvaihdolla, puhdastilarakenteilla ja rakennusautomaatiolla. Ilma voidaan tuoda tilaan kolmella eri periaatteella. Ilmaa tuodaan tilaan yhdensuuntaisesti, turbulenttisesti tai sekavirtauksena HEPA -suodattimien kautta, joilla varmistetaan epäpuhtauksien korkea suodatusaste. Ilmapoistetaan rei'itettyjen, korotettujen lattioiden kautta tai tilan alaosassa olevien poistoilmasäleikköjen kautta, josta se johdetaan noin 75-90%:sti kierrätettynä takaisin tilaan. Lääketeollisuudessa rei'itettyjä, korotettuja lattioita eivoida käyttää kontaminaatiovaaran, vuoksi. Tilaan suunniteltuja olosuhteita ylläpidetään rakennusautomaation avulla ja monitorointijärjestelmällä valvotaan tilassa olevan ilman laatua. Kaikki GMP-luokituksen mukaiset puhdastilat tulee validoida. Validointiin kuuluu teknisten järjestelmien kvalifiointi ja koko prosessin validointi. Teknisten järjestel-mien kvalifiointi käsittää suunnitelmien tarkastuksen (DQ), asennus - ja käyttöönotto tarkastukset (IQ), toiminnan testauksen (OQ) ja suorituksen testauksen (PQ). Kvali-fiointi kuuluu yhtenä osa-alueena validointiin. Prosessin validointi on osa yrityksen laadunvarmistusta. Validoinnilla hankitaan dokumentoidut todisteet siitä, että tila tai prosessi todella täyttää annetut vaatimukset. Tässä työssä laadittiin esimerkinomainen kvalifiointisuunnitelma puhdastilan tekni-sille järjestelmille. Suunnitelma sisältää asennus- ja käyttöönoton mukaiset tarkastukset (IQ)ja toiminnan aikaiset testaukset (OQ).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis is about detection of local image features. The research topic belongs to the wider area of object detection, which is a machine vision and pattern recognition problem where an object must be detected (located) in an image. State-of-the-art object detection methods often divide the problem into separate interest point detection and local image description steps, but in this thesis a different technique is used, leading to higher quality image features which enable more precise localization. Instead of using interest point detection the landmark positions are marked manually. Therefore, the quality of the image features is not limited by the interest point detection phase and the learning of image features is simplified. The approach combines both interest point detection and local description into one phase for detection. Computational efficiency of the descriptor is therefore important, leaving out many of the commonly used descriptors as unsuitably heavy. Multiresolution Gabor features has been the main descriptor in this thesis and improving their efficiency is a significant part. Actual image features are formed from descriptors by using a classifierwhich can then recognize similar looking patches in new images. The main classifier is based on Gaussian mixture models. Classifiers are used in one-class classifier configuration where there are only positive training samples without explicit background class. The local image feature detection method has been tested with two freely available face detection databases and a proprietary license plate database. The localization performance was very good in these experiments. Other applications applying the same under-lying techniques are also presented, including object categorization and fault detection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Luokittelujärjestelmää suunniteltaessa tarkoituksena on rakentaa systeemi, joka pystyy ratkaisemaan mahdollisimman tarkasti tutkittavan ongelma-alueen. Hahmontunnistuksessa tunnistusjärjestelmän ydin on luokitin. Luokittelun sovellusaluekenttä on varsin laaja. Luokitinta tarvitaan mm. hahmontunnistusjärjestelmissä, joista kuvankäsittely toimii hyvänä esimerkkinä. Myös lääketieteen parissa tarkkaa luokittelua tarvitaan paljon. Esimerkiksi potilaan oireiden diagnosointiin tarvitaan luokitin, joka pystyy mittaustuloksista päättelemään mahdollisimman tarkasti, onko potilaalla kyseinen oire vai ei. Väitöskirjassa on tehty similaarisuusmittoihin perustuva luokitin ja sen toimintaa on tarkasteltu mm. lääketieteen paristatulevilla data-aineistoilla, joissa luokittelutehtävänä on tunnistaa potilaan oireen laatu. Väitöskirjassa esitetyn luokittimen etuna on sen yksinkertainen rakenne, josta johtuen se on helppo tehdä sekä ymmärtää. Toinen etu on luokittimentarkkuus. Luokitin saadaan luokittelemaan useita eri ongelmia hyvin tarkasti. Tämä on tärkeää varsinkin lääketieteen parissa, missä jo pieni tarkkuuden parannus luokittelutuloksessa on erittäin tärkeää. Väitöskirjassa ontutkittu useita eri mittoja, joilla voidaan mitata samankaltaisuutta. Mitoille löytyy myös useita parametreja, joille voidaan etsiä juuri kyseiseen luokitteluongelmaan sopivat arvot. Tämä parametrien optimointi ongelma-alueeseen sopivaksi voidaan suorittaa mm. evoluutionääri- algoritmeja käyttäen. Kyseisessä työssä tähän on käytetty geneettistä algoritmia ja differentiaali-evoluutioalgoritmia. Luokittimen etuna on sen joustavuus. Ongelma-alueelle on helppo vaihtaa similaarisuusmitta, jos kyseinen mitta ei ole sopiva tutkittavaan ongelma-alueeseen. Myös eri mittojen parametrien optimointi voi parantaa tuloksia huomattavasti. Kun käytetään eri esikäsittelymenetelmiä ennen luokittelua, tuloksia pystytään parantamaan.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this study was to analyse the nursing student-patient relationship and factors associated with this relationship from the point of view of both students and patients, and to identify factors that predict the type of relationship. The ultimate goal is to improve supervised clinical practicum with a view to supporting students in their reciprocal collaborative relationships with patients, increase their preparedness to meet patients’ health needs, and thus to enhance the quality of patient care. The study was divided into two phases. In the first phase (1999-2005), a literature review concerning the student-patient relationship was conducted (n=104 articles) and semi-structured interviews carried out with nursing students (n=30) and internal medicine patients (n=30). Data analysis was by means of qualitative content analysis and Student-Patient Relationship Scales, which were specially developed for this research. In the second phase (2005-2007), the data were collected by SPR scales among nursing students (n=290) and internal medicine patients (n=242). The data were analysed statistically by SPSS 12.0 software. The results revealed three types of student-patient relationship: a mechanistic relationship focusing on the student’s learning needs; an authoritative relationship focusing on what the student assumes is in the patient’s best interest; and a facilitative relationship focusing on the common good of both student and patient. Students viewed their relationship with patients more often as facilitative and authoritative than mechanistic, while in patients’ assessments the authoritative relationship occurred most frequently and the facilitative relationship least frequently. Furthermore, students’ and patients’ views on their relationships differed significantly. A number of background factors, contextual factors and consequences of the relationship were found to be associated with the type of relationship. In the student data, factors that predicted the type of relationship were age, current year of study and support received in the relationship with patient. The higher the student’s age, the more likely the relationship with the patient was facilitative. Fourth year studies and the support of a person other than a supervisor were significantly associated with an authoritative relationship. Among patients, several factors were found to predict the type of nursing student-patient relationships. Significant factors associated with a facilitative relationship were university-level education, several previous hospitalizations, admission to hospital for a medical problem, experience of caring for an ill family member and patient’s positive perception of atmosphere during collaboration and of student’s personal and professional growth. In patients, positive perceptions of student’s personal and professional attributes and patient’s improved health and a greater commitment to self-care, on the other hand, were significantly associated with an authoritative relationship, whereas positive perceptions of one’s own attributes as a patient were significantly associated with a mechanistic relationship. It is recommended that further research on the student-patient relationship and related factors should focus on questions of content, methodology and education.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis author approaches the problem of automated text classification, which is one of basic tasks for building Intelligent Internet Search Agent. The work discusses various approaches to solving sub-problems of automated text classification, such as feature extraction and machine learning on text sources. Author also describes her own multiword approach to feature extraction and pres-ents the results of testing this approach using linear discriminant analysis based classifier, and classifier combining unsupervised learning for etalon extraction with supervised learning using common backpropagation algorithm for multilevel perceptron.