42 resultados para Semi-supervised segmentation
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
In this work we study the classification of forest types using mathematics based image analysis on satellite data. We are interested in improving classification of forest segments when a combination of information from two or more different satellites is used. The experimental part is based on real satellite data originating from Canada. This thesis gives summary of the mathematics basics of the image analysis and supervised learning , methods that are used in the classification algorithm. Three data sets and four feature sets were investigated in this thesis. The considered feature sets were 1) histograms (quantiles) 2) variance 3) skewness and 4) kurtosis. Good overall performances were achieved when a combination of ASTERBAND and RADARSAT2 data sets was used.
Resumo:
This thesis is about detection of local image features. The research topic belongs to the wider area of object detection, which is a machine vision and pattern recognition problem where an object must be detected (located) in an image. State-of-the-art object detection methods often divide the problem into separate interest point detection and local image description steps, but in this thesis a different technique is used, leading to higher quality image features which enable more precise localization. Instead of using interest point detection the landmark positions are marked manually. Therefore, the quality of the image features is not limited by the interest point detection phase and the learning of image features is simplified. The approach combines both interest point detection and local description into one phase for detection. Computational efficiency of the descriptor is therefore important, leaving out many of the commonly used descriptors as unsuitably heavy. Multiresolution Gabor features has been the main descriptor in this thesis and improving their efficiency is a significant part. Actual image features are formed from descriptors by using a classifierwhich can then recognize similar looking patches in new images. The main classifier is based on Gaussian mixture models. Classifiers are used in one-class classifier configuration where there are only positive training samples without explicit background class. The local image feature detection method has been tested with two freely available face detection databases and a proprietary license plate database. The localization performance was very good in these experiments. Other applications applying the same under-lying techniques are also presented, including object categorization and fault detection.
Resumo:
The purpose of this study was to analyse the nursing student-patient relationship and factors associated with this relationship from the point of view of both students and patients, and to identify factors that predict the type of relationship. The ultimate goal is to improve supervised clinical practicum with a view to supporting students in their reciprocal collaborative relationships with patients, increase their preparedness to meet patients’ health needs, and thus to enhance the quality of patient care. The study was divided into two phases. In the first phase (1999-2005), a literature review concerning the student-patient relationship was conducted (n=104 articles) and semi-structured interviews carried out with nursing students (n=30) and internal medicine patients (n=30). Data analysis was by means of qualitative content analysis and Student-Patient Relationship Scales, which were specially developed for this research. In the second phase (2005-2007), the data were collected by SPR scales among nursing students (n=290) and internal medicine patients (n=242). The data were analysed statistically by SPSS 12.0 software. The results revealed three types of student-patient relationship: a mechanistic relationship focusing on the student’s learning needs; an authoritative relationship focusing on what the student assumes is in the patient’s best interest; and a facilitative relationship focusing on the common good of both student and patient. Students viewed their relationship with patients more often as facilitative and authoritative than mechanistic, while in patients’ assessments the authoritative relationship occurred most frequently and the facilitative relationship least frequently. Furthermore, students’ and patients’ views on their relationships differed significantly. A number of background factors, contextual factors and consequences of the relationship were found to be associated with the type of relationship. In the student data, factors that predicted the type of relationship were age, current year of study and support received in the relationship with patient. The higher the student’s age, the more likely the relationship with the patient was facilitative. Fourth year studies and the support of a person other than a supervisor were significantly associated with an authoritative relationship. Among patients, several factors were found to predict the type of nursing student-patient relationships. Significant factors associated with a facilitative relationship were university-level education, several previous hospitalizations, admission to hospital for a medical problem, experience of caring for an ill family member and patient’s positive perception of atmosphere during collaboration and of student’s personal and professional growth. In patients, positive perceptions of student’s personal and professional attributes and patient’s improved health and a greater commitment to self-care, on the other hand, were significantly associated with an authoritative relationship, whereas positive perceptions of one’s own attributes as a patient were significantly associated with a mechanistic relationship. It is recommended that further research on the student-patient relationship and related factors should focus on questions of content, methodology and education.
Resumo:
Segmentointi on strateginen työkalu, joka tehostaa yrityksen resurssien käyttöä ja siten vaikuttaa kaikkiin asiakkuuksiin liittyviin liiketoimintaprosesseihin. Työn tavoitteena oli muodostaa segmentointimalli (sisältää sekä segmentointiprosessin että kriteerit) yritysinternetmarkkinoille. Työn tuloksia voidaan kuitenkin tulkita ja soveltaa laajemmin korkean teknologian yrityspalvelumarkkinoille. Tämä tutkielma lisää tietämystämme ja tarjoaa uudenlaisen näkemyksen segmentointiin korkean teknologian yrityspalvelumarkkinoilla. Työssä kuvataan korkean teknologian ja yritys- sekä palvelumarkkinoinnin erityispiirteitä ja kuinka nämä tekijät vaikuttavat segmentointimallin. Tutkimuksessa selvitettiin kohdeyrityksen nykyiset segmentointikäytännöt henkilökohtaisin asiantuntijahaastatteluin. Haastatteluiden avulla luotiin kuva nykyisistä lähestymistavoista sekä niiden lähtökohdista, vahvuuksista ja haasteista. Haastatteluiden analysoinnin jälkeen perustettiin projekti segmentoinnin kehittämiseksi. Työ tuloksena luotiin segmentointimalli, joka tarjoaa vankan perustan segmentoinnin kehittämiselle jatkuvana prosessina. Työssä esitetään segmentoinnin integroimista yrityksen asiakkuuksiin liittyviin liiketoimintaprosesseihin, joka usein puuttuu aiemmista töistä, sekä informaationkulun tehostamista segmentoinnin hyödyntämiseksi tehokkaammin. Segmentointi on strateginen työkalu ja vaatii siksi ylemmän johdon tuen ja sitoutumisen. Oikein sovellettuna segmentointi tarjoaa liiketoiminnalle mahdollisuuden merkittäviin etuihin kuten asiakastyytyväisyyden ja kannattavuuden kehittämiseen.
Resumo:
Markkinasegmentointi nousi esiin ensi kerran jo 50-luvulla ja se on ollut siitä lähtien yksi markkinoinnin peruskäsitteistä. Suuri osa segmentointia käsittelevästä tutkimuksesta on kuitenkin keskittynyt kuluttajamarkkinoiden segmentointiin yritys- ja teollisuusmarkkinoiden segmentoinnin jäädessä vähemmälle huomiolle. Tämän tutkimuksen tavoitteena on luoda segmentointimalli teollismarkkinoille tietotekniikan tuotteiden ja palveluiden tarjoajan näkökulmasta. Tarkoituksena on selvittää mahdollistavatko case-yrityksen nykyiset asiakastietokannat tehokkaan segmentoinnin, selvittää sopivat segmentointikriteerit sekä arvioida tulisiko tietokantoja kehittää ja kuinka niitä tulisi kehittää tehokkaamman segmentoinnin mahdollistamiseksi. Tarkoitus on luoda yksi malli eri liiketoimintayksiköille yhteisesti. Näin ollen eri yksiköiden tavoitteet tulee ottaa huomioon eturistiriitojen välttämiseksi. Tutkimusmetodologia on tapaustutkimus. Lähteinä tutkimuksessa käytettiin sekundäärisiä lähteitä sekä primäärejä lähteitä kuten case-yrityksen omia tietokantoja sekä haastatteluita. Tutkimuksen lähtökohtana oli tutkimusongelma: Voiko tietokantoihin perustuvaa segmentointia käyttää kannattavaan asiakassuhdejohtamiseen PK-yritys sektorilla? Tavoitteena on luoda segmentointimalli, joka hyödyntää tietokannoissa olevia tietoja tinkimättä kuitenkaan tehokkaan ja kannattavan segmentoinnin ehdoista. Teoriaosa tutkii segmentointia yleensä painottuen kuitenkin teolliseen markkinasegmentointiin. Tarkoituksena on luoda selkeä kuva erilaisista lähestymistavoista aiheeseen ja syventää näkemystä tärkeimpien teorioiden osalta. Tietokantojen analysointi osoitti selviä puutteita asiakastiedoissa. Peruskontaktitiedot löytyvät mutta segmentointia varten tietoa on erittäin rajoitetusti. Tietojen saantia jälleenmyyjiltä ja tukkureilta tulisi parantaa loppuasiakastietojen saannin takia. Segmentointi nykyisten tietojen varassa perustuu lähinnä sekundäärisiin tietoihin kuten toimialaan ja yrityskokoon. Näitäkään tietoja ei ole saatavilla kaikkien tietokannassa olevien yritysten kohdalta.
Resumo:
Tutkielman tavoitteena on tutkia kansainvälisen liiketoimintastrategian kehittämiseen liittyviä osa-alueita ja tarjota ehdotuksia Kemira Agro Oy:lle liiketoimintastrategian kehittämiseksi. Tutkielman teoreettisessa osassa analysoidaan liiketoimintastrategian osa-alueita. Tutkielman empiirisessä osassa liiketoimintastrategian osa-alueet analysoidaan Kemira Agro Oy:n Kiinan liiketoimintastrategian kannalta ja esitetään ehdotuksia liiketoimintastrategian kehittämiseksi. Tutkielma on normatiivinen case-tutkimus. Tutkielma on jaettu teoreettiseen ja empiiriseen osaan. Empiirisessä osassa tutkimuskohteina on Suomessa tehdyt vapaamuotoiset haastattelut ja Kiinassa puoli-strukturoituina toteutetut haastattelut. Tutkimus määrittää Kiinan vaikeaksi markkina-alueeksi, joka kuitenkin tarjoaa suuria kasvumahdollisuuksia. Tutkielman tutkimustuloksissa ehdotetaan markkinointitoimenpiteiden lisäämistä sekä tutkimaan mahdollisuutta oman jakelukanavan luomiseen ja tuotevalikoiman laajentamiseen sekä korostetaan segmentoinnin tärkeyttä.
Resumo:
In this thesis we study the field of opinion mining by giving a comprehensive review of the available research that has been done in this topic. Also using this available knowledge we present a case study of a multilevel opinion mining system for a student organization's sales management system. We describe the field of opinion mining by discussing its historical roots, its motivations and applications as well as the different scientific approaches that have been used to solve this challenging problem of mining opinions. To deal with this huge subfield of natural language processing, we first give an abstraction of the problem of opinion mining and describe the theoretical frameworks that are available for dealing with appraisal language. Then we discuss the relation between opinion mining and computational linguistics which is a crucial pre-processing step for the accuracy of the subsequent steps of opinion mining. The second part of our thesis deals with the semantics of opinions where we describe the different ways used to collect lists of opinion words as well as the methods and techniques available for extracting knowledge from opinions present in unstructured textual data. In the part about collecting lists of opinion words we describe manual, semi manual and automatic ways to do so and give a review of the available lists that are used as gold standards in opinion mining research. For the methods and techniques of opinion mining we divide the task into three levels that are the document, sentence and feature level. The techniques that are presented in the document and sentence level are divided into supervised and unsupervised approaches that are used to determine the subjectivity and polarity of texts and sentences at these levels of analysis. At the feature level we give a description of the techniques available for finding the opinion targets, the polarity of the opinions about these opinion targets and the opinion holders. Also at the feature level we discuss the various ways to summarize and visualize the results of this level of analysis. In the third part of our thesis we present a case study of a sales management system that uses free form text and that can benefit from an opinion mining system. Using the knowledge gathered in the review of this field we provide a theoretical multi level opinion mining system (MLOM) that can perform most of the tasks needed from an opinion mining system. Based on the previous research we give some hints that many of the laborious market research tasks that are done by the sales force, which uses this sales management system, can improve their insight about their partners and by that increase the quality of their sales services and their overall results.
Resumo:
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.