13 resultados para infrared spectroscopy,chemometrics,least squares support vector machines

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Työssä käydään läpi tukivektorikoneiden teoreettista pohjaa sekä tutkitaan eri parametrien vaikutusta spektridatan luokitteluun.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Väitöstutkimuksessa on tarkasteltuinfrapunaspektroskopian ja monimuuttujaisten aineistonkäsittelymenetelmien soveltamista kiteytysprosessin monitoroinnissa ja kidemäisen tuotteen analysoinnissa. Parhaillaan kiteytysprosessitutkimuksessa maailmanlaajuisesti tutkitaan intensiivisesti erilaisten mittausmenetelmien soveltamista kiteytysprosessin ilmiöidenjatkuvaan mittaamiseen niin nestefaasista kuin syntyvistä kiteistäkin. Lisäksi tuotteen karakterisointi on välttämätöntä tuotteen laadun varmistamiseksi. Erityisesti lääkeaineiden valmistuksessa kiinnostusta tämäntyyppiseen tutkimukseen edistää Yhdysvaltain elintarvike- ja lääkeaineviraston (FDA) prosessianalyyttisiintekniikoihin (PAT) liittyvä ohjeistus, jossa määritellään laajasti vaatimukset lääkeaineiden valmistuksessa ja tuotteen karakterisoinnissa tarvittaville mittauksille turvallisten valmistusprosessien takaamiseksi. Jäähdytyskiteytyson erityisesti lääketeollisuudessa paljon käytetty erotusmenetelmä kiinteän raakatuotteen puhdistuksessa. Menetelmässä puhdistettava kiinteä raaka-aine liuotetaan sopivaan liuottimeen suhteellisen korkeassa lämpötilassa. Puhdistettavan aineen liukoisuus käytettävään liuottimeen laskee lämpötilan laskiessa, joten systeemiä jäähdytettäessä liuenneen aineen konsentraatio prosessissa ylittää liukoisuuskonsentraation. Tällaiseen ylikylläiseen systeemiin pyrkii muodostumaan uusia kiteitä tai olemassa olevat kiteet kasvavat. Ylikylläisyys on yksi tärkeimmistä kidetuotteen laatuun vaikuttavista tekijöistä. Jäähdytyskiteytyksessä syntyvän tuotteen ominaisuuksiin voidaan vaikuttaa mm. liuottimen valinnalla, jäähdytyprofiililla ja sekoituksella. Lisäksi kiteytysprosessin käynnistymisvaihe eli ensimmäisten kiteiden muodostumishetki vaikuttaa tuotteen ominaisuuksiin. Kidemäisen tuotteen laatu määritellään kiteiden keskimääräisen koon, koko- ja muotojakaumansekä puhtauden perusteella. Lääketeollisuudessa on usein vaatimuksena, että tuote edustaa tiettyä polymorfimuotoa, mikä tarkoittaa molekyylien kykyä järjestäytyä kidehilassa usealla eri tavalla. Edellä mainitut ominaisuudet vaikuttavat tuotteen jatkokäsiteltävyyteen, kuten mm. suodattuvuuteen, jauhautuvuuteen ja tabletoitavuuteen. Lisäksi polymorfiamuodolla on vaikutusta moniin tuotteen käytettävyysominaisuuksiin, kuten esim. lääkeaineen liukenemisnopeuteen elimistössä. Väitöstyössä on tutkittu sulfatiatsolin jäähdytyskiteytystä käyttäen useita eri liuotinseoksia ja jäähdytysprofiileja sekä tarkasteltu näiden tekijöiden vaikutustatuotteen laatuominaisuuksiin. Infrapunaspektroskopia on laajalti kemian alan tutkimuksissa sovellettava menetelmä. Siinä mitataan tutkittavan näytteenmolekyylien värähtelyjen aiheuttamia spektrimuutoksia IR alueella. Tutkimuksessa prosessinaikaiset mittaukset toteutettiin in-situ reaktoriin sijoitettavalla uppoanturilla käyttäen vaimennettuun kokonaisheijastukseen (ATR) perustuvaa Fourier muunnettua infrapuna (FTIR) spektroskopiaa. Jauhemaiset näytteet mitattiin off-line diffuusioheijastukseen (DRIFT) perustuvalla FTIR spektroskopialla. Monimuuttujamenetelmillä (kemometria) voidaan useita satoja, jopa tuhansia muuttujia käsittävä spektridata jalostaa kvalitatiiviseksi (laadulliseksi) tai kvantitatiiviseksi (määrälliseksi) prosessia kuvaavaksi informaatioksi. Väitöstyössä tarkasteltiin laajasti erilaisten monimuuttujamenetelmien soveltamista mahdollisimman monipuolisen prosessia kuvaavan informaation saamiseksi mitatusta spektriaineistosta. Väitöstyön tuloksena on ehdotettu kalibrointirutiini liuenneen aineen konsentraation ja edelleen ylikylläisyystason mittaamiseksi kiteytysprosessin aikana. Kalibrointirutiinin kehittämiseen kuuluivat aineiston hyvyyden tarkastelumenetelmät, aineiston esikäsittelymenetelmät, varsinainen kalibrointimallinnus sekä mallin validointi. Näin saadaan reaaliaikaista informaatiota kiteytysprosessin ajavasta voimasta, mikä edelleen parantaa kyseisen prosessin tuntemusta ja hallittavuutta. Ylikylläisyystason vaikutuksia syntyvän kidetuotteen laatuun seurattiin usein kiteytyskokein. Työssä on esitetty myös monimuuttujaiseen tilastolliseen prosessinseurantaan perustuva menetelmä, jolla voidaan ennustaa spontaania primääristä ytimenmuodostumishetkeä mitatusta spektriaineistosta sekä mahdollisesti päätellä ydintymisessä syntyvä polymorfimuoto. Ehdotettua menetelmää hyödyntäen voidaan paitsi ennakoida kideytimien muodostumista myös havaita mahdolliset häiriötilanteet kiteytysprosessin alkuhetkillä. Syntyvää polymorfimuotoa ennustamalla voidaan havaita ei-toivotun polymorfin ydintyminen,ja mahdollisesti muuttaa kiteytyksen ohjausta halutun polymorfimuodon saavuttamiseksi. Monimuuttujamenetelmiä sovellettiin myös kiteytyspanosten välisen vaihtelun määrittämiseen mitatusta spektriaineistosta. Tämäntyyppisestä analyysistä saatua informaatiota voidaan hyödyntää kiteytysprosessien suunnittelussa ja optimoinnissa. Väitöstyössä testattiin IR spektroskopian ja erilaisten monimuuttujamenetelmien soveltuvuutta kidetuotteen polymorfikoostumuksen nopeaan määritykseen. Jauhemaisten näytteiden luokittelu eri polymorfeja sisältäviin näytteisiin voitiin tehdä käyttäen tarkoitukseen soveltuvia monimuuttujaisia luokittelumenetelmiä. Tämä tarjoaa nopean menetelmän jauhemaisen näytteen polymorfikoostumuksen karkeaan arviointiin, eli siihen mitä yksittäistä polymorfia kyseinen näyte pääasiassa sisältää. Varsinainen kvantitatiivinen analyysi, eli sen selvittäminen paljonko esim. painoprosentteina näyte sisältää eri polymorfeja, vaatii kaikki polymorfit kattavan fysikaalisen kalibrointisarjan, mikä voi olla puhtaiden polymorfien huonon saatavuuden takia hankalaa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The subject of the thesis is automatic sentence compression with machine learning, so that the compressed sentences remain both grammatical and retain their essential meaning. There are multiple possible uses for the compression of natural language sentences. In this thesis the focus is generation of television program subtitles, which often are compressed version of the original script of the program. The main part of the thesis consists of machine learning experiments for automatic sentence compression using different approaches to the problem. The machine learning methods used for this work are linear-chain conditional random fields and support vector machines. Also we take a look which automatic text analysis methods provide useful features for the task. The data used for machine learning is supplied by Lingsoft Inc. and consists of subtitles in both compressed an uncompressed form. The models are compared to a baseline system and comparisons are made both automatically and also using human evaluation, because of the potentially subjective nature of the output. The best result is achieved using a CRF - sequence classification using a rich feature set. All text analysis methods help classification and most useful method is morphological analysis. Tutkielman aihe on suomenkielisten lauseiden automaattinen tiivistäminen koneellisesti, niin että lyhennetyt lauseet säilyttävät olennaisen informaationsa ja pysyvät kieliopillisina. Luonnollisen kielen lauseiden tiivistämiselle on monta käyttötarkoitusta, mutta tässä tutkielmassa aihetta lähestytään television ohjelmien tekstittämisen kautta, johon käytännössä kuuluu alkuperäisen tekstin lyhentäminen televisioruudulle paremmin sopivaksi. Tutkielmassa kokeillaan erilaisia koneoppimismenetelmiä tekstin automaatiseen lyhentämiseen ja tarkastellaan miten hyvin erilaiset luonnollisen kielen analyysimenetelmät tuottavat informaatiota, joka auttaa näitä menetelmiä lyhentämään lauseita. Lisäksi tarkastellaan minkälainen lähestymistapa tuottaa parhaan lopputuloksen. Käytetyt koneoppimismenetelmät ovat tukivektorikone ja lineaarisen sekvenssin mallinen CRF. Koneoppimisen tukena käytetään tekstityksiä niiden eri käsittelyvaiheissa, jotka on saatu Lingsoft OY:ltä. Luotuja malleja vertaillaan Lopulta mallien lopputuloksia evaluoidaan automaattisesti ja koska teksti lopputuksena on jossain määrin subjektiivinen myös ihmisarviointiin perustuen. Vertailukohtana toimii kirjallisuudesta poimittu menetelmä. Tutkielman tuloksena paras lopputulos saadaan aikaan käyttäen CRF sekvenssi-luokittelijaa laajalla piirrejoukolla. Kaikki kokeillut teksin analyysimenetelmät auttavat luokittelussa, joista tärkeimmän panoksen antaa morfologinen analyysi.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The continuous technology evaluation is benefiting our lives to a great extent. The evolution of Internet of things and deployment of wireless sensor networks is making it possible to have more connectivity between people and devices used extensively in our daily lives. Almost every discipline of daily life including health sector, transportation, agriculture etc. is benefiting from these technologies. There is a great potential of research and refinement of health sector as the current system is very often dependent on manual evaluations conducted by the clinicians. There is no automatic system for patient health monitoring and assessment which results to incomplete and less reliable heath information. Internet of things has a great potential to benefit health care applications by automated and remote assessment, monitoring and identification of diseases. Acute pain is the main cause of people visiting to hospitals. An automatic pain detection system based on internet of things with wireless devices can make the assessment and redemption significantly more efficient. The contribution of this research work is proposing pain assessment method based on physiological parameters. The physiological parameters chosen for this study are heart rate, electrocardiography, breathing rate and galvanic skin response. As a first step, the relation between these physiological parameters and acute pain experienced by the test persons is evaluated. The electrocardiography data collected from the test persons is analyzed to extract interbeat intervals. This evaluation clearly demonstrates specific patterns and trends in these parameters as a consequence of pain. This parametric behavior is then used to assess and identify the pain intensity by implementing machine learning algorithms. Support vector machines are used for classifying these parameters influenced by different pain intensities and classification results are achieved. The classification results with good accuracy rates between two and three levels of pain intensities shows clear indication of pain and the feasibility of this pain assessment method. An improved approach on the basis of this research work can be implemented by using both physiological parameters and electromyography data of facial muscles for classification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phosphorylation is amongst the most crucial and well-studied post-translational modifications. It is involved in multiple cellular processes which makes phosphorylation prediction vital for understanding protein functions. However, wet-lab techniques are labour and time intensive. Thus, computational tools are required for efficiency. This project aims to provide a novel way to predict phosphorylation sites from protein sequences by adding flexibility and Sezerman Grouping amino acid similarity measure to previous methods, as discovering new protein sequences happens at a greater rate than determining protein structures. The predictor – NOPAY - relies on Support Vector Machines (SVMs) for classification. The features include amino acid encoding, amino acid grouping, predicted secondary structure, predicted protein disorder, predicted protein flexibility, solvent accessibility, hydrophobicity and volume. As a result, we have managed to improve phosphorylation prediction accuracy for Homo sapiens by 3% and 6.1% for Mus musculus. Sensitivity at 99% specificity was also increased by 6% for Homo sapiens and for Mus musculus by 5% on independent test sets. In this study, we have managed to increase phosphorylation prediction accuracy for Homo sapiens and Mus musculus. When there is enough data, future versions of the software may also be able to predict other organisms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thedirect torque control (DTC) has become an accepted vector control method besidethe current vector control. The DTC was first applied to asynchronous machines,and has later been applied also to synchronous machines. This thesis analyses the application of the DTC to permanent magnet synchronous machines (PMSM). In order to take the full advantage of the DTC, the PMSM has to be properly dimensioned. Therefore the effect of the motor parameters is analysed taking the control principle into account. Based on the analysis, a parameter selection procedure is presented. The analysis and the selection procedure utilize nonlinear optimization methods. The key element of a direct torque controlled drive is the estimation of the stator flux linkage. Different estimation methods - a combination of current and voltage models and improved integration methods - are analysed. The effect of an incorrect measured rotor angle in the current model is analysed andan error detection and compensation method is presented. The dynamic performance of an earlier presented sensorless flux estimation method is made better by improving the dynamic performance of the low-pass filter used and by adapting the correction of the flux linkage to torque changes. A method for the estimation ofthe initial angle of the rotor is presented. The method is based on measuring the inductance of the machine in several directions and fitting the measurements into a model. The model is nonlinear with respect to the rotor angle and therefore a nonlinear least squares optimization method is needed in the procedure. A commonly used current vector control scheme is the minimum current control. In the DTC the stator flux linkage reference is usually kept constant. Achieving the minimum current requires the control of the reference. An on-line method to perform the minimization of the current by controlling the stator flux linkage reference is presented. Also, the control of the reference above the base speed is considered. A new estimation flux linkage is introduced for the estimation of the parameters of the machine model. In order to utilize the flux linkage estimates in off-line parameter estimation, the integration methods are improved. An adaptive correction is used in the same way as in the estimation of the controller stator flux linkage. The presented parameter estimation methods are then used in aself-commissioning scheme. The proposed methods are tested with a laboratory drive, which consists of a commercial inverter hardware with a modified software and several prototype PMSMs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tämän tutkielman tavoitteena on tarkastella Kiinan osakemarkkinoiden tehokkuutta ja random walk -hypoteesin voimassaoloa. Tavoitteena on myös selvittää esiintyykö viikonpäiväanomalia Kiinan osakemarkkinoilla. Tutkimusaineistona käytetään Shanghain osakepörssin A-sarjan,B-sarjan ja yhdistelmä-sarjan ja Shenzhenin yhdistelmä-sarjan indeksien päivittäisiä logaritmisoituja tuottoja ajalta 21.2.1992-30.12.2005 sekä Shenzhenin osakepörssin A-sarjan ja B-sarjan indeksien päivittäisiä logaritmisoituja tuottoja ajalta 5.10.1992-30.12.2005. Tutkimusmenetelminä käytetään neljä tilastollista menetelmää, mukaan lukien autokorrelaatiotestiä, epäparametrista runs-testiä, varianssisuhdetestiä sekä Augmented Dickey-Fullerin yksikköjuuritestiä. Viikonpäiväanomalian esiintymistä tutkitaan käyttämällä pienimmän neliösumman menetelmää (OLS). Testejä tehdään sekä koko aineistolla että kolmella erillisellä ajanjaksolla. Tämän tutkielman empiiriset tulokset tukevat aikaisempia tutkimuksia Kiinan osakemarkkinoiden tehottomuudesta. Lukuun ottamatta yksikköjuuritestien saatuja tuloksia, autokorrelaatio-, runs- ja varianssisuhdetestien perusteella random walk-hypoteesi hylättiin molempien Kiinan osakemarkkinoiden kohdalla. Tutkimustulokset osoittavat, että molemmilla osakepörssillä B-sarjan indeksien käyttäytyminenon ollut huomattavasti enemmän random walk -hypoteesin vastainen kuin A-sarjan indeksit. Paitsi B-sarjan markkinat, molempien Kiinan osakemarkkinoiden tehokkuus näytti myös paranevan vuoden 2001 markkinabuumin jälkeen. Tutkimustulokset osoittavat myös viikonpäiväanomalian esiintyvän Shanghain osakepörssillä, muttei kuitenkaan Shenzhenin osakepörssillä koko tarkasteluajanjaksolla.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tämän työn tarkoituksena on koota yhteen selluprosessin mittausongelmat ja mahdolliset mittaustekniikat ongelmien ratkaisemiseksi. Pääpaino on online-mittaustekniikoissa. Työ koostuu kolmesta osasta. Ensimmäinen osa on kirjallisuustyö, jossa esitellään nykyaikaisen selluprosessin perusmittaukset ja säätötarpeet. Mukana on koko kuitulinja puunkäsittelystä valkaisuun ja kemikaalikierto: haihduttamo, soodakattila, kaustistamo ja meesauuni. Toisessa osassa mittausongelmat ja mahdolliset mittaustekniikat on koottu yhteen ”tiekartaksi”. Tiedot on koottu vierailemalla kolmella suomalaisella sellutehtaalla ja haastattelemalla laitetekniikka- ja mittaustekniikka-asiantuntijoita. Prosessikemian paremmalle ymmärtämiselle näyttää haastattelun perusteella olevan tarvetta, minkä vuoksi konsentraatiomittaukset on valittu jatkotutkimuskohteeksi. Viimeisessä osassa esitellään mahdollisia mittaustekniikoita konsentraatiomittausten ratkaisemiseksi. Valitut tekniikat ovat lähi-infrapunatekniikka (NIR), fourier-muunnosinfrapunatekniikka (FTIR), online-kapillaarielektroforeesi (CE) ja laserindusoitu plasmaemissiospektroskopia (LIPS). Kaikkia tekniikoita voi käyttää online-kytkettyinä prosessikehitystyökaluina. Kehityskustannukset on arvioitu säätöön kytketylle online-laitteelle. Kehityskustannukset vaihtelevat nollasta miestyövuodesta FTIR-tekniikalle viiteen miestyövuoteen CE-laitteelle; kehityskustannukset riippuvat tekniikan kehitysasteesta ja valmiusasteesta tietyn ongelman ratkaisuun. Työn viimeisessä osassa arvioidaan myös yhden mittausongelman – pesuhäviömittauksen – ratkaisemisen teknis-taloudellista kannattavuutta. Ligniinipitoisuus kuvaisi nykyisiä mittauksia paremmin todellista pesuhäviötä. Nykyään mitataan joko natrium- tai COD-pesuhäviötä. Ligniinipitoisuutta voidaan mitata UV-absorptiotekniikalla. Myös CE-laitetta voitaisiin käyttää pesuhäviön mittauksessa ainakin prosessikehitysvaiheessa. Taloudellinen tarkastelu pohjautuu moniin yksinkertaistuksiin ja se ei sovellu suoraan investointipäätösten tueksi. Parempi mittaus- ja säätöjärjestelmä voisi vakauttaa pesemön ajoa. Investointi ajoa vakauttavaan järjestelmään on kannattavaa, jos todellinen ajotilanne on tarpeeksi kaukana kustannusminimistä tai jos pesurin ajo heilahtelee eli pesuhäviön keskihajonta on suuri. 50 000 € maksavalle mittaus- ja säätöjärjestelmälle saadaan alle 0,5 vuoden takaisinmaksuaika epävakaassa ajossa, jos COD-pesuhäviön vaihteluväli on 5,2 – 11,6 kg/odt asetusarvon ollessa 8,4 kg/odt. Laimennuskerroin vaihtelee tällöin välillä 1,7 – 3,6 m3/odt asetusarvon ollessa 2,5 m3/odt.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The focus of this dissertation is the motivational influences on transfer in higher education and professional training contexts. To estimate these motivational influences, the dissertation includes seven individual studies that are structured in two parts. Part I, Dimensions, aims at identifying the dimensionality of motivation to transfer and its structural relations with training-related antecedents and outcomes. Part II, Boundary Conditions, aims at testing the predictive validity of motivation theories used in contemporary training research under different study conditions. Data in this dissertation was gathered from multi-item questionnaires, which were analyzed differently in Part I and Part II. Studies in Part I employed exploratory and confirmatory factor analysis, structural equation modeling, partial least squares (PLS) path modeling, and mediation analysis. Studies in Part II used artifact distribution meta-analysis, (nested) subgroup analysis, and weighted least squares (WLS) multiple regression. Results demonstrate that motivation to transfer can be conceptualized as a three-dimensional construct, including autonomous motivation to transfer, controlled motivation to transfer, and intention to transfer, given a theoretical framework informed by expectancy theory, self-determination theory, and the theory of planned behavior. Results also demonstrate that a range of boundary conditions moderates motivational influences on transfer. To test the predictive validity of expectancy theory, social cognitive theory, and the theory of goal orientations under different study settings, a total of 17 boundary conditions were meta-analyzed, including age; assessment criterion; assessment source; attendance policy; collaboration among trainees; computer support; instruction; instrument used to measure motivation; level of education; publication type; social training context; SS/SMC bias; study setting; survey modality; type of knowledge being trained; use of a control group; and work context. Together, the findings cumulated in this thesis support the basic premise that motivation is centrally important for transfer, but that motivational influences need to be understood from a more differentiated perspective than commonly found in the literature, in order to account for several dimensions and boundary conditions. The results of this dissertation across the seven individual studies are reflected in terms of their implications for theory development and their significance for training evaluation and the design of training environments. Limitations and directions to take in future research are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the framework of the biorefinery concept researchers aspire to optimize the utilization of plant materials, such as agricultural wastes and wood. For most of the known processes, the first steps in the valorisation of biomass are the extraction and purification of the individual components. The obtained raw products by means of a controlled separation can consecutively be modified to result in biofuels or biogas for energy production, but also in value-added products such as additives and important building blocks for the chemical and material industries. Considerable efforts are undertaken in order to substitute the use of oil-based starting materials or at least minimize their processing for the production of everyday goods. Wood is one of the raw materials, which have gained large attention in the last decades and its composition has been studied in detail. Nowadays, the extraction of water-soluble hemicelluloses from wood is well known and so for example xylan can be obtained from hardwoods and O-acetyl galactoglucomannans (GGMs) from softwoods. The aim of this work was to develop water-soluble amphiphilic materials of GGM and to assess their potential use as additives. Furthermore, GGM was also applied as a crosslinker in the synthesis of functional hydrogels for the removal of toxic metals and metalloid ions from aqueous solutions. The distinguished products were obtained by several chemical approaches and analysed by nuclear magnetic resonance spectroscopy (NMR), Fourier transform infrared spectroscopy (FTIR), size exclusion chromatography (SEC), thermal gravimetric analysis (TGA), scanning electron microscope SEM, among others. Bio-based surfactants were produced by applying GGM and different fatty acids as starting materials. On one hand, GGM-grafted-fatty acids were prepared by esterification and on the other hand, well-defined GGM-block-fatty acid derivatives were obtained by linking amino-functional fatty acids to the reducing end of GGM. The reaction conditions for the syntheses were optimized and the resultant amphiphilic GGM derivatives were evaluated concerning their ability to reduce the surface tension of water as surfactants. Furthermore, the block-structured derivatives were tested in respect to their applicability as additives for the surface modification of cellulosic materials. Besides the GGM surfactants with a bio-based hydrophilic and a bio-based hydrophobic part, also GGM block-structured derivatives with a synthetic hydrophobic tail, consisting of a polydimethylsiloxane chain, were prepared and assessed for the hydrophobization of surface of nanofibrillated cellulose films. In order to generate GGM block-structured derivatives containing a synthetic tail with distinguished physical and chemical properties, as well as a tailored chain length, a controlled polymerization method was used. Therefore, firstly an initiator group was introduced at the reducing end of the GGM and consecutively single electron transfer-living radical polymerization (SET-LRP) was performed by applying three different monomers in individual reactions. For the accomplishment of the synthesis and the analysis of the products, challenges related to the solubility of the reactants had to be overcome. Overall, a synthesis route for the production of GGM block-copolymers bearing different synthetic polymer chains was developed and several derivatives were obtained. Moreover, GGM with different molar masses were, after modification, used as a crosslinker in the synthesis of functional hydrogels. Hereby, a cationic monomer was used during the free radical polymerization and the resultant hydrogels were successfully tested for the removal of chromium and arsenic ions from aqueous solutions. The hydrogel synthesis was tailored and materials with distinguished physical properties, such as the swelling rate, were obtained after purification. The results generated in this work underline the potential of bio-based products and the urge to continue carrying out research in order to be able to use more green chemicals for the manufacturing of biorenewable and biodegradable daily products.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the most disputable matters in the theory of finance has been the theory of capital structure. The seminal contributions of Modigliani and Miller (1958, 1963) gave rise to a multitude of studies and debates. Since the initial spark, the financial literature has offered two competing theories of financing decision: the trade-off theory and the pecking order theory. The trade-off theory suggests that firms have an optimal capital structure balancing the benefits and costs of debt. The pecking order theory approaches the firm capital structure from information asymmetry perspective and assumes a hierarchy of financing, with firms using first internal funds, followed by debt and as a last resort equity. This thesis analyses the trade-off and pecking order theories and their predictions on a panel data consisting 78 Finnish firms listed on the OMX Helsinki stock exchange. Estimations are performed for the period 2003–2012. The data is collected from Datastream system and consists of financial statement data. A number of capital structure characteristics are identified: firm size, profitability, firm growth opportunities, risk, asset tangibility and taxes, speed of adjustment and financial deficit. A regression analysis is used to examine the effects of the firm characteristics on capitals structure. The regression models were formed based on the relevant theories. The general capital structure model is estimated with fixed effects estimator. Additionally, dynamic models play an important role in several areas of corporate finance, but with the combination of fixed effects and lagged dependent variables the model estimation is more complicated. A dynamic partial adjustment model is estimated using Arellano and Bond (1991) first-differencing generalized method of moments, the ordinary least squares and fixed effects estimators. The results for Finnish listed firms show support for the predictions of profitability, firm size and non-debt tax shields. However, no conclusive support for the pecking-order theory is found. However, the effect of pecking order cannot be fully ignored and it is concluded that instead of being substitutes the trade-off and pecking order theory appear to complement each other. For the partial adjustment model the results show that Finnish listed firms adjust towards their target capital structure with a speed of 29% a year using book debt ratio.