47 resultados para rank estimation

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selostus: Ayrshire-ensikoiden koelypsykohtaisen maidontuotannon perinnölliset tunnusluvut laktaation eri vaiheissa

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selostus: Maassa olevan nitraattitypen arviointi simulointimallin avulla

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selostus: Ravikilpailumenestysmittojen periytymisasteet ja toistumiskertoimet kilpailukohtaisten tulosten perusteella

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Yksi keskeisimmistä tehtävistä matemaattisten mallien tilastollisessa analyysissä on mallien tuntemattomien parametrien estimointi. Tässä diplomityössä ollaan kiinnostuneita tuntemattomien parametrien jakaumista ja niiden muodostamiseen sopivista numeerisista menetelmistä, etenkin tapauksissa, joissa malli on epälineaarinen parametrien suhteen. Erilaisten numeeristen menetelmien osalta pääpaino on Markovin ketju Monte Carlo -menetelmissä (MCMC). Nämä laskentaintensiiviset menetelmät ovat viime aikoina kasvattaneet suosiotaan lähinnä kasvaneen laskentatehon vuoksi. Sekä Markovin ketjujen että Monte Carlo -simuloinnin teoriaa on esitelty työssä siinä määrin, että menetelmien toimivuus saadaan perusteltua. Viime aikoina kehitetyistä menetelmistä tarkastellaan etenkin adaptiivisia MCMC menetelmiä. Työn lähestymistapa on käytännönläheinen ja erilaisia MCMC -menetelmien toteutukseen liittyviä asioita korostetaan. Työn empiirisessä osuudessa tarkastellaan viiden esimerkkimallin tuntemattomien parametrien jakaumaa käyttäen hyväksi teoriaosassa esitettyjä menetelmiä. Mallit kuvaavat kemiallisia reaktioita ja kuvataan tavallisina differentiaaliyhtälöryhminä. Mallit on kerätty kemisteiltä Lappeenrannan teknillisestä yliopistosta ja Åbo Akademista, Turusta.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We provide an incremental quantile estimator for Non-stationary Streaming Data. We propose a method for simultaneous estimation of multiple quantiles corresponding to the given probability levels from streaming data. Due to the limitations of the memory, it is not feasible to compute the quantiles by storing the data. So estimating the quantiles as the data pass by is the only possibility. This can be effective in network measurement. To provide the minimum of the mean-squared error of the estimation, we use parabolic approximation and for comparison we simulate the results for different number of runs and using both linear and parabolic approximations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tutkimus keskittyy kansainväliseen hajauttamiseen suomalaisen sijoittajan näkökulmasta. Tutkimuksen toinen tavoite on selvittää tehostavatko uudet kovarianssimatriisiestimaattorit minimivarianssiportfolion optimointiprosessia. Tavallisen otoskovarianssimatriisin lisäksi optimoinnissa käytetään kahta kutistusestimaattoria ja joustavaa monimuuttuja-GARCH(1,1)-mallia. Tutkimusaineisto koostuu Dow Jonesin toimialaindekseistä ja OMX-H:n portfolioindeksistä. Kansainvälinen hajautusstrategia on toteutettu käyttäen toimialalähestymistapaa ja portfoliota optimoidaan käyttäen kahtatoista komponenttia. Tutkimusaieisto kattaa vuodet 1996-2005 eli 120 kuukausittaista havaintoa. Muodostettujen portfolioiden suorituskykyä mitataan Sharpen indeksillä. Tutkimustulosten mukaan kansainvälisesti hajautettujen investointien ja kotimaisen portfolion riskikorjattujen tuottojen välillä ei ole tilastollisesti merkitsevää eroa. Myöskään uusien kovarianssimatriisiestimaattoreiden käytöstä ei synnytilastollisesti merkitsevää lisäarvoa verrattuna otoskovarianssimatrisiin perustuvaan portfolion optimointiin.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gas-liquid mass transfer is an important issue in the design and operation of many chemical unit operations. Despite its importance, the evaluation of gas-liquid mass transfer is not straightforward due to the complex nature of the phenomena involved. In this thesis gas-liquid mass transfer was evaluated in three different gas-liquid reactors in a traditional way by measuring the volumetric mass transfer coefficient (kLa). The studied reactors were a bubble column with a T-junction two-phase nozzle for gas dispersion, an industrial scale bubble column reactor for the oxidation of tetrahydroanthrahydroquinone and a concurrent downflow structured bed.The main drawback of this approach is that the obtained correlations give only the average volumetric mass transfer coefficient, which is dependent on average conditions. Moreover, the obtained correlations are valid only for the studied geometry and for the chemical system used in the measurements. In principle, a more fundamental approach is to estimate the interfacial area available for mass transfer from bubble size distributions obtained by solution of population balance equations. This approach has been used in this thesis by developing a population balance model for a bubble column together with phenomenological models for bubble breakage and coalescence. The parameters of the bubble breakage rate and coalescence rate models were estimated by comparing the measured and calculated bubble sizes. The coalescence models always have at least one experimental parameter. This is because the bubble coalescence depends on liquid composition in a way which is difficult to evaluate using known physical properties. The coalescence properties of some model solutions were evaluated by measuring the time that a bubble rests at the free liquid-gas interface before coalescing (the so-calledpersistence time or rest time). The measured persistence times range from 10 msup to 15 s depending on the solution. The coalescence was never found to be instantaneous. The bubble oscillates up and down at the interface at least a coupleof times before coalescence takes place. The measured persistence times were compared to coalescence times obtained by parameter fitting using measured bubble size distributions in a bubble column and a bubble column population balance model. For short persistence times, the persistence and coalescence times are in good agreement. For longer persistence times, however, the persistence times are at least an order of magnitude longer than the corresponding coalescence times from parameter fitting. This discrepancy may be attributed to the uncertainties concerning the estimation of energy dissipation rates, collision rates and mechanisms and contact times of the bubbles.