26 resultados para 080704 Information Retrieval and Web Search

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fast changing environment sets pressure on firms to share large amount of information with their customers and suppliers. The terms information integration and information sharing are essential for facilitating a smooth flow of information throughout the supply chain, and the terms are used interchangeably in research literature. By integrating and sharing information, firms want to improve their logistics performance. Firms share information with their suppliers and customers by using traditional communication methods (telephone, fax, Email, written and face-to-face contacts) and by using advanced or modern communication methods such as electronic data interchange (EDI), enterprise resource planning (ERP), web-based procurement systems, electronic trading systems and web portals. Adopting new ways of using IT is one important resource for staying competitive on the rapidly changing market (Saeed et al. 2005, 387), and an information system that provides people the information they need for performing their work, will support company performance (Boddy et al. 2005, 26). The purpose of this research has been to test and understand the relationship between information integration with key suppliers and/or customers and a firm’s logistics performance, especially when information technology (IT) and information systems (IS) are used for integrating information. Quantitative and qualitative research methods have been used to perform the research. Special attention has been paid to the scope, level and direction of information integration (Van Donk & van der Vaart 2005a). In addition, the four elements of integration (Jahre & Fabbe-Costes 2008) are closely tied to the frame of reference. The elements are integration of flows, integration of processes and activities, integration of information technologies and systems and integration of actors. The study found that information integration has a low positive relationship to operational performance and a medium positive relationship to strategic performance. The potential performance improvements found in this study vary from efficiency, delivery and quality improvements (operational) to profit, profitability or customer satisfaction improvements (strategic). The results indicate that although information integration has an impact on a firm’s logistics performance, all performance improvements have not been achieved. This study also found that the use of IT and IS have a mediocre positive relationship to information integration. Almost all case companies agreed on that the use of IT and IS could facilitate information integration and improve their logistics performance. The case companies felt that an implementation of a web portal or a data bank would benefit them - enhance their performance and increase information integration.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Selostus: Ponsiviljeltävyys ja siihen liittyvät geenimerkit peltokauran ja susikauran risteytysjälkeläisissä

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study we used market settlement prices of European call options on stock index futures to extract implied probability distribution function (PDF). The method used produces a PDF of returns of an underlying asset at expiration date from implied volatility smile. With this method, the assumption of lognormal distribution (Black-Scholes model) is tested. The market view of the asset price dynamics can then be used for various purposes (hedging, speculation). We used the so called smoothing approach for implied PDF extraction presented by Shimko (1993). In our analysis we obtained implied volatility smiles from index futures markets (S&P 500 and DAX indices) and standardized them. The method introduced by Breeden and Litzenberger (1978) was then used on PDF extraction. The results show significant deviations from the assumption of lognormal returns for S&P500 options while DAX options mostly fit the lognormal distribution. A deviant subjective view of PDF can be used to form a strategy as discussed in the last section.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This Master´s thesis explores how the a global industrial corporation’s after sales service department should arrange its installed base management practices in order to maintain and utilize the installed base information effectively. Case company has product-related records, such as product’s lifecycle information, service history information and information about product’s performance. Information is collected and organized often case by case, therefore the systematic and effective use of installed base information is difficult also the overview of installed base is missing. The goal of the thesis study was to find out how the case company can improve the installed base maintenance and management practices and improve the installed base information availability and reliability. Installed base information management practices were first examined through the literature. The empirical research was conducted by the interviews and questionnaire survey, targeted to the case company’s service department. The research purpose was to find out the challenges related to case company´s service department’s information management practices. The study also identified the installed base information needs and improvement potential in the availability of information. Based on the empirical research findings, recommendations for improve installed base management practices and information availability were created. Grounding of the recommendations, the case company is suggested the following proposals for action: Service report development, improving the change management process, ensuring the quality of the product documentation in early stages of product life cycle and decision to improve installed base management practices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Supply chains are becoming increasingly dependent on information ex-change in today’s world, and any disruption can cause severe repercus-sions to the flow of materials in the chain. The speed, accuracy and amount of information are key factors. The aim in this thesis is to address a gap in the research by focusing on information exchange and the risks related to it in a multimodal wood supply chain operating between the Baltic States and Finland. The study involved interviewing people engaged in logistics management in the supply chain in question. The main risk the interviewees identified arose from the sea logistics system, which held a lot of different kinds of information. The threat of breakdown in the Internet connection was also found to hinder the operations significantly. A vulnerability analysis was carried out in order to identify the main actors and channels of infor-mation flow in the supply chain. The analysis revealed that the most important and therefore most vulnerable information-exchange channels were those linking the terminal superintendent, the operative managers and the mill managers. The study gives a holistic picture of the investigated supply chain. Information-exchange-related risks varied greatly. One of the most frequently mentioned was the risk of information inaccuracy, which was usually due to the fact that those in charge of the various functions did not fully understand the consequences for the entire chain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014