14 resultados para Ontologies (Information retrieval) -- TFC

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Summary: Using WordNet in information retrieval

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This piece of work which is Identification of Research Portfolio for Development of Filtration Equipment aims at presenting a novel approach to identify promising research topics in the field of design and development of filtration equipment and processes. The projected approach consists of identifying technological problems often encountered in filtration processes. The sources of information for the problem retrieval were patent documents and scientific papers that discussed filtration equipments and processes. The problem identification method adopted in this work focussed on the semantic nature of a sentence in order to generate series of subject-action-object structures. This was achieved with software called Knowledgist. List of problems often encountered in filtration processes that have been mentioned in patent documents and scientific papers were generated. These problems were carefully studied and categorized. Suggestions were made on the various classes of these problems that need further investigation in order to propose a research portfolio. The uses and importance of other methods of information retrieval were also highlighted in this work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Web-portaalien aiheenmukaista luokittelua voidaan hyödyntää tunnistamaan käyttäjän kiinnostuksen kohteet keräämällä tilastotietoa hänen selaustottumuksistaan eri kategorioissa. Tämä diplomityö käsittelee web-sovelluksien osa-alueita, joissa kerättyä tilastotietoa voidaan hyödyntää personalisoinnissa. Yleisperiaatteet sisällön personalisoinnista, Internet-mainostamisesta ja tiedonhausta selitetään matemaattisia malleja käyttäen. Lisäksi työssä kuvaillaan yleisluontoiset ominaisuudet web-portaaleista sekä tilastotiedon keräämiseen liittyvät seikat.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Summary : Fuzzy translation techniques in cross-language information retrieval between closely related languages

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tässä diplomityössä tarkastellaan tietojärjestelmän kehitystyötä, vaatimusmäärittelyä ja toteutustavan selvitystä monimutkaisen organisaation näkökulmasta. Työn tavoitteena on tehdä tiedonhaun opetukseen liittyvän tietojärjestelmän esitutkimus, vaatimusmäärittely ja toteutustavan arviointi. Tietojärjestelmän kehitystyötä tarkastellaan vesiputous-vaihejakomalliin kuuluvien eri vaiheiden avulla. Tietojärjestelmän vaatimusmäärittelyä tarkastellaan sen tavoitteiden, vaiheiden ja erilaisten vaatimusten kautta. Lisäksi tutkitaan kehysorganisaation vaikutusta tietojärjestelmän kehitystyöhön ja vaatimusmäärittelyyn. Työn tuloksista oli nähtävissä, että kehysorganisaatio ja sen monimutkaisuus vaikuttavat tietojärjestelmän kehitystyöhön ja sitä kautta vaatimusmäärittelyn tekemiseen monin tavoin. Nykyisten tietojärjestelmän kehitystyömallien lisäksi on jouduttu miettimään uusia keinoja siihen, miten tekniset vaatimukset yhdistetään liiketaloudellisiin ja organisatorisiin ongelmiin. Työn empiirisen osuuden tuloksena kerättiin Tiedonhaun opetus -tietojärjestelmän vaatimusmäärittelyyn tarvittavat tiedot. Lisäksi selvitettiin miten ja millä resursseilla esitetyn mukainen tietojärjestelmä olisi mahdollista toteuttaa. Erillistä vaatimusmäärittelydokumenttia ei toteutettu, koska tietojärjestelmän mahdolliseksi toteutustavaksi erottui kaksi toisistaan poikkeavaa vaihtoehtoa. Vaatimusmäärittelydokumentin tarkempi muoto hahmottuu sitten, kun tietojärjestelmän toteutustavan periaatteet ovat selvillä.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study presents an automatic, computer-aided analytical method called Comparison Structure Analysis (CSA), which can be applied to different dimensions of music. The aim of CSA is first and foremost practical: to produce dynamic and understandable representations of musical properties by evaluating the prevalence of a chosen musical data structure through a musical piece. Such a comparison structure may refer to a mathematical vector, a set, a matrix or another type of data structure and even a combination of data structures. CSA depends on an abstract systematic segmentation that allows for a statistical or mathematical survey of the data. To choose a comparison structure is to tune the apparatus to be sensitive to an exclusive set of musical properties. CSA settles somewhere between traditional music analysis and computer aided music information retrieval (MIR). Theoretically defined musical entities, such as pitch-class sets, set-classes and particular rhythm patterns are detected in compositions using pattern extraction and pattern comparison algorithms that are typical within the field of MIR. In principle, the idea of comparison structure analysis can be applied to any time-series type data and, in the music analytical context, to polyphonic as well as homophonic music. Tonal trends, set-class similarities, invertible counterpoints, voice-leading similarities, short-term modulations, rhythmic similarities and multiparametric changes in musical texture were studied. Since CSA allows for a highly accurate classification of compositions, its methods may be applicable to symbolic music information retrieval as well. The strength of CSA relies especially on the possibility to make comparisons between the observations concerning different musical parameters and to combine it with statistical and perhaps other music analytical methods. The results of CSA are dependent on the competence of the similarity measure. New similarity measures for tonal stability, rhythmic and set-class similarity measurements were proposed. The most advanced results were attained by employing the automated function generation – comparable with the so-called genetic programming – to search for an optimal model for set-class similarity measurements. However, the results of CSA seem to agree strongly, independent of the type of similarity function employed in the analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tutkimuksen tarkoituksena oli selvittää, millaista uraohjausta ammattikorkeakoulun tuutoriopettajat antavat ja millaista uraohjausta opiskelijat haluavat. Lisäksi tavoitteena oli selvittää, löytyykö opiskelijoiden koulutusalavalinnan perusteista yhteyttä uran suunnittelutaitoihin ja ohjauksen tarpeeseen, ja tunnistavatko tuutoriopettajat opiskelijoiden erilaiset uraohjauksen tarpeet. Tutkimuksen teoreettisissa rakenteissa hyödynnettiin kolmea postmodernia urateoriaa, jotka olivat Hodkinsonin ja Sparkesin (1997) uranvalinnan päätöksentekoteoria, Mitchellin, Lewinin ja Krumbolzin (1999) suunnitellun sattuman teoria ja Savickasin (2005) uran rakentamisteoria. Tutkimusympäristönä oli Satakunnan ammattikorkeakoulu. Tutkimus oli kaksivaiheinen. Ensimmäisessä vaiheessa kerättiin harkinnanvaraisesti valituilta tuutoriopettajilta (n=14) ja opintojensa eri vaiheissa olevilta opiskelijoilta (n=65) kirjoitettu aineisto. Kvalitatiivinen aineisto analysoitiin sisällönanalyysillä. Aineiston perusteella löydettiin kolmenlaisia urasuunnittelijoita: epävarmat, uteliaat ja tietoiset. Aineiston perusteella laadittiin kyselylomake tutkimuksen toisen vaiheen tiedonkeruuta varten. Tutkimuksen toisessa vaiheessa kerättiin opintojen eri vaiheissa olevilta opiskelijoilta kyselylomakekyselynä kvantitatiivinen aineisto (n=903), joka analysoitiin tilastollisin menetelmin. Koulutusalavalinnan perusteista elämäntilanne, alan mahdollisuudet, oma toive, kutsumus, aktiivinen tiedonhaku ja halu opiskella ammattikorkeakoulussa olivat yhteydessä opiskelijan hyvään urasuunnittelukykyyn. Näillä perusteilla koulutusalansa valinneita tietoisiksi luokiteltuja urasuunnittelijoita oli 72 % vastanneista. Alavalinnan perusteista sattuman, kavereiden, sukulaisten, lukion opinto-ohjauksen ja paikkakunnan perusteella koulutusalansa valinneet luokiteltiin epävarmoiksi urasuunnittelijoiksi, ja heitä oli 28 % vastanneista. Tulokset antavat ohjaajille tukea epävarman ja muita enemmän uraohjausta tarvitsevan opiskelijan tunnistamiseen ja heidän hops-prosessinsa tehostamiseen opintojen alusta asti. Lisäksi tulosten perusteella esitetään seuraavia suosituksia: tuutoriopettajille tulisi asettaa pätevyysvaatimukseksi ohjausalan opintojen suorittaminen; opiskelijoita tulisi ohjata tunnistamaan erilaisia satunnaisesti avautuvia mahdollisuuksia ja tietoisesti hyödyntämään niitä elämässään; uraohjaukseen tulisi kytkeä mukaan työelämäyhteistyö; ohjaajien tulisi tiivistää yhteistyötä toisen asteen ohjaajien kanssa, jotta opiskelijoiden koulutusalavalinnat onnistuisivat paremmin; uraohjausta tulisi antaa tulevaisuuden kvalifikaatioiden ennakoinnin ja elinikäisten oppimisvalmiuksien näkökulmasta.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Context: Web services have been gaining popularity due to the success of service oriented architecture and cloud computing. Web services offer tremendous opportunity for service developers to publish their services and applications over the boundaries of the organization or company. However, to fully exploit these opportunities it is necessary to find efficient discovery mechanism thus, Web services discovering mechanism has attracted a considerable attention in Semantic Web research, however, there have been no literature surveys that systematically map the present research result thus overall impact of these research efforts and level of maturity of their results are still unclear. This thesis aims at providing an overview of the current state of research into Web services discovering mechanism using systematic mapping. The work is based on the papers published 2004 to 2013, and attempts to elaborate various aspects of the analyzed literature including classifying them in terms of the architecture, frameworks and methods used for web services discovery mechanism. Objective: The objective if this work is to summarize the current knowledge that is available as regards to Web service discovery mechanisms as well as to systematically identify and analyze the current published research works in order to identify different approaches presented. Method: A systematic mapping study has been employed to assess the various Web Services discovery approaches presented in the literature. Systematic mapping studies are useful for categorizing and summarizing the level of maturity research area. Results: The result indicates that there are numerous approaches that are consistently being researched and published in this field. In terms of where these researches are published, conferences are major contributing publishing arena as 48% of the selected papers were conference published papers illustrating the level of maturity of the research topic. Additionally selected 52 papers are categorized into two broad segments namely functional and non-functional based approaches taking into consideration architectural aspects and information retrieval approaches, semantic matching, syntactic matching, behavior based matching as well as QOS and other constraints.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The overwhelming amount and unprecedented speed of publication in the biomedical domain make it difficult for life science researchers to acquire and maintain a broad view of the field and gather all information that would be relevant for their research. As a response to this problem, the BioNLP (Biomedical Natural Language Processing) community of researches has emerged and strives to assist life science researchers by developing modern natural language processing (NLP), information extraction (IE) and information retrieval (IR) methods that can be applied at large-scale, to scan the whole publicly available biomedical literature and extract and aggregate the information found within, while automatically normalizing the variability of natural language statements. Among different tasks, biomedical event extraction has received much attention within BioNLP community recently. Biomedical event extraction constitutes the identification of biological processes and interactions described in biomedical literature, and their representation as a set of recursive event structures. The 2009–2013 series of BioNLP Shared Tasks on Event Extraction have given raise to a number of event extraction systems, several of which have been applied at a large scale (the full set of PubMed abstracts and PubMed Central Open Access full text articles), leading to creation of massive biomedical event databases, each of which containing millions of events. Sinece top-ranking event extraction systems are based on machine-learning approach and are trained on the narrow-domain, carefully selected Shared Task training data, their performance drops when being faced with the topically highly varied PubMed and PubMed Central documents. Specifically, false-positive predictions by these systems lead to generation of incorrect biomolecular events which are spotted by the end-users. This thesis proposes a novel post-processing approach, utilizing a combination of supervised and unsupervised learning techniques, that can automatically identify and filter out a considerable proportion of incorrect events from large-scale event databases, thus increasing the general credibility of those databases. The second part of this thesis is dedicated to a system we developed for hypothesis generation from large-scale event databases, which is able to discover novel biomolecular interactions among genes/gene-products. We cast the hypothesis generation problem as a supervised network topology prediction, i.e predicting new edges in the network, as well as types and directions for these edges, utilizing a set of features that can be extracted from large biomedical event networks. Routine machine learning evaluation results, as well as manual evaluation results suggest that the problem is indeed learnable. This work won the Best Paper Award in The 5th International Symposium on Languages in Biology and Medicine (LBM 2013).