924 resultados para NUDIST (Information retrieval system)
Resumo:
This thesis targets on a challenging issue that is to enhance users' experience over massive and overloaded web information. The novel pattern-based topic model proposed in this thesis can generate high-quality multi-topic user interest models technically by incorporating statistical topic modelling and pattern mining. We have successfully applied the pattern-based topic model to both fields of information filtering and information retrieval. The success of the proposed model in finding the most relevant information to users mainly comes from its precisely semantic representations to represent documents and also accurate classification of the topics at both document level and collection level.
Resumo:
Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
Resumo:
The increasing amount of information that is annotated against standardised semantic resources offers opportunities to incorporate sophisticated levels of reasoning, or inference, into the retrieval process. In this position paper, we reflect on the need to incorporate semantic inference into retrieval (in particular for medical information retrieval) as well as previous attempts that have been made so far with mixed success. Medical information retrieval is a fertile ground for testing inference mechanisms to augment retrieval. The medical domain offers a plethora of carefully curated, structured, semantic resources, along with well established entity extraction and linking tools, and search topics that intuitively require a number of different inferential processes (e.g., conceptual similarity, conceptual implication, etc.). We argue that integrating semantic inference in information retrieval has the potential to uncover a large amount of information that otherwise would be inaccessible; but inference is also risky and, if not used cautiously, can harm retrieval.
Resumo:
Concept mapping involves determining relevant concepts from a free-text input, where concepts are defined in an external reference ontology. This is an important process that underpins many applications for clinical information reporting, derivation of phenotypic descriptions, and a number of state-of-the-art medical information retrieval methods. Concept mapping can be cast into an information retrieval (IR) problem: free-text mentions are treated as queries and concepts from a reference ontology as the documents to be indexed and retrieved. This paper presents an empirical investigation applying general-purpose IR techniques for concept mapping in the medical domain. A dataset used for evaluating medical information extraction is adapted to measure the effectiveness of the considered IR approaches. Standard IR approaches used here are contrasted with the effectiveness of two established benchmark methods specifically developed for medical concept mapping. The empirical findings show that the IR approaches are comparable with one benchmark method but well below the best benchmark.
Resumo:
Recent advances in neural language models have contributed new methods for learning distributed vector representations of words (also called word embeddings). Two such methods are the continuous bag-of-words model and the skipgram model. These methods have been shown to produce embeddings that capture higher order relationships between words that are highly effective in natural language processing tasks involving the use of word similarity and word analogy. Despite these promising results, there has been little analysis of the use of these word embeddings for retrieval. Motivated by these observations, in this paper, we set out to determine how these word embeddings can be used within a retrieval model and what the benefit might be. To this aim, we use neural word embeddings within the well known translation language model for information retrieval. This language model captures implicit semantic relations between the words in queries and those in relevant documents, thus producing more accurate estimations of document relevance. The word embeddings used to estimate neural language models produce translations that differ from previous translation language model approaches; differences that deliver improvements in retrieval effectiveness. The models are robust to choices made in building word embeddings and, even more so, our results show that embeddings do not even need to be produced from the same corpus being used for retrieval.
Resumo:
Menneinä vuosikymmeninä maatalouden työt ovat ensin koneellistuneet voimakkaasti ja sittemmin mukaan on tullut automaatio. Nykyään koneiden kokoa suurentamalla ei enää saada tuottavuutta nostettua merkittävästi, vaan työn tehostaminen täytyy tehdä olemassa olevien resurssien käyttöä tehostamalla. Tässä työssä tarkastelun kohteena on ajosilppuriketju nurmisäilörehun korjuussa. Säilörehun korjuun intensiivisyys ja koneyksiköiden runsas määrä ovat työnjohdon kannalta vaativa yhdistelmä. Työn tavoitteena oli selvittää vaatimuksia maatalouden urakoinnin tueksi kehitettävälle tiedonhallintajärjestelmälle. Tutkimusta varten haastateltiin yhteensä 12 urakoitsijaa tai yhteistyötä tekevää viljelijää. Tutkimuksen perusteella urakoitsijoilla on tarvetta tietojärjestelmille.Luonnollisesti urakoinnin laajuus ja järjestelyt vaikuttavat asiaan. Tutkimuksen perusteella keskeisimpiä vaatimuksia tiedonhallinnalle ovat: • mahdollisimman laaja, yksityiskohtainen ja automaattinen tiedon keruu tehtävästä työstä • karttapohjaisuus, kuljettajien opastus kohteisiin • asiakasrekisteri, työn tilaus sähköisesti • tarjouspyyntöpohjat, hintalaskurit • luotettavuus, tiedon säilyvyys • sovellettavuus monenlaisiin töihin • yhteensopivuus muiden järjestelmien kanssa Kehitettävän järjestelmän tulisi siis tutkimuksen perusteella sisältää seuraavia osia: helppokäyttöinen suunnittelu/asiakasrekisterityökalu, toimintoja koneiden seurantaan, opastukseen ja johtamiseen, työnaikainen tiedonkeruu sekä kerätyn tiedon käsittelytoimintoja. Kaikki käyttäjät eivät kuitenkaan tarvitse kaikkia toimintoja, joten urakoitsijan on voitava valita tarvitsemansa osat ja mahdollisesti lisätä toimintoja myöhemmin. Tiukoissa taloudellisissa ja ajallisissa raameissa toimivat urakoitsijat ovat vaativia asiakkaita, joiden käyttämän tekniikan tulee olla toimivaa ja luotettavaa. Toisaalta inhimillisiä virheitä sattuu kokeneillekin, joten hyvällä tietojärjestelmällä työstä tulee helpompaa ja tehokkaampaa.