993 resultados para Retrieval models


Relevância:

70.00% 70.00%

Publicador:

Resumo:

La recherche d'informations s'intéresse, entre autres, à répondre à des questions comme: est-ce qu'un document est pertinent à une requête ? Est-ce que deux requêtes ou deux documents sont similaires ? Comment la similarité entre deux requêtes ou documents peut être utilisée pour améliorer l'estimation de la pertinence ? Pour donner réponse à ces questions, il est nécessaire d'associer chaque document et requête à des représentations interprétables par ordinateur. Une fois ces représentations estimées, la similarité peut correspondre, par exemple, à une distance ou une divergence qui opère dans l'espace de représentation. On admet généralement que la qualité d'une représentation a un impact direct sur l'erreur d'estimation par rapport à la vraie pertinence, jugée par un humain. Estimer de bonnes représentations des documents et des requêtes a longtemps été un problème central de la recherche d'informations. Le but de cette thèse est de proposer des nouvelles méthodes pour estimer les représentations des documents et des requêtes, la relation de pertinence entre eux et ainsi modestement avancer l'état de l'art du domaine. Nous présentons quatre articles publiés dans des conférences internationales et un article publié dans un forum d'évaluation. Les deux premiers articles concernent des méthodes qui créent l'espace de représentation selon une connaissance à priori sur les caractéristiques qui sont importantes pour la tâche à accomplir. Ceux-ci nous amènent à présenter un nouveau modèle de recherche d'informations qui diffère des modèles existants sur le plan théorique et de l'efficacité expérimentale. Les deux derniers articles marquent un changement fondamental dans l'approche de construction des représentations. Ils bénéficient notamment de l'intérêt de recherche dont les techniques d'apprentissage profond par réseaux de neurones, ou deep learning, ont fait récemment l'objet. Ces modèles d'apprentissage élicitent automatiquement les caractéristiques importantes pour la tâche demandée à partir d'une quantité importante de données. Nous nous intéressons à la modélisation des relations sémantiques entre documents et requêtes ainsi qu'entre deux ou plusieurs requêtes. Ces derniers articles marquent les premières applications de l'apprentissage de représentations par réseaux de neurones à la recherche d'informations. Les modèles proposés ont aussi produit une performance améliorée sur des collections de test standard. Nos travaux nous mènent à la conclusion générale suivante: la performance en recherche d'informations pourrait drastiquement être améliorée en se basant sur les approches d'apprentissage de représentations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

La recherche d'informations s'intéresse, entre autres, à répondre à des questions comme: est-ce qu'un document est pertinent à une requête ? Est-ce que deux requêtes ou deux documents sont similaires ? Comment la similarité entre deux requêtes ou documents peut être utilisée pour améliorer l'estimation de la pertinence ? Pour donner réponse à ces questions, il est nécessaire d'associer chaque document et requête à des représentations interprétables par ordinateur. Une fois ces représentations estimées, la similarité peut correspondre, par exemple, à une distance ou une divergence qui opère dans l'espace de représentation. On admet généralement que la qualité d'une représentation a un impact direct sur l'erreur d'estimation par rapport à la vraie pertinence, jugée par un humain. Estimer de bonnes représentations des documents et des requêtes a longtemps été un problème central de la recherche d'informations. Le but de cette thèse est de proposer des nouvelles méthodes pour estimer les représentations des documents et des requêtes, la relation de pertinence entre eux et ainsi modestement avancer l'état de l'art du domaine. Nous présentons quatre articles publiés dans des conférences internationales et un article publié dans un forum d'évaluation. Les deux premiers articles concernent des méthodes qui créent l'espace de représentation selon une connaissance à priori sur les caractéristiques qui sont importantes pour la tâche à accomplir. Ceux-ci nous amènent à présenter un nouveau modèle de recherche d'informations qui diffère des modèles existants sur le plan théorique et de l'efficacité expérimentale. Les deux derniers articles marquent un changement fondamental dans l'approche de construction des représentations. Ils bénéficient notamment de l'intérêt de recherche dont les techniques d'apprentissage profond par réseaux de neurones, ou deep learning, ont fait récemment l'objet. Ces modèles d'apprentissage élicitent automatiquement les caractéristiques importantes pour la tâche demandée à partir d'une quantité importante de données. Nous nous intéressons à la modélisation des relations sémantiques entre documents et requêtes ainsi qu'entre deux ou plusieurs requêtes. Ces derniers articles marquent les premières applications de l'apprentissage de représentations par réseaux de neurones à la recherche d'informations. Les modèles proposés ont aussi produit une performance améliorée sur des collections de test standard. Nos travaux nous mènent à la conclusion générale suivante: la performance en recherche d'informations pourrait drastiquement être améliorée en se basant sur les approches d'apprentissage de représentations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Resource Selection (or Query Routing) is an important step in P2P IR. Though analogous to document retrieval in the sense of choosing a relevant subset of resources, resource selection methods have evolved independently from those for document retrieval. Among the reasons for such divergence is that document retrieval targets scenarios where underlying resources are semantically homogeneous, whereas peers would manage diverse content. We observe that semantic heterogeneity is mitigated in the clustered 2-tier P2P IR architecture resource selection layer by way of usage of clustering, and posit that this necessitates a re-look at the applicability of document retrieval methods for resource selection within such a framework. This paper empirically benchmarks document retrieval models against the state-of-the-art resource selection models for the problem of resource selection in the clustered P2P IR architecture, using classical IR evaluation metrics. Our benchmarking study illustrates that document retrieval models significantly outperform other methods for the task of resource selection in the clustered P2P IR architecture. This indicates that clustered P2P IR framework can exploit advancements in document retrieval methods to deliver corresponding improvements in resource selection, indicating potential convergence of these fields for the clustered P2P IR architecture.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The information retrieval process involves subjective, imprecise and vague concepts, such as "information need", "relevance", and the very concept of "information". The main information retrieval models treat these concepts accurately, represented by a single numerical value. The fuzzy logic, while operating with the uncertainty of natural phenomena in a systematic and rigorous manner, represents a promising alternative to solve some problems related to information retrieval. This paper presents the fuzzy logic and some examples of its use in information retrieval systems (IRS).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traditional content-based filtering methods usually utilize text extraction and classification techniques for building user profiles as well as for representations of contents, i.e. item profiles. These methods have some disadvantages e.g. mismatch between user profile terms and item profile terms, leading to low performance. Some of the disadvantages can be overcome by incorporating a common ontology which enables representing both the users' and the items' profiles with concepts taken from the same vocabulary. We propose a new content-based method for filtering and ranking the relevancy of items for users, which utilizes a hierarchical ontology. The method measures the similarity of the user's profile to the items' profiles, considering the existing of mutual concepts in the two profiles, as well as the existence of "related" concepts, according to their position in the ontology. The proposed filtering algorithm computes the similarity between the users' profiles and the items' profiles, and rank-orders the relevant items according to their relevancy to each user. The method is being implemented in ePaper, a personalized electronic newspaper project, utilizing a hierarchical ontology designed specifically for classification of News items. It can, however, be utilized in other domains and extended to other ontologies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Systemidentification, evolutionary automatic, data-driven model, fuzzy Takagi-Sugeno grammar, genotype interpretability, toxicity-prediction

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The motivation for the work presented in this thesis is to retrieve profile information for the atmospheric trace constituents nitrogen dioxide (NO2) and ozone (O3) in the lower troposphere from remote sensing measurements. The remote sensing technique used, referred to as Multiple AXis Differential Optical Absorption Spectroscopy (MAX-DOAS), is a recent technique that represents a significant advance on the well-established DOAS, especially for what it concerns the study of tropospheric trace consituents. NO2 is an important trace gas in the lower troposphere due to the fact that it is involved in the production of tropospheric ozone; ozone and nitrogen dioxide are key factors in determining the quality of air with consequences, for example, on human health and the growth of vegetation. To understand the NO2 and ozone chemistry in more detail not only the concentrations at ground but also the acquisition of the vertical distribution is necessary. In fact, the budget of nitrogen oxides and ozone in the atmosphere is determined both by local emissions and non-local chemical and dynamical processes (i.e. diffusion and transport at various scales) that greatly impact on their vertical and temporal distribution: thus a tool to resolve the vertical profile information is really important. Useful measurement techniques for atmospheric trace species should fulfill at least two main requirements. First, they must be sufficiently sensitive to detect the species under consideration at their ambient concentration levels. Second, they must be specific, which means that the results of the measurement of a particular species must be neither positively nor negatively influenced by any other trace species simultaneously present in the probed volume of air. Air monitoring by spectroscopic techniques has proven to be a very useful tool to fulfill these desirable requirements as well as a number of other important properties. During the last decades, many such instruments have been developed which are based on the absorption properties of the constituents in various regions of the electromagnetic spectrum, ranging from the far infrared to the ultraviolet. Among them, Differential Optical Absorption Spectroscopy (DOAS) has played an important role. DOAS is an established remote sensing technique for atmospheric trace gases probing, which identifies and quantifies the trace gases in the atmosphere taking advantage of their molecular absorption structures in the near UV and visible wavelengths of the electromagnetic spectrum (from 0.25 μm to 0.75 μm). Passive DOAS, in particular, can detect the presence of a trace gas in terms of its integrated concentration over the atmospheric path from the sun to the receiver (the so called slant column density). The receiver can be located at ground, as well as on board an aircraft or a satellite platform. Passive DOAS has, therefore, a flexible measurement configuration that allows multiple applications. The ability to properly interpret passive DOAS measurements of atmospheric constituents depends crucially on how well the optical path of light collected by the system is understood. This is because the final product of DOAS is the concentration of a particular species integrated along the path that radiation covers in the atmosphere. This path is not known a priori and can only be evaluated by Radiative Transfer Models (RTMs). These models are used to calculate the so called vertical column density of a given trace gas, which is obtained by dividing the measured slant column density to the so called air mass factor, which is used to quantify the enhancement of the light path length within the absorber layers. In the case of the standard DOAS set-up, in which radiation is collected along the vertical direction (zenith-sky DOAS), calculations of the air mass factor have been made using “simple” single scattering radiative transfer models. This configuration has its highest sensitivity in the stratosphere, in particular during twilight. This is the result of the large enhancement in stratospheric light path at dawn and dusk combined with a relatively short tropospheric path. In order to increase the sensitivity of the instrument towards tropospheric signals, measurements with the telescope pointing the horizon (offaxis DOAS) have to be performed. In this circumstances, the light path in the lower layers can become very long and necessitate the use of radiative transfer models including multiple scattering, the full treatment of atmospheric sphericity and refraction. In this thesis, a recent development in the well-established DOAS technique is described, referred to as Multiple AXis Differential Optical Absorption Spectroscopy (MAX-DOAS). The MAX-DOAS consists in the simultaneous use of several off-axis directions near the horizon: using this configuration, not only the sensitivity to tropospheric trace gases is greatly improved, but vertical profile information can also be retrieved by combining the simultaneous off-axis measurements with sophisticated RTM calculations and inversion techniques. In particular there is a need for a RTM which is capable of dealing with all the processes intervening along the light path, supporting all DOAS geometries used, and treating multiple scattering events with varying phase functions involved. To achieve these multiple goals a statistical approach based on the Monte Carlo technique should be used. A Monte Carlo RTM generates an ensemble of random photon paths between the light source and the detector, and uses these paths to reconstruct a remote sensing measurement. Within the present study, the Monte Carlo radiative transfer model PROMSAR (PROcessing of Multi-Scattered Atmospheric Radiation) has been developed and used to correctly interpret the slant column densities obtained from MAX-DOAS measurements. In order to derive the vertical concentration profile of a trace gas from its slant column measurement, the AMF is only one part in the quantitative retrieval process. One indispensable requirement is a robust approach to invert the measurements and obtain the unknown concentrations, the air mass factors being known. For this purpose, in the present thesis, we have used the Chahine relaxation method. Ground-based Multiple AXis DOAS, combined with appropriate radiative transfer models and inversion techniques, is a promising tool for atmospheric studies in the lower troposphere and boundary layer, including the retrieval of profile information with a good degree of vertical resolution. This thesis has presented an application of this powerful comprehensive tool for the study of a preserved natural Mediterranean area (the Castel Porziano Estate, located 20 km South-West of Rome) where pollution is transported from remote sources. Application of this tool in densely populated or industrial areas is beginning to look particularly fruitful and represents an important subject for future studies.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Studies of delayed nonmatching-to-sample (DNMS) performance following lesions of the monkey cortex have revealed a critical circuit of brain regions involved in forming memories and retaining and retrieving stimulus representations. Using event-related functional magnetic resonance imaging (fMRI), we measured brain activity in 10 healthy human participants during performance of a trial-unique visual DNMS task using novel barcode stimuli. The event-related design enabled the identification of activity during the different phases of the task (encoding, retention, and retrieval). Several brain regions identified by monkey studies as being important for successful DNMS performance showed selective activity during the different phases, including the mediodorsal thalamic nucleus (encoding), ventrolateral prefrontal cortex (retention), and perirhinal cortex (retrieval). Regions showing sustained activity within trials included the ventromedial and dorsal prefrontal cortices and occipital cortex. The present study shows the utility of investigating performance on tasks derived from animal models to assist in the identification of brain regions involved in human recognition memory.