836 resultados para Text retrieval


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The MARS (Media Asset Retrieval System) Project is the collaborative effort of public broadcasters,libraries and schools in the Puget Sound region to create a digital online resource that provides access to content produced by public broadcasters via the public libraries. Convergence ConsortiumThe Convergence Consortium is a model for community collaboration, including organizations such as public broadcasters, libraries, museums, and schools in the Puget Sound region to assess the needs of their constituents and pool resources to develop solutions to meet those needs. Specifically, the archives of public broadcasters have been identified as significant resources for the local communities and nationally. These resources can be accessed on the broadcasters websites, and through libraries and used by schools, and integrated with text and photographic archives from other partners.MARS’ goalCreate an online resource that provides effective access to the content produced locally by KCTS (Seattle PBS affiliate) and KUOW (Seattle NPR affiliate). The broadcasts will be made searchable using the CPB Metadata Element Set (under development) and controlled vocabularies (to be developed). This will ensure a user friendly search and navigation mechanism and user satisfaction.Furthermore, the resource can search the local public library’s catalog concurrently and provide the user with relevant TV material, radio material, and books on a given subject.The ultimate goal is to produce a model that can be used in cities around the country.The current phase of the project assesses the community’s need, analyzes the current operational systems, and makes recommendations for the design of the resource.Deliverables• Literature review of the issues surrounding the organization, description and representation of media assets• Needs assessment report of internal and external stakeholders• Profile of the systems in the area of managing and organizing media assetsfor public broadcasting nationwideActivities• Analysis of information seeking behavior• Analysis of collaboration within the respective organizations• Analysis of the scope and context of the proposed system• Examining the availability of information resources and exchangeof resources among users

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The MARS (Media Asset Retrieval System) Project is a collaboration between public broadcasters, libraries and schools in the Puget Sound region to assess the needs of their constituents and pool resources to develop solutions to meet those needs. The Project’s ultimate goal is to create a digital online resource that will provide access to content produced by public broadcasters and libraries. The MARS Project is funded by a grant from the Corporation for Public Broadcasting (CPB) Television Future Fund. Convergence ConsortiumThe Convergence Consortium is a model for community collaboration, including representatives from public broadcasting, libraries and schools in the Puget Sound region. They meet regularly to consider collaborative efforts that will be mutually beneficial to their institutions and constituents. Specifically, the archives of public broadcasters have been identified as significant resources that can be accessed through libraries and used by schools, and integrated with text and photographic archives from other partners.Using the work-centered framework, we collected data through interviews with nine engineers and observation of their searching while they performed their regular, job-related searches on the Web. The framework was used to analyze the data on two levels: 1) the activities and organizational relationships and constrains of work domains, and 2) users’ cognitive and social activities and their subjective preferences during searching.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In questa tesi si trattano lo studio e la sperimentazione di un modello generativo retrieval-augmented, basato su Transformers, per il task di Abstractive Summarization su lunghe sentenze legali. La sintesi automatica del testo (Automatic Text Summarization) è diventata un task di Natural Language Processing (NLP) molto importante oggigiorno, visto il grandissimo numero di dati provenienti dal web e banche dati. Inoltre, essa permette di automatizzare un processo molto oneroso per gli esperti, specialmente nel settore legale, in cui i documenti sono lunghi e complicati, per cui difficili e dispendiosi da riassumere. I modelli allo stato dell’arte dell’Automatic Text Summarization sono basati su soluzioni di Deep Learning, in particolare sui Transformers, che rappresentano l’architettura più consolidata per task di NLP. Il modello proposto in questa tesi rappresenta una soluzione per la Long Document Summarization, ossia per generare riassunti di lunghe sequenze testuali. In particolare, l’architettura si basa sul modello RAG (Retrieval-Augmented Generation), recentemente introdotto dal team di ricerca Facebook AI per il task di Question Answering. L’obiettivo consiste nel modificare l’architettura RAG al fine di renderla adatta al task di Abstractive Long Document Summarization. In dettaglio, si vuole sfruttare e testare la memoria non parametrica del modello, con lo scopo di arricchire la rappresentazione del testo di input da riassumere. A tal fine, sono state sperimentate diverse configurazioni del modello su diverse tipologie di esperimenti e sono stati valutati i riassunti generati con diverse metriche automatiche.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La tesi ha lo scopo di ricercare, esaminare ed implementare un sistema di Machine Learning, un Recommendation Systems per precisione, che permetta la racommandazione di documenti di natura giuridica, i quali sono già stati analizzati e categorizzati appropriatamente, in maniera ottimale, il cui scopo sarebbe quello di accompagnare un sistema già implementato di Information Retrieval, istanziato sopra una web application, che permette di ricercare i documenti giuridici appena menzionati.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Artificial Intelligence is reshaping the field of fashion industry in different ways. E-commerce retailers exploit their data through AI to enhance their search engines, make outfit suggestions and forecast the success of a specific fashion product. However, it is a challenging endeavour as the data they possess is huge, complex and multi-modal. The most common way to search for fashion products online is by matching keywords with phrases in the product's description which are often cluttered, inadequate and differ across collections and sellers. A customer may also browse an online store's taxonomy, although this is time-consuming and doesn't guarantee relevant items. With the advent of Deep Learning architectures, particularly Vision-Language models, ad-hoc solutions have been proposed to model both the product image and description to solve this problems. However, the suggested solutions do not exploit effectively the semantic or syntactic information of these modalities, and the unique qualities and relations of clothing items. In this work of thesis, a novel approach is proposed to address this issues, which aims to model and process images and text descriptions as graphs in order to exploit the relations inside and between each modality and employs specific techniques to extract syntactic and semantic information. The results obtained show promising performances on different tasks when compared to the present state-of-the-art deep learning architectures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduction: Internet users are increasingly using the worldwide web to search for information relating to their health. This situation makes it necessary to create specialized tools capable of supporting users in their searches. Objective: To apply and compare strategies that were developed to investigate the use of the Portuguese version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-language web-based content within or outside of the field of healthcare, focusing on the lay public. Methods: 3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were used to validate it. The strategies proposed were constructed using content-based vector methods for text classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics obtained through the proposed strategies. Results: A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity and area under the ROC curve). Conclusions: Because of the significant results achieved by InDeCS, this tool has been successfully applied to the Brazilian healthcare search portal known as Busca Saude. Furthermore, it could be shown that MeSH presents important results when used for the task of classifying web-based content focusing on the lay public. It was also possible to show from this study that MeSH was able to map out mutable non-deterministic characteristics of the web. (c) 2010 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An implementation of a computational tool to generate new summaries from new source texts is presented, by means of the connectionist approach (artificial neural networks). Among other contributions that this work intends to bring to natural language processing research, the use of a more biologically plausible connectionist architecture and training for automatic summarization is emphasized. The choice relies on the expectation that it may bring an increase in computational efficiency when compared to the sa-called biologically implausible algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

While multimedia data, image data in particular, is an integral part of most websites and web documents, our quest for information so far is still restricted to text based search. To explore the World Wide Web more effectively, especially its rich repository of truly multimedia information, we are facing a number of challenging problems. Firstly, we face the ambiguous and highly subjective nature of defining image semantics and similarity. Secondly, multimedia data could come from highly diversified sources, as a result of automatic image capturing and generation processes. Finally, multimedia information exists in decentralised sources over the Web, making it difficult to use conventional content-based image retrieval (CBIR) techniques for effective and efficient search. In this special issue, we present a collection of five papers on visual and multimedia information management and retrieval topics, addressing some aspects of these challenges. These papers have been selected from the conference proceedings (Kluwer Academic Publishers, ISBN: 1-4020- 7060-8) of the Sixth IFIP 2.6 Working Conference on Visual Database Systems (VDB6), held in Brisbane, Australia, on 29–31 May 2002.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A long-standing challenge of content-based image retrieval (CBIR) systems is the definition of a suitable distance function to measure the similarity between images in an application context which complies with the human perception of similarity. In this paper, we present a new family of distance functions, called attribute concurrence influence distances (AID), which serve to retrieve images by similarity. These distances address an important aspect of the psychophysical notion of similarity in comparisons of images: the effect of concurrent variations in the values of different image attributes. The AID functions allow for comparisons of feature vectors by choosing one of two parameterized expressions: one targeting weak attribute concurrence influence and the other for strong concurrence influence. This paper presents the mathematical definition and implementation of the AID family for a two-dimensional feature space and its extension to any dimension. The composition of the AID family with L (p) distance family is considered to propose a procedure to determine the best distance for a specific application. Experimental results involving several sets of medical images demonstrate that, taking as reference the perception of the specialist in the field (radiologist), the AID functions perform better than the general distance functions commonly used in CBIR.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

What different forms of engagement do image and text allow the spectator/reader? We know that text and image communicate, and that all communication depends on a relationship between those who communicate. The objective of this text is therefore to understand the new possibilities available to an anthropology of the expression of knowledge that makes use of images, such as photographs and films.