955 resultados para Information Retrieval, Document Databases, Digital Libraries


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning from Positive and Unlabelled examples (LPU) has emerged as an important problem in data mining and information retrieval applications. Existing techniques are not ideally suited for real world scenarios where the datasets are linearly inseparable, as they either build linear classifiers or the non-linear classifiers fail to achieve the desired performance. In this work, we propose to extend maximum margin clustering ideas and present an iterative procedure to design a non-linear classifier for LPU. In particular, we build a least squares support vector classifier, suitable for handling this problem due to symmetry of its loss function. Further, we present techniques for appropriately initializing the labels of unlabelled examples and for enforcing the ratio of positive to negative examples while obtaining these labels. Experiments on real-world datasets demonstrate that the non-linear classifier designed using the proposed approach gives significantly better generalization performance than the existing relevant approaches for LPU.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper explores how audio chord estimation could improve if information about chord boundaries or beat onsets is revealed by an oracle. Chord estimation at the frame level is compared with three simulations, each using an oracle of increasing powers. The beat and chord segments revealed by an oracle are used to compute a chord ranking at the segment level, and to compute the cumulative probability of finding the correct chord among the top ranked chords. Oracle results on two different audio datasets demonstrate the substantial potential of segment versus frame approaches for chord audio estimation. This paper also provides a comparison of the oracle results on the Beatles dataset, the standard dataset in this area, with the new Billboard Hot 100 chord dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a new method for local key and chord estimation from audio signals. This method relies primarily on principles from music theory, and does not require any training on a corpus of labelled audio files. A harmonic content of the musical piece is first extracted by computing a set of chroma vectors. A set of chord/key pairs is selected for every frame by correlation with fixed chord and key templates. An acyclic harmonic graph is constructed with these pairs as vertices, using a musical distance to weigh its edges. Finally, the sequences of chords and keys are obtained by finding the best path in the graph using dynamic programming. The proposed method allows a mutual chord and key estimation. It is evaluated on a corpus composed of Beatles songs for both the local key estimation and chord recognition tasks, as well as a larger corpus composed of songs taken from the Billboard dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EN]Measuring semantic similarity and relatedness between textual items (words, sentences, paragraphs or even documents) is a very important research area in Natural Language Processing (NLP). In fact, it has many practical applications in other NLP tasks. For instance, Word Sense Disambiguation, Textual Entailment, Paraphrase detection, Machine Translation, Summarization and other related tasks such as Information Retrieval or Question Answering. In this masther thesis we study di erent approaches to compute the semantic similarity between textual items. In the framework of the european PATHS project1, we also evaluate a knowledge-base method on a dataset of cultural item descriptions. Additionaly, we describe the work carried out for the Semantic Textual Similarity (STS) shared task of SemEval-2012. This work has involved supporting the creation of datasets for similarity tasks, as well as the organization of the task itself.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bilbao, Gidor y Ricardo Gómez, «Textos antiguos vascos en Internet», en Humanidades Digitales: desafíos, logros y perspectivas de futuro, Sagrario López Poza y Nieves Pena Sueiro (editoras), Janus [en línea], Anexo 1 (2014), 111-121, publicado el 11/04/2014, consultado el 12/04/2014. URL: digital.es/anexos/contribucion.htm?id=11>.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several research studies have been recently initiated to investigate the use of construction site images for automated infrastructure inspection, progress monitoring, etc. In these studies, it is always necessary to extract material regions (concrete or steel) from the images. Existing methods made use of material's special color/texture ranges for material information retrieval, but they do not sufficiently discuss how to find these appropriate color/texture ranges. As a result, users have to define appropriate ones by themselves, which is difficult for those who do not have enough image processing background. This paper presents a novel method of identifying concrete material regions using machine learning techniques. Under the method, each construction site image is first divided into regions through image segmentation. Then, the visual features of each region are calculated and classified with a pre-trained classifier. The output value determines whether the region is composed of concrete or not. The method was implemented using C++ and tested over hundreds of construction site images. The results were compared with the manual classification ones to indicate the method's validity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This book explores the processes for retrieval, classification, and integration of construction images in AEC/FM model based systems. The author describes a combination of techniques from the areas of image and video processing, computer vision, information retrieval, statistics and content-based image and video retrieval that have been integrated into a novel method for the retrieval of related construction site image data from components of a project model. This method has been tested on available construction site images from a variety of sources like past and current building construction and transportation projects and is able to automatically classify, store, integrate and retrieve image data files in inter-organizational systems so as to allow their usage in project management related tasks. objects. Therefore, automated methods for the integration of construction images are important for construction information management. During this research, processes for retrieval, classification, and integration of construction images in AEC/FM model based systems have been explored. Specifically, a combination of techniques from the areas of image and video processing, computer vision, information retrieval, statistics and content-based image and video retrieval have been deployed in order to develop a methodology for the retrieval of related construction site image data from components of a project model. This method has been tested on available construction site images from a variety of sources like past and current building construction and transportation projects and is able to automatically classify, store, integrate and retrieve image data files in inter-organizational systems so as to allow their usage in project management related tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Architecture, Engineering, Construction and Facilities Management (AEC/FM) industry is rapidly becoming a multidisciplinary, multinational and multi-billion dollar economy, involving large numbers of actors working concurrently at different locations and using heterogeneous software and hardware technologies. Since the beginning of the last decade, a great deal of effort has been spent within the field of construction IT in order to integrate data and information from most computer tools used to carry out engineering projects. For this purpose, a number of integration models have been developed, like web-centric systems and construction project modeling, a useful approach in representing construction projects and integrating data from various civil engineering applications. In the modern, distributed and dynamic construction environment it is important to retrieve and exchange information from different sources and in different data formats in order to improve the processes supported by these systems. Previous research demonstrated that a major hurdle in AEC/FM data integration in such systems is caused by its variety of data types and that a significant part of the data is stored in semi-structured or unstructured formats. Therefore, new integrative approaches are needed to handle non-structured data types like images and text files. This research is focused on the integration of construction site images. These images are a significant part of the construction documentation with thousands stored in site photographs logs of large scale projects. However, locating and identifying such data needed for the important decision making processes is a very hard and time-consuming task, while so far, there are no automated methods for associating them with other related objects. Therefore, automated methods for the integration of construction images are important for construction information management. During this research, processes for retrieval, classification, and integration of construction images in AEC/FM model based systems have been explored. Specifically, a combination of techniques from the areas of image and video processing, computer vision, information retrieval, statistics and content-based image and video retrieval have been deployed in order to develop a methodology for the retrieval of related construction site image data from components of a project model. This method has been tested on available construction site images from a variety of sources like past and current building construction and transportation projects and is able to automatically classify, store, integrate and retrieve image data files in inter-organizational systems so as to allow their usage in project management related tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ideally, one would like to perform image search using an intuitive and friendly approach. Many existing image search engines, however, present users with sets of images arranged in some default order on the screen, typically the relevance to a query, only. While this certainly has its advantages, arguably, a more flexible and intuitive way would be to sort images into arbitrary structures such as grids, hierarchies, or spheres so that images that are visually or semantically alike are placed together. This paper focuses on designing such a navigation system for image browsers. This is a challenging task because arbitrary layout structure makes it difficult - if not impossible - to compute cross-similarities between images and structure coordinates, the main ingredient of traditional layouting approaches. For this reason, we resort to a recently developed machine learning technique: kernelized sorting. It is a general technique for matching pairs of objects from different domains without requiring cross-domain similarity measures and hence elegantly allows sorting images into arbitrary structures. Moreover, we extend it so that some images can be preselected for instance forming the tip of the hierarchy allowing to subsequently navigate through the search results in the lower levels in an intuitive way. Copyright 2010 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

政府信息检索系统作为政府信息公开平台的重要组成部分,对于用户从大量信息中准确查找所需信息起到关键作用,然而现有政府信息检索系统存在两个主要问题:一是系统采用的基于关键词匹配的检索技术忽视了对于用户检索条件的语义的理解,缺乏对于文档实质内涵的准确描述;二是由于对政府信息领域知识的缺乏,用户不能很好地提出符合自己检索需求的检索条件。这两个问题导致检索结果远远不能满足用户的要求。 本体是“概念模型的明确的规范说明”,它提供明确定义的词汇表,描述概念和概念之间的关系,被当作某个领域内不同主体之间进行交流的一种语义基础。它被广泛的应用于信息检索,特别是基于知识的检索中,能显著提高检索系统的查全率和查准率。 本文提出了构建政府信息领域本体并将其应用于政府信息检索系统的方案。首先,研究了现有的领域本体构建方法;在分析了政府信息领域的特点,考察了该领域可用资源的基础上,提出了基于政务主题词表的政府信息领域本体的构建方法。该方法充分利用了《综合电子政务主题词表》中已有的主题词和关系,保证了本体概念添加的完备性和科学性,减少了对领域专家的依赖,提高了构建效率。 设计和实现了基于领域本体的政府信息检索系统。该系统以领域本体为核心,对检索条件进行了扩展,既解决了检索词同政府信息中的公文用词存在差异的问题,又进一步明确了用户的检索需求;对政府信息文档进行了语义标注,提高了检索匹配时的准确度。同时,系统将与检索条件相关的领域概念反馈给用户,便于用户了解领域知识,进一步优化检索条件,获得更全更准的检索结果。