937 resultados para Ontologies (Information Retrieval)


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The following topics were dealt with: document analysis and recognition; multimedia document processing; character recognition; document image processing; cheque processing; form processing; music processing; document segmentation; electronic documents; character classification; handwritten character recognition; information retrieval; postal automation; font recognition; Indian language OCR; handwriting recognition; performance evaluation; graphics recognition; oriental character recognition; and word recognition

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The problem of identifying user intent has received considerable attention in recent years, particularly in the context of improving the search experience via query contextualization. Intent can be characterized by multiple dimensions, which are often not observed from query words alone. Accurate identification of Intent from query words remains a challenging problem primarily because it is extremely difficult to discover these dimensions. The problem is often significantly compounded due to lack of representative training sample. We present a generic, extensible framework for learning the multi-dimensional representation of user intent from the query words. The approach models the latent relationships between facets using tree structured distribution which leads to an efficient and convergent algorithm, FastQ, for identifying the multi-faceted intent of users based on just the query words. We also incorporated WordNet to extend the system capabilities to queries which contain words that do not appear in the training data. Empirical results show that FastQ yields accurate identification of intent when compared to a gold standard.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞-norm extreme of the lp-norm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on real-world data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we propose a postprocessing technique for a spectrogram diffusion based harmonic/percussion decom- position algorithm. The proposed technique removes har- monic instrument leakages in the percussion enhanced out- puts of the baseline algorithm. The technique uses median filtering and an adaptive detection of percussive segments in subbands followed by piecewise signal reconstruction using envelope properties to ensure that percussion is enhanced while harmonic leakages are suppressed. A new binary mask is created for the percussion signal which upon applying on the original signal improves harmonic versus percussion separation. We compare our algorithm with two recent techniques and show that on a database of polyphonic Indian music, the postprocessing algorithm improves the harmonic versus percussion decomposition significantly.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose an iterative algorithm to detect transient segments in audio signals. Short time Fourier transform(STFT) is used to detect rapid local changes in the audio signal. The algorithm has two steps that iteratively - (a) calculate a function of the STFT and (b) build a transient signal. A dynamic thresholding scheme is used to locate the potential positions of transients in the signal. The iterative procedure ensures that genuine transients are built up while the localised spectral noise are suppressed by using an energy criterion. The extracted transient signal is later compared to a ground truth dataset. The algorithm performed well on two databases. On the EBU-SQAM database of monophonic sounds, the algorithm achieved an F-measure of 90% while on our database of polyphonic audio an F-measure of 91% was achieved. This technique is being used as a preprocessing step for a tempo analysis algorithm and a TSR (Transients + Sines + Residue) decomposition scheme.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Learning from Positive and Unlabelled examples (LPU) has emerged as an important problem in data mining and information retrieval applications. Existing techniques are not ideally suited for real world scenarios where the datasets are linearly inseparable, as they either build linear classifiers or the non-linear classifiers fail to achieve the desired performance. In this work, we propose to extend maximum margin clustering ideas and present an iterative procedure to design a non-linear classifier for LPU. In particular, we build a least squares support vector classifier, suitable for handling this problem due to symmetry of its loss function. Further, we present techniques for appropriately initializing the labels of unlabelled examples and for enforcing the ratio of positive to negative examples while obtaining these labels. Experiments on real-world datasets demonstrate that the non-linear classifier designed using the proposed approach gives significantly better generalization performance than the existing relevant approaches for LPU.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

[ES] Se estudian las características comunes y específicas de los gestores personales de bases de datos de referencias bibliográficas más utilizados: Reference Manager, EndNote, ProCite, RefWorks y EndNote Web. Los Apartados analizados son: la entrada de datos, el control de autoridades, los comandos de edición global, la personalización de algunos aspectos de las bases de datos, la exportación de las referencias, la visualización de los Registros, la inserción de citas bibliográficas y la generación automática de bibliografías.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper explores how audio chord estimation could improve if information about chord boundaries or beat onsets is revealed by an oracle. Chord estimation at the frame level is compared with three simulations, each using an oracle of increasing powers. The beat and chord segments revealed by an oracle are used to compute a chord ranking at the segment level, and to compute the cumulative probability of finding the correct chord among the top ranked chords. Oracle results on two different audio datasets demonstrate the substantial potential of segment versus frame approaches for chord audio estimation. This paper also provides a comparison of the oracle results on the Beatles dataset, the standard dataset in this area, with the new Billboard Hot 100 chord dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes a new method for local key and chord estimation from audio signals. This method relies primarily on principles from music theory, and does not require any training on a corpus of labelled audio files. A harmonic content of the musical piece is first extracted by computing a set of chroma vectors. A set of chord/key pairs is selected for every frame by correlation with fixed chord and key templates. An acyclic harmonic graph is constructed with these pairs as vertices, using a musical distance to weigh its edges. Finally, the sequences of chords and keys are obtained by finding the best path in the graph using dynamic programming. The proposed method allows a mutual chord and key estimation. It is evaluated on a corpus composed of Beatles songs for both the local key estimation and chord recognition tasks, as well as a larger corpus composed of songs taken from the Billboard dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

[EN]Measuring semantic similarity and relatedness between textual items (words, sentences, paragraphs or even documents) is a very important research area in Natural Language Processing (NLP). In fact, it has many practical applications in other NLP tasks. For instance, Word Sense Disambiguation, Textual Entailment, Paraphrase detection, Machine Translation, Summarization and other related tasks such as Information Retrieval or Question Answering. In this masther thesis we study di erent approaches to compute the semantic similarity between textual items. In the framework of the european PATHS project1, we also evaluate a knowledge-base method on a dataset of cultural item descriptions. Additionaly, we describe the work carried out for the Semantic Textual Similarity (STS) shared task of SemEval-2012. This work has involved supporting the creation of datasets for similarity tasks, as well as the organization of the task itself.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O problema que justifica o presente estudo refere-se à falta de semântica nos mecanismos de busca na Web. Para este problema, o consórcio W3 vem desenvolvendo tecnologias que visam construir uma Web Semântica. Entre estas tecnologias, estão as ontologias de domínio. Neste sentido, o objetivo geral desta dissertação é discutir as possibilidades de se imprimir semântica às buscas nos agregadores de notícia da Web. O objetivo específico é apresentar uma aplicação que usa uma classificação semi-automática de notícias, reunindo, para tanto, as tecnologias de busca da área de recuperação de informação com as ontologias de domínio. O sistema proposto é uma aplicação para a Web capaz de buscar notícias sobre um domínio específico em portais de informação. Ela utiliza a API do Google Maps V1 para a localização georreferenciada da notícia, sempre que esta informação estiver disponível. Para mostrar a viabilidade da proposta, foi desenvolvido um exemplo apoiado em uma ontologia para o domínio de chuvas e suas consequências. Os resultados obtidos por este novo Feed de base ontológica são alocados em um banco de dados e disponibilizados para consulta via Web. A expectativa é que o Feed proposto seja mais relevante em seus resultados do que um Feed comum. Os resultados obtidos com a união de tecnologias patrocinadas pelo consórcio W3 (XML, RSS e ontologia) e ferramentas de busca em página Web foram satisfatórios para o propósito pretendido. As ontologias mostram-se como ferramentas de usos múltiplos, e seu valor de análise em buscas na Web pode ser ampliado com aplicações computacionais adequadas para cada caso. Como no exemplo apresentado nesta dissertação, à palavra chuva agregaram-se outros conceitos, que estavam presentes nos desdobramentos ocasionados por ela. Isto realçou a ligação do evento chuva com as consequências que ela provoca - ação que só foi possível executar através de um recorte do conhecimento formal envolvido.