929 resultados para SEMANTIC SIMILARITY


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper compares statistical technique of paraphrase identification to semantic technique of paraphrase identification. The statistical techniques used for comparison are word set and word-order based methods where as the semantic technique used is the WordNet similarity matrix method described by Stevenson and Fernando in [3].

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Linked data offers a promising setting to encode, publish and share metadata of resources. As the matter of fact, it is already adopted by data producers such as European Environment Agency, US and some EU Governs, whose first ambition is to share (meta)data making their processes more effective and transparent. Such as an increasing interest and involvement of data providers surely represents a genuine witness of the web of data success, but in a longer perspective, frameworks supporting linked data consumers in their decision making processes will be a compelling need. In this respect, the talk is introducing SSONDE, a framework enabling in detailed comparison, ranking and selection of linked data resources through the analysis of their RDF ontology driven metadata. SSONDE implements an instance similarity especially designed to support in resource selection, namely the process stakeholders engage to choose a set of resources suitable for a given analysis purpose: (i) it deploys an asymmetric similarity assessment to emphasize information about gains and losses the stakeholders get adopting a resource in place of another; (ii) it relies on an explicit formalization of contexts to tailor the similarity assessment with respect to specific user-defined selection goals. The talk aims at providing an insight on SSONDE instance similarity and it will briefly describe some examples of SSONDE deployment in the context of linked data consumption.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Extracting the semantic relatedness of terms is an important topic in several areas, including data mining, information retrieval and web recommendation. This paper presents an approach for computing the semantic relatedness of terms using the knowledge base of DBpedia — a community effort to extract structured information from Wikipedia. Several approaches to extract semantic relatedness from Wikipedia using bag-of-words vector models are already available in the literature. The research presented in this paper explores a novel approach using paths on an ontological graph extracted from DBpedia. It is based on an algorithm for finding and weighting a collection of paths connecting concept nodes. This algorithm was implemented on a tool called Shakti that extract relevant ontological data for a given domain from DBpedia using its SPARQL endpoint. To validate the proposed approach Shakti was used to recommend web pages on a Portuguese social site related to alternative music and the results of that experiment are reported in this paper.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Two experiments examined the extent to which erroneous recall blocks veridical recall using, as a vehicle for study, the disruptive impact of distractors that are semantically similar to a list of words presented for free recall. Instructing participants to avoid erroneous recall of to-be-ignored spoken distractors attenuated their recall but this did not influence the disruptive effect of those distractors on veridical recall (Experiment 1). Using an externalised output-editing procedure—whereby participants recalled all items that came to mind and identified those that were erroneous—the usual between-sequence semantic similarity effect on erroneous and veridical recall was replicated but the relationship between the rate of erroneous and veridical recall was weak (Experiment 2). The results suggest that forgetting is not due to veridical recall being blocked by similar events.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Existing theories of semantic cognition propose models of cognitive processing occurring in a conceptual space, where ‘meaning’ is derived from the spatial relationships between concepts’ mapped locations within the space. Information visualisation is a growing area of research within the field of information retrieval, and methods for presenting database contents visually in the form of spatial data management systems (SDMSs) are being developed. This thesis combined these two areas of research to investigate the benefits associated with employing spatial-semantic mapping (documents represented as objects in two- and three-dimensional virtual environments are proximally mapped dependent on the semantic similarity of their content) as a tool for improving retrieval performance and navigational efficiency when browsing for information within such systems. Positive effects associated with the quality of document mapping were observed; improved retrieval performance and browsing behaviour were witnessed when mapping was optimal. It was also shown using a third dimension for virtual environment (VE) presentation provides sufficient additional information regarding the semantic structure of the environment that performance is increased in comparison to using two-dimensions for mapping. A model that describes the relationship between retrieval performance and browsing behaviour was proposed on the basis of findings. Individual differences were not found to have any observable influence on retrieval performance or browsing behaviour when mapping quality was good. The findings from this work have implications for both cognitive modelling of semantic information, and for designing and testing information visualisation systems. These implications are discussed in the conclusions of this work.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the words’ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we propose an unsupervised methodology to automatically discover pairs of semantically related words by highlighting their local environment and evaluating their semantic similarity in local and global semantic spaces. This proposal di®ers from previous research as it tries to take the best of two different methodologies i.e. semantic space models and information extraction models. It can be applied to extract close semantic relations, it limits the search space and it is unsupervised.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In recent years the technological world has grown by incorporating billions of small sensing devices, collecting and sharing real-world information. As the number of such devices grows, it becomes increasingly difficult to manage all these new information sources. There is no uniform way to share, process and understand context information. In previous publications we discussed efficient ways to organize context information that is independent of structure and representation. However, our previous solution suffers from semantic sensitivity. In this paper we review semantic methods that can be used to minimize this issue, and propose an unsupervised semantic similarity solution that combines distributional profiles with public web services. Our solution was evaluated against Miller-Charles dataset, achieving a correlation of 0.6.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this paper is to verify the level of text comprehension (reading and translation) in Portuguese, by native speakers of Spanish and vice-versa. The subjects are freshmen, from different fields (300 native speakers of Portuguese and 300 of Spanish), who have never studied the other language neither as a second (L2) nor as a foreign language (FL). The results show that, in each group of subjects, there is a high level of comprehension of the foreign language, which varies from 58% to 94%, depending on the context and on the lexical/semantic similarity (or difference) between the key-words in the texts used in this research.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background: Many complex systems can be represented and analysed as networks. The recent availability of large-scale datasets, has made it possible to elucidate some of the organisational principles and rules that govern their function, robustness and evolution. However, one of the main limitations in using protein-protein interactions for function prediction is the availability of interaction data, especially for Mollicutes. If we could harness predicted interactions, such as those from a Protein-Protein Association Networks (PPAN), combining several protein-protein network function-inference methods with semantic similarity calculations, the use of protein-protein interactions for functional inference in this species would become more potentially useful. Results: In this work we show that using PPAN data combined with other approximations, such as functional module detection, orthology exploitation methods and Gene Ontology (GO)-based information measures helps to predict protein function in Mycoplasma genitalium. Conclusions: To our knowledge, the proposed method is the first that combines functional module detection among species, exploiting an orthology procedure and using information theory-based GO semantic similarity in PPAN of the Mycoplasma species. The results of an evaluation show a higher recall than previously reported methods that focused on only one organism network.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L’approche psycholinguistique suggère que la rétention à court terme verbale et le langage dépendent de mécanismes communs. Elle prédit que les caractéristiques linguistiques des items verbaux (e.g. phonologiques, lexicales, sémantiques) influencent le rappel immédiat (1) et que la contribution des niveaux de représentations linguistiques dépend du contexte de rappel, certaines conditions expérimentales (e.g. format des stimuli) favorisant l’utilisation de codes spécifiques (2). Ces prédictions sont évaluées par le biais de deux études empiriques réalisées auprès d’une patiente cérébrolésée qui présente une atteinte du traitement phonologique (I.R.) et de participants contrôles. Une première étude (Article 1) teste l’impact des modes de présentation et de rappel sur les effets de similarité phonologique et de catégorie sémantique de listes de mots. Une seconde étude (Article 2) évalue la contribution du code orthographique en mémoire à court terme (MCT) verbale en testant l’effet de la densité du voisinage orthographique des mots sur le rappel sériel immédiat de mots présentés visuellement. Compte tenu du rôle déterminant du code phonologique en MCT et du type d’atteinte de I.R., des effets linguistiques distincts étaient attendus chez elle et chez les contrôles. Selon le contexte de rappel, des effets sémantiques (Article 1) et orthographiques (Article 2) plus importants étaient prédits chez I.R. et des effets phonologiques plus marqués étaient attendus chez les participants contrôles. Chez I.R., le rappel est influencé par les caractéristiques sémantiques et orthographiques des mots, mais peu par leurs caractéristiques phonologiques et le contexte de rappel module l’utilisation de différents niveaux de représentations linguistiques. Chez les contrôles, une contribution relativement plus stable des représentations phonologiques est observée. Les données appuient une approche psycholinguistique qui postule que des mécanismes communs régissent la rétention à court terme verbale et le langage. Les implications théoriques et cliniques des résultats sont discutées en regard de modèles psycholinguistiques actuels.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., alpha beta-motifs, alpha-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we describe our approach for Cross-Lingual linking of Indian news stories, submitted for Cross-Lingual Indian News Story Search (CL!NSS) task at FIRE 2012. Our approach consists of two major steps, the reduction of search space by using di�erent features and ranking of the news stories according to their relatedness scores. Our approach uses Wikipedia-based Cross-Lingual Explicit Semantic Analysis (CLESA) to calculate the semantic similarity and relatedness score between two news stories in di�erent languages. We evaluate our approach on CL!NSS dataset, which consists of 50 news stories in English and 50K news stories in Hindi.