928 resultados para Lexical Database
Resumo:
This paper presents the preliminary analysis of Kannada WordNet and the set of relevant computational tools. Although the design has been inspired by the famous English WordNet, and to certain extent, by the Hindi WordNet, the unique features of Kannada WordNet are graded antonyms and meronymy relationships, nominal as well as verbal compoundings, complex verb constructions and efficient underlying database design (designed to handle storage and display of Kannada unicode characters). Kannada WordNet would not only add to the sparse collection of machine-readable Kannada dictionaries, but also will give new insights into the Kannada vocabulary. It provides sufficient interface for applications involved in Kannada machine translation, spell checker and semantic analyser.
Resumo:
Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.
Resumo:
Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980's. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and interrelating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important initiative, relying on a slightly different method of building multilingual wordnets, is the MultiWordNet project, where the key strategy is building language specific wordnets keeping as much as possible of the semantic relations available in the WN.Pr. This paper, in particular, stresses that the additional advantage of using WN.Pr lexical database as a resource for building wordnets for other languages is to explore possibilities of implementing an automatic procedure to map the WN.Pr conceptual relations as hyponymy, co-hyponymy, troponymy, meronymy, cause, and entailment onto the lexical database of the wordnet under construction, a viable possibility, for those are language-independent relations that hold between lexicalized concepts, not between lexical units. Accordingly, combining methods from both initiatives, this paper presents the ongoing implementation of the WN.Br lexical database and the aforementioned automation procedure illustrated with a sample of the automatic encoding of the hyponymy and co-hyponymy relations.
Resumo:
The cross-sections of the Social Web and the Semantic Web has put folksonomy in the spot light for its potential in overcoming knowledge acquisition bottleneck and providing insight for "wisdom of the crowds". Folksonomy which comes as the results of collaborative tagging activities has provided insight into user's understanding about Web resources which might be useful for searching and organizing purposes. However, collaborative tagging vocabulary poses some challenges since tags are freely chosen by users and may exhibit synonymy and polysemy problem. In order to overcome these challenges and boost the potential of folksonomy as emergence semantics we propose to consolidate the diverse vocabulary into a consolidated entities and concepts. We propose to extract a tag ontology by ontology learning process to represent the semantics of a tagging community. This paper presents a novel approach to learn the ontology based on the widely used lexical database WordNet. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. We provide empirical evaluations by using the semantic information contained in the ontology in a tag recommendation experiment. The results show that by using the semantic relationships on the ontology the accuracy of the tag recommender has been improved.
Resumo:
Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. One of the most popular web personalization systems is recommender systems. In recommender systems choosing user information that can be used to profile users is very crucial for user profiling. In Web 2.0, one facility that can help users organize Web resources of their interest is user tagging systems. Exploring user tagging behavior provides a promising way for understanding users’ information needs since tags are given directly by users. However, free and relatively uncontrolled vocabulary makes the user self-defined tags lack of standardization and semantic ambiguity. Also, the relationships among tags need to be explored since there are rich relationships among tags which could provide valuable information for us to better understand users. In this paper, we propose a novel approach for learning tag ontology based on the widely used lexical database WordNet for capturing the semantics and the structural relationships of tags. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. To personalize further, clustering of users is performed to generate a more accurate ontology for a particular group of users. In order to evaluate the usefulness of the tag ontology, we use the tag ontology in a pilot tag recommendation experiment for improving the recommendation performance by exploiting the semantic information in the tag ontology. The initial result shows that the personalized information has improved the accuracy of the tag recommendation.
Resumo:
FinnWordNet is a wordnet for Finnish that complies with the format of the Princeton WordNet (PWN) (Fellbaum, 1998). It was built by translating the PrincetonWordNet 3.0 synsets into Finnish by human translators. It is open source and contains 117000 synsets. The Finnish translations were inserted into the PWN structure resulting in a bilingual lexical database. In natural language processing (NLP), wordnets have been used for infusing computers with semantic knowledge assuming that humans already have a sufficient amount of this knowledge. In this paper we present a case study of using wordnets as an electronic dictionary. We tested whether native Finnish speakers benefit from using a wordnet while completing English sentence completion tasks. We found that using either an English wordnet or a bilingual English Finnish wordnet significantly improves performance in the task. This should be taken into account when setting standards and comparing human and computer performance on these tasks.
Resumo:
This research shows a new approach and development of a design methodology, based on the perspective of meanings. In this study the design process is explored as a development of the structure of meanings. The processes of search and evaluation of meanings form the foundations of developing this structure. In order to facilitate the use and operation of the meanings, the WordNet lexical database and an existing visualization of WordNet — Visuwords — is used for the process of meaning search. The basic tool used for evaluation process is the WordNet::Similarity software, measuring the relatedness of meanings in the database. In this way it is measuring the degree of interconnections between different meanings. This kind of search and evaluation techniques are later on incorporated into our methodology of the structure of meanings to support the design process. The measures of relatedness of meanings are developed as convergence criteria for application in the processes of evaluation. Further on, the methodology for the structure of meanings developed here is used to construct meanings in a verification of product design. The steps of the design methodology, including the search and evaluation processes involved in developing the structure of the meanings, are elucidated. The choices, made by the designer in terms of meanings are supported by consequent searches and evaluations of meanings to be implemented in the designed product. In conclusion, the paper presents directions for developing and further extensions of the proposed design methodology.
Resumo:
[ES] El presente manual describe el manejo de grafos de forma interactiva en el entorno 3D que proporciona el programa Xglore (http://sourceforge.net/projects/xglore/). Forma parte del proyecto “Nerthusv2: Base de datos léxica en 3D del inglés antiguo” patrocinado por el Ministerio de Ciencia e Innovación (nº: FFI08-04448/FILO).
Resumo:
Se analizan y describen las principales líneas de trabajo de la Web Semántica en el ámbito de los archivos de televisión. Para ello, se analiza y contextualiza la web semántica desde una perspectiva general para posteriormente analizar las principales iniciativas que trabajan con lo audiovisual: Proyecto MuNCH, Proyecto S5T, Semantic Television y VideoActive.
Resumo:
Cette recherche porte sur l’interface entre la sémantique lexicale et la syntaxe, et elle s’inscrit dans le cadre du projet de base lexicale DiCo (acronyme pour Dictionnaire de combinatoire) à l’Observatoire de Linguistique Sens-Texte [OLST] de l’Université de Montréal. Le projet découle d'une volonté d'inscrire de façon concise et complète, à même le dictionnaire, le comportement syntaxique typique à chaque unité lexicale. Dans cette optique, nous encodons la cooccurrence des lexies nominales du DiCo avec leurs actants à l'intérieur d'un tableau de régime lexical (aussi connu sous le nom de schéma valenciel, structure argumentale, cadre de sous-catégorisation, structure prédicats-arguments, etc.), en notant entre autres les dépendances syntaxiques de surface impliquées. Dans ce mémoire, nous présentons les propriétés syntaxiques d'une dépendance nominale du français, celle que nous avons nommée attributive adnominale, de façon à exposer une méthodologie d'identification et de caractérisation des dépendances syntaxiques de surface. Nous donnons également la liste des dépendances nominales régies identifiées au cours de ce travail. Par la suite, nous exposons la création d'une base de données de régimes généralisés du français nommée CARNAVAL. Finalement, nous discutons des applications possibles de notre travail, particulièrement en ce qui a trait à la création d'une typologie des régimes lexicaux du français.
Resumo:
Dans la sémantique des cadres de Fillmore, les mots prennent leur sens par rapport au contexte événementiel ou situationnel dans lequel ils s’inscrivent. FrameNet, une ressource lexicale pour l’anglais, définit environ 1000 cadres conceptuels, couvrant l’essentiel des contextes possibles. Dans un cadre conceptuel, un prédicat appelle des arguments pour remplir les différents rôles sémantiques associés au cadre (par exemple : Victime, Manière, Receveur, Locuteur). Nous cherchons à annoter automatiquement ces rôles sémantiques, étant donné le cadre sémantique et le prédicat. Pour cela, nous entrainons un algorithme d’apprentissage machine sur des arguments dont le rôle est connu, pour généraliser aux arguments dont le rôle est inconnu. On utilisera notamment des propriétés lexicales de proximité sémantique des mots les plus représentatifs des arguments, en particulier en utilisant des représentations vectorielles des mots du lexique.
Resumo:
This paper discusses particular linguistic challenges in the task of reusing published dictionaries, conceived as structured sources of lexical information, in the compilation process of a machine-tractable thesaurus-like lexical database for Brazilian Portuguese. After delimiting the scope of the polysemous term thesaurus, the paper focuses on the improvement of the resulting object by a small team, in a form compatible with and inspired by WordNet guidelines, comments on the dictionary entries, addresses selected problems found in the process of extracting the relevant lexical information form the selected dictionaries, and provides some strategies to overcome them.
Resumo:
This paper presents the overall methodology that has been used to encode both the Brazilian Portuguese WordNet (WordNet.Br) standard language-independent conceptual-semantic relations (hyponymy, co-hyponymy, meronymy, cause, and entailment) and the so-called cross-lingual conceptual-semantic relations between different wordnets. Accordingly, after contextualizing the project and outlining the current lexical database structure and statistics, it describes the WordNet.Br editing GUI that was designed to aid the linguist in carrying out the tasks of building synsets, selecting sample sentences from corpora, writing synset concept glosses, and encoding both language-independent conceptual-semantic relations and cross-lingual conceptual-semantic relations between WordNet.Br and Princeton WordNet © Springer-Verlag Berlin Heidelberg 2006.
Resumo:
The need for the representation of both semantics and common sense and its organization in a lexical database or knowledge base has motivated the development of large projects, such as Wordnets, CYC and Mikrokosmos. Besides the generic bases, another approach is the construction of ontologies for specific domains. Among the advantages of such approach there is the possibility of a greater and more detailed coverage of a specific domain and its terminology. Domain ontologies are important resources in several tasks related to the language processing, especially in those related to information retrieval and extraction in textual bases. Information retrieval or even question and answer systems can benefit from the domain knowledge represented in an ontology. Besides embracing the terminology of the field, the ontology makes the relationships among the terms explicit. Copyright 2007 ACM.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)