2 resultados para Intuition.
em AMS Tesi di Laurea - Alm@DL - Università di Bologna
Esperienza di creazione di entrate lessicografiche combinatorie: metodi e dati dal progetto CombiNet
Resumo:
The present dissertation aims at simulating the construction of lexicographic layouts for an Italian combinatory dictionary based on real linguistic data, extracted from corpora by using computational methods. This work is based on the assumption that the intuition of the native speaker, or the lexicographer, who manually extracts and classifies all the relevant data, are not adequate to provide sufficient information on the meaning and use of words. Therefore, a study of the real use of language is required and this is particularly true for dictionaries that collect the combinatory behaviour of words, where the task of the lexicographer is to identify typical combinations where a word occurs. This study is conducted in the framework of the CombiNet project aimed at studying Italian Word Combinationsand and at building an online, corpus-based combinatory lexicographic resource for the Italian language. This work is divided into three chapters. Chapter 1 describes the criteria considered for the classification of word combinations according to the work of Ježek (2011). Chapter 1 also contains a brief comparison between the most important Italian combinatory dictionaries and the BBI Dictionary of Word Combinations in order to describe how word combinations are considered in these lexicographic resources. Chapter 2 describes the main computational methods used for the extraction of word combinations from corpora, taking into account the advantages and disadvantages of the two methods. Chapter 3 mainly focuses on the practical word carried out in the framework of the CombiNet project, with reference to the tools and resources used (EXTra, LexIt and "La Repubblica" corpus). Finally, the data extracted and the lexicographic layout of the lemmas to be included in the combinatory dictionary are commented, namely the words "acqua" (water), "braccio" (arm) and "colpo" (blow, shot, stroke).
Resumo:
The our reality is characterized by a constant progress and, to follow that, people need to stay up to date on the events. In a world with a lot of existing news, search for the ideal ones may be difficult, because the obstacles that make it arduous will be expanded more and more over time, due to the enrichment of data. In response, a great help is given by Information Retrieval, an interdisciplinary branch of computer science that deals with the management and the retrieval of the information. An IR system is developed to search for contents, contained in a reference dataset, considered relevant with respect to the need expressed by an interrogative query. To satisfy these ambitions, we must consider that most of the developed IR systems rely solely on textual similarity to identify relevant information, defining them as such when they include one or more keywords expressed by the query. The idea studied here is that this is not always sufficient, especially when it's necessary to manage large databases, as is the web. The existing solutions may generate low quality responses not allowing, to the users, a valid navigation through them. The intuition, to overcome these limitations, has been to define a new concept of relevance, to differently rank the results. So, the light was given to Temporal PageRank, a new proposal for the Web Information Retrieval that relies on a combination of several factors to increase the quality of research on the web. Temporal PageRank incorporates the advantages of a ranking algorithm, to prefer the information reported by web pages considered important by the context itself in which they reside, and the potential of techniques belonging to the world of the Temporal Information Retrieval, exploiting the temporal aspects of data, describing their chronological contexts. In this thesis, the new proposal is discussed, comparing its results with those achieved by the best known solutions, analyzing its strengths and its weaknesses.