987 resultados para Documents-Restauració
Resumo:
The central objective of research in Information Retrieval (IR) is to discover new techniques to retrieve relevant information in order to satisfy an Information Need. The Information Need is satisfied when relevant information can be provided to the user. In IR, relevance is a fundamental concept which has changed over time, from popular to personal, i.e., what was considered relevant before was information for the whole population, but what is considered relevant now is specific information for each user. Hence, there is a need to connect the behavior of the system to the condition of a particular person and his social context; thereby an interdisciplinary sector called Human-Centered Computing was born. For the modern search engine, the information extracted for the individual user is crucial. According to the Personalized Search (PS), two different techniques are necessary to personalize a search: contextualization (interconnected conditions that occur in an activity), and individualization (characteristics that distinguish an individual). This movement of focus to the individual's need undermines the rigid linearity of the classical model overtaken the ``berry picking'' model which explains that the terms change thanks to the informational feedback received from the search activity introducing the concept of evolution of search terms. The development of Information Foraging theory, which observed the correlations between animal foraging and human information foraging, also contributed to this transformation through attempts to optimize the cost-benefit ratio. This thesis arose from the need to satisfy human individuality when searching for information, and it develops a synergistic collaboration between the frontiers of technological innovation and the recent advances in IR. The search method developed exploits what is relevant for the user by changing radically the way in which an Information Need is expressed, because now it is expressed through the generation of the query and its own context. As a matter of fact the method was born under the pretense to improve the quality of search by rewriting the query based on the contexts automatically generated from a local knowledge base. Furthermore, the idea of optimizing each IR system has led to develop it as a middleware of interaction between the user and the IR system. Thereby the system has just two possible actions: rewriting the query, and reordering the result. Equivalent actions to the approach was described from the PS that generally exploits information derived from analysis of user behavior, while the proposed approach exploits knowledge provided by the user. The thesis went further to generate a novel method for an assessment procedure, according to the "Cranfield paradigm", in order to evaluate this type of IR systems. The results achieved are interesting considering both the effectiveness achieved and the innovative approach undertaken together with the several applications inspired using a local knowledge base.
Resumo:
Except the article forming the main content most HTML documents on the WWW contain additional contents such as navigation menus, design elements or commercial banners. In the context of several applications it is necessary to draw the distinction between main and additional content automatically. Content extraction and template detection are the two approaches to solve this task. This thesis gives an extensive overview of existing algorithms from both areas. It contributes an objective way to measure and evaluate the performance of content extraction algorithms under different aspects. These evaluation measures allow to draw the first objective comparison of existing extraction solutions. The newly introduced content code blurring algorithm overcomes several drawbacks of previous approaches and proves to be the best content extraction algorithm at the moment. An analysis of methods to cluster web documents according to their underlying templates is the third major contribution of this thesis. In combination with a localised crawling process this clustering analysis can be used to automatically create sets of training documents for template detection algorithms. As the whole process can be automated it allows to perform template detection on a single document, thereby combining the advantages of single and multi document algorithms.
Resumo:
Lo scopo di questa dissertazione è di identificare le tecnologie più appropriate per la creazione di editor parametrici per documenti strutturati e di descrivere LIME, un editor di markup parametrico e indipendente dal linguaggio. La recente evoluzione delle tecnologie XML ha portato ad un utilizzo sempre più consistente di documenti strutturati. Oggigiorno, questi vengono utilizzati sia per scopi tipografici sia per l’interscambio di dati nella rete internet. Per questa ragione, sempre più persone hanno a che fare con documenti XML nel lavoro quotidiano. Alcuni dialetti XML, tuttavia, non sono semplici da comprendere e da utilizzare e, per questo motivo, si rendono necessari editor XML che possano guidare gli autori di documenti XML durante tutto il processo di markup. In alcuni contesti, specialmente in quello dell’informatica giuridica, sono stati introdotti i markup editor, software WYSIWYG che assistono l’utente nella creazione di documenti corretti. Questi editor possono essere utilizzati anche da persone che non conoscono a fondo XML ma, d’altra parte, sono solitamente basati su uno specifico linguaggio XML. Questo significa che sono necessarie molte risorse, in termini di programmazione, per poterli adattare ad altri linguaggi XML o ad altri contesti. Basando l’architettura degli editor di markup su parametri, è possibile progettare e sviluppare software che non dipendono da uno specifico linguaggio XML e che possono essere personalizzati al fine di utilizzarli in svariati contesti.
Resumo:
From the beginning of the standardisation of language in Bosnia and Herzegovina, i.e. from the acceptance of Karadzic's phonetic spelling in the mid-19th century, to the present day when there are three different language standards in force - Bosniac (Muslim), Croatian and Serbian, language in Bosnia and Herzegovina has been a subject of political conflict. Documents on language policy from this period show the degree to which domestic and foreign political factors influenced the standard language issue, beginning with the very appellation for the specific norm regulation. The material analysed (proclamations by political, cultural and other organisations as well as corresponding constitutional and statutory provisions on language use) shows the differing treatment of the standard language in Bosnia and Herzegovina in different historical periods. During the period of Turkish rule (until 1878) there was no real political interest in the issue. Under Austro-Hungarian rule (1878-1918) there was an attempt to use the language as a means of forming a united Bosnian nation, but this was later abandoned. During the first Yugoslavia (1918-1941) a uniform solution was imposed on Bosnia and Herzegovina, as throughout the Serbo-Croatian language area, while under the Independent State of Croatia (1941-1945), the official language of Bosnia and Herzegovina was Croatian. The period from 1945 to 1991 had two phases: the first a standard language unity of Serbs, Croats, Muslims and Montenegrins (until 1965), and the second a gradual but stormy separation of national languages, which has been largely completed since 1991. The introductory study includes a detailed analysis of all the expressions used, with special reference to the present state, and accompanies the collection of documents which represent the main outcome of the research.