925 resultados para Semantic Web, Exploratory Search, Recommendation Systems
Resumo:
Pós-graduação em Ciência da Informação - FFC
Resumo:
The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
Matita (that means pencil in Italian) is a new interactive theorem prover under development at the University of Bologna. When compared with state-of-the-art proof assistants, Matita presents both traditional and innovative aspects. The underlying calculus of the system, namely the Calculus of (Co)Inductive Constructions (CIC for short), is well-known and is used as the basis of another mainstream proof assistant—Coq—with which Matita is to some extent compatible. In the same spirit of several other systems, proof authoring is conducted by the user as a goal directed proof search, using a script for storing textual commands for the system. In the tradition of LCF, the proof language of Matita is procedural and relies on tactic and tacticals to proceed toward proof completion. The interaction paradigm offered to the user is based on the script management technique at the basis of the popularity of the Proof General generic interface for interactive theorem provers: while editing a script the user can move forth the execution point to deliver commands to the system, or back to retract (or “undo”) past commands. Matita has been developed from scratch in the past 8 years by several members of the Helm research group, this thesis author is one of such members. Matita is now a full-fledged proof assistant with a library of about 1.000 concepts. Several innovative solutions spun-off from this development effort. This thesis is about the design and implementation of some of those solutions, in particular those relevant for the topic of user interaction with theorem provers, and of which this thesis author was a major contributor. Joint work with other members of the research group is pointed out where needed. The main topics discussed in this thesis are briefly summarized below. Disambiguation. Most activities connected with interactive proving require the user to input mathematical formulae. Being mathematical notation ambiguous, parsing formulae typeset as mathematicians like to write down on paper is a challenging task; a challenge neglected by several theorem provers which usually prefer to fix an unambiguous input syntax. Exploiting features of the underlying calculus, Matita offers an efficient disambiguation engine which permit to type formulae in the familiar mathematical notation. Step-by-step tacticals. Tacticals are higher-order constructs used in proof scripts to combine tactics together. With tacticals scripts can be made shorter, readable, and more resilient to changes. Unfortunately they are de facto incompatible with state-of-the-art user interfaces based on script management. Such interfaces indeed do not permit to position the execution point inside complex tacticals, thus introducing a trade-off between the usefulness of structuring scripts and a tedious big step execution behavior during script replaying. In Matita we break this trade-off with tinycals: an alternative to a subset of LCF tacticals which can be evaluated in a more fine-grained manner. Extensible yet meaningful notation. Proof assistant users often face the need of creating new mathematical notation in order to ease the use of new concepts. The framework used in Matita for dealing with extensible notation both accounts for high quality bidimensional rendering of formulae (with the expressivity of MathMLPresentation) and provides meaningful notation, where presentational fragments are kept synchronized with semantic representation of terms. Using our approach interoperability with other systems can be achieved at the content level, and direct manipulation of formulae acting on their rendered forms is possible too. Publish/subscribe hints. Automation plays an important role in interactive proving as users like to delegate tedious proving sub-tasks to decision procedures or external reasoners. Exploiting the Web-friendliness of Matita we experimented with a broker and a network of web services (called tutors) which can try independently to complete open sub-goals of a proof, currently being authored in Matita. The user receives hints from the tutors on how to complete sub-goals and can interactively or automatically apply them to the current proof. Another innovative aspect of Matita, only marginally touched by this thesis, is the embedded content-based search engine Whelp which is exploited to various ends, from automatic theorem proving to avoiding duplicate work for the user. We also discuss the (potential) reusability in other systems of the widgets presented in this thesis and how we envisage the evolution of user interfaces for interactive theorem provers in the Web 2.0 era.
Resumo:
In questa tesi partendo dai limiti sintattici dello scambio di Electronic Patient Records (EHRs), si arriva alla creazione di un framework che supporti lo scambio di informazioni semantiche. Il framework creato si chiama Semantic TuCSoN ed è una estensione di TuCSoN (Tuple Centres Spread over the Network). Semantic TuCSoN viene modellato per il contesto eHealth definendo gli agenti e le politiche di coordinamento atte allo scambio di EHR. Questo framework vine infine testa per verificarne le performance allo scopo di valutare un suo ulteriore utilizzo.
Resumo:
La presente tesi illustra e discute due attività legate all'ambito dei siti web, ovvero la localizzazione e l'ottimizzazione per i motori di ricerca (o SEO, dall'inglese "Search Engine Optimization"). Quest'ultima è un'attività mirata a permettere che i siti stessi ottengano un posizionamento migliore nella pagina dei risultati dei motori di ricerca e siano dunque più visibili agli utenti. Poiché la SEO prevede vari interventi sui siti web, alcuni dei quali implicano la manipolazione di codice HTML, essa viene spesso considerata come un'attività strettamente informatica. L'obiettivo della presente tesi, dunque, è quello di illustrare come i traduttori possano sfruttare le proprie competenze linguistiche per dedicarsi non soltanto alla localizzazione di siti web, ma anche alla loro ottimizzazione per i motori di ricerca. Per dimostrare l'applicabilità di tali tecniche è stato utilizzato come esempio pratico il sito web de "Il Palio di San Donato", un sito gestito dal Comune di Cividale del Friuli e finalizzato alla descrizione dell'omonima rievocazione storica cittadina. La tesi si compone di quattro capitoli. Nel primo capitolo vengono introdotti i principi teorici alla base della localizzazione di siti web, della SEO, della scrittura per il web e della traduzione per il settore turistico. Nel secondo capitolo, invece, viene descritto il sito del Palio di San Donato, esaminandone in particolare la struttura e i contenuti. Il terzo capitolo è dedicato alla descrizione del progetto di localizzazione a cui è stato sottoposto il sito in esame. Infine, il quarto capitolo contiene un breve commento relativo alle problematiche linguistiche, culturali e tecnologiche riscontrate durante il processo traduttivo e un elenco di strategie di SEO applicate a cinque pagine del sito web in esame, selezionate sulla base della possibilità di illustrare il maggior numero possibile di interventi di SEO attuabili dai traduttori.
Resumo:
Ce mémoire a comme objectif de montrer le processus de localisation en langue italienne d’un site Internet français, celui du Parc de loisir du Lac de Maine. En particulier, le but du mémoire est de démontrer que, lorsqu’on parle de localisation pour le Web, on doit tenir compte de deux facteurs essentiels, qui contribuent de manière exceptionnelle au succès du site sur le Réseau Internet. D’un côté, l’utilisabilité du site Web, dite également ergonomie du Web, qui a pour objectif de rendre les sites Web plus aisés d'utilisation pour l'utilisateur final, de manière que son rapprochement au site soit intuitif et simple. De l’autre côté, l’optimisation pour les moteurs de recherche, couramment appelée « SEO », acronyme de son appellation anglais, qui cherche à découvrir les meilleures techniques visant à optimiser la visibilité d'un site web dans les pages de résultats de recherche. En améliorant le positionnement d'une page web dans les pages de résultats de recherche des moteurs, le site a beaucoup plus de possibilités d’augmenter son trafic et, donc, son succès. Le premier chapitre de ce mémoire introduit la localisation, avec une approche théorique qui en illustre les caractéristiques principales ; il contient aussi des références à la naissance et l’origine de la localisation. On introduit aussi le domaine du site qu’on va localiser, c’est-à-dire le domaine du tourisme, en soulignant l’importance de la langue spéciale du tourisme. Le deuxième chapitre est dédié à l’optimisation pour les moteurs de recherche et à l’ergonomie Web. Enfin, le dernier chapitre est consacré au travail de localisation sur le site du Parc : on analyse le site, ses problèmes d’optimisation et d’ergonomie, et on montre toutes les phases du processus de localisation, y compris l’intégration de plusieurs techniques visant à améliorer la facilité d’emploi par les utilisateurs finaux, ainsi que le positionnement du site dans les pages de résultats des moteurs de recherche.
Resumo:
La tesi tratta i concetti fondamentali legati alla "Search Engine Optimization", ovvero all’ottimizzazione dei siti web per i motori di ricerca. La SEO è un’attività multidisciplinare che coinvolge aspetti tecnici dello sviluppo web e princìpi di web marketing, allo scopo di migliorare la visibilità di un sito nelle pagine di risposta di un motore di ricerca. All’interno dell’elaborato viene analizzato dapprima il funzionamento dei motori di ricerca, con particolare riferimento al mondo Google; in seguito vengono esaminate le diverse tecniche di ottimizzazione “on-page” di un sito (codice, architettura, contenuti) e le strategie “off-page” volte a migliorare reputazione, popolarità e autorevolezza del sito stesso.
Resumo:
Microblogging is the new Web 2.0 hype in the media. Techies, politicians, family members and many more use Twitter to keep in touch with their interest groups, their voters or their friends and relatives. We wanted to know whether Twitter can also keep us aware about our team colleagues, how this improves teamwork and finally why Twitter is accepted and used in teams. Based on an action research study about Twitter usage in a team of seven researchers and the findings of prior literature, we attempt to extend the unified theory of technology acceptance (Venkatesh 2003) and adapt it to the specific context of microblogging in teams. Extending the performance expectancy construct, we propose two groups of factors inherent to social software that should be integrated into the UTAUT: the task characteristics of other users and the individual motivations for using social software
Resumo:
Grigorij Kreidlin (Russia). A Comparative Study of Two Semantic Systems: Body Russian and Russian Phraseology. Mr. Kreidlin teaches in the Department of Theoretical and Applied Linguistics of the State University of Humanities in Moscow and worked on this project from August 1996 to July 1998. The classical approach to non-verbal and verbal oral communication is based on a traditional separation of body and mind. Linguists studied words and phrasemes, the products of mind activities, while gestures, facial expressions, postures and other forms of body language were left to anthropologists, psychologists, physiologists, and indeed to anyone but linguists. Only recently have linguists begun to turn their attention to gestures and semiotic and cognitive paradigms are now appearing that raise the question of designing an integral model for the unified description of non-verbal and verbal communicative behaviour. This project attempted to elaborate lexical and semantic fragments of such a model, producing a co-ordinated semantic description of the main Russian gestures (including gestures proper, postures and facial expressions) and their natural language analogues. The concept of emblematic gestures and gestural phrasemes and of their semantic links permitted an appropriate description of the transformation of a body as a purely physical substance into a body as a carrier of essential attributes of Russian culture - the semiotic process called the culturalisation of the human body. Here the human body embodies a system of cultural values and displays them in a text within the area of phraseology and some other important language domains. The goal of this research was to develop a theory that would account for the fundamental peculiarities of the process. The model proposed is based on the unified lexicographic representation of verbal and non-verbal units in the Dictionary of Russian Gestures, which the Mr. Kreidlin had earlier complied in collaboration with a group of his students. The Dictionary was originally oriented only towards reflecting how the lexical competence of Russian body language is represented in the Russian mind. Now a special type of phraseological zone has been designed to reflect explicitly semantic relationships between the gestures in the entries and phrasemes and to provide the necessary information for a detailed description of these. All the definitions, rules of usage and the established correlations are written in a semantic meta-language. Several classes of Russian gestural phrasemes were identified, including those phrasemes and idioms with semantic definitions close to those of the corresponding gestures, those phraseological units that have lost touch with the related gestures (although etymologically they are derived from gestures that have gone out of use), and phrasemes and idioms which have semantic traces or reflexes inherited from the meaning of the related gestures. The basic assumptions and practical considerations underlying the work were as follows. (1) To compare meanings one has to be able to state them. To state the meaning of a gesture or a phraseological expression, one needs a formal semantic meta-language of propositional character that represents the cognitive and mental aspects of the codes. (2) The semantic contrastive analysis of any semiotic codes used in person-to-person communication also requires a single semantic meta-language, i.e. a formal semantic language of description,. This language must be as linguistically and culturally independent as possible and yet must be open to interpretation through any culture and code. Another possible method of conducting comparative verbal-non-verbal semantic research is to work with different semantic meta-languages and semantic nets and to learn how to combine them, translate from one to another, etc. in order to reach a common basis for the subsequent comparison of units. (3) The practical work in defining phraseological units and organising the phraseological zone in the Dictionary of Russian Gestures unexpectedly showed that semantic links between gestures and gestural phrasemes are reflected not only in common semantic elements and syntactic structure of semantic propositions, but also in general and partial cognitive operations that are made over semantic definitions. (4) In comparative semantic analysis one should take into account different values and roles of inner form and image components in the semantic representation of non-verbal and verbal units. (5) For the most part, gestural phrasemes are direct semantic derivatives of gestures. The cognitive and formal techniques can be regarded as typological features for the future functional-semantic classification of gestural phrasemes: two phrasemes whose meaning can be obtained by the same cognitive or purely syntactic operations (or types of operations) over the meanings of the corresponding gestures, belong by definition to one and the same class. The nature of many cognitive operations has not been studied well so far, but the first steps towards its comprehension and description have been taken. The research identified 25 logically possible classes of relationships between a gesture and a gestural phraseme. The calculation is based on theoretically possible formal (set-theory) correlations between signifiers and signified of the non-verbal and verbal units. However, in order to examine which of them are realised in practice a complete semantic and lexicographic description of all (not only central) everyday emblems and gestural phrasemes is required and this unfortunately does not yet exist. Mr. Kreidlin suggests that the results of the comparative analysis of verbal and non-verbal units could also be used in other research areas such as the lexicography of emotions.