894 resultados para Query expansion, Text mining, Information retrieval, Chinese IR


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ce travail porte sur la construction d’un corpus étalon pour l’évaluation automatisée des extracteurs de termes. Ces programmes informatiques, conçus pour extraire automatiquement les termes contenus dans un corpus, sont utilisés dans différentes applications, telles que la terminographie, la traduction, la recherche d’information, l’indexation, etc. Ainsi, leur évaluation doit être faite en fonction d’une application précise. Une façon d’évaluer les extracteurs consiste à annoter toutes les occurrences des termes dans un corpus, ce qui nécessite un protocole de repérage et de découpage des unités terminologiques. À notre connaissance, il n’existe pas de corpus annoté bien documenté pour l’évaluation des extracteurs. Ce travail vise à construire un tel corpus et à décrire les problèmes qui doivent être abordés pour y parvenir. Le corpus étalon que nous proposons est un corpus entièrement annoté, construit en fonction d’une application précise, à savoir la compilation d’un dictionnaire spécialisé de la mécanique automobile. Ce corpus rend compte de la variété des réalisations des termes en contexte. Les termes sont sélectionnés en fonction de critères précis liés à l’application, ainsi qu’à certaines propriétés formelles, linguistiques et conceptuelles des termes et des variantes terminologiques. Pour évaluer un extracteur au moyen de ce corpus, il suffit d’extraire toutes les unités terminologiques du corpus et de comparer, au moyen de métriques, cette liste à la sortie de l’extracteur. On peut aussi créer une liste de référence sur mesure en extrayant des sous-ensembles de termes en fonction de différents critères. Ce travail permet une évaluation automatique des extracteurs qui tient compte du rôle de l’application. Cette évaluation étant reproductible, elle peut servir non seulement à mesurer la qualité d’un extracteur, mais à comparer différents extracteurs et à améliorer les techniques d’extraction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research on music information behavior demonstrates that people rely primarily on others to discover new music. This paper reports on a qualitative study aiming at exploring more in-depth how music information circulates within the social networks of late adolescents and the role the different people involved in the process play. In-depth interviews were conducted with 19 adolescents (15-17 years old). The analysis revealed that music opinion leaders showed eagerness to share music information, tended to seek music information on an ongoing basis, and were perceived as being more knowledgeable than others in music. It was found that the ties that connected participants to opinion leaders were predominantly strong ties, which suggests that trustworthiness is an important component of credibility. These findings could potentially help identify new avenues for the improvement of music recommender systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Les filtres de recherche bibliographique optimisés visent à faciliter le repérage de l’information dans les bases de données bibliographiques qui sont presque toujours la source la plus abondante d’évidences scientifiques. Ils contribuent à soutenir la prise de décisions basée sur les évidences. La majorité des filtres disponibles dans la littérature sont des filtres méthodologiques. Mais pour donner tout leur potentiel, ils doivent être combinés à des filtres permettant de repérer les études couvrant un sujet particulier. Dans le champ de la sécurité des patients, il a été démontré qu’un repérage déficient de l’information peut avoir des conséquences tragiques. Des filtres de recherche optimisés couvrant le champ pourraient s’avérer très utiles. La présente étude a pour but de proposer des filtres de recherche bibliographique optimisés pour le champ de la sécurité des patients, d’évaluer leur validité, et de proposer un guide pour l’élaboration de filtres de recherche. Nous proposons des filtres optimisés permettant de repérer des articles portant sur la sécurité des patients dans les organisations de santé dans les bases de données Medline, Embase et CINAHL. Ces filtres réalisent de très bonnes performances et sont spécialement construits pour les articles dont le contenu est lié de façon explicite au champ de la sécurité des patients par leurs auteurs. La mesure dans laquelle on peut généraliser leur utilisation à d’autres contextes est liée à la définition des frontières du champ de la sécurité des patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette thèse étudie des modèles de séquences de haute dimension basés sur des réseaux de neurones récurrents (RNN) et leur application à la musique et à la parole. Bien qu'en principe les RNN puissent représenter les dépendances à long terme et la dynamique temporelle complexe propres aux séquences d'intérêt comme la vidéo, l'audio et la langue naturelle, ceux-ci n'ont pas été utilisés à leur plein potentiel depuis leur introduction par Rumelhart et al. (1986a) en raison de la difficulté de les entraîner efficacement par descente de gradient. Récemment, l'application fructueuse de l'optimisation Hessian-free et d'autres techniques d'entraînement avancées ont entraîné la recrudescence de leur utilisation dans plusieurs systèmes de l'état de l'art. Le travail de cette thèse prend part à ce développement. L'idée centrale consiste à exploiter la flexibilité des RNN pour apprendre une description probabiliste de séquences de symboles, c'est-à-dire une information de haut niveau associée aux signaux observés, qui en retour pourra servir d'à priori pour améliorer la précision de la recherche d'information. Par exemple, en modélisant l'évolution de groupes de notes dans la musique polyphonique, d'accords dans une progression harmonique, de phonèmes dans un énoncé oral ou encore de sources individuelles dans un mélange audio, nous pouvons améliorer significativement les méthodes de transcription polyphonique, de reconnaissance d'accords, de reconnaissance de la parole et de séparation de sources audio respectivement. L'application pratique de nos modèles à ces tâches est détaillée dans les quatre derniers articles présentés dans cette thèse. Dans le premier article, nous remplaçons la couche de sortie d'un RNN par des machines de Boltzmann restreintes conditionnelles pour décrire des distributions de sortie multimodales beaucoup plus riches. Dans le deuxième article, nous évaluons et proposons des méthodes avancées pour entraîner les RNN. Dans les quatre derniers articles, nous examinons différentes façons de combiner nos modèles symboliques à des réseaux profonds et à la factorisation matricielle non-négative, notamment par des produits d'experts, des architectures entrée/sortie et des cadres génératifs généralisant les modèles de Markov cachés. Nous proposons et analysons également des méthodes d'inférence efficaces pour ces modèles, telles la recherche vorace chronologique, la recherche en faisceau à haute dimension, la recherche en faisceau élagué et la descente de gradient. Finalement, nous abordons les questions de l'étiquette biaisée, du maître imposant, du lissage temporel, de la régularisation et du pré-entraînement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work is aimed at building an adaptable frame-based system for processing Dravidian languages. There are about 17 languages in this family and they are spoken by the people of South India.Karaka relations are one of the most important features of Indian languages. They are the semabtuco-syntactic relations between verbs and other related constituents in a sentence. The karaka relations and surface case endings are analyzed for meaning extraction. This approach is comparable with the borad class of case based grammars.The efficiency of this approach is put into test in two applications. One is machine translation and the other is a natural language interface (NLI) for information retrieval from databases. The system mainly consists of a morphological analyzer, local word grouper, a parser for the source language and a sentence generator for the target language. This work make contributios like, it gives an elegant account of the relation between vibhakthi and karaka roles in Dravidian languages. This mapping is elegant and compact. The same basic thing also explains simple and complex sentence in these languages. This suggests that the solution is not just ad hoc but has a deeper underlying unity. This methodology could be extended to other free word order languages. Since the frame designed for meaning representation is general, they are adaptable to other languages coming in this group and to other applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present study is an attempt to highlight the problem of typographical errors in OPACS. The errors made while typing catalogue entries as well as importing bibliographical records from other libraries exist unnoticed by librarians resulting the non-retrieval of available records and affecting the quality of OPACs. This paper follows previous research on the topic mainly by Jeffrey Beall and Terry Ballard. The word “management” was chosen from the list of likely to be misspelled words identified by previous research. It was found that the word is wrongly entered in several forms in local, national and international OPACs justifying the observations of Ballard that typos occur in almost everywhere. Though there are lots of corrective measures proposed and are in use, the study asserts the fact that human effort is needed to get rid of the problem. The paper is also an invitation to the library professionals and system designers to construct a strategy to solve the issue

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present study is an attempt to highlight the problem of typographical errors in OPACS. The errors made while typing catalogue entries as well as importing bibliographical records from other libraries exist unnoticed by librarians resulting the non-retrieval of available records and affecting the quality of OPACs. This paper follows previous research on the topic mainly by Jeffrey Beall and Terry Ballard. The word “management” was chosen from the list of likely to be misspelled words identified by previous research. It was found that the word is wrongly entered in several forms in local, national and international OPACs justifying the observations of Ballard that typos occur in almost everywhere. Though there are lots of corrective measures proposed and are in use, the study asserts the fact that human effort is needed to get rid of the problem. The paper is also an invitation to the library professionals and system designers to construct a strategy to solve the issue

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Formal Concept Analysis allows to derive conceptual hierarchies from data tables. Formal Concept Analysis is applied in various domains, e.g., data analysis, information retrieval, and knowledge discovery in databases. In order to deal with increasing sizes of the data tables (and to allow more complex data structures than just binary attributes), conceputal scales habe been developed. They are considered as metadata which structure the data conceptually. But in large applications, the number of conceptual scales increases as well. Techniques are needed which support the navigation of the user also on this meta-level of conceptual scales. In this paper, we attack this problem by extending the set of scales by hierarchically ordered higher level scales and by introducing a visualization technique called nested scaling. We extend the two-level architecture of Formal Concept Analysis (the data table plus one level of conceptual scales) to many-level architecture with a cascading system of conceptual scales. The approach also allows to use representation techniques of Formal Concept Analysis for the visualization of thesauri and ontologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social resource sharing systems like YouTube and del.icio.us have acquired a large number of users within the last few years. They provide rich resources for data analysis, information retrieval, and knowledge discovery applications. A first step towards this end is to gain better insights into content and structure of these systems. In this paper, we will analyse the main network characteristics of two of the systems. We consider their underlying data structures – socalled folksonomies – as tri-partite hypergraphs, and adapt classical network measures like characteristic path length and clustering coefficient to them. Subsequently, we introduce a network of tag co-occurrence and investigate some of its statistical properties, focusing on correlations in node connectivity and pointing out features that reflect emergent semantics within the folksonomy. We show that simple statistical indicators unambiguously spot non-social behavior such as spam.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social resource sharing systems like YouTube and del.icio.us have acquired a large number of users within the last few years. They provide rich resources for data analysis, information retrieval, and knowledge discovery applications. A first step towards this end is to gain better insights into content and structure of these systems. In this paper, we will analyse the main network characteristics of two of these systems. We consider their underlying data structures – so-called folksonomies – as tri-partite hypergraphs, and adapt classical network measures like characteristic path length and clustering coefficient to them. Subsequently, we introduce a network of tag cooccurrence and investigate some of its statistical properties, focusing on correlations in node connectivity and pointing out features that reflect emergent semantics within the folksonomy. We show that simple statistical indicators unambiguously spot non-social behavior such as spam.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Los nuevos Planes de Estudio adaptados al Espacio Europeo de la Educación Superior ya están implantándose en nuestras universidades y esto ha supuesto plantearnos qué cambios debemos incorporar como docentes, para que nuestros estudiantes consigan las competencias, habilidades y destrezas necesarias para ser profesionales competentes en un futuro cercano

Relevância:

100.00% 100.00%

Publicador:

Resumo:

These slides support students in understanding how to respond to the challenge of: "I’ve been told not to use Google or Wikipedia to research my essay. What else is there?" The powerpoint guides students in how to identify high quality, up to date and relevant resources on the web that they can reliably draw upon for their academic assignments. The slides were created by the subject liaison librarian who supports the School of Electronics and Computer Science at the UNiversity of Southampton, Fiona Nichols.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this talk we explore the trade-offs involved and the main problems that come with big data using the Web as case study: scalability, redundancy, bias, noise, spam, and privacy. Speaker Biography Ricardo Baeza-Yates Ricardo Baeza-Yates is VP of Research for Yahoo Labs leading teams in United States, Europe and Latin America since 2006 and based in Sunnyvale, California, since August 2014. During this time he has lead the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract A frequent assumption in Social Media is that its open nature leads to a representative view of the world. In this talk we want to consider bias occurring in the Social Web. We will consider a case study of liquid feedback, a direct democracy platform of the German pirate party as well as models of (non-)discriminating systems. As a conclusion of this talk we stipulate the need of Social Media systems to bias their working according to social norms and to publish the bias they introduce. Speaker Biography: Prof Steffen Staab Steffen studied in Erlangen (Germany), Philadelphia (USA) and Freiburg (Germany) computer science and computational linguistics. Afterwards he worked as researcher at Uni. Stuttgart/Fraunhofer and Univ. Karlsruhe, before he became professor in Koblenz (Germany). Since March 2015 he also holds a chair for Web and Computer Science at Univ. of Southampton sharing his time between here and Koblenz. In his research career he has managed to avoid almost all good advice that he now gives to his team members. Such advise includes focusing on research (vs. company) or concentrating on only one or two research areas (vs. considering ontologies, semantic web, social web, data engineering, text mining, peer-to-peer, multimedia, HCI, services, software modelling and programming and some more). Though, actually, improving how we understand and use text and data is a good common denominator for a lot of Steffen's professional activities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is genetic evidence of similarities and differences among autoimmune diseases (AIDs) that warrants looking at a general panorama of what has been published. Thus, our aim was to determine the main shared genes and to what extent they contribute to building clusters of AIDs. We combined a text-mining approach to build clusters of genetic concept profiles (GCPs) from the literature in MedLine with knowledge of protein-protein interactions to confirm if genes in GCP encode proteins that truly interact. We found three clusters in which the genes with the highest contribution encoded proteins that showed strong and specific interactions. After projecting the AIDs on a plane, two clusters could be discerned: Sjögren’s syndrome—systemic lupus erythematosus, and autoimmune thyroid disease—type1 diabetes—rheumatoid arthritis. Our results support the common origin of AIDs and the role of genes involved in apoptosis such as CTLA4, FASLG, and IL10.