886 resultados para semi-autonomous information retrieval
Resumo:
BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.
Resumo:
Summary: Using WordNet in information retrieval
Resumo:
Abstract Textual autocorrelation is a broad and pervasive concept, referring to the similarity between nearby textual units: lexical repetitions along consecutive sentences, semantic association between neighbouring lexemes, persistence of discourse types (narrative, descriptive, dialogal...) and so on. Textual autocorrelation can also be negative, as illustrated by alternating phonological or morpho-syntactic categories, or the succession of word lengths. This contribution proposes a general Markov formalism for textual navigation, and inspired by spatial statistics. The formalism can express well-known constructs in textual data analysis, such as term-document matrices, references and hyperlinks navigation, (web) information retrieval, and in particular textual autocorrelation, as measured by Moran's I relatively to the exchange matrix associated to neighbourhoods of various possible types. Four case studies (word lengths alternation, lexical repulsion, parts of speech autocorrelation, and semantic autocorrelation) illustrate the theory. In particular, one observes a short-range repulsion between nouns together with a short-range attraction between verbs, both at the lexical and semantic levels. Résumé: Le concept d'autocorrélation textuelle, fort vaste, réfère à la similarité entre unités textuelles voisines: répétitions lexicales entre phrases successives, association sémantique entre lexèmes voisins, persistance du type de discours (narratif, descriptif, dialogal...) et ainsi de suite. L'autocorrélation textuelle peut être également négative, comme l'illustrent l'alternance entre les catégories phonologiques ou morpho-syntaxiques, ou la succession des longueurs de mots. Cette contribution propose un formalisme markovien général pour la navigation textuelle, inspiré par la statistique spatiale. Le formalisme est capable d'exprimer des constructions bien connues en analyse des données textuelles, telles que les matrices termes-documents, les références et la navigation par hyperliens, la recherche documentaire sur internet, et, en particulier, l'autocorélation textuelle, telle que mesurée par le I de Moran relatif à une matrice d'échange associée à des voisinages de différents types possibles. Quatre cas d'étude illustrent la théorie: alternance des longueurs de mots, répulsion lexicale, autocorrélation des catégories morpho-syntaxiques et autocorrélation sémantique. On observe en particulier une répulsion à courte portée entre les noms, ainsi qu'une attraction à courte portée entre les verbes, tant au niveau lexical que sémantique.
Resumo:
El objetivo de este proyecto es familiarizarse con las tecnologías de Semántica, entender que es una ontología y aprender a modelar una en un dominio elegido por nosotros. Realizar un parser que conectándose a la la Wikipedia y/o DBpedia rellene dicha ontología permitiendo al usuario navegar por sus conceptos y estudiar sus relaciones.
Resumo:
Software de lectura y población de ontología con información de DBpedia y Wikipedia.
Resumo:
Des d'aquest TFC volem estudiar l'evolució de la Web actual cap a la Web Semàntica.
Resumo:
Purpose This paper aims to analyse various aspects of an academic social network: the profile of users, the reasons for its use, its perceived benefits and the use of other social media for scholarly purposes. Design/methodology/approach The authors examined the profiles of the users of an academic social network. The users were affiliated with 12 universities. The following were recorded for each user: sex, the number of documents uploaded, the number of followers, and the number of people being followed. In addition, a survey was sent to the individuals who had an email address in their profile. Findings Half of the users of the social network were academics and a third were PhD students. Social sciences scholars accounted for nearly half of all users. Academics used the service to get in touch with other scholars, disseminate research results and follow other scholars. Other widely employed social media included citation indexes, document creation, edition and sharing tools and communication tools. Users complained about the lack of support for the utilisation of these tools. Research limitations/implications The results are based on a single case study. Originality/value This study provides new insights on the impact of social media in academic contexts by analysing the user profiles and benefits of a social network service that is specifically targeted at the academic community.
Resumo:
This piece of work which is Identification of Research Portfolio for Development of Filtration Equipment aims at presenting a novel approach to identify promising research topics in the field of design and development of filtration equipment and processes. The projected approach consists of identifying technological problems often encountered in filtration processes. The sources of information for the problem retrieval were patent documents and scientific papers that discussed filtration equipments and processes. The problem identification method adopted in this work focussed on the semantic nature of a sentence in order to generate series of subject-action-object structures. This was achieved with software called Knowledgist. List of problems often encountered in filtration processes that have been mentioned in patent documents and scientific papers were generated. These problems were carefully studied and categorized. Suggestions were made on the various classes of these problems that need further investigation in order to propose a research portfolio. The uses and importance of other methods of information retrieval were also highlighted in this work.
Resumo:
Web-portaalien aiheenmukaista luokittelua voidaan hyödyntää tunnistamaan käyttäjän kiinnostuksen kohteet keräämällä tilastotietoa hänen selaustottumuksistaan eri kategorioissa. Tämä diplomityö käsittelee web-sovelluksien osa-alueita, joissa kerättyä tilastotietoa voidaan hyödyntää personalisoinnissa. Yleisperiaatteet sisällön personalisoinnista, Internet-mainostamisesta ja tiedonhausta selitetään matemaattisia malleja käyttäen. Lisäksi työssä kuvaillaan yleisluontoiset ominaisuudet web-portaaleista sekä tilastotiedon keräämiseen liittyvät seikat.
Resumo:
En aquest treball es realitza un estudi sobre l'estat de l'art de la web semàntica i els seus estàndards actuals, més concretament sobre ontologies. Descriu també el procés pràctic emprat pel disseny i la implementació d'una ontologia en el domini concret de Twitter, en format OWL, fent servir l'aplicació Protégé per a la seva creació. Finalment explica la creació (captura de requeriments, disseny i implementació) d'una aplicació capaç d'obtenir dades reals de Twitter, processar-les per extreure'n la informació rellevant i emmagatzemar-la a la ontologia creada.
Resumo:
This paper presents a reflection on the need for libraries to think about how to facilitate access to the documentary sources they manage.As the number of resources available in electronic form increases, libraries are in the need to provide a simple and usable search tool that allows integrating the contents of the various information management systems they give access to.To define user expectations to the search interface, some of the features that they are accustomed to use in their requests for information on the Internet have been included.The technologies that allow the discovery layer implementation as a search tool that integrates the various information systems of the library are presented next. And below are some examples of implementations that work in line with the integration of various information sources into a single search engine, as models to consider for implementing a system of this kind.The purpose of it all is to present a state of the art of some cases of operational deployments as a starting point for any organization interested in improving access it offers to its resources on the basis of references study.
Resumo:
Summary : Fuzzy translation techniques in cross-language information retrieval between closely related languages
Resumo:
Purpose- This paper aims to analyse various aspects of an academic social network: the profile of users, the reasons for its use, its perceived benefits and the use of other social media for scholarly purposes. Design/methodology/approach- The authors examined the profiles of the users of an academic social network. The users were affiliated with 12 universities. The following were recorded for each user: sex, the number of documents uploaded, the number of followers, and the number of people being followed. In addition, a survey was sent to the individuals who had an email address in their profile. Findings- Half of the users of the social network were academics and a third were PhD students. Social sciences scholars accounted for nearly half of all users. Academics used the service to get in touch with other scholars, disseminate research results and follow other scholars. Other widely employed social media included citation indexes, document creation, edition and sharing tools and communication tools. Users complained about the lack of support for the utilisation of these tools. Research limitations/implications- The results are based on a single case study. Originality/value- This study provides new insights on the impact of social media in academic contexts by analysing the user profiles and benefits of a social network service that is specifically targeted at the academic community.
Resumo:
Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.