910 resultados para Machine-readable bibliographic data
Resumo:
Linked Data is not always published with a license. Sometimes a wrong license type is used, like a license for software, or it is not expressed in a standard, machine readable manner. Yet, Linked Data resources may be subject to intellectual property and database laws, may contain personal data subject to privacy restrictions or may even contain important trade secrets. The proper declaration of which rights are held, waived or licensed is a must for the lawful use of Linked Data at its different granularity levels, from the simple RDF statement to a dataset or a mapping. After comparing the current practice with the actual needs, six research questions are posed.
Resumo:
The application of Linked Data technology to the publication of linguistic data promises to facilitate interoperability of these data and has lead to the emergence of the so called Linguistic Linked Data Cloud (LLD) in which linguistic data is published following the Linked Data principles. Three essential issues need to be addressed for such data to be easily exploitable by language technologies: i) appropriate machine-readable licensing information is needed for each dataset, ii) minimum quality standards for Linguistic Linked Data need to be defined, and iii) appropriate vocabularies for publishing Linguistic Linked Data resources are needed. We propose the notion of Licensed Linguistic Linked Data (3LD) in which different licensing models might co-exist, from totally open to more restrictive licenses through to completely closed datasets.
Resumo:
MARC 21 (‘Machine-Readable Cataloguing’) is a US library standard established worldwide and recently translated also in Bulgarian (those parts used most by librarians in their everyday work). The Bulgarian translations are freely available on the NALIS website (http://www.nalis.bg/) under the Library Standards Section, where also an Online Multilingual Dictionary of MARC 21 Terms can be found. All these works are approved by the US Library of Congress and published on its MARC 21 website under Translations (http://www.loc.gov/marc/translations.html#bulgarian).
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
A Internet das Coisas tal como o Big Data e a análise dos dados são dos temas mais discutidos ao querermos observar ou prever as tendências do mercado para as próximas décadas, como o volume económico, financeiro e social, pelo que será relevante perceber a importância destes temas na atualidade. Nesta dissertação será descrita a origem da Internet das Coisas, a sua definição (por vezes confundida com o termo Machine to Machine, redes interligadas de máquinas controladas e monitorizadas remotamente e que possibilitam a troca de dados (Bahga e Madisetti 2014)), o seu ecossistema que envolve a tecnologia, software, dispositivos, aplicações, a infra-estrutura envolvente, e ainda os aspetos relacionados com a segurança, privacidade e modelos de negócios da Internet das Coisas. Pretende-se igualmente explicar cada um dos “Vs” associados ao Big Data: Velocidade, Volume, Variedade e Veracidade, a importância da Business Inteligence e do Data Mining, destacando-se algumas técnicas utilizadas de modo a transformar o volume dos dados em conhecimento para as empresas. Um dos objetivos deste trabalho é a análise das áreas de IoT, modelos de negócio e as implicações do Big Data e da análise de dados como elementos chave para a dinamização do negócio de uma empresa nesta área. O mercado da Internet of Things tem vindo a ganhar dimensão, fruto da Internet e da tecnologia. Devido à importância destes dois recursos e á falta de estudos em Portugal neste campo, com esta dissertação, sustentada na metodologia do “Estudo do Caso”, pretende-se dar a conhecer a experiência portuguesa no mercado da Internet das Coisas. Visa-se assim perceber quais os mecanismos utilizados para trabalhar os dados, a metodologia, sua importância, que consequências trazem para o modelo de negócio e quais as decisões tomadas com base nesses mesmos dados. Este estudo tem ainda como objetivo incentivar empresas portuguesas que estejam neste mercado ou que nele pretendam aceder, a adoptarem estratégias, mecanismos e ferramentas concretas no que diz respeito ao Big Data e análise dos dados.
Resumo:
Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.
Resumo:
In this paper, we describe an interdisciplinary project in which visualization techniques were developed for and applied to scholarly work from literary studies. The aim was to bring Christof Schöch's electronic edition of Bérardier de Bataut's Essai sur le récit (1776) to the web. This edition is based on the Text Encoding Initiative's XML-based encoding scheme (TEI P5, subset TEI-Lite). This now de facto standard applies to machine-readable texts used chiefly in the humanities and social sciences. The intention of this edition is to make the edited text freely available on the web, to allow for alternative text views (here original and modern/corrected text), to ensure reader-friendly annotation and navigation, to permit on-line collaboration in encoding and annotation as well as user comments, all in an open source, generically usable, lightweight package. These aims were attained by relying on a GPL-based, public domain CMS (Drupal) and combining it with XSL-Stylesheets and Java Script.
Resumo:
With the advent of mass digitization projects, such as the Google Book Search, a peculiar shift has occurred in the way that copyright works are dealt with. Contrary to what has so far been the case, works are turned into machine-readable data to be automatically processed for various purposes without the expression of works being displayed to the public. In the Google Book Settlement Agreement, this new kind of usage is referred to as ‘non-display uses’ of digital works. The legitimacy of these uses has not yet been tested by Courts and does not comfortably fit in the current copyright doctrine, plainly because the works are not used as works but as something else, namely as data. Since non-display uses may prove to be a very lucrative market in the near future, with the potential to affect the way people use copyright works, we examine non-display uses under the prism of copyright principles to determine the boundaries of their legitimacy. Through this examination, we provide a categorization of the activities carried out under the heading of ‘non-display uses’, we examine their lawfulness under the current copyright doctrine and approach the phenomenon from the spectrum of data protection law that could apply, by analogy, to the use of copyright works as processable data.
Resumo:
The FunFOLD2 server is a new independent server that integrates our novel protein–ligand binding site and quality assessment protocols for the prediction of protein function (FN) from sequence via structure. Our guiding principles were, first, to provide a simple unified resource to make our function prediction software easily accessible to all via a simple web interface and, second, to produce integrated output for predictions that can be easily interpreted. The server provides a clean web interface so that results can be viewed on a single page and interpreted by non-experts at a glance. The output for the prediction is an image of the top predicted tertiary structure annotated to indicate putative ligand-binding site residues. The results page also includes a list of the most likely binding site residues and the types of predicted ligands and their frequencies in similar structures. The protein–ligand interactions can also be interactively visualized in 3D using the Jmol plug-in. The raw machine readable data are provided for developers, which comply with the Critical Assessment of Techniques for Protein Structure Prediction data standards for FN predictions. The FunFOLD2 webserver is freely available to all at the following web site: http://www.reading.ac.uk/bioinf/FunFOLD/FunFOLD_form_2_0.html.
Resumo:
Human brain imaging techniques, such as Magnetic Resonance Imaging (MRI) or Diffusion Tensor Imaging (DTI), have been established as scientific and diagnostic tools and their adoption is growing in popularity. Statistical methods, machine learning and data mining algorithms have successfully been adopted to extract predictive and descriptive models from neuroimage data. However, the knowledge discovery process typically requires also the adoption of pre-processing, post-processing and visualisation techniques in complex data workflows. Currently, a main problem for the integrated preprocessing and mining of MRI data is the lack of comprehensive platforms able to avoid the manual invocation of preprocessing and mining tools, that yields to an error-prone and inefficient process. In this work we present K-Surfer, a novel plug-in of the Konstanz Information Miner (KNIME) workbench, that automatizes the preprocessing of brain images and leverages the mining capabilities of KNIME in an integrated way. K-Surfer supports the importing, filtering, merging and pre-processing of neuroimage data from FreeSurfer, a tool for human brain MRI feature extraction and interpretation. K-Surfer automatizes the steps for importing FreeSurfer data, reducing time costs, eliminating human errors and enabling the design of complex analytics workflow for neuroimage data by leveraging the rich functionalities available in the KNIME workbench.
Resumo:
Pós-graduação em Ciência da Informação - FFC
Resumo:
Concept drift, which refers to non stationary learning problems over time, has increasing importance in machine learning and data mining. Many concept drift applications require fast response, which means an algorithm must always be (re)trained with the latest available data. But the process of data labeling is usually expensive and/or time consuming when compared to acquisition of unlabeled data, thus usually only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are based on assumptions that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenging task in machine learning. Recently, a particle competition and cooperation approach has been developed to realize graph-based semi-supervised learning from static data. We have extend that approach to handle data streams and concept drift. The result is a passive algorithm which uses a single classifier approach, naturally adapted to concept changes without any explicit drift detection mechanism. It has built-in mechanisms that provide a natural way of learning from new data, gradually "forgetting" older knowledge as older data items are no longer useful for the classification of newer data items. The proposed algorithm is applied to the KDD Cup 1999 Data of network intrusion, showing its effectiveness.
Resumo:
This thesis concerns artificially intelligent natural language processing systems that are capable of learning the properties of lexical items (properties like verbal valency or inflectional class membership) autonomously while they are fulfilling their tasks for which they have been deployed in the first place. Many of these tasks require a deep analysis of language input, which can be characterized as a mapping of utterances in a given input C to a set S of linguistically motivated structures with the help of linguistic information encoded in a grammar G and a lexicon L: G + L + C → S (1) The idea that underlies intelligent lexical acquisition systems is to modify this schematic formula in such a way that the system is able to exploit the information encoded in S to create a new, improved version of the lexicon: G + L + S → L' (2) Moreover, the thesis claims that a system can only be considered intelligent if it does not just make maximum usage of the learning opportunities in C, but if it is also able to revise falsely acquired lexical knowledge. So, one of the central elements in this work is the formulation of a couple of criteria for intelligent lexical acquisition systems subsumed under one paradigm: the Learn-Alpha design rule. The thesis describes the design and quality of a prototype for such a system, whose acquisition components have been developed from scratch and built on top of one of the state-of-the-art Head-driven Phrase Structure Grammar (HPSG) processing systems. The quality of this prototype is investigated in a series of experiments, in which the system is fed with extracts of a large English corpus. While the idea of using machine-readable language input to automatically acquire lexical knowledge is not new, we are not aware of a system that fulfills Learn-Alpha and is able to deal with large corpora. To instance four major challenges of constructing such a system, it should be mentioned that a) the high number of possible structural descriptions caused by highly underspeci ed lexical entries demands for a parser with a very effective ambiguity management system, b) the automatic construction of concise lexical entries out of a bulk of observed lexical facts requires a special technique of data alignment, c) the reliability of these entries depends on the system's decision on whether it has seen 'enough' input and d) general properties of language might render some lexical features indeterminable if the system tries to acquire them with a too high precision. The cornerstone of this dissertation is the motivation and development of a general theory of automatic lexical acquisition that is applicable to every language and independent of any particular theory of grammar or lexicon. This work is divided into five chapters. The introductory chapter first contrasts three different and mutually incompatible approaches to (artificial) lexical acquisition: cue-based queries, head-lexicalized probabilistic context free grammars and learning by unification. Then the postulation of the Learn-Alpha design rule is presented. The second chapter outlines the theory that underlies Learn-Alpha and exposes all the related notions and concepts required for a proper understanding of artificial lexical acquisition. Chapter 3 develops the prototyped acquisition method, called ANALYZE-LEARN-REDUCE, a framework which implements Learn-Alpha. The fourth chapter presents the design and results of a bootstrapping experiment conducted on this prototype: lexeme detection, learning of verbal valency, categorization into nominal count/mass classes, selection of prepositions and sentential complements, among others. The thesis concludes with a review of the conclusions and motivation for further improvements as well as proposals for future research on the automatic induction of lexical features.
Resumo:
Questo progetto di tesi è lo sviluppo di un sistema distribuito di acquisizione e visualizzazione interattiva di dati. Tale sistema è utilizzato al CERN (Organizzazione Europea per la Ricerca Nucleare) al fine di raccogliere i dati relativi al funzionamento dell'LHC (Large Hadron Collider, infrastruttura ove avvengono la maggior parte degli esperimenti condotti al CERN) e renderli disponibili al pubblico in tempo reale tramite una dashboard web user-friendly. L'infrastruttura sviluppata è basata su di un prototipo progettato ed implementato al CERN nel 2013. Questo prototipo è nato perché, dato che negli ultimi anni il CERN è diventato sempre più popolare presso il grande pubblico, si è sentita la necessità di rendere disponibili in tempo reale, ad un numero sempre maggiore di utenti esterni allo staff tecnico-scientifico, i dati relativi agli esperimenti effettuati e all'andamento dell'LHC. Le problematiche da affrontare per realizzare ciò riguardano sia i produttori dei dati, ovvero i dispositivi dell'LHC, sia i consumatori degli stessi, ovvero i client che vogliono accedere ai dati. Da un lato, i dispositivi di cui vogliamo esporre i dati sono sistemi critici che non devono essere sovraccaricati di richieste, che risiedono in una rete protetta ad accesso limitato ed utilizzano protocolli di comunicazione e formati dati eterogenei. Dall'altro lato, è necessario che l'accesso ai dati da parte degli utenti possa avvenire tramite un'interfaccia web (o dashboard web) ricca, interattiva, ma contemporaneamente semplice e leggera, fruibile anche da dispositivi mobili. Il sistema da noi sviluppato apporta miglioramenti significativi rispetto alle soluzioni precedentemente proposte per affrontare i problemi suddetti. In particolare presenta un'interfaccia utente costituita da diversi widget configurabili, riuitilizzabili che permettono di esportare i dati sia presentati graficamente sia in formato "machine readable". Un'alta novità introdotta è l'architettura dell'infrastruttura da noi sviluppata. Essa, dato che è basata su Hazelcast, è un'infrastruttura distribuita modulare e scalabile orizzontalmente. È infatti possibile inserire o rimuovere agenti per interfacciarsi con i dispositivi dell'LHC e web server per interfacciarsi con gli utenti in modo del tutto trasparente al sistema. Oltre a queste nuove funzionalità e possbilità, il nostro sistema, come si può leggere nella trattazione, fornisce molteplici spunti per interessanti sviluppi futuri.
Resumo:
Lehrvideos erfreuen sich dank aktueller Entwicklungen im Bereich der Online-Lehre (Videoplattformen, MOOCs) auf der einen Seite und einer riesigen Auswahl sowie einer einfachen Produktion und Distribution auf der anderen Seite großer Beliebtheit bei der Wissensvermittlung. Trotzdem bringen Videos einen entscheidenden Nachteil mit sich, welcher in der Natur des Datenformats liegt. So sind die Suche nach konkreten Sachverhalten in einem Video sowie die semantische Aufbereitung zur automatisierten Verknüpfung mit weiteren spezifischen Inhalten mit hohem Aufwand verbunden. Daher werden die lernerfolg-orientierte Selektion von Lehrsegmenten und ihr Arrangement zur auf Lernprozesse abgestimmten Steuerung gehemmt. Beim Betrachten des Videos werden unter Umständen bereits bekannte Sachverhalte wiederholt bzw. können nur durch aufwendiges manuelles Spulen übersprungen werden. Selbiges Problem besteht auch bei der gezielten Wiederholung von Videoabschnitten. Als Lösung dieses Problems wird eine Webapplikation vorgestellt, welche die semantische Aufbereitung von Videos hin zu adaptiven Lehrinhalten ermöglicht: mittels Integration von Selbsttestaufgaben mit definierten Folgeaktionen können auf Basis des aktuellen Nutzerwissens Videoabschnitte automatisiert übersprungen oder wiederholt und externe Inhalte verlinkt werden. Der präsentierte Ansatz basiert somit auf einer Erweiterung der behavioristischen Lerntheorie der Verzweigten Lehrprogramme nach Crowder, die auf den Lernverlauf angepasste Sequenzen von Lerneinheiten beinhaltet. Gleichzeitig werden mittels regelmäßig eingeschobener Selbsttestaufgaben Motivation sowie Aufmerksamkeit des Lernenden nach Regeln der Programmierten Unterweisung nach Skinner und Verstärkungstheorie gefördert. Durch explizite Auszeichnung zusammengehöriger Abschnitte in Videos können zusätzlich die enthaltenden Informationen maschinenlesbar gestaltet werden, sodass weitere Möglichkeiten zum Auffinden und Verknüpfen von Lerninhalten geschaffen werden.