173 resultados para vocabularies
Resumo:
Reproducible research in scientic work ows is often addressed by tracking the provenance of the produced results. While this approach allows inspecting intermediate and nal results, improves understanding, and permits replaying a work ow execution, it does not ensure that the computational environment is available for subsequent executions to reproduce the experiment. In this work, we propose describing the resources involved in the execution of an experiment using a set of semantic vocabularies, so as to conserve the computational environment. We dene a process for documenting the work ow application, management system, and their dependencies based on 4 domain ontologies. We then conduct an experimental evaluation sing a real work ow application on an academic and a public Cloud platform. Results show that our approach can reproduce an equivalent execution environment of a predened virtual machine image on both computing platforms.
Resumo:
En los vocabularios biomédicos actuales más utilizados, suelen existir mecanismos de composición de términos a partir de términos pre-existentes. Estos mecanismos de composición aumentan la potencia de los lenguajes que los poseen pero parten con la desventaja de la posibilidad de representar un mismo concepto con diferentes conceptos base, lo que incluye un componente de ambigüedad en los mismos. Este trabajo de fin de grado consiste en la realización de una herramienta que permita reconocer términos de estos vocabularios biomédicos complejos, es decir, vocabularios con términos compuestos por otros términos como puede ser el caso de SNOMED. Con la consecución de este proyecto, obtendremos una herramienta capaz de identificar las ambigüedades presentes en la representación de estos conceptos compuestos y representar de una forma homogénea dichos conceptos. Para favorecer la interoperabilidad y accesibilidad de la herramienta se ha decidido ofrecerla mediante una interfaz web accesible desde cualquier dispositivo o lugar con acceso a internet. ---ABSTRACT---In the latest and most used biomedical languages, we usually and term composition operations from existing terms. These mechanisms increase the utility of those terminologies they belong to. Despite this, these operations present a disadvantage, that is, the possibility of representing the same concept with diferent base concepts which introduces a certain degree of ambiguity in those complex terms. The objective of this final degree project consists in developing a tool that allows recognizing terms from those complex biomedical vocabularies, that is, terminologies with terms comprised of simpler terms such as SNOMED. By completing this project, we obtained a tool capable of identifying the present ambiguities in the representation of those composite concepts and represent them in a homogenous format. To facilitate the interoperability and accessibility of the tool it was decided to other it through a web interface loadable from any place or device with access to the internet.
Resumo:
The application of Linked Data technology to the publication of linguistic data promises to facilitate interoperability of these data and has lead to the emergence of the so called Linguistic Linked Data Cloud (LLD) in which linguistic data is published following the Linked Data principles. Three essential issues need to be addressed for such data to be easily exploitable by language technologies: i) appropriate machine-readable licensing information is needed for each dataset, ii) minimum quality standards for Linguistic Linked Data need to be defined, and iii) appropriate vocabularies for publishing Linguistic Linked Data resources are needed. We propose the notion of Licensed Linguistic Linked Data (3LD) in which different licensing models might co-exist, from totally open to more restrictive licenses through to completely closed datasets.
Resumo:
Within the European Union, member states are setting up official data catalogues as entry points to access PSI (Public Sector Information). In this context, it is important to describe the metadata of these data portals, i.e., of data catalogs, and allow for interoperability among them. To tackle these issues, the Government Linked Data Working Group developed DCAT (Data Catalog Vocabulary), an RDF vocabulary for describing the metadata of data catalogs. This topic report analyzes the current use of the DCAT vocabulary in several European data catalogs and proposes some recommendations to deal with an inconsistent use of the metadata across countries. The enrichment of such metadata vocabularies with multilingual descriptions, as well as an account for cultural divergences, is seen as a necessary step to guarantee interoperability and ensure wider adoption.
Resumo:
Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper.
Resumo:
Reproducible research in scientific workflows is often addressed by tracking the provenance of the produced results. While this approach allows inspecting intermediate and final results, improves understanding, and permits replaying a workflow execution, it does not ensure that the computational environment is available for subsequent executions to reproduce the experiment. In this work, we propose describing the resources involved in the execution of an experiment using a set of semantic vocabularies, so as to conserve the computational environment. We define a process for documenting the workflow application, management system, and their dependencies based on 4 domain ontologies. We then conduct an experimental evaluation using a real workflow application on an academic and a public Cloud platform. Results show that our approach can reproduce an equivalent execution environment of a predefined virtual machine image on both computing platforms.
Resumo:
We describe a domain ontology development approach that extracts domain terms from folksonomies and enrich them with data and vocabularies from the Linked Open Data cloud. As a result, we obtain lightweight domain ontologies that combine the emergent knowledge of social tagging systems with formal knowledge from Ontologies. In order to illustrate the feasibility of our approach, we have produced an ontology in the financial domain from tags available in Delicious, using DBpedia, OpenCyc and UMBEL as additional knowledge sources.
Resumo:
El aprendizaje basado en problemas se lleva aplicando con éxito durante las últimas tres décadas en un amplio rango de entornos de aprendizaje. Este enfoque educacional consiste en proponer problemas a los estudiantes de forma que puedan aprender sobre un dominio particular mediante el desarrollo de soluciones a dichos problemas. Si esto se aplica al modelado de conocimiento, y en particular al basado en Razonamiento Cualitativo, las soluciones a los problemas pasan a ser modelos que representan el compotamiento del sistema dinámico propuesto. Por lo tanto, la tarea del estudiante en este caso es acercar su modelo inicial (su primer intento de representar el sistema) a los modelos objetivo que proporcionan soluciones al problema, a la vez que adquieren conocimiento sobre el dominio durante el proceso. En esta tesis proponemos KaiSem, un método que usa tecnologías y recursos semánticos para guiar a los estudiantes durante el proceso de modelado, ayudándoles a adquirir tanto conocimiento como sea posible sin la directa supervisión de un profesor. Dado que tanto estudiantes como profesores crean sus modelos de forma independiente, estos tendrán diferentes terminologías y estructuras, dando lugar a un conjunto de modelos altamente heterogéneo. Para lidiar con tal heterogeneidad, proporcionamos una técnica de anclaje semántico para determinar, de forma automática, enlaces entre la terminología libre usada por los estudiantes y algunos vocabularios disponibles en la Web de Datos, facilitando con ello la interoperabilidad y posterior alineación de modelos. Por último, proporcionamos una técnica de feedback semántico para comparar los modelos ya alineados y generar feedback basado en las posibles discrepancias entre ellos. Este feedback es comunicado en forma de sugerencias individualizadas que el estudiante puede utilizar para acercar su modelo a los modelos objetivos en cuanto a su terminología y estructura se refiere. ABSTRACT Problem-based learning has been successfully applied over the last three decades to a diverse range of learning environments. This educational approach consists of posing problems to learners, so they can learn about a particular domain by developing solutions to them. When applied to conceptual modeling, and particularly to Qualitative Reasoning, the solutions to problems are models that represent the behavior of a dynamic system. Therefore, the learner's task is to move from their initial model, as their first attempt to represent the system, to the target models that provide solutions to that problem while acquiring domain knowledge in the process. In this thesis we propose KaiSem, a method for using semantic technologies and resources to scaffold the modeling process, helping the learners to acquire as much domain knowledge as possible without direct supervision from the teacher. Since learners and experts create their models independently, these will have different terminologies and structure, giving rise to a pool of models highly heterogeneous. To deal with such heterogeneity, we provide a semantic grounding technique to automatically determine links between the unrestricted terminology used by learners and some online vocabularies of the Web of Data, thus facilitating the interoperability and later alignment of the models. Lastly, we provide a semantic-based feedback technique to compare the aligned models and generate feedback based on the possible discrepancies. This feedback is communicated in the form of individualized suggestions, which can be used by the learner to bring their model closer in terminology and structure to the target models.
Resumo:
A methodology for developing an advanced communications system for the Deaf in a new domain is presented in this paper. This methodology is a user-centred design approach consisting of four main steps: requirement analysis, parallel corpus generation, technology adaptation to the new domain, and finally, system evaluation. During the requirement analysis, both the user and technical requirements are evaluated and defined. For generating the parallel corpus, it is necessary to collect Spanish sentences in the new domain and translate them into LSE (Lengua de Signos Española: Spanish Sign Language). LSE is represented by glosses and using video recordings. This corpus is used for training the two main modules of the advanced communications system to the new domain: the spoken Spanish into the LSE translation module and the Spanish generation from the LSE module. The main aspects to be generated are the vocabularies for both languages (Spanish words and signs), and the knowledge for translating in both directions. Finally, the field evaluation is carried out with deaf people using the advanced communications system to interact with hearing people in several scenarios. In this evaluation, the paper proposes several objective and subjective measurements for evaluating the performance. In this paper, the new considered domain is about dialogues in a hotel reception. Using this methodology, the system was developed in several months, obtaining very good performance: good translation rates (10% Sign Error Rate) with small processing times, allowing face-to-face dialogues.
Resumo:
The Gene Expression Database (GXD) is a community resource of gene expression information for the laboratory mouse. By combining the different types of expression data, GXD aims to provide increasingly complete information about the expression profiles of genes in different mouse strains and mutants, thus enabling valuable insights into the molecular networks that underlie normal development and disease. GXD is integrated with the Mouse Genome Database (MGD). Extensive interconnections with sequence databases and with databases from other species, and the development and use of shared controlled vocabularies extend GXD’s utility for the analysis of gene expression information. GXD is accessible through the Mouse Genome Informatics web site at http://www.informatic s.jax.org/ or directly at http://www.informatics.jax.org/me nus/expression_menu.shtml.
Resumo:
Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology.
Resumo:
In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.
Resumo:
li-Abī al-Qāsim Maḥmūd ibn ʻUmar al-Zamakhsharī.
Resumo:
li-Ḥasan Quwaydir al-Khalīlī ; tarjamat al-muʼallif Muḥammad Ibrāhīm Fannī.
Resumo:
1. Tarih-i Al-i Osman bin Ertuğrul (dates of Ottoman Sultans) (f. 1r) -- 2. Suret-i arzname (ff. 1v--2v) -- 3. Arabic poem, awāʼil Muḥarram 804 [August 11-20, 1401] (copied by Ḥājjī Aḥmad ibn ... al-B.f.l.ghānī) (ff. 3r-11r) -- 4. Taʻrīfāt ʻilm usūl fiqh, Shawwāl 804 [May 1402] (ff. 11v-16v) -- 5. Arabic glossary (explanations in Arabic and Persian), 804 [1402] (copied by Idrīs b. Ḥasan b. Bayram) (ff. 17r-52r) -- 6. Sharḥ al-Farāʼiḍ al-Sirājīyah / ʻAbd al-Karīm b. Muḥammad b. al-Ḥasan al-Hamadānī al-Tabrīzī, awāsiṭ Dhī al-Ḥijja 804 [July 1402] (copied by Idrīs b. Ḥasan b. Bayram) (ff. 52v-94r) -- 7. Lughat-i ḥurūf (ff. 94v-95r) -- 8. Mufradāt-i Pārsī (A list of Persian verbs) (ff. 95v-97v).