Biblioteca Digital

34 resultados para annotation

Adding semantic annotations into (Geospatial) RESTful services

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper the authors present an approach for the semantic annotation of RESTful services in the geospatial domain. Their approach automates some stages of the annotation process, by using a combination of resources and services: a cross-domain knowledge base like DBpedia, two domain ontologies like GeoNames and the WGS84 vocabulary, and suggestion and synonym services. The authors’ approach has been successfully evaluated with a set of geospatial RESTful services obtained from ProgrammableWeb.com, where geospatial services account for a third of the total amount of services available in this registry.

Unified access to media metadata on the web: Towards interoperability using a core vocabulary.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The goal of the W3C's Media Annotation Working Group (MAWG) is to promote interoperability between multimedia metadata formats on the Web. As experienced by everybody, audiovisual data is omnipresent on today's Web. However, different interaction interfaces and especially diverse metadata formats prevent unified search, access, and navigation. MAWG has addressed this issue by developing an interlingua ontology and an associated API. This article discusses the rationale and core concepts of the ontology and API for media resources. The specifications developed by MAWG enable interoperable contextualized and semantic annotation and search, independent of the source metadata format, and connecting multimedia data to the Linked Data cloud. Some demonstrators of such applications are also presented in this article.

Some reflections on the IT challenges for a multilingual semantic web

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many attempts have been made to provide multilinguality to the Semantic Web, by means of annotation properties in Natural Language (NL), such as RDFs or SKOS labels, and other lexicon-ontology models, such as lemon, but there are still many issues to be solved if we want to have a truly accessible Multilingual Semantic Web (MSW). Reusability of monolingual resources (ontologies, lexicons, etc.), accessibility of multilingual resources hindered by many formats, reliability of ontological sources, disambiguation problems and multilingual presentation to the end user of all this information in NL can be mentioned as some of the most relevant problems. Unless this NL presentation is achieved, MSW will be restricted to the limits of IT experts, but even so, with great dissatisfaction and disenchantment

Workflow-centric research objects: First class citizens in scholarly discourse.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A workflow-centric research object bundles a workflow, the provenance of the results obtained by its enactment, other digital objects that are relevant for the experiment (papers, datasets, etc.), and annotations that semantically describe all these objects. In this paper, we propose a model to specify workflow-centric research objects, and show how the model can be grounded using semantic technologies and existing vocabularies, in particular the Object Reuse and Exchange (ORE) model and the Annotation Ontology (AO).We describe the life-cycle of a research object, which resembles the life-cycle of a scienti?c experiment.

A Multingual Web-based Educational System for Professional Musicians

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the main results of the eContent HARMOS project. The project has developed a webbased educational system for professional musicians. The main idea of the project consists of recording master classes taught by highly recognised maestros and annotate this multimedia material using an educational musical taxonomy and automatic annotation tools. Users of the system access a multi-criteria search engine that allows them to find and play video segments according to a combination of criteria, which include instrument, teacher, composer, composition, movement and pedagogical concept. In order to preserve teachers and students rights, a DRM and protection system has been developed. The system is being publicly exploited. This model preserves musical heritage, since these valuable master classes are usually not recorded and it also provides a sustainable model for musical institutions.

Análisis genómico y funcional de los sistemas de Quorum Sensing en Rhizobium leguminosarum

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Rhizobium leguminosarum (Rl) es una alfa-proteobacteria capaz de establecer una simbiosis diazotrófica con distintas leguminosas. A pesar de la importancia de esta simbiosis en el balance global del ciclo del nitrógeno, muy pocos genomas de rhizobios han sido secuenciados, que aporten nuevos conocimientos relacionados con las características genéticas que contribuyen a importantes procesos simbióticos. Únicamente tres secuencias completas de Rl han sido publicadas: Rl bv. viciae 3841 y dos genomas de Rl bv. trifolii (WSM1325 y WSM2304), ambos simbiontes de trébol. La secuencia genómica de Rlv UPM791 se ha determinado por medio de secuenciación 454. Este genoma tiene un tamaño aproximado de 7.8 Mb, organizado en un cromosoma y 5 replicones extracromosómicos, que incluyen un plásmido simbiótico de 405 kb. Este nuevo genoma se ha analizado en relación a las funciones simbióticas y adaptativas en comparación con los genomas completos de Rlv 3841 y Rl bv. trifolii WSM1325 y WSM2304. Mientras que los plásmidos pUPM791a y b se encuentran conservados, el plásmido simbiótico pUPM791c exhibe un grado de conservación muy bajo comparado con aquellos descritos en las otras cepas de Rl. Uno de los factores implicados en el establecimiento de la simbiosis es el sistema de comunicación intercelular conocido como Quorum Sensing (QS). El análisis del genoma de Rlv UPM791 ha permitido la identificación de dos sistemas tipo LuxRI mediados por señales de tipo N-acyl-homoserina lactonas (AHLs). El análisis mediante HPLC-MS ha permitido asociar las señales C6-HSL, C7-HSL y C8-HSL al sistema rhiRI, codificado en el plásmido simbiótico; mientras que el sistema cinRI, localizado en el cromosoma, produce 3OH-C14:1-HSL. Se ha identificado una tercera sintasa (TraI) codificada en el plásmido simbiótico, pero su regulador correspondiente se encuentra truncado debido a un salto de fase. Adicionalmente, se han encontrado tres reguladores de tipo LuxR-orphan que no presentan una sintasa LuxI asociada. El efecto potencial de las señales tipo AHL se ha estudiado mediante una estrategia de quorum quenching, la cual interfiere con los sistemas de QS de la bacteria. Esta estrategia está basada en la introducción del gen aiiA de Bacillus subtilis, que expresa constitutivamente una enzima lactonasa degradadora de AHLs. Para llevar a cabo el análisis en condiciones simbióticas, se ha desarrollado un sistema de doble marcaje que permite la identificación basado en los marcadores gusA y celB, que codifican para una enzima β–glucuronidasa y una β–galactosidasa termoestable, respectivamente. Los resultados obtenidos indican que Rlv UPM791 predomina sobre la cepa Rlv 3841 para la formación de nódulos en plantas de guisante. La baja estabilidad del plásmido que codifica para aiiA, no ha permitido obtener una conclusión definitiva sobre el efecto de la lactonasa AiiA en competitividad. Con el fin de analizar el significado y la regulación de la producción de moléculas señal tipo AHL, se han generado mutantes defectivos en cada uno de los dos sistemas de QS. Se ha llevado a cabo un análisis detallado sobre la producción de AHLs, formación de biofilm y simbiosis con plantas de guisante, veza y lenteja. El efecto de las deleciones de los genes rhiI y rhiR en Rlv UPM791 es más drástico en ausencia del plásmido pUPM791d. Mutaciones en cinI o cinRIS muestran tanto ausencia de señales, como producción exclusivamente de las de bajo peso molecular, respectivamente, producidas por el sistema rhiRI. Estas mutaciones mostraron un efecto importante en simbiosis. El sistema rhiRI se necesita para un comportamiento simbiótico normal. Además, mutantes cinRIS generaron nódulos blancos e ineficientes, mientras que el mutante cinI fue incapaz de producir nódulos en ninguna de las leguminosas utilizadas. Dicha mutación resultó en la inestabilización del plasmido simbiótico por un mecanismo dependiente de cinI que no ha sido aclarado. En general, los resultados obtenidos indican la existencia de un modelo de regulación dependiente de QS significativamente distinto a los que se han descrito previamente en otras cepas de R. leguminosarum, en las cuales no se había observado ningún fenotipo relevante en simbiosis. La regulación de la producción de AHLs Rlv UPM791 es un proceso complejo que implica genes situados en los plásmidos UPM791c y UPM791d, además de la señal 3-OH-C14:1-HSL. Finalmente, se ha identificado un transportador de tipo RND, homologo a mexAB-oprM de P. aeruginosa e implicado en la extrusión de AHLs de cadena larga. La mutación he dicho transportador no tuvo efectos apreciables sobre la simbiosis. ABSTRACT Rhizobium leguminosarum (Rl) is a soil alpha-proteobacterium that establishes a diazotrophic symbiosis with different legumes. Despite the importance of this symbiosis to the global nitrogen cycling balance, very few rhizobial genomes have been sequenced so far which provide new insights into the genetic features contributing to symbiotically relevant processes. Only three complete sequences of Rl strains have been published: Rl bv. viciae 3841, harboring six plasmids (7.75 Mb) and two Rl bv. trifolii (WSM1325 and WSM2304), both clover symbionts, harboring 5 and 4 plasmids, respectively (7.41 and 6.87 Mb). The genomic sequence of Rlv UPM791 was undertaken by means of 454 sequencing. Illumina and Sanger reads were used to improve the assembly, leading to 17 final contigs. This genome has an estimated size of 7.8 Mb organized in one chromosome and five extrachromosomal replicons, including a 405 kb symbiotic plasmid. Four of these plasmids are already closed, whereas there are still gaps in the smallest one (pUPM791d) due to the presence of insertion elements and repeated sequences, which difficult the assembly. The annotation has been carried out thanks to the Manatee pipeline. This new genome has been analyzed as regarding symbiotic and adaptive functions in comparison to the Rlv 3841 complete genome, and to those from Rl bv. trifolii strains WSM1325 and WSM2304. While plasmids pUPM791a and b are conserved, the symbiotic plasmid pUPM791c exhibited the lowest degree of conservation as compared to those from the other Rl strains. One of the factors involved in the symbiotic process is the intercellular communication system known as Quorum Sensing (QS). This mechanism allows bacteria to carry out diverse biological processes in a coordinate way through the production and detection of extracellular signals that regulate the transcription of different target genes. Analysis of the Rlv UPM791 genome allowed the identification of two LuxRI-like systems mediated by N-acyl-homoserine lactones (AHLs). HPLC-MS analysis allowed the adscription of C6-HSL, C7-HSL and C8-HSL signals to the rhiRI system, encoded in the symbiotic plasmid, whereas the cinRI system, located in the chromosome, produces 3OH-C14:1-HSL, previously described as “bacteriocin small”. A third synthase (TraI) is encoded also in the symbiotic plasmid, but its cognate regulator TraR is not functional due to a fameshift mutation. Three additional LuxR orphans were also found which no associated LuxI-type synthase. The potential effect of AHLs has been studied by means of a quorum quenching approach to interfere with the QS systems of the bacteria. This approach is based upon the introduction into the strains Rl UPM791 and Rl 3841 of the Bacillus subtilis gene aiiA expressing constitutively an AHL-degrading lactonase enzyme which led to virtual absence of AHL even when AiiA-expressing cells were a fraction of the total population. No significant effect of AiiA-mediated AHL removal on competitiveness for growth in solid surface was observed. For analysis under symbiotic conditions we have set up a two-label system to identify nodules produced by two different strains in pea roots, based on the markers gusA and celB, encoding a β–glucuronidase and a thermostable β–galactosidase enzymes, respectively. The results obtained show that Rlv UPM791 outcompetes Rlv 3841 for nodule formation in pea plants, and that the presence of the AiiA plasmid does not significantly affect the relative competitiveness of the two Rlv strains. However, the low stability of the pME6863 plasmid, encoding aiiA, did not lead to a clear conclusion about the AiiA lactonase effect on competitiveness. In order to further analyze the significance and regulation of the production of AHL signal molecules, mutants deficient in each of the two QS systems were constructed. A detailed analysis of the effect of these mutations on AHL production, biofilm formation and symbiosis with pea, vetch and lentil plants has been carried out. The effect of deletions on Rlv UPM791 rhiI and rhiR genes is more pronounced in the absence of plasmid pUPM791d, as no signal is detected in UPM791.1, lacking this plasmid. Mutations in cinI or cinRIS show either no signals, or only the small ones produced by the rhiRI system, suggesting that cinR might be regulating the rhiRI system. These mutations had a strong effect on symbiosis. Analysis of rhi mutants revealed that rhiRI system is required for normal symbiotic performance, as a drastic reduction of symbiotic fitness is observed when rhiI is deleted, and rhiR is essential for nitrogen fixation in the absence of plasmid pUPM791d. Furthermore, cinRIS mutants resulted in white and inefficient nodules, whereas cinI mutant was unable to form nodules on any legume tested. The latter mutation is associated to the instabilization of the symbiotic plasmid through a mechanism still uncovered. Overall, the results obtained indicate the existence of a model of QS-dependent regulation significantly different to that previously described in other R. leguminosarum strains, where no relevant symbiotic phenotype had been observed. The regulation of AHL production in Rlv UPM791 is a complex process involving the symbiotic plasmid (pUPM791c) and the smallest plasmid (pUPM791d), with a key role for the 3-OH-C14:1-HSL signal. Finally, we made a search for potential AHL transporters in Rlv UPM791 genome. These signals diffuse freely across membranes, but in the case of the long-chain AHLs an active efflux system might be required, as it has been described for C12-HSL in the case of Pseudomonas aeruginosa. We have identified a putative AHL transporter of the RND family homologous to P. aeruginosa mexAB-oprM. A mutant strain deficient in this transporter has been generated, and TLC analysis shows absence of 3OH-C14:1-HSL in its supernatant. This deficiency was complemented by the reintroduction of an intact copy of the genes via plasmid transfer. The mutation in mexAB genes had no significant effects on the symbiotic performance of R. leguminosarum bv. viciae.

Automated processing of zebrafish imaging data: a survey

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Due to the relative transparency of its embryos and larvae, the zebrafish is an ideal model organism for bioimaging approaches in vertebrates. Novel microscope technologies allow the imaging of developmental processes in unprecedented detail, and they enable the use of complex image-based read-outs for high-throughput/high-content screening. Such applications can easily generate Terabytes of image data, the handling and analysis of which becomes a major bottleneck in extracting the targeted information. Here, we describe the current state of the art in computational image analysis in the zebrafish system. We discuss the challenges encountered when handling high-content image data, especially with regard to data quality, annotation, and storage. We survey methods for preprocessing image data for further analysis, and describe selected examples of automated image analysis, including the tracking of cells during embryogenesis, heartbeat detection, identification of dead embryos, recognition of tissues and anatomical landmarks, and quantification of behavioral patterns of adult fish. We review recent examples for applications using such methods, such as the comprehensive analysis of cell lineages during early development, the generation of a three-dimensional brain atlas of zebrafish larvae, and high-throughput drug screens based on movement patterns. Finally, we identify future challenges for the zebrafish image analysis community, notably those concerning the compatibility of algorithms and data formats for the assembly of modular analysis pipelines.

Herramienta basada en vistas para la generación automática de anotaciones de fuentes RDF

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Durante los últimos años, el imparable crecimiento de fuentes de datos biomédicas, propiciado por el desarrollo de técnicas de generación de datos masivos (principalmente en el campo de la genómica) y la expansión de tecnologías para la comunicación y compartición de información ha propiciado que la investigación biomédica haya pasado a basarse de forma casi exclusiva en el análisis distribuido de información y en la búsqueda de relaciones entre diferentes fuentes de datos. Esto resulta una tarea compleja debido a la heterogeneidad entre las fuentes de datos empleadas (ya sea por el uso de diferentes formatos, tecnologías, o modelizaciones de dominios). Existen trabajos que tienen como objetivo la homogeneización de estas con el fin de conseguir que la información se muestre de forma integrada, como si fuera una única base de datos. Sin embargo no existe ningún trabajo que automatice de forma completa este proceso de integración semántica. Existen dos enfoques principales para dar solución al problema de integración de fuentes heterogéneas de datos: Centralizado y Distribuido. Ambos enfoques requieren de una traducción de datos de un modelo a otro. Para realizar esta tarea se emplean formalizaciones de las relaciones semánticas entre los modelos subyacentes y el modelo central. Estas formalizaciones se denominan comúnmente anotaciones. Las anotaciones de bases de datos, en el contexto de la integración semántica de la información, consisten en definir relaciones entre términos de igual significado, para posibilitar la traducción automática de la información. Dependiendo del problema en el que se esté trabajando, estas relaciones serán entre conceptos individuales o entre conjuntos enteros de conceptos (vistas). El trabajo aquí expuesto se centra en estas últimas. El proyecto europeo p-medicine (FP7-ICT-2009-270089) se basa en el enfoque centralizado y hace uso de anotaciones basadas en vistas y cuyas bases de datos están modeladas en RDF. Los datos extraídos de las diferentes fuentes son traducidos e integrados en un Data Warehouse. Dentro de la plataforma de p-medicine, el Grupo de Informática Biomédica (GIB) de la Universidad Politécnica de Madrid, en el cuál realicé mi trabajo, proporciona una herramienta para la generación de las necesarias anotaciones de las bases de datos RDF. Esta herramienta, denominada Ontology Annotator ofrece la posibilidad de generar de manera manual anotaciones basadas en vistas. Sin embargo, aunque esta herramienta muestra las fuentes de datos a anotar de manera gráfica, la gran mayoría de usuarios encuentran difícil el manejo de la herramienta , y pierden demasiado tiempo en el proceso de anotación. Es por ello que surge la necesidad de desarrollar una herramienta más avanzada, que sea capaz de asistir al usuario en el proceso de anotar bases de datos en p-medicine. El objetivo es automatizar los procesos más complejos de la anotación y presentar de forma natural y entendible la información relativa a las anotaciones de bases de datos RDF. Esta herramienta ha sido denominada Ontology Annotator Assistant, y el trabajo aquí expuesto describe el proceso de diseño y desarrollo, así como algunos algoritmos innovadores que han sido creados por el autor del trabajo para su correcto funcionamiento. Esta herramienta ofrece funcionalidades no existentes previamente en ninguna otra herramienta del área de la anotación automática e integración semántica de bases de datos. ---ABSTRACT---Over the last years, the unstoppable growth of biomedical data sources, mainly thanks to the development of massive data generation techniques (specially in the genomics field) and the rise of the communication and information sharing technologies, lead to the fact that biomedical research has come to rely almost exclusively on the analysis of distributed information and in finding relationships between different data sources. This is a complex task due to the heterogeneity of the sources used (either by the use of different formats, technologies or domain modeling). There are some research proyects that aim homogenization of these sources in order to retrieve information in an integrated way, as if it were a single database. However there is still now work to automate completely this process of semantic integration. There are two main approaches with the purpouse of integrating heterogeneous data sources: Centralized and Distributed. Both approches involve making translation from one model to another. To perform this task there is a need of using formalization of the semantic relationships between the underlying models and the main model. These formalizations are also calles annotations. In the context of semantic integration of the information, data base annotations consist on defining relations between concepts or words with the same meaning, so the automatic translation can be performed. Depending on the task, the ralationships can be between individuals or between whole sets of concepts (views). This paper focuses on the latter. The European project p-medicine (FP7-ICT-2009-270089) is based on the centralized approach. It uses view based annotations and RDF modeled databases. The data retireved from different data sources is translated and joined into a Data Warehouse. Within the p-medicine platform, the Biomedical Informatics Group (GIB) of the Polytechnic University of Madrid, in which I worked, provides a software to create annotations for the RDF sources. This tool, called Ontology Annotator, is used to create annotations manually. However, although Ontology Annotator displays the data sources graphically, most of the users find it difficult to use this software, thus they spend too much time to complete the task. For this reason there is a need to develop a more advanced tool, which would be able to help the user in the task of annotating p-medicine databases. The aim is automating the most complex processes of the annotation and display the information clearly and easy understanding. This software is called Ontology Annotater Assistant and this book describes the process of design and development of it. as well as some innovative algorithms that were designed by the author of the work. This tool provides features that no other software in the field of automatic annotation can provide.

Diseño e implementación de un módulo para la anotación de imágenes médicas

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Desde hace tiempo ha habido mucho interés en la automatización de todo tipo de tareas en las que la intervención humana es esencial para que sean completadas con éxito. Esto es de especial interés si además se ciertas tareas que pueden ser perfectamente reproducibles y, o bien requieren mucha formación, o bien consumen mucho tiempo. Este proyecto está dirigido a la búsqueda de métodos para automatizar la anotación de imágenes médicas. En concreto, se centra en el apartado de delimitación de las regiones de interés (ROIs) en imágenes de tipo PET siendo éstas usadas con frecuencia junto con las imágenes de tipo CT en el campo de oncología para delinear volúmenes afectados por cáncer. Se pretende con esto ayudar a los hospitales a organizar y estructurar las imágenes de sus pacientes y relacionarlas con las notas clínicas. Esto es lo que llamaremos el proceso de anotación de imágenes y la integración con la anotación de notas clínicas respectivamente. En este documento nos vamos a centrar en describir cuáles eran los objetivos iniciales, los pasos dados para su consecución y las dificultades encontradas durante el proceso. De todas las técnicas existentes en la literatura, se han elegido 4 técnicas de segmentación, 2 de ellas probadas en pacientes reales y las otras 2 probadas solo en phantoms según la literatura. En nuestro caso, las pruebas, se han realizado en imágenes PET de 6 pacientes reales diagnosticados de cáncer. Los resultados han sido analizados y presentados. ---ABSTRACT---For a long period of time, there has been an increasing interest in automation of tasks where human intervention is needed in order to succeed. This interest is even greater if those tasks must be solved by qualifed specialists in the area and the task is reproducible or if the task is too time consuming. The main objective of this project is to find methods which can help to automate medical image annotation processes. In our specific case, we are willing to delineate regions of interest (ROIs) in PET images which are frequently used simultaneaously ith CT images in oncology to determine those volumes that are afected by cancer. With this process we want to help hospitals organize and have from their patient studies and to relate these images to the corpus annotations. We may call this the image annotation process and the integration with the corpus annotation respectively. In this document we are going to concentrate in the description of the initial objectives, the steps we had to go through and the di�culties we had to face during this process. From all existing techniques in the literature, 4 segmentation techniques have been chosen, 2 of them were tested in real patients and the other 2 were tested using phantoms according to the literature. In our case, the tests have been done using PET images from 6 real patients diagnosed with cancer. The results have been analyzed and presented.

Construcción automática de grafos RDF a partir de frases en lenguaje natural para la integración de bases de datos heterogéneas

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El presente trabajo desarrolla un servicio REST que transforma frases en lenguaje natural a grafos RDF. Los grafos generados son grafos dirigidos, donde los nodos se forman con los sustantivos o adjetivos de las frases, y los arcos se forman con los verbos. Se utiliza dentro del proyecto p-medicine para dar soporte a las siguientes funcionalidades: Búsquedas en lenguaje natural: actualmente la plataforma p-medicine proporciona un interfaz programático para realizar consultas en SPARQL. El servicio desarrollado permitiría generar esas consultas automáticamente a partir de frases en lenguaje natural. Anotaciones de bases de datos mediante lenguaje natural: la plataforma pmedicine incorpora una herramienta, desarrollada por el Grupo de Ingeniería Biomédica de la Universidad Politécnica de Madrid, para la anotación de bases de datos RDF. Estas anotaciones son necesarias para la posterior traducción de las bases de datos a un esquema central. El proceso de anotación requiere que el usuario construya de forma manual las vistas RDF que desea anotar, lo que requiere mostrar gráficamente el esquema RDF y que el usuario construya vistas RDF seleccionando las clases y relaciones necesarias. Este proceso es a menudo complejo y demasiado difícil para un usuario sin perfil técnico. El sistema se incorporará para permitir que la construcción de estas vistas se realice con lenguaje natural. ---ABSTRACT---The present work develops a REST service that transforms natural language sentences to RDF degrees. Generated graphs are directed graphs where nodes are formed with nouns or adjectives of phrases, and the arcs are formed with verbs. Used within the p-medicine project to support the following functionality: Natural language queries: currently the p-medicine platform provides a programmatic interface to query SPARQL. The developed service would automatically generate those queries from natural language sentences. Memos databases using natural language: the p-medicine platform incorporates a tool, developed by the Group of Biomedical Engineering at the Polytechnic University of Madrid, for the annotation of RDF data bases. Such annotations are necessary for the subsequent translation of databases to a central scheme. The annotation process requires the user to manually construct the RDF views that he wants annotate, requiring graphically display the RDF schema and the user to build RDF views by selecting classes and relationships. This process is often complex and too difficult for a user with no technical background. The system is incorporated to allow the construction of these views to be performed with natural language.

Diseño e implementación de un módulo para la anotación de imágenes médicas

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Se comenzó el trabajo recabando información sobre los distintos enfoques que se le había dado a la anotación a lo largo del tiempo, desde anotación de imágenes a mano, pasando por anotación de imágenes utilizando características de bajo nivel, como color y textura, hasta la anotación automática. Tras entrar en materia, se procedió a estudiar artículos relativos a los diferentes algoritmos utilizados para la anotación automática de imágenes. Dado que la anotación automática es un campo bastante abierto, hay un gran numero de enfoques. Teniendo las características de las imágenes en particular en las que se iba a centrar el proyecto, se fueron descartando los poco idoneos, bien por un coste computacional elevado, o porque estaba centrado en un tipo diferente de imágenes, entre otras cosas. Finalmente, se encontró un algoritmo basado en formas (Active Shape Model) que se consideró que podría funcionar adecuadamente. Básicamente, los diferentes objetos de la imagen son identicados a partir de un contorno base generado a partir de imágenes de muestra, siendo modicado automáticamente para cubrir la zona deseada. Dado que las imágenes usadas son todas muy similares en composición, se cree que puede funcionar bien. Se partió de una implementación del algoritmo programada en MATLAB. Para empezar, se obtuvieron una serie de radiografías del tórax ya anotadas. Las imágenes contenían datos de contorno para ambos pulmones, las dos clavículas y el corazón. El primer paso fue la creación de una serie de scripts en MATLAB que permitieran: - Leer y transformar las imágenes recibidas en RAW, para adaptarlas al tamaño y la posición de los contornos anotados - Leer los archivos de texto con los datos de los puntos del contorno y transformarlos en variables de MATLAB - Unir la imagen transformada con los puntos y guardarla en un formato que la implementación del algoritmo entendiera. Tras conseguir los ficheros necesarios, se procedió a crear un modelo para cada órgano utilizando para el entrenamiento una pequeña parte de las imágenes. El modelo obtenido se probó con varias imágenes de las restantes. Sin embargo, se encontro bastante variación dependiendo de la imagen utilizada y el órgano detectado. ---ABSTRACT---The project was started by procuring information about the diferent approaches to image annotation over time, from manual image anotation to automatic annotation. The next step was to study several articles about the diferent algorithms used for automatic image annotation. Given that automatic annotation is an open field, there is a great number of approaches. Taking into account the features of the images that would be used, the less suitable algorithms were rejected. Eventually, a shape-based algorithm (Active Shape Model) was found. Basically, the diferent objects in the image are identified from a base contour, which is generated from training images. Then this contour is automatically modified to cover the desired area. Given that all the images that would be used are similar in object placement, the algorithm would probably work nicely. The work started from a MATLAB implementation of the algorithm. To begin with, a set of chest radiographs already annotated were obtained. These images came with contour data for both lungs, both clavicles and the heart. The first step was the creation of a series of MATLAB scripts to join the RAW images with the annotation data and transform them into a format that the algorithm could read. After obtaining the necessary files, a model for each organ was created using part of the images for training. The trained model was tested on several of the reimaining images. However, there was much variation in the results from one image to another. Generally, lungs were detected pretty accurately, whereas clavicles and the heart gave more problems. To improve the method, a new model was trained using half of the available images. With this model, a significant inprovement of the results can be seen.

A machine learning approach to identify clinical trials involving nanodrugs and nanodevices from ClinicalTrials.gov

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Clinical Trials (CTs) are essential for bridging the gap between experimental research on new drugs and their clinical application. Just like CTs for traditional drugs and biologics have helped accelerate the translation of biomedical findings into medical practice, CTs for nanodrugs and nanodevices could advance novel nanomaterials as agents for diagnosis and therapy. Although there is publicly available information about nanomedicine-related CTs, the online archiving of this information is carried out without adhering to criteria that discriminate between studies involving nanomaterials or nanotechnology-based processes (nano), and CTs that do not involve nanotechnology (non-nano). Finding out whether nanodrugs and nanodevices were involved in a study from CT summaries alone is a challenging task. At the time of writing, CTs archived in the well-known online registry ClinicalTrials.gov are not easily told apart as to whether they are nano or non-nano CTs-even when performed by domain experts, due to the lack of both a common definition for nanotechnology and of standards for reporting nanomedical experiments and results. METHODS: We propose a supervised learning approach for classifying CT summaries from ClinicalTrials.gov according to whether they fall into the nano or the non-nano categories. Our method involves several stages: i) extraction and manual annotation of CTs as nano vs. non-nano, ii) pre-processing and automatic classification, and iii) performance evaluation using several state-of-the-art classifiers under different transformations of the original dataset. RESULTS AND CONCLUSIONS: The performance of the best automated classifier closely matches that of experts (AUC over 0.95), suggesting that it is feasible to automatically detect the presence of nanotechnology products in CT summaries with a high degree of accuracy. This can significantly speed up the process of finding whether reports on ClinicalTrials.gov might be relevant to a particular nanoparticle or nanodevice, which is essential to discover any precedents for nanotoxicity events or advantages for targeted drug therapy.

Structuring research methods and data with the research object model: genomics workflows as a case study

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. Results: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as “which particular data was input to a particular workflow to test a particular hypothesis?”, and “which particular conclusions were drawn from a particular workflow?”. Conclusions: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well.

Towards a unified sentiment lexicon based on graphics processing units

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of positive, negative, or neutral {P, Z, N} depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and so is the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus, the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155802 lexical entries. The results of the analysis conducted using the USL approach show that the USL has 95.430 lexical entries, out of which there are 35.201 considered to be positive, 22.029 negative, and 38.200 neutral. Finally, the runtime was 10 minutes for 95.430 lexical entries; this allows a reduction of the time computing for the UnifiedMetrics by 3 times.

A linked data approach to sentiment and emotion analysis of twitter in the financial domain

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sentiment analysis has recently gained popularity in the financial domain thanks to its capability to predict the stock market based on the wisdom of the crowds. Nevertheless, current sentiment indicators are still silos that cannot be combined to get better insight about the mood of different communities. In this article we propose a Linked Data approach for modelling sentiment and emotions about financial entities. We aim at integrating sentiment information from different communities or providers, and complements existing initiatives such as FIBO. The ap- proach has been validated in the semantic annotation of tweets of several stocks in the Spanish stock market, including its sentiment information.

«
1
2
3
»