584 resultados para Annotations
Resumo:
Natural language understanding (NLU) aims to map sentences to their semantic mean representations. Statistical approaches to NLU normally require fully-annotated training data where each sentence is paired with its word-level semantic annotations. In this paper, we propose a novel learning framework which trains the Hidden Markov Support Vector Machines (HM-SVMs) without the use of expensive fully-annotated data. In particular, our learning approach takes as input a training set of sentences labeled with abstract semantic annotations encoding underlying embedded structural relations and automatically induces derivation rules that map sentences to their semantic meaning representations. The proposed approach has been tested on the DARPA Communicator Data and achieved 93.18% in F-measure, which outperforms the previously proposed approaches of training the hidden vector state model or conditional random fields from unaligned data, with a relative error reduction rate of 43.3% and 10.6% being achieved.
Resumo:
Most of the existing work on information integration in the Semantic Web concentrates on resolving schema-level problems. Specific issues of data-level integration (instance coreferencing, conflict resolution, handling uncertainty) are usually tackled by applying the same techniques as for ontology schema matching or by reusing the solutions produced in the database domain. However, data structured according to OWL ontologies has its specific features: e.g., the classes are organized into a hierarchy, the properties are inherited, data constraints differ from those defined by database schema. This paper describes how these features are exploited in our architecture KnoFuss, designed to support data-level integration of semantic annotations.
Resumo:
Because metadata that underlies semantic web applications is gathered from distributed and heterogeneous data sources, it is important to ensure its quality (i.e., reduce duplicates, spelling errors, ambiguities). However, current infrastructures that acquire and integrate semantic data have only marginally addressed the issue of metadata quality. In this paper we present our metadata acquisition infrastructure, ASDI, which pays special attention to ensuring that high quality metadata is derived. Central to the architecture of ASDI is a verification engine that relies on several semantic web tools to check the quality of the derived data. We tested our prototype in the context of building a semantic web portal for our lab, KMi. An experimental evaluation comparing the automatically extracted data against manual annotations indicates that the verification engine enhances the quality of the extracted semantic metadata.
Resumo:
Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.
Resumo:
More and more researchers have realized that ontologies will play a critical role in the development of the Semantic Web, the next generation Web in which content is not only consumable by humans, but also by software agents. The development of tools to support ontology management including creation, visualization, annotation, database storage, and retrieval is thus extremely important. We have developed ImageSpace, an image ontology creation and annotation tool that features (1) full support for the standard web ontology language DAML+OIL; (2) image ontology creation, visualization, image annotation and display in one integrated framework; (3) ontology consistency assurance; and (4) storing ontologies and annotations in relational databases. It is expected that the availability of such a tool will greatly facilitate the creation of image repositories as islands of the Semantic Web.
Resumo:
The field of Semantic Web Services (SWS) has been recognized as one of the most promising areas of emergent research within the Semantic Web initiative, exhibiting an extensive commercial potential and attracting significant attention from both industry and the research community. Currently, there exist several different frameworks and languages for formally describing a Web Service: Web Ontology Language for Services (OWL-S), Web Service Modelling Ontology (WSMO) and Semantic Annotations for the Web Services Description Language (SAWSDL) are the most important approaches. To the inexperienced user, choosing the appropriate platform for a specific SWS application may prove to be challenging, given a lack of clear separation between the ideas promoted by the associated research communities. In this paper, we systematically compare OWL-S, WSMO and SAWSDL from various standpoints, namely, that of the service requester and provider as well as the broker-based view. The comparison is meant to help users to better understand the strengths and limitations of these different approaches to formalizing SWS, and to choose the most suitable solution for a given application. Copyright © 2015 John Wiley & Sons, Ltd.
Resumo:
The field of Semantic Web Services (SWS) has been recognized as one of the most promising areas of emergent research within the Semantic Web (SW) initiative, exhibiting an extensive commercial potential, and attracting significant attention from both industry and the research community. Currently, there exist several different frameworks and languages for formally describing a Web Service: OWL-S (Web Ontology Language for Services), WSMO (Web Service Modeling Ontology) and SAWSDL (Semantic Annotations for the Web Services Description Language) are the most important approaches. To the inexperienced user, choosing the appropriate paradigm for a specific SWS application may prove to be challenging, given a lack of clear separation between the ideas promoted by the associated research communities. In this paper, we systematically compare OWL-S, WSMO and SAWSDL from various standpoints, namely that of the service requester and provider as well as the broker based view. The comparison is meant to help users to better understand the strengths and limitations of these different approaches to formalising SWS, and to choose the most suitable solution for a given use case. © 2013 IEEE.
Resumo:
Concerns that variola viruses might be used as bioweapons have renewed the interest in developing new and safer smallpox vaccines. Variola virus genomes are now widely available, allowing computational characterization of the entire T-cell epitome and the use of such information to develop safe and yet effective vaccines. To this end, we identified 124 proteins shared between various species of pathogenic orthopoxviruses including variola minor and major, monkeypox, cowpox, and vaccinia viruses, and we targeted them for T-cell epitope prediction. We recognized 8,106, and 8,483 unique class I and class II MHC-restricted T-cell epitopes that are shared by all mentioned orthopoxviruses. Subsequently, we developed an immunological resource, EPIPOX, upon the predicted T-cell epitome. EPIPOX is freely available online and it has been designed to facilitate reverse vaccinology. Thus, EPIPOX includes key epitope-focused protein annotations: time point expression, presence of leader and transmembrane signals, and known location on outer membrane structures of the infective viruses. These features can be used to select specific T-cell epitopes suitable for experimental validation restricted by single MHC alleles, as combinations thereof, or by MHC supertypes.
Resumo:
Background: During alternative splicing, the inclusion of an exon in the final mRNA molecule is determined by nuclear proteins that bind cis-regulatory sequences in a target pre-mRNA molecule. A recent study suggested that the regulatory codes of individual RNA-binding proteins may be nearly immutable between very diverse species such as mammals and insects. The model system Drosophila melanogaster therefore presents an excellent opportunity for the study of alternative splicing due to the availability of quality EST annotations in FlyBase. Methods: In this paper, we describe an in silico analysis pipeline to extract putative exonic splicing regulatory sequences from a multiple alignment of 15 species of insects. Our method, ESTs-to-ESRs (E2E), uses graph analysis of EST splicing graphs to identify mutually exclusive (ME) exons and combines phylogenetic measures, a sliding window approach along the multiple alignment and the Welch’s t statistic to extract conserved ESR motifs. Results: The most frequent 100% conserved word of length 5 bp in different insect exons was “ATGGA”. We identified 799 statistically significant “spike” hexamers, 218 motifs with either a left or right FDR corrected spike magnitude p-value < 0.05 and 83 with both left and right uncorrected p < 0.01. 11 genes were identified with highly significant motifs in one ME exon but not in the other, suggesting regulation of ME exon splicing through these highly conserved hexamers. The majority of these genes have been shown to have regulated spatiotemporal expression. 10 elements were found to match three mammalian splicing regulator databases. A putative ESR motif, GATGCAG, was identified in the ME-13b but not in the ME-13a of Drosophila N-Cadherin, a gene that has been shown to have a distinct spatiotemporal expression pattern of spliced isoforms in a recent study. Conclusions: Analysis of phylogenetic relationships and variability of sequence conservation as implemented in the E2E spikes method may lead to improved identification of ESRs. We found that approximately half of the putative ESRs in common between insects and mammals have a high statistical support (p < 0.01). Several Drosophila genes with spatiotemporal expression patterns were identified to contain putative ESRs located in one exon of the ME exon pairs but not in the other.
Resumo:
Leishmania infantum is the main etiologic agent of visceral leishmaniasis in the New World. The pattern of distribution of leishmaniasis has changed substantially and has presented an emerging profile within the periphery of the Large Urban Centers. Leishmania infection can compromise skin, mucosa and viscera. Only 10% of the individuals infected develop the disease and 90% of human infection is asymptomatic. The main factors involved in the development of the disease are the host immune response, the vector’s species and the parasite’s genetic content. The sequencing of Leishmania isolated seeks to increase the understanding of the symptoms of individuals. The aim of this study was to evaluate the genetic diversity of circulating Leishmania strains among humans, and symptomatic and asymptomatic, and dogs from endemic areas of Rio Grande do Norte State and analyze sandflies from endemic areas for cutaneous and visceral disease. The genetic variability was evaluated by the use of markers hsp70 , ITS1 and a whole genome sequencing was also carried out. The amplified hsp70 and ITS1 of samples were analyzed and assembled using a Phred / Phrap package. The dendograms were constructed using the same methodology, but adding 500 bootstraps, followed by inferences on the relationships between Leishmania variants. The sequences of the 20 Brazilian isolates were mapped to the reference genome L. infantum JPCM5, using the Bowtie2 program and the identification of 36 contigs. The information of the valid SNPs were used in the PCA. SNPs were visualized by Geneious 7.1 and IGV. The genome annotations were transferred to their respective chromosomes and displayed on Geneious. The matching sequences of all chromosomes were aligned using Mauve. The phylogenetic trees were calculated according to maximum likelihood and JTT models. Sandflies were analyzed by PCR for the identification of Leishmania infection, a blood meal source and GAPDH sand fly. As a result, hsp70 and ITS1 were not capable of identifying genetic variability among human isolates from symptomatic and asymptomatic, and dogs. The complete sequencing of the 20 Brazilian isolates revealed a strong similarity between the circulating Leishmania strains in Rio Grande do Norte. The isolates collected in the city of Natal from humans and canines remained grouped in all analyzes, suggesting that there is genotypic and geographic proximity among the isolates. The isolated samples in the 1990s had a higher genotypic diversity when compared to freshly isolated samples. All isolates presented 36 chromosomes with variable ploidy among them, no correlation was found between the number of amastina genes copies, gp63, A2 and SSG with such clinic forms. In general, we did not find correlation between symptomatic and asymptomatic clinical forms and the gene content of the Brazilian isolates of Leishmania. 34,28% of the sandflies collected in the upper west region were L. longipalpis and the main sources of blood meal were humans, dogs and chickens.
Resumo:
Leishmania infantum is the main etiologic agent of visceral leishmaniasis in the New World. The pattern of distribution of leishmaniasis has changed substantially and has presented an emerging profile within the periphery of the Large Urban Centers. Leishmania infection can compromise skin, mucosa and viscera. Only 10% of the individuals infected develop the disease and 90% of human infection is asymptomatic. The main factors involved in the development of the disease are the host immune response, the vector’s species and the parasite’s genetic content. The sequencing of Leishmania isolated seeks to increase the understanding of the symptoms of individuals. The aim of this study was to evaluate the genetic diversity of circulating Leishmania strains among humans, and symptomatic and asymptomatic, and dogs from endemic areas of Rio Grande do Norte State and analyze sandflies from endemic areas for cutaneous and visceral disease. The genetic variability was evaluated by the use of markers hsp70 , ITS1 and a whole genome sequencing was also carried out. The amplified hsp70 and ITS1 of samples were analyzed and assembled using a Phred / Phrap package. The dendograms were constructed using the same methodology, but adding 500 bootstraps, followed by inferences on the relationships between Leishmania variants. The sequences of the 20 Brazilian isolates were mapped to the reference genome L. infantum JPCM5, using the Bowtie2 program and the identification of 36 contigs. The information of the valid SNPs were used in the PCA. SNPs were visualized by Geneious 7.1 and IGV. The genome annotations were transferred to their respective chromosomes and displayed on Geneious. The matching sequences of all chromosomes were aligned using Mauve. The phylogenetic trees were calculated according to maximum likelihood and JTT models. Sandflies were analyzed by PCR for the identification of Leishmania infection, a blood meal source and GAPDH sand fly. As a result, hsp70 and ITS1 were not capable of identifying genetic variability among human isolates from symptomatic and asymptomatic, and dogs. The complete sequencing of the 20 Brazilian isolates revealed a strong similarity between the circulating Leishmania strains in Rio Grande do Norte. The isolates collected in the city of Natal from humans and canines remained grouped in all analyzes, suggesting that there is genotypic and geographic proximity among the isolates. The isolated samples in the 1990s had a higher genotypic diversity when compared to freshly isolated samples. All isolates presented 36 chromosomes with variable ploidy among them, no correlation was found between the number of amastina genes copies, gp63, A2 and SSG with such clinic forms. In general, we did not find correlation between symptomatic and asymptomatic clinical forms and the gene content of the Brazilian isolates of Leishmania. 34,28% of the sandflies collected in the upper west region were L. longipalpis and the main sources of blood meal were humans, dogs and chickens.
Resumo:
del Sig:re Sebastiano Nasolini :
Resumo:
Thèse numérisée par la Direction des bibliothèques de l'Université de Montréal.
Resumo:
HomeBank is introduced here. It is a public, permanent, extensible, online database of daylong audio recorded in naturalistic environments. HomeBank serves two primary purposes. First, it is a repository for raw audio and associated files: one database requires special permissions, and another redacted database allows unrestricted public access. Associated files include metadata such as participant demographics and clinical diagnostics, automated annotations, and human-generated transcriptions and annotations. Many recordings use the child-perspective LENA recorders (LENA Research Foundation, Boulder, Colorado, United States), but various recordings and metadata can be accommodated. The HomeBank database can have both vetted and unvetted recordings, with different levels of accessibility. Additionally, HomeBank is an open repository for processing and analysis tools for HomeBank or similar data sets. HomeBank is flexible for users and contributors, making primary data available to researchers, especially those in child development, linguistics, and audio engineering. HomeBank facilitates researchers' access to large-scale data and tools, linking the acoustic, auditory, and linguistic characteristics of children's environments with a variety of variables including socioeconomic status, family characteristics, language trajectories, and disorders. Automated processing applied to daylong home audio recordings is now becoming widely used in early intervention initiatives, helping parents to provide richer speech input to at-risk children.
Resumo:
Background An early objective biomarker to predict the severity of hypoxic-ischaemic encephalopathy (HIE) and identify infants suitable for intervention remains elusive. This thesis aims to progress metabolomic markers of HIE through a pipeline of biomarker discovery and validation by employing a novel untargeted mass spectrometry metabolomic method. Methodology Term infants with perinatal asphyxia were recruited, all having umbilical cord blood (UCB) drawn and biobanked within three hours of birth. HIE was defined by Sarnat score at 24hours and continuous multichannel-EEG. Infant neurodevelopment was assessed at 36-42 months using the Bayley Scales of Infant and Toddler Development Ed. III (BSID-III). Untargeted metabolomic analysis of UCB was performed using direct injection FT-ICR mass spectrometry (DI FT-ICR MS). Putative metabolite annotations and lipid classes were assigned and pathway analysis was performed. Results Untargeted metabolomic analysis: Thirty enrolled infants were diagnosed with HIE, including 17 mild, 8 moderate, and 5 severe cases. Pathway analysis revealed that ΔHIE was associated with a 50% and 75% perturbation of tryptophan and pyrimidine metabolism respectively, alongside alterations in amino acid pathways. Significant metabolite alterations were detected from six putatively identified lipid classes including fatty acyls, glycerolipids, glycerophospholipids, sphingolipids, sterol lipids and prenol lipids. Outcome prediction: Metabolite model scores significantly correlated with outcome R=0.429 (model A) and R=0.549 (model B) respectively. Model B demonstrates the potential to predict both severe outcome (AUROC of 0.915) and intact survival (AUROC of 0.800). The effect of haemolysis: On average 5% of polar and 1.5% of non-polar features were altered between paired haemolysed and clean samples. However unsupervised multivariate analysis concluded that the preanalytical variability introduced by haemolysis was negligible compared with the inherent biological inter-individual variability. Conclusion This research has employed untargeted metabolomics to identify potential early cord blood biomarkers of HIE and has performed the technical validation of previously proposed markers.