942 resultados para Knowledge Discovery Database


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article describes the work performed over the database of questions belonging to the different opinion polls carried during the last 50 years in Spain. Approximately half of the questions are provided with a title while the other half remain untitled. The work and implemented techniques in order to automatically generate the titles for untitled questions are described. This process is performed over very short texts and generated titles are subject to strong stylistic conventions and should be fully grammatical pieces of Spanish

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This PhD thesis contributes to the problem of resource and service discovery in the context of the composable web. In the current web, mashup technologies allow developers reusing services and contents to build new web applications. However, developers face a problem of information flood when searching for appropriate services or resources for their combination. To contribute to overcoming this problem, a framework is defined for the discovery of services and resources. In this framework, three levels are defined for performing discovery at content, discovery and agente levels. The content level involves the information available in web resources. The web follows the Representational Stateless Transfer (REST) architectural style, in which resources are returned as representations from servers to clients. These representations usually employ the HyperText Markup Language (HTML), which, along with Content Style Sheets (CSS), describes the markup employed to render representations in a web browser. Although the use of SemanticWeb standards such as Resource Description Framework (RDF) make this architecture suitable for automatic processes to use the information present in web resources, these standards are too often not employed, so automation must rely on processing HTML. This process, often referred as Screen Scraping in the literature, is the content discovery according to the proposed framework. At this level, discovery rules indicate how the different pieces of data in resources’ representations are mapped onto semantic entities. By processing discovery rules on web resources, semantically described contents can be obtained out of them. The service level involves the operations that can be performed on the web. The current web allows users to perform different tasks such as search, blogging, e-commerce, or social networking. To describe the possible services in RESTful architectures, a high-level feature-oriented service methodology is proposed at this level. This lightweight description framework allows defining service discovery rules to identify operations in interactions with REST resources. The discovery is thus performed by applying discovery rules to contents discovered in REST interactions, in a novel process called service probing. Also, service discovery can be performed by modelling services as contents, i.e., by retrieving Application Programming Interface (API) documentation and API listings in service registries such as ProgrammableWeb. For this, a unified model for composable components in Mashup-Driven Development (MDD) has been defined after the analysis of service repositories from the web. The agent level involves the orchestration of the discovery of services and contents. At this level, agent rules allow to specify behaviours for crawling and executing services, which results in the fulfilment of a high-level goal. Agent rules are plans that allow introspecting the discovered data and services from the web and the knowledge present in service and content discovery rules to anticipate the contents and services to be found on specific resources from the web. By the definition of plans, an agent can be configured to target specific resources. The discovery framework has been evaluated on different scenarios, each one covering different levels of the framework. Contenidos a la Carta project deals with the mashing-up of news from electronic newspapers, and the framework was used for the discovery and extraction of pieces of news from the web. Similarly, in Resulta and VulneraNET projects the discovery of ideas and security knowledge in the web is covered, respectively. The service level is covered in the OMELETTE project, where mashup components such as services and widgets are discovered from component repositories from the web. The agent level is applied to the crawling of services and news in these scenarios, highlighting how the semantic description of rules and extracted data can provide complex behaviours and orchestrations of tasks in the web. The main contributions of the thesis are the unified framework for discovery, which allows configuring agents to perform automated tasks. Also, a scraping ontology has been defined for the construction of mappings for scraping web resources. A novel first-order logic rule induction algorithm is defined for the automated construction and maintenance of these mappings out of the visual information in web resources. Additionally, a common unified model for the discovery of services is defined, which allows sharing service descriptions. Future work comprises the further extension of service probing, resource ranking, the extension of the Scraping Ontology, extensions of the agent model, and contructing a base of discovery rules. Resumen La presente tesis doctoral contribuye al problema de descubrimiento de servicios y recursos en el contexto de la web combinable. En la web actual, las tecnologías de combinación de aplicaciones permiten a los desarrolladores reutilizar servicios y contenidos para construir nuevas aplicaciones web. Pese a todo, los desarrolladores afrontan un problema de saturación de información a la hora de buscar servicios o recursos apropiados para su combinación. Para contribuir a la solución de este problema, se propone un marco de trabajo para el descubrimiento de servicios y recursos. En este marco, se definen tres capas sobre las que se realiza descubrimiento a nivel de contenido, servicio y agente. El nivel de contenido involucra a la información disponible en recursos web. La web sigue el estilo arquitectónico Representational Stateless Transfer (REST), en el que los recursos son devueltos como representaciones por parte de los servidores a los clientes. Estas representaciones normalmente emplean el lenguaje de marcado HyperText Markup Language (HTML), que, unido al estándar Content Style Sheets (CSS), describe el marcado empleado para mostrar representaciones en un navegador web. Aunque el uso de estándares de la web semántica como Resource Description Framework (RDF) hace apta esta arquitectura para su uso por procesos automatizados, estos estándares no son empleados en muchas ocasiones, por lo que cualquier automatización debe basarse en el procesado del marcado HTML. Este proceso, normalmente conocido como Screen Scraping en la literatura, es el descubrimiento de contenidos en el marco de trabajo propuesto. En este nivel, un conjunto de reglas de descubrimiento indican cómo los diferentes datos en las representaciones de recursos se corresponden con entidades semánticas. Al procesar estas reglas sobre recursos web, pueden obtenerse contenidos descritos semánticamente. El nivel de servicio involucra las operaciones que pueden ser llevadas a cabo en la web. Actualmente, los usuarios de la web pueden realizar diversas tareas como búsqueda, blogging, comercio electrónico o redes sociales. Para describir los posibles servicios en arquitecturas REST, se propone en este nivel una metodología de alto nivel para descubrimiento de servicios orientada a funcionalidades. Este marco de descubrimiento ligero permite definir reglas de descubrimiento de servicios para identificar operaciones en interacciones con recursos REST. Este descubrimiento es por tanto llevado a cabo al aplicar las reglas de descubrimiento sobre contenidos descubiertos en interacciones REST, en un nuevo procedimiento llamado sondeo de servicios. Además, el descubrimiento de servicios puede ser llevado a cabo mediante el modelado de servicios como contenidos. Es decir, mediante la recuperación de documentación de Application Programming Interfaces (APIs) y listas de APIs en registros de servicios como ProgrammableWeb. Para ello, se ha definido un modelo unificado de componentes combinables para Mashup-Driven Development (MDD) tras el análisis de repositorios de servicios de la web. El nivel de agente involucra la orquestación del descubrimiento de servicios y contenidos. En este nivel, las reglas de nivel de agente permiten especificar comportamientos para el rastreo y ejecución de servicios, lo que permite la consecución de metas de mayor nivel. Las reglas de los agentes son planes que permiten la introspección sobre los datos y servicios descubiertos, así como sobre el conocimiento presente en las reglas de descubrimiento de servicios y contenidos para anticipar contenidos y servicios por encontrar en recursos específicos de la web. Mediante la definición de planes, un agente puede ser configurado para descubrir recursos específicos. El marco de descubrimiento ha sido evaluado sobre diferentes escenarios, cada uno cubriendo distintos niveles del marco. El proyecto Contenidos a la Carta trata de la combinación de noticias de periódicos digitales, y en él el framework se ha empleado para el descubrimiento y extracción de noticias de la web. De manera análoga, en los proyectos Resulta y VulneraNET se ha llevado a cabo un descubrimiento de ideas y de conocimientos de seguridad, respectivamente. El nivel de servicio se cubre en el proyecto OMELETTE, en el que componentes combinables como servicios y widgets se descubren en repositorios de componentes de la web. El nivel de agente se aplica al rastreo de servicios y noticias en estos escenarios, mostrando cómo la descripción semántica de reglas y datos extraídos permiten proporcionar comportamientos complejos y orquestaciones de tareas en la web. Las principales contribuciones de la tesis son el marco de trabajo unificado para descubrimiento, que permite configurar agentes para realizar tareas automatizadas. Además, una ontología de extracción ha sido definida para la construcción de correspondencias y extraer información de recursos web. Asimismo, un algoritmo para la inducción de reglas de lógica de primer orden se ha definido para la construcción y el mantenimiento de estas correspondencias a partir de la información visual de recursos web. Adicionalmente, se ha definido un modelo común y unificado para el descubrimiento de servicios que permite la compartición de descripciones de servicios. Como trabajos futuros se considera la extensión del sondeo de servicios, clasificación de recursos, extensión de la ontología de extracción y la construcción de una base de reglas de descubrimiento.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La tomografía axial computerizada (TAC) es la modalidad de imagen médica preferente para el estudio de enfermedades pulmonares y el análisis de su vasculatura. La segmentación general de vasos en pulmón ha sido abordada en profundidad a lo largo de los últimos años por la comunidad científica que trabaja en el campo de procesamiento de imagen; sin embargo, la diferenciación entre irrigaciones arterial y venosa es aún un problema abierto. De hecho, la separación automática de arterias y venas está considerado como uno de los grandes retos futuros del procesamiento de imágenes biomédicas. La segmentación arteria-vena (AV) permitiría el estudio de ambas irrigaciones por separado, lo cual tendría importantes consecuencias en diferentes escenarios médicos y múltiples enfermedades pulmonares o estados patológicos. Características como la densidad, geometría, topología y tamaño de los vasos sanguíneos podrían ser analizados en enfermedades que conllevan remodelación de la vasculatura pulmonar, haciendo incluso posible el descubrimiento de nuevos biomarcadores específicos que aún hoy en dípermanecen ocultos. Esta diferenciación entre arterias y venas también podría ayudar a la mejora y el desarrollo de métodos de procesamiento de las distintas estructuras pulmonares. Sin embargo, el estudio del efecto de las enfermedades en los árboles arterial y venoso ha sido inviable hasta ahora a pesar de su indudable utilidad. La extrema complejidad de los árboles vasculares del pulmón hace inabordable una separación manual de ambas estructuras en un tiempo realista, fomentando aún más la necesidad de diseñar herramientas automáticas o semiautomáticas para tal objetivo. Pero la ausencia de casos correctamente segmentados y etiquetados conlleva múltiples limitaciones en el desarrollo de sistemas de separación AV, en los cuales son necesarias imágenes de referencia tanto para entrenar como para validar los algoritmos. Por ello, el diseño de imágenes sintéticas de TAC pulmonar podría superar estas dificultades ofreciendo la posibilidad de acceso a una base de datos de casos pseudoreales bajo un entorno restringido y controlado donde cada parte de la imagen (incluyendo arterias y venas) está unívocamente diferenciada. En esta Tesis Doctoral abordamos ambos problemas, los cuales están fuertemente interrelacionados. Primero se describe el diseño de una estrategia para generar, automáticamente, fantomas computacionales de TAC de pulmón en humanos. Partiendo de conocimientos a priori, tanto biológicos como de características de imagen de CT, acerca de la topología y relación entre las distintas estructuras pulmonares, el sistema desarrollado es capaz de generar vías aéreas, arterias y venas pulmonares sintéticas usando métodos de crecimiento iterativo, que posteriormente se unen para formar un pulmón simulado con características realistas. Estos casos sintéticos, junto a imágenes reales de TAC sin contraste, han sido usados en el desarrollo de un método completamente automático de segmentación/separación AV. La estrategia comprende una primera extracción genérica de vasos pulmonares usando partículas espacio-escala, y una posterior clasificación AV de tales partículas mediante el uso de Graph-Cuts (GC) basados en la similitud con arteria o vena (obtenida con algoritmos de aprendizaje automático) y la inclusión de información de conectividad entre partículas. La validación de los fantomas pulmonares se ha llevado a cabo mediante inspección visual y medidas cuantitativas relacionadas con las distribuciones de intensidad, dispersión de estructuras y relación entre arterias y vías aéreas, los cuales muestran una buena correspondencia entre los pulmones reales y los generados sintéticamente. La evaluación del algoritmo de segmentación AV está basada en distintas estrategias de comprobación de la exactitud en la clasificación de vasos, las cuales revelan una adecuada diferenciación entre arterias y venas tanto en los casos reales como en los sintéticos, abriendo así un amplio abanico de posibilidades en el estudio clínico de enfermedades cardiopulmonares y en el desarrollo de metodologías y nuevos algoritmos para el análisis de imágenes pulmonares. ABSTRACT Computed tomography (CT) is the reference image modality for the study of lung diseases and pulmonary vasculature. Lung vessel segmentation has been widely explored by the biomedical image processing community, however, differentiation of arterial from venous irrigations is still an open problem. Indeed, automatic separation of arterial and venous trees has been considered during last years as one of the main future challenges in the field. Artery-Vein (AV) segmentation would be useful in different medical scenarios and multiple pulmonary diseases or pathological states, allowing the study of arterial and venous irrigations separately. Features such as density, geometry, topology and size of vessels could be analyzed in diseases that imply vasculature remodeling, making even possible the discovery of new specific biomarkers that remain hidden nowadays. Differentiation between arteries and veins could also enhance or improve methods processing pulmonary structures. Nevertheless, AV segmentation has been unfeasible until now in clinical routine despite its objective usefulness. The huge complexity of pulmonary vascular trees makes a manual segmentation of both structures unfeasible in realistic time, encouraging the design of automatic or semiautomatic tools to perform the task. However, this lack of proper labeled cases seriously limits in the development of AV segmentation systems, where reference standards are necessary in both algorithm training and validation stages. For that reason, the design of synthetic CT images of the lung could overcome these difficulties by providing a database of pseudorealistic cases in a constrained and controlled scenario where each part of the image (including arteries and veins) is differentiated unequivocally. In this Ph.D. Thesis we address both interrelated problems. First, the design of a complete framework to automatically generate computational CT phantoms of the human lung is described. Starting from biological and imagebased knowledge about the topology and relationships between structures, the system is able to generate synthetic pulmonary arteries, veins, and airways using iterative growth methods that can be merged into a final simulated lung with realistic features. These synthetic cases, together with labeled real CT datasets, have been used as reference for the development of a fully automatic pulmonary AV segmentation/separation method. The approach comprises a vessel extraction stage using scale-space particles and their posterior artery-vein classification using Graph-Cuts (GC) based on arterial/venous similarity scores obtained with a Machine Learning (ML) pre-classification step and particle connectivity information. Validation of pulmonary phantoms from visual examination and quantitative measurements of intensity distributions, dispersion of structures and relationships between pulmonary air and blood flow systems, show good correspondence between real and synthetic lungs. The evaluation of the Artery-Vein (AV) segmentation algorithm, based on different strategies to assess the accuracy of vessel particles classification, reveal accurate differentiation between arteries and vein in both real and synthetic cases that open a huge range of possibilities in the clinical study of cardiopulmonary diseases and the development of methodological approaches for the analysis of pulmonary images.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A rapidly growing area of genome research is the generation of expressed sequence tags (ESTs) in which large numbers of randomly selected cDNA clones are partially sequenced. The collection of ESTs reflects the level and complexity of gene expression in the sampled tissue. To date, the majority of plant ESTs are from nonwoody plants such as Arabidopsis, Brassica, maize, and rice. Here, we present a large-scale production of ESTs from the wood-forming tissues of two poplars, Populus tremula L. × tremuloides Michx. and Populus trichocarpa ‘Trichobel.’ The 5,692 ESTs analyzed represented a total of 3,719 unique transcripts for the two cDNA libraries. Putative functions could be assigned to 2,245 of these transcripts that corresponded to 820 protein functions. Of specific interest to forest biotechnology are the 4% of ESTs involved in various processes of cell wall formation, such as lignin and cellulose synthesis, 5% similar to developmental regulators and members of known signal transduction pathways, and 2% involved in hormone biosynthesis. An additional 12% of the ESTs showed no significant similarity to any other DNA or protein sequences in existing databases. The absence of these sequences from public databases may indicate a specific role for these proteins in wood formation. The cDNA libraries and the accompanying database are valuable resources for forest research directed toward understanding the genetic control of wood formation and future endeavors to modify wood and fiber properties for industrial use.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. This knowledge has subsequently been applied to the construction of artificial antibodies with prescribed specificities, and to many other studies. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. While new sequences are continually added into this database, we have undertaken the task of developing more analytical methods to study the information content of this collection of aligned sequences. New examples of analysis will be illustrated on a yearly basis. The Kabat Database and its applications are freely available at http://immuno.bme.nwu.edu.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Familial structural rearrangements of chromosomes represent a factor of malformation risk that could vary over a large range, making genetic counseling difficult. However, they also represent a powerful tool for increasing knowledge of the genome, particularly by studying breakpoints and viable imbalances of the genome. We have developed a collaborative database that now includes data on more than 4100 families, from which we have developed a web site called HC Forum® (http://HCForum.imag.fr). It offers geneticists assistance in diagnosis and in genetic counseling by assessing the malformation risk with statistical models. For researchers, interactive interfaces exhibit the distribution of chromosomal breakpoints and of the genome regions observed at birth in trisomy or in monosomy. Dedicated tools including an interactive pedigree allow electronic submission of data, which will be anonymously shown in a forum for discussions. After validation, data are definitively registered in the database with the email of the sender, allowing direct location of biological material. Thus HC Forum® constitutes a link between diagnosis laboratories and genome research centers, and after 1 year, more than 700 users from about 40 different countries already exist.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Arabidopsis thaliana, a small annual plant belonging to the mustard family, is the subject of study by an estimated 7000 researchers around the world. In addition to the large body of genetic, physiological and biochemical data gathered for this plant, it will be the first higher plant genome to be completely sequenced, with completion expected at the end of the year 2000. The sequencing effort has been coordinated by an international collaboration, the Arabidopsis Genome Initiative (AGI). The rationale for intensive investigation of Arabidopsis is that it is an excellent model for higher plants. In order to maximize use of the knowledge gained about this plant, there is a need for a comprehensive database and information retrieval and analysis system that will provide user-friendly access to Arabidopsis information. This paper describes the initial steps we have taken toward realizing these goals in a project called The Arabidopsis Information Resource (TAIR) (www.arabidopsis.org).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ACTIVITY is a database on DNA/RNA site sequences with known activity magnitudes, measurement systems, sequence-activity relationships under fixed experimental conditions and procedures to adapt these relationships from one measurement system to another. This database deposits information on DNA/RNA affinities to proteins and cell nuclear extracts, cutting efficiencies, gene transcription activity, mRNA translation efficiencies, mutability and other biological activities of natural sites occurring within promoters, mRNA leaders, and other regulatory regions in pro- and eukaryotic genomes, their mutant forms and synthetic analogues. Since activity magnitudes are heavily system-dependent, the current version of ACTIVITY is supplemented by three novel sub-databases: (i) SYSTEM, measurement systems; (ii) KNOWLEDGE, sequence-activity relationships under fixed experimental conditions; and (iii) CROSS_TEST, procedures adapting a relationship from one measurement system to another. These databases are useful in molecular biology, pharmacogenetics, metabolic engineering, drug design and biotechnology. The databases can be queried using SRS and are available through the Web, http://wwwmgs.bionet.nsc.ru/systems/Activity/.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Methylation of cytosine in the 5 position of the pyrimidine ring is a major modification of the DNA in most organisms. In eukaryotes, the distribution and number of 5-methylcytosines (5mC) along the DNA is heritable but can also change with the developmental state of the cell and as a response to modifications of the environment. While DNA methylation probably has a number of functions, scientific interest has recently focused on the gene silencing effect methylation can have in eukaryotic cells. In particular, the discovery of changes in the methylation level during cancer development has increased the interest in this field. In the past, a vast amount of data has been generated with different levels of resolution ranging from 5mC content of total DNA to the methylation status of single nucleotides. We present here a database for DNA methylation data that attempts to unify these results in a common resource. The database is accessible via WWW (http://www.methdb.de). It stores information about the origin of the investigated sample and the experimental procedure, and contains the DNA methylation data. Query masks allow for searching for 5mC content, species, tissue, gene, sex, phenotype, sequence ID and DNA type. The output lists all available information including the relative gene expression level. DNA methylation patterns and methylation profiles are shown both as a graphical representation and as G/A/T/C/5mC-sequences or tables with sequence positions and methylation levels, respectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present study aims to inventory and analyse the ethnobotanical knowledge about medicinal plants in the Serra de Mariola Natural Park. In respect to traditional uses, 93 species reported by local informants were therapeutic, 27 food, 4 natural dyes and 13 handcrafts. We developed a methodology that allowed the location of individuals or vegetation communities with a specific popular use. We prepared a geographic information system (GIS) that included gender, family, scientific nomenclature and common names in Spanish and Catalan for each species. We also made a classification of 39 medicinal uses from ATC (Anatomical, Therapeutic, Chemical classification system). Labiatae (n=19), Compositae (n=9) and Leguminosae (n=6) were the families most represented among the plants used to different purposes in humans. Species with the most elevated cultural importance index (CI) values were Thymus vulgaris (CI=1.431), Rosmarinus officinalis (CI=1.415), Eryngium campestre (CI=1.325), Verbascum sinuatum (CI=1.106) and Sideritis angustifolia (CI=1.041). Thus, the collected plants with more therapeutic uses were: Lippia triphylla (12), Thymus vulgaris and Allium roseum (9) and Erygium campestre (8). The most repeated ATC uses were: G04 (urological use), D03 (treatment of wounds and ulcers) and R02 (throat diseases). These results were in a geographic map where each point represented an individual of any species. A database was created with the corresponding therapeutic uses. This application is useful for the identification of individuals and the selection of species for specific medicinal properties. In the end, knowledge of these useful plants may be interesting to revive the local economy and in some cases promote their cultivation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The microbiota of multi-pond solar salterns around the world has been analyzed using a variety of culture-dependent and molecular techniques. However, studies addressing the dynamic nature of these systems are very scarce. Here we have characterized the temporal variation during 1 year of the microbiota of five ponds with increasing salinity (from 18% to >40%), by means of CARD-FISH and DGGE. Microbial community structure was statistically correlated with several environmental parameters, including ionic composition and meteorological factors, indicating that the microbial community was dynamic as specific phylotypes appeared only at certain times of the year. In addition to total salinity, microbial composition was strongly influenced by temperature and specific ionic composition. Remarkably, DGGE analyses unveiled the presence of most phylotypes previously detected in hypersaline systems using metagenomics and other molecular techniques, such as the very abundant Haloquadratum and Salinibacter representatives or the recently described low GC Actinobacteria and Nanohaloarchaeota. In addition, an uncultured group of Bacteroidetes was present along the whole range of salinity. Database searches indicated a previously unrecognized widespread distribution of this phylotype. Single-cell genome analysis of five members of this group suggested a set of metabolic characteristics that could provide competitive advantages in hypersaline environments, such as polymer degradation capabilities, the presence of retinal-binding light-activated proton pumps and arsenate reduction potential. In addition, the fairly high metagenomic fragment recruitment obtained for these single cells in both the intermediate and hypersaline ponds further confirm the DGGE data and point to the generalist lifestyle of this new Bacteroidetes group.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Decision support systems (DSS) support business or organizational decision-making activities, which require the access to information that is internally stored in databases or data warehouses, and externally in the Web accessed by Information Retrieval (IR) or Question Answering (QA) systems. Graphical interfaces to query these sources of information ease to constrain dynamically query formulation based on user selections, but they present a lack of flexibility in query formulation, since the expressivity power is reduced to the user interface design. Natural language interfaces (NLI) are expected as the optimal solution. However, especially for non-expert users, a real natural communication is the most difficult to realize effectively. In this paper, we propose an NLI that improves the interaction between the user and the DSS by means of referencing previous questions or their answers (i.e. anaphora such as the pronoun reference in “What traits are affected by them?”), or by eliding parts of the question (i.e. ellipsis such as “And to glume colour?” after the question “Tell me the QTLs related to awn colour in wheat”). Moreover, in order to overcome one of the main problems of NLIs about the difficulty to adapt an NLI to a new domain, our proposal is based on ontologies that are obtained semi-automatically from a framework that allows the integration of internal and external, structured and unstructured information. Therefore, our proposal can interface with databases, data warehouses, QA and IR systems. Because of the high NL ambiguity of the resolution process, our proposal is presented as an authoring tool that helps the user to query efficiently in natural language. Finally, our proposal is tested on a DSS case scenario about Biotechnology and Agriculture, whose knowledge base is the CEREALAB database as internal structured data, and the Web (e.g. PubMed) as external unstructured information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The cores and dredges described in this report were taken on Cruise 16 of the R.R.S. "Discovery" from January until May 1967 by the National Institute of Oceanography, Wormley, United Kingdom. A total of 73 cores and dredges were recovered and are available through the British Oceanographic Data Centre for sampling and study.