965 resultados para Information discovery


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This PhD thesis contributes to the problem of resource and service discovery in the context of the composable web. In the current web, mashup technologies allow developers reusing services and contents to build new web applications. However, developers face a problem of information flood when searching for appropriate services or resources for their combination. To contribute to overcoming this problem, a framework is defined for the discovery of services and resources. In this framework, three levels are defined for performing discovery at content, discovery and agente levels. The content level involves the information available in web resources. The web follows the Representational Stateless Transfer (REST) architectural style, in which resources are returned as representations from servers to clients. These representations usually employ the HyperText Markup Language (HTML), which, along with Content Style Sheets (CSS), describes the markup employed to render representations in a web browser. Although the use of SemanticWeb standards such as Resource Description Framework (RDF) make this architecture suitable for automatic processes to use the information present in web resources, these standards are too often not employed, so automation must rely on processing HTML. This process, often referred as Screen Scraping in the literature, is the content discovery according to the proposed framework. At this level, discovery rules indicate how the different pieces of data in resources’ representations are mapped onto semantic entities. By processing discovery rules on web resources, semantically described contents can be obtained out of them. The service level involves the operations that can be performed on the web. The current web allows users to perform different tasks such as search, blogging, e-commerce, or social networking. To describe the possible services in RESTful architectures, a high-level feature-oriented service methodology is proposed at this level. This lightweight description framework allows defining service discovery rules to identify operations in interactions with REST resources. The discovery is thus performed by applying discovery rules to contents discovered in REST interactions, in a novel process called service probing. Also, service discovery can be performed by modelling services as contents, i.e., by retrieving Application Programming Interface (API) documentation and API listings in service registries such as ProgrammableWeb. For this, a unified model for composable components in Mashup-Driven Development (MDD) has been defined after the analysis of service repositories from the web. The agent level involves the orchestration of the discovery of services and contents. At this level, agent rules allow to specify behaviours for crawling and executing services, which results in the fulfilment of a high-level goal. Agent rules are plans that allow introspecting the discovered data and services from the web and the knowledge present in service and content discovery rules to anticipate the contents and services to be found on specific resources from the web. By the definition of plans, an agent can be configured to target specific resources. The discovery framework has been evaluated on different scenarios, each one covering different levels of the framework. Contenidos a la Carta project deals with the mashing-up of news from electronic newspapers, and the framework was used for the discovery and extraction of pieces of news from the web. Similarly, in Resulta and VulneraNET projects the discovery of ideas and security knowledge in the web is covered, respectively. The service level is covered in the OMELETTE project, where mashup components such as services and widgets are discovered from component repositories from the web. The agent level is applied to the crawling of services and news in these scenarios, highlighting how the semantic description of rules and extracted data can provide complex behaviours and orchestrations of tasks in the web. The main contributions of the thesis are the unified framework for discovery, which allows configuring agents to perform automated tasks. Also, a scraping ontology has been defined for the construction of mappings for scraping web resources. A novel first-order logic rule induction algorithm is defined for the automated construction and maintenance of these mappings out of the visual information in web resources. Additionally, a common unified model for the discovery of services is defined, which allows sharing service descriptions. Future work comprises the further extension of service probing, resource ranking, the extension of the Scraping Ontology, extensions of the agent model, and contructing a base of discovery rules. Resumen La presente tesis doctoral contribuye al problema de descubrimiento de servicios y recursos en el contexto de la web combinable. En la web actual, las tecnologías de combinación de aplicaciones permiten a los desarrolladores reutilizar servicios y contenidos para construir nuevas aplicaciones web. Pese a todo, los desarrolladores afrontan un problema de saturación de información a la hora de buscar servicios o recursos apropiados para su combinación. Para contribuir a la solución de este problema, se propone un marco de trabajo para el descubrimiento de servicios y recursos. En este marco, se definen tres capas sobre las que se realiza descubrimiento a nivel de contenido, servicio y agente. El nivel de contenido involucra a la información disponible en recursos web. La web sigue el estilo arquitectónico Representational Stateless Transfer (REST), en el que los recursos son devueltos como representaciones por parte de los servidores a los clientes. Estas representaciones normalmente emplean el lenguaje de marcado HyperText Markup Language (HTML), que, unido al estándar Content Style Sheets (CSS), describe el marcado empleado para mostrar representaciones en un navegador web. Aunque el uso de estándares de la web semántica como Resource Description Framework (RDF) hace apta esta arquitectura para su uso por procesos automatizados, estos estándares no son empleados en muchas ocasiones, por lo que cualquier automatización debe basarse en el procesado del marcado HTML. Este proceso, normalmente conocido como Screen Scraping en la literatura, es el descubrimiento de contenidos en el marco de trabajo propuesto. En este nivel, un conjunto de reglas de descubrimiento indican cómo los diferentes datos en las representaciones de recursos se corresponden con entidades semánticas. Al procesar estas reglas sobre recursos web, pueden obtenerse contenidos descritos semánticamente. El nivel de servicio involucra las operaciones que pueden ser llevadas a cabo en la web. Actualmente, los usuarios de la web pueden realizar diversas tareas como búsqueda, blogging, comercio electrónico o redes sociales. Para describir los posibles servicios en arquitecturas REST, se propone en este nivel una metodología de alto nivel para descubrimiento de servicios orientada a funcionalidades. Este marco de descubrimiento ligero permite definir reglas de descubrimiento de servicios para identificar operaciones en interacciones con recursos REST. Este descubrimiento es por tanto llevado a cabo al aplicar las reglas de descubrimiento sobre contenidos descubiertos en interacciones REST, en un nuevo procedimiento llamado sondeo de servicios. Además, el descubrimiento de servicios puede ser llevado a cabo mediante el modelado de servicios como contenidos. Es decir, mediante la recuperación de documentación de Application Programming Interfaces (APIs) y listas de APIs en registros de servicios como ProgrammableWeb. Para ello, se ha definido un modelo unificado de componentes combinables para Mashup-Driven Development (MDD) tras el análisis de repositorios de servicios de la web. El nivel de agente involucra la orquestación del descubrimiento de servicios y contenidos. En este nivel, las reglas de nivel de agente permiten especificar comportamientos para el rastreo y ejecución de servicios, lo que permite la consecución de metas de mayor nivel. Las reglas de los agentes son planes que permiten la introspección sobre los datos y servicios descubiertos, así como sobre el conocimiento presente en las reglas de descubrimiento de servicios y contenidos para anticipar contenidos y servicios por encontrar en recursos específicos de la web. Mediante la definición de planes, un agente puede ser configurado para descubrir recursos específicos. El marco de descubrimiento ha sido evaluado sobre diferentes escenarios, cada uno cubriendo distintos niveles del marco. El proyecto Contenidos a la Carta trata de la combinación de noticias de periódicos digitales, y en él el framework se ha empleado para el descubrimiento y extracción de noticias de la web. De manera análoga, en los proyectos Resulta y VulneraNET se ha llevado a cabo un descubrimiento de ideas y de conocimientos de seguridad, respectivamente. El nivel de servicio se cubre en el proyecto OMELETTE, en el que componentes combinables como servicios y widgets se descubren en repositorios de componentes de la web. El nivel de agente se aplica al rastreo de servicios y noticias en estos escenarios, mostrando cómo la descripción semántica de reglas y datos extraídos permiten proporcionar comportamientos complejos y orquestaciones de tareas en la web. Las principales contribuciones de la tesis son el marco de trabajo unificado para descubrimiento, que permite configurar agentes para realizar tareas automatizadas. Además, una ontología de extracción ha sido definida para la construcción de correspondencias y extraer información de recursos web. Asimismo, un algoritmo para la inducción de reglas de lógica de primer orden se ha definido para la construcción y el mantenimiento de estas correspondencias a partir de la información visual de recursos web. Adicionalmente, se ha definido un modelo común y unificado para el descubrimiento de servicios que permite la compartición de descripciones de servicios. Como trabajos futuros se considera la extensión del sondeo de servicios, clasificación de recursos, extensión de la ontología de extracción y la construcción de una base de reglas de descubrimiento.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nanotechnology represents an area of particular promise and significant opportunity across multiple scientific disciplines. Ongoing nanotechnology research ranges from the characterization of nanoparticles and nanomaterials to the analysis and processing of experimental data seeking correlations between nanoparticles and their functionalities and side effects. Due to their special properties, nanoparticles are suitable for cellular-level diagnostics and therapy, offering numerous applications in medicine, e.g. development of biomedical devices, tissue repair, drug delivery systems and biosensors. In nanomedicine, recent studies are producing large amounts of structural and property data, highlighting the role for computational approaches in information management. While in vitro and in vivo assays are expensive, the cost of computing is falling. Furthermore, improvements in the accuracy of computational methods (e.g. data mining, knowledge discovery, modeling and simulation) have enabled effective tools to automate the extraction, management and storage of these vast data volumes. Since this information is widely distributed, one major issue is how to locate and access data where it resides (which also poses data-sharing limitations). The novel discipline of nanoinformatics addresses the information challenges related to nanotechnology research. In this paper, we summarize the needs and challenges in the field and present an overview of extant initiatives and efforts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One medium-term strategy for helping in the management of complexity is the introduction of a conceptual complexity component in the very centre of university curricula. In very few areas is the growth of complexity as evident as in the information technologies (ITs), the focus of the work presented in the current paper. We have therefore developed an integrated way of tackling the specific field of information technologies by means of an approach,to complexity. The content of this paper describes the guidelines of our research effort, placing an emphasis on informatics. Concepts of complexity based on the system metaphor have been substantially drawn upon in this exercise and are thus presented in some detail. Also described is a didactic experiment conducted by the author and designed to provide a new and integrating approach to University curricula for future professionals. The students' "discovery" of complexity is the focal point of the experiment. The findings of this effort are encouraging and call for the continuation and expansion of this experiment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tradicionalmente, el uso de técnicas de análisis de datos ha sido una de las principales vías para el descubrimiento de conocimiento oculto en grandes cantidades de datos, recopilados por expertos en diferentes dominios. Por otra parte, las técnicas de visualización también se han usado para mejorar y facilitar este proceso. Sin embargo, existen limitaciones serias en la obtención de conocimiento, ya que suele ser un proceso lento, tedioso y en muchas ocasiones infructífero, debido a la dificultad de las personas para comprender conjuntos de datos de grandes dimensiones. Otro gran inconveniente, pocas veces tenido en cuenta por los expertos que analizan grandes conjuntos de datos, es la degradación involuntaria a la que someten a los datos durante las tareas de análisis, previas a la obtención final de conclusiones. Por degradación quiere decirse que los datos pueden perder sus propiedades originales, y suele producirse por una reducción inapropiada de los datos, alterando así su naturaleza original y llevando en muchos casos a interpretaciones y conclusiones erróneas que podrían tener serias implicaciones. Además, este hecho adquiere una importancia trascendental cuando los datos pertenecen al dominio médico o biológico, y la vida de diferentes personas depende de esta toma final de decisiones, en algunas ocasiones llevada a cabo de forma inapropiada. Ésta es la motivación de la presente tesis, la cual propone un nuevo framework visual, llamado MedVir, que combina la potencia de técnicas avanzadas de visualización y minería de datos para tratar de dar solución a estos grandes inconvenientes existentes en el proceso de descubrimiento de información válida. El objetivo principal es hacer más fácil, comprensible, intuitivo y rápido el proceso de adquisición de conocimiento al que se enfrentan los expertos cuando trabajan con grandes conjuntos de datos en diferentes dominios. Para ello, en primer lugar, se lleva a cabo una fuerte disminución en el tamaño de los datos con el objetivo de facilitar al experto su manejo, y a la vez preservando intactas, en la medida de lo posible, sus propiedades originales. Después, se hace uso de efectivas técnicas de visualización para representar los datos obtenidos, permitiendo al experto interactuar de forma sencilla e intuitiva con los datos, llevar a cabo diferentes tareas de análisis de datos y así estimular visualmente su capacidad de comprensión. De este modo, el objetivo subyacente se basa en abstraer al experto, en la medida de lo posible, de la complejidad de sus datos originales para presentarle una versión más comprensible, que facilite y acelere la tarea final de descubrimiento de conocimiento. MedVir se ha aplicado satisfactoriamente, entre otros, al campo de la magnetoencefalografía (MEG), que consiste en la predicción en la rehabilitación de lesiones cerebrales traumáticas (Traumatic Brain Injury (TBI) rehabilitation prediction). Los resultados obtenidos demuestran la efectividad del framework a la hora de acelerar y facilitar el proceso de descubrimiento de conocimiento sobre conjuntos de datos reales. ABSTRACT Traditionally, the use of data analysis techniques has been one of the main ways of discovering knowledge hidden in large amounts of data, collected by experts in different domains. Moreover, visualization techniques have also been used to enhance and facilitate this process. However, there are serious limitations in the process of knowledge acquisition, as it is often a slow, tedious and many times fruitless process, due to the difficulty for human beings to understand large datasets. Another major drawback, rarely considered by experts that analyze large datasets, is the involuntary degradation to which they subject the data during analysis tasks, prior to obtaining the final conclusions. Degradation means that data can lose part of their original properties, and it is usually caused by improper data reduction, thereby altering their original nature and often leading to erroneous interpretations and conclusions that could have serious implications. Furthermore, this fact gains a trascendental importance when the data belong to medical or biological domain, and the lives of people depends on the final decision-making, which is sometimes conducted improperly. This is the motivation of this thesis, which proposes a new visual framework, called MedVir, which combines the power of advanced visualization techniques and data mining to try to solve these major problems existing in the process of discovery of valid information. Thus, the main objective is to facilitate and to make more understandable, intuitive and fast the process of knowledge acquisition that experts face when working with large datasets in different domains. To achieve this, first, a strong reduction in the size of the data is carried out in order to make the management of the data easier to the expert, while preserving intact, as far as possible, the original properties of the data. Then, effective visualization techniques are used to represent the obtained data, allowing the expert to interact easily and intuitively with the data, to carry out different data analysis tasks, and so visually stimulating their comprehension capacity. Therefore, the underlying objective is based on abstracting the expert, as far as possible, from the complexity of the original data to present him a more understandable version, thus facilitating and accelerating the task of knowledge discovery. MedVir has been succesfully applied to, among others, the field of magnetoencephalography (MEG), which consists in predicting the rehabilitation of Traumatic Brain Injury (TBI). The results obtained successfully demonstrate the effectiveness of the framework to accelerate and facilitate the process of knowledge discovery on real world datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the challenges facing the current web is the efficient use of all the available information. The Web 2.0 phenomenon has favored the creation of contents by average users, and thus the amount of information that can be found for diverse topics has grown exponentially in the last years. Initiatives such as linked data are helping to build the Semantic Web, in which a set of standards are proposed for the exchange of data among heterogeneous systems. However, these standards are sometimes not used, and there are still plenty of websites that require naive techniques to discover their contents and services. This paper proposes an integrated framework for content and service discovery and extraction. The framework is divided into several layers where the discovery of contents and services is made in a representational stateless transfer system such as the web. It employs several web mining techniques as well as feature-oriented modeling for the discovery of cross-cutting features in web resources. The framework is used in a scenario of electronic newspapers. An intelligent agent crawls the web for related news, and uses services and visits links automatically according to its goal. This scenario illustrates how the discovery is made at different levels and how the use of semantics helps implement an agent that performs high-level tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Transgenic mice carrying heterologous genes directed by a 670-bp segment of the regulatory sequence from the human transferrin (TF) gene demonstrated high expression in brain. Mice carrying the chimeric 0.67kbTF-CAT gene expressed TF-CAT in neurons and glial cells of the nucleus basalis, the cerebrum, corpus callosum, cerebellum, and hippocampus. In brains from two independent TF-CAT transgenic founder lines, copy number of TF-CAT mRNA exceeded the number of mRNA transcripts encoding either mouse endogenous transferrin or mouse endogenous amyloid precursor protein. In two transgenic founder lines, the chloramphenicol acetyltransferase (CAT) protein synthesized from the TF-CAT mRNA was estimated to be 0.10-0.15% of the total soluble proteins of the brain. High expression observed in brain indicates that the 0.67kbTF promoter is a promising director of brain expression of heterologous genes. Therefore, the promoter has been used to express the three common human apolipoprotein E (apoE) alleles in transgenic mouse brains. The apoE alleles have been implicated in the expression of Alzheimer disease, and the human apoE isoforms are reported to interact with different affinities to the brain beta-amyloid and tau protein in vitro. Results of this study demonstrate high expression and production of human apoE proteins in transgenic mouse brains. The model may be used to characterize the interaction of human apoE isoforms with other brain proteins and provide information helpful in designing therapeutic strategies for Alzheimer disease.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many academic libraries are implementing discovery services as a way of giving their users a single comprehensive search option for all library resources. These tools are designed to change the research experience, yet very few studies have investigated the impact of discovery service implementation. This study examines one aspect of that impact by asking whether usage of publisher-hosted journal content changes after implementation of a discovery tool. Libraries that have begun using the four major discovery services have seen an increase in usage of this content, suggesting that for this particular type of material, discovery services have a positive impact on use. Though all discovery services significantly increased usage relative to a no discovery service control group, some had a greater impact than others, and there was extensive variation in usage change among libraries using the same service. Future phases of this study will look at other types of content.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Geographic knowledge discovery (GKD) is the process of extracting information and knowledge from massive georeferenced databases. Usually the process is accomplished by two different systems, the Geographic Information Systems (GIS) and the data mining engines. However, the development of those systems is a complex task due to it does not follow a systematic, integrated and standard methodology. To overcome these pitfalls, in this paper, we propose a modeling framework that addresses the development of the different parts of a multilayer GKD process. The main advantages of our framework are that: (i) it reduces the design effort, (ii) it improves quality systems obtained, (iii) it is independent of platforms, (iv) it facilitates the use of data mining techniques on geo-referenced data, and finally, (v) it ameliorates the communication between different users.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This layer is a georeferenced raster image of the historic paper map entitled: Chart of the world on Mercators projection : exhibiting all the new discoveries to the present time, with the tracks of the most distinguished navigators since the year 1700 carefully collected from the best charts, maps, voyages, &c. extant and regulated from the accurate astronomical observations made in three voyages performed under the command of Captn. James Cook in the years 1768, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79 & 80, compiled and published by A. Arrowsmith, geographer; by permission of Simon McTavish Esq[r] is correctly delineated the discoveries of Mr. McKenzie laid down from his original journal in the year 1789. It was published by A. Arrowsmith, April 1, 1790. Scale [ca. 1:20,000,000]. This layer is image 1 of 8 total images of the seven sheet source map. Covers portions of eastern Asia, Siberia, Russia, Pacific Islands, and western portions of Canada and the United States including Alaska. The image inside the map neatline is georeferenced to the surface of the earth and fit to a non-standard 'World Mercator' projection, with the central meridian at 180 degrees west. All map collar and inset information is also available as part of the raster image, including any inset maps, profiles, statistical tables, directories, text, illustrations, index maps, legends, or other information associated with the principal map. Note: The central meridian of this map is not the same as the Prime Meridian and may wrap the International Date Line or overlap itself when displayed in GIS software. This map shows features such as drainage, cities and other human settlements, territorial boundaries, shoreline features, and more. Relief shown by hachures. Depths shown by soundings. Includes routes, locations, and dates of James Cook's voyages. This layer is part of a selection of digitally scanned and georeferenced historic maps from the Harvard Map Collection and the Harvard University Library as part of the Open Collections Program at Harvard University project: Organizing Our World: Sponsored Exploration and Scientific Discovery in the Modern Age. Maps selected for the project correspond to various expeditions and represent a range of regions, originators, ground condition dates, scales, and purposes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This layer is a georeferenced raster image of the historic paper map entitled: Chart of the world on Mercators projection : exhibiting all the new discoveries to the present time, with the tracks of the most distinguished navigators since the year 1700 carefully collected from the best charts, maps, voyages, &c. extant and regulated from the accurate astronomical observations made in three voyages performed under the command of Captn. James Cook in the years 1768, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79 & 80, compiled and published by A. Arrowsmith, geographer; by permission of Simon McTavish Esq[r] is correctly delineated the discoveries of Mr. McKenzie laid down from his original journal in the year 1789. It was published by A. Arrowsmith, April 1, 1790. Scale [ca. 1:20,000,000]. This layer is image 2 of 8 total images of the seven sheet source map. Covers portions of Europe, Northern Africa, and Asia. The image inside the map neatline is georeferenced to the surface of the earth and fit to the 'World Mercator' projection. All map collar and inset information is also available as part of the raster image, including any inset maps, profiles, statistical tables, directories, text, illustrations, index maps, legends, or other information associated with the principal map. This map shows features such as drainage, cities and other human settlements, territorial boundaries, shoreline features, and more. Relief shown by hachures. Depths shown by soundings. Includes routes, locations, and dates of James Cook's voyages. This layer is part of a selection of digitally scanned and georeferenced historic maps from the Harvard Map Collection and the Harvard University Library as part of the Open Collections Program at Harvard University project: Organizing Our World: Sponsored Exploration and Scientific Discovery in the Modern Age. Maps selected for the project correspond to various expeditions and represent a range of regions, originators, ground condition dates, scales, and purposes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This layer is a georeferenced raster image of the historic paper map entitled: Chart of the world on Mercators projection : exhibiting all the new discoveries to the present time, with the tracks of the most distinguished navigators since the year 1700 carefully collected from the best charts, maps, voyages, &c. extant and regulated from the accurate astronomical observations made in three voyages performed under the command of Captn. James Cook in the years 1768, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79 & 80, compiled and published by A. Arrowsmith, geographer; by permission of Simon McTavish Esq[r] is correctly delineated the discoveries of Mr. McKenzie laid down from his original journal in the year 1789. It was published by A. Arrowsmith, April 1, 1790. Scale [ca. 1:20,000,000]. This layer is image 3 of 8 total images of the seven sheet source map. Covers portions of South America, the South Pacific and the South Atlantic. The image inside the map neatline is georeferenced to the surface of the earth and fit to the 'World Mercator' projection. All map collar and inset information is also available as part of the raster image, including any inset maps, profiles, statistical tables, directories, text, illustrations, index maps, legends, or other information associated with the principal map. This map shows features such as drainage, cities and other human settlements, territorial boundaries, shoreline features, and more. Relief shown by hachures. Depths shown by soundings. Includes routes, locations, and dates of James Cook's voyages. This layer is part of a selection of digitally scanned and georeferenced historic maps from the Harvard Map Collection and the Harvard University Library as part of the Open Collections Program at Harvard University project: Organizing Our World: Sponsored Exploration and Scientific Discovery in the Modern Age. Maps selected for the project correspond to various expeditions and represent a range of regions, originators, ground condition dates, scales, and purposes.