872 resultados para heterogeneous data sources
Resumo:
A data warehouse is a data repository which collects and maintains a large amount of data from multiple distributed, autonomous and possibly heterogeneous data sources. Often the data is stored in the form of materialized views in order to provide fast access to the integrated data. One of the most important decisions in designing a data warehouse is the selection of views for materialization. The objective is to select an appropriate set of views that minimizes the total query response time with the constraint that the total maintenance time for these materialized views is within a given bound. This view selection problem is totally different from the view selection problem under the disk space constraint. In this paper the view selection problem under the maintenance time constraint is investigated. Two efficient, heuristic algorithms for the problem are proposed. The key to devising the proposed algorithms is to define good heuristic functions and to reduce the problem to some well-solved optimization problems. As a result, an approximate solution of the known optimization problem will give a feasible solution of the original problem. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
Este trabalho sugere uma solução de integração de dados em tempo real no contexto dos transportes públicos. Com o aumento das alternativas oferecidas aos utilizadores dos transportes públicos é importante que estes conheçam todas as alternativas com base em informação em tempo real para que realizem a escolha que melhor se enquadre às suas necessidades. Por outro lado, os operadores de transportes públicos deverão ser capazes de disponibilizar toda a informação pretendida com o mínimo de esforço ou de alterações ao sistema que têm implementado. Neste trabalho serão utilizadas ferramentas que permitem fornecer uma visão homogénea das várias fontes de dados heterogéneas, sendo essa homogeneidade o ponto de integração de todas as fontes de dados com as aplicações cliente.
Resumo:
A quantidade e variedade de conteúdos multimédia actualmente disponíveis cons- tituem um desafio para os utilizadores dado que o espaço de procura e escolha de fontes e conteúdos excede o tempo e a capacidade de processamento dos utilizado- res. Este problema da selecção, em função do perfil do utilizador, de informação em grandes conjuntos heterogéneos de dados é complexo e requer ferramentas específicas. Os Sistemas de Recomendação surgem neste contexto e são capazes de sugerir ao utilizador itens que se coadunam com os seus gostos, interesses ou necessidades, i.e., o seu perfil, recorrendo a metodologias de inteligência artificial. O principal objectivo desta tese é demonstrar que é possível recomendar em tempo útil conteúdos multimédia a partir do perfil pessoal e social do utilizador, recorrendo exclusivamente a fontes públicas e heterogéneas de dados. Neste sen- tido, concebeu-se e desenvolveu-se um Sistema de Recomendação de conteúdos multimédia baseado no conteúdo, i.e., nas características dos itens, no historial e preferências pessoais e nas interacções sociais do utilizador. Os conteúdos mul- timédia recomendados, i.e., os itens sugeridos ao utilizador, são provenientes da estação televisiva britânica, British Broadcasting Corporation (BBC), e estão classificados de acordo com as categorias dos programas da BBC. O perfil do utilizador é construído levando em conta o historial, o contexto, as preferências pessoais e as actividades sociais. O YouTube é a fonte do histo- rial pessoal utilizada, permitindo simular a principal fonte deste tipo de dados - a Set-Top Box (STB). O historial do utilizador é constituído pelo conjunto de vídeos YouTube e programas da BBC vistos pelo utilizador. O conteúdo dos vídeos do YouTube está classificado segundo as categorias de vídeo do próprio YouTube, sendo efectuado o mapeamento para as categorias dos programas da BBC. A informação social, que é proveniente das redes sociais Facebook e Twit- ter, é recolhida através da plataforma Beancounter. As actividades sociais do utilizador obtidas são filtradas para extrair os filmes e séries que são, por sua vez, enriquecidos semanticamente através do recurso a repositórios abertos de dados interligados. Neste caso, os filmes e séries são classificados através dos géneros da IMDb e, posteriormente, mapeados para as categorias de programas da BBC. Por último, a informação do contexto e das preferências explícitas, através da classificação dos itens recomendados, do utilizador são também contempladas. O sistema desenvolvido efectua recomendações em tempo real baseado nas actividades das redes sociais Facebook e Twitter, no historial de vídeos Youtube e de programas da BBC vistos e preferências explícitas. Foram realizados testes com cinco utilizadores e o tempo médio de resposta do sistema para criar o conjunto inicial de recomendações foi 30 s. As recomendações personalizadas são geradas e actualizadas mediante pedido expresso do utilizador.
Resumo:
Mathematical and computational models play an essential role in understanding the cellular metabolism. They are used as platforms to integrate current knowledge on a biological system and to systematically test and predict the effect of manipulations to such systems. The recent advances in genome sequencing techniques have facilitated the reconstruction of genome-scale metabolic networks for a wide variety of organisms from microbes to human cells. These models have been successfully used in multiple biotechnological applications. Despite these advancements, modeling cellular metabolism still presents many challenges. The aim of this Research Topic is not only to expose and consolidate the state-of-the-art in metabolic modeling approaches, but also to push this frontier beyond the current edge through the introduction of innovative solutions. The articles presented in this e-book address some of the main challenges in the field, including the integration of different modeling formalisms, the integration of heterogeneous data sources into metabolic models, explicit representation of other biological processes during phenotype simulation, and standardization efforts in the representation of metabolic models and simulation results.
Resumo:
Background: The G1-to-S transition of the cell cycle in the yeast Saccharomyces cerevisiae involves an extensive transcriptional program driven by transcription factors SBF (Swi4-Swi6) and MBF (Mbp1-Swi6). Activation of these factors ultimately depends on the G1 cyclin Cln3. Results: To determine the transcriptional targets of Cln3 and their dependence on SBF or MBF, we first have used DNA microarrays to interrogate gene expression upon Cln3 overexpression in synchronized cultures of strains lacking components of SBF and/or MBF. Secondly, we have integrated this expression dataset together with other heterogeneous data sources into a single probabilistic model based on Bayesian statistics. Our analysis has produced more than 200 transcription factor-target assignments, validated by ChIP assays and by functional enrichment. Our predictions show higher internal coherence and predictive power than previous classifications. Our results support a model whereby SBF and MBF may be differentially activated by Cln3. Conclusions: Integration of heterogeneous genome-wide datasets is key to building accurate transcriptional networks. By such integration, we provide here a reliable transcriptional network at the G1-to-S transition in the budding yeast cell cycle. Our results suggest that to improve the reliability of predictions we need to feed our models with more informative experimental data.
Resumo:
GridRM is an open and extensible resource monitoring system, based on the Global Grid Forum's Grid Monitoring Architecture (GMA). GridRM is not intended to interact with applications; rather it is designed to monitor the resources that an application may use. This paper focuses on the dynamic driver infrastructure used by GridRM to interact with heterogeneous data sources, such as SNMP or Ganglia agents, and how it provides a homogeneous view of the underlying heterogeneous data. This paper discusses the local infrastructure and details work implementing and deploying a number of drivers.
Resumo:
Durante el transcurso de esta Tesis Doctoral se ha realizado un estudio de la problemática asociada al desarrollo de sistemas de interacción hombre-máquina sensibles al contexto. Este problema se enmarca dentro de dos áreas de investigación: los sistemas interactivos y las fuentes de información contextual. Tradicionalmente la integración entre ambos campos se desarrollaba a través de soluciones verticales específicas, que abstraen a los sistemas interactivos de conocer los procedimientos de bajo nivel de acceso a la información contextual, pero limitan su interoperabilidad con otras aplicaciones y fuentes de información. Para solventar esta limitación se hace imprescindible potenciar soluciones interoperables que permitan acceder a la información del mundo real a través de procedimientos homogéneos. Esta problemática coincide perfectamente con los escenarios de \Computación Ubicua" e \Internet de las Cosas", donde se apunta a un futuro en el que los objetos que nos rodean serán capaces de obtener información del entorno y comunicarla a otros objetos y personas. Los sistemas interactivos, al ser capaces de obtener información de su entorno a través de la interacción con el usuario, pueden tomar un papel especial en este escenario tanto como consumidores como productores de información. En esta Tesis se ha abordado la integración de ambos campos teniendo en cuenta este escenario tecnológico. Para ello, en primer lugar se ha realizado un an álisis de las iniciativas más importantes para la definición y diseño de sistemas interactivos, y de las principales infraestructuras de suministro de información. Mediante este estudio se ha propuesto utilizar el lenguaje SCXML del W3C para el diseño de los sistemas interactivos y el procesamiento de los datos proporcionados por fuentes de contexto. Así, se ha reflejado cómo las capacidades del lenguaje SCXML para combinar información de diferentes modalidades pueden también utilizarse para procesar e integrar información contextual de diferentes fuentes heterogéneas, y por consiguiente diseñar sistemas de interacción sensibles al contexto. Del mismo modo se presenta a la iniciativa Sensor Web, y a su extensión semántica Semantic Sensor Web, como una iniciativa idónea para permitir un acceso y suministro homogéneo de la información a los sistemas interactivos sensibles al contexto. Posteriormente se han analizado los retos que plantea la integración de ambos tipos de iniciativas. Como resultado se ha conseguido establecer una serie de funcionalidades que son necesarias implementar para llevar a cabo esta integración. Utilizando tecnologías que aportan una gran flexibilidad al proceso de implementación y que se apoyan en recomendaciones y estándares actuales, se implementaron una serie de desarrollos experimentales que integraban las funcionalidades identificadas anteriormente. Finalmente, con el fin de validar nuestra propuesta, se realizaron un conjunto de experimentos sobre un entorno de experimentación que simula el escenario de la conducción. En este escenario un sistema interactivo se comunica con una extensión semántica de una plataforma basada en los estándares de la Sensor Web para poder obtener información y publicar las observaciones que el usuario realizaba al sistema. Los resultados obtenidos han demostrado la viabilidad de utilizar el lenguaje SCXML para el diseño de sistemas interactivos sensibles al contexto que requieren acceder a plataformas avanzadas de información para consumir y publicar información a la vez que interaccionan con el usuario. Del mismo modo, se ha demostrado cómo la utilización de tecnologías semánticas en los procesos de consulta y publicación de información puede facilitar la reutilización de la información publicada en infraestructuras Sensor Web por cualquier tipo de aplicación, y de este modo contribuir al futuro escenario de Internet de las Cosas. ABSTRACT In this Thesis, we have addressed the difficulties related to the development of context-aware human-machine interaction systems. This issue is part of two research fields: interactive systems and contextual information sources. Traditionally both fields have been integrated through domain-specific vertical solutions that allow interactive systems to access contextual information without having to deal with low-level procedures, but restricting their interoperability with other applications and heterogeneous data sources. Thus, it is essential to boost the research on interoperable solutions that provide access to real world information through homogeneous procedures. This issue perfectly matches with the scenarios of \Ubiquitous Computing" and \Internet of Things", which point toward a future in which many objects around us will be able to acquire meaningful information about the environment and communicate it to other objects and to people. Since interactive systems are able to get information from their environment through interaction with the user, they can play an important role in this scenario as they can both consume real-world data and produce enriched information. This Thesis deals with the integration of both fields considering this technological scenario. In order to do this, we first carried out an analysis of the most important initiatives for the definition and design of interactive systems, and the main infrastructures for providing information. Through this study the use of the W3C SCXML language is proposed for both the design of interactive systems and the processing of data provided by different context sources. Thus, this work has shown how the SCXML capabilities for combining information from different modalities can also be used to process and integrate contextual information from different heterogeneous sensor sources, and therefore to develope context-aware interaction systems. Similarly, we present the Sensor Web initiative, and its semantic extension Semantic Sensor Web, as an appropriate initiative to allow uniform access and delivery of information to the context-aware interactive systems. Subsequently we have analyzed the challenges of integrating both types of initiatives: SCXML and (Semantic) Sensor Web. As a result, we state a number of functionalities that are necessary to implement in order to perform this integration. By using technologies that provide exibility to the implementation process and are based on current recommendations and standards, we implemented a series of experimental developments that integrate the identified functionalities. Finally, in order to validate our approach, we conducted different experiments with a testing environment simulating a driving scenario. In this framework an interactive system can access a semantic extension of a Telco plataform, based on the standards of the Sensor Web, to acquire contextual information and publish observations that the user performed to the system. The results showed the feasibility of using the SCXML language for designing context-aware interactive systems that require access to advanced sensor platforms for consuming and publishing information while interacting with the user. In the same way, it was shown how the use of semantic technologies in the processes of querying and publication sensor data can assist in reusing and sharing the information published by any application in Sensor Web infrastructures, and thus contribute to realize the future scenario of \Internet of Things".
Resumo:
Durante los últimos años, el imparable crecimiento de fuentes de datos biomédicas, propiciado por el desarrollo de técnicas de generación de datos masivos (principalmente en el campo de la genómica) y la expansión de tecnologías para la comunicación y compartición de información ha propiciado que la investigación biomédica haya pasado a basarse de forma casi exclusiva en el análisis distribuido de información y en la búsqueda de relaciones entre diferentes fuentes de datos. Esto resulta una tarea compleja debido a la heterogeneidad entre las fuentes de datos empleadas (ya sea por el uso de diferentes formatos, tecnologías, o modelizaciones de dominios). Existen trabajos que tienen como objetivo la homogeneización de estas con el fin de conseguir que la información se muestre de forma integrada, como si fuera una única base de datos. Sin embargo no existe ningún trabajo que automatice de forma completa este proceso de integración semántica. Existen dos enfoques principales para dar solución al problema de integración de fuentes heterogéneas de datos: Centralizado y Distribuido. Ambos enfoques requieren de una traducción de datos de un modelo a otro. Para realizar esta tarea se emplean formalizaciones de las relaciones semánticas entre los modelos subyacentes y el modelo central. Estas formalizaciones se denominan comúnmente anotaciones. Las anotaciones de bases de datos, en el contexto de la integración semántica de la información, consisten en definir relaciones entre términos de igual significado, para posibilitar la traducción automática de la información. Dependiendo del problema en el que se esté trabajando, estas relaciones serán entre conceptos individuales o entre conjuntos enteros de conceptos (vistas). El trabajo aquí expuesto se centra en estas últimas. El proyecto europeo p-medicine (FP7-ICT-2009-270089) se basa en el enfoque centralizado y hace uso de anotaciones basadas en vistas y cuyas bases de datos están modeladas en RDF. Los datos extraídos de las diferentes fuentes son traducidos e integrados en un Data Warehouse. Dentro de la plataforma de p-medicine, el Grupo de Informática Biomédica (GIB) de la Universidad Politécnica de Madrid, en el cuál realicé mi trabajo, proporciona una herramienta para la generación de las necesarias anotaciones de las bases de datos RDF. Esta herramienta, denominada Ontology Annotator ofrece la posibilidad de generar de manera manual anotaciones basadas en vistas. Sin embargo, aunque esta herramienta muestra las fuentes de datos a anotar de manera gráfica, la gran mayoría de usuarios encuentran difícil el manejo de la herramienta , y pierden demasiado tiempo en el proceso de anotación. Es por ello que surge la necesidad de desarrollar una herramienta más avanzada, que sea capaz de asistir al usuario en el proceso de anotar bases de datos en p-medicine. El objetivo es automatizar los procesos más complejos de la anotación y presentar de forma natural y entendible la información relativa a las anotaciones de bases de datos RDF. Esta herramienta ha sido denominada Ontology Annotator Assistant, y el trabajo aquí expuesto describe el proceso de diseño y desarrollo, así como algunos algoritmos innovadores que han sido creados por el autor del trabajo para su correcto funcionamiento. Esta herramienta ofrece funcionalidades no existentes previamente en ninguna otra herramienta del área de la anotación automática e integración semántica de bases de datos. ---ABSTRACT---Over the last years, the unstoppable growth of biomedical data sources, mainly thanks to the development of massive data generation techniques (specially in the genomics field) and the rise of the communication and information sharing technologies, lead to the fact that biomedical research has come to rely almost exclusively on the analysis of distributed information and in finding relationships between different data sources. This is a complex task due to the heterogeneity of the sources used (either by the use of different formats, technologies or domain modeling). There are some research proyects that aim homogenization of these sources in order to retrieve information in an integrated way, as if it were a single database. However there is still now work to automate completely this process of semantic integration. There are two main approaches with the purpouse of integrating heterogeneous data sources: Centralized and Distributed. Both approches involve making translation from one model to another. To perform this task there is a need of using formalization of the semantic relationships between the underlying models and the main model. These formalizations are also calles annotations. In the context of semantic integration of the information, data base annotations consist on defining relations between concepts or words with the same meaning, so the automatic translation can be performed. Depending on the task, the ralationships can be between individuals or between whole sets of concepts (views). This paper focuses on the latter. The European project p-medicine (FP7-ICT-2009-270089) is based on the centralized approach. It uses view based annotations and RDF modeled databases. The data retireved from different data sources is translated and joined into a Data Warehouse. Within the p-medicine platform, the Biomedical Informatics Group (GIB) of the Polytechnic University of Madrid, in which I worked, provides a software to create annotations for the RDF sources. This tool, called Ontology Annotator, is used to create annotations manually. However, although Ontology Annotator displays the data sources graphically, most of the users find it difficult to use this software, thus they spend too much time to complete the task. For this reason there is a need to develop a more advanced tool, which would be able to help the user in the task of annotating p-medicine databases. The aim is automating the most complex processes of the annotation and display the information clearly and easy understanding. This software is called Ontology Annotater Assistant and this book describes the process of design and development of it. as well as some innovative algorithms that were designed by the author of the work. This tool provides features that no other software in the field of automatic annotation can provide.
Resumo:
Context: Empirical Software Engineering (ESE) replication researchers need to store and manipulate experimental data for several purposes, in particular analysis and reporting. Current research needs call for sharing and preservation of experimental data as well. In a previous work, we analyzed Replication Data Management (RDM) needs. A novel concept, called Experimental Ecosystem, was proposed to solve current deficiencies in RDM approaches. The empirical ecosystem provides replication researchers with a common framework that integrates transparently local heterogeneous data sources. A typical situation where the Empirical Ecosystem is applicable, is when several members of a research group, or several research groups collaborating together, need to share and access each other experimental results. However, to be able to apply the Empirical Ecosystem concept and deliver all promised benefits, it is necessary to analyze the software architectures and tools that can properly support it.
Resumo:
Integrating information in the molecular biosciences involves more than the cross-referencing of sequences or structures. Experimental protocols, results of computational analyses, annotations and links to relevant literature form integral parts of this information, and impart meaning to sequence or structure. In this review, we examine some existing approaches to integrating information in the molecular biosciences. We consider not only technical issues concerning the integration of heterogeneous data sources and the corresponding semantic implications, but also the integration of analytical results. Within the broad range of strategies for integration of data and information, we distinguish between platforms and developments. We discuss two current platforms and six current developments, and identify what we believe to be their strengths and limitations. We identify key unsolved problems in integrating information in the molecular biosciences, and discuss possible strategies for addressing them including semantic integration using ontologies, XML as a data model, and graphical user interfaces as integrative environments.
Resumo:
In this paper we propose algorithms for combining and ranking answers from distributed heterogeneous data sources in the context of a multi-ontology Question Answering task. Our proposal includes a merging algorithm that aggregates, combines and filters ontology-based search results and three different ranking algorithms that sort the final answers according to different criteria such as popularity, confidence and semantic interpretation of results. An experimental evaluation on a large scale corpus indicates improvements in the quality of the search results with respect to a scenario where the merging and ranking algorithms were not applied. These collective methods for merging and ranking allow to answer questions that are distributed across ontologies, while at the same time, they can filter irrelevant answers, fuse similar answers together, and elicit the most accurate answer(s) to a question.
Resumo:
This paper presents our Semantic Web portal infrastructure, which focuses on how to enhance knowledge access in traditional Web portals by gathering and exploiting semantic metadata. Special attention is paid to three important issues that affect the performance of knowledge access: i) high quality metadata acquisition, which concerns how to ensure high quality while gathering semantic metadata from heterogeneous data sources; ii) semantic search, which addresses how to meet the information querying needs of ordinary end users who are not necessarily familiar with the problem domain or the supported query language; and iii) semantic browsing, which concerns how to help users understand and explore the problem domain.
Resumo:
Because metadata that underlies semantic web applications is gathered from distributed and heterogeneous data sources, it is important to ensure its quality (i.e., reduce duplicates, spelling errors, ambiguities). However, current infrastructures that acquire and integrate semantic data have only marginally addressed the issue of metadata quality. In this paper we present our metadata acquisition infrastructure, ASDI, which pays special attention to ensuring that high quality metadata is derived. Central to the architecture of ASDI is a verification engine that relies on several semantic web tools to check the quality of the derived data. We tested our prototype in the context of building a semantic web portal for our lab, KMi. An experimental evaluation comparing the automatically extracted data against manual annotations indicates that the verification engine enhances the quality of the extracted semantic metadata.
Resumo:
Mediation techniques provide interoperability and support integrated query processing among heterogeneous databases. While such techniques help data sharing among different sources, they increase the risk for data security, such as violating access control rules. Successful protection of information by an effective access control mechanism is a basic requirement for interoperation among heterogeneous data sources. ^ This dissertation first identified the challenges in the mediation system in order to achieve both interoperability and security in the interconnected and collaborative computing environment, which includes: (1) context-awareness, (2) semantic heterogeneity, and (3) multiple security policy specification. Currently few existing approaches address all three security challenges in mediation system. This dissertation provides a modeling and architectural solution to the problem of mediation security that addresses the aforementioned security challenges. A context-aware flexible authorization framework was developed in the dissertation to deal with security challenges faced by mediation system. The authorization framework consists of two major tasks, specifying security policies and enforcing security policies. Firstly, the security policy specification provides a generic and extensible method to model the security policies with respect to the challenges posed by the mediation system. The security policies in this study are specified by 5-tuples followed by a series of authorization constraints, which are identified based on the relationship of the different security components in the mediation system. Two essential features of mediation systems, i. e., relationship among authorization components and interoperability among heterogeneous data sources, are the focus of this investigation. Secondly, this dissertation supports effective access control on mediation systems while providing uniform access for heterogeneous data sources. The dynamic security constraints are handled in the authorization phase instead of the authentication phase, thus the maintenance cost of security specification can be reduced compared with related solutions. ^
Resumo:
Our surrounding landscape is in a constantly dynamic state, but recently the rate of changes and their effects on the environment have considerably increased. In terms of the impact on nature, this development has not been entirely positive, but has rather caused a decline in valuable species, habitats, and general biodiversity. Regardless of recognizing the problem and its high importance, plans and actions of how to stop the detrimental development are largely lacking. This partly originates from a lack of genuine will, but is also due to difficulties in detecting many valuable landscape components and their consequent neglect. To support knowledge extraction, various digital environmental data sources may be of substantial help, but only if all the relevant background factors are known and the data is processed in a suitable way. This dissertation concentrates on detecting ecologically valuable landscape components by using geospatial data sources, and applies this knowledge to support spatial planning and management activities. In other words, the focus is on observing regionally valuable species, habitats, and biotopes with GIS and remote sensing data, using suitable methods for their analysis. Primary emphasis is given to the hemiboreal vegetation zone and the drastic decline in its semi-natural grasslands, which were created by a long trajectory of traditional grazing and management activities. However, the applied perspective is largely methodological, and allows for the application of the obtained results in various contexts. Models based on statistical dependencies and correlations of multiple variables, which are able to extract desired properties from a large mass of initial data, are emphasized in the dissertation. In addition, the papers included combine several data sets from different sources and dates together, with the aim of detecting a wider range of environmental characteristics, as well as pointing out their temporal dynamics. The results of the dissertation emphasise the multidimensionality and dynamics of landscapes, which need to be understood in order to be able to recognise their ecologically valuable components. This not only requires knowledge about the emergence of these components and an understanding of the used data, but also the need to focus the observations on minute details that are able to indicate the existence of fragmented and partly overlapping landscape targets. In addition, this pinpoints the fact that most of the existing classifications are too generalised as such to provide all the required details, but they can be utilized at various steps along a longer processing chain. The dissertation also emphases the importance of landscape history as an important factor, which both creates and preserves ecological values, and which sets an essential standpoint for understanding the present landscape characteristics. The obtained results are significant both in terms of preserving semi-natural grasslands, as well as general methodological development, giving support to science-based framework in order to evaluate ecological values and guide spatial planning.