889 resultados para heterogeneous data sources


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective To compare mortality burden estimates based on direct measurement of levels and causes in communities with indirect estimates based on combining health facility cause-specific mortality structures with community measurement of mortality levels. Methods. Data from sentinel vital registration (SVR) with verbal autopsy (VA) were used to determine the cause-specific mortality burden at the community level in two areas of the United Republic of Tanzania. Proportional cause-specific mortality structures from health facilities were applied to counts of deaths obtained by SVR to produce modelled estimates. The burden was expressed in years of life lost. Findings. A total of 2884 deaths were recorded from health facilities and 2167 recorded from SVR/VAs. In the perinatal and neonatal age group cause-specific mortality rates were dominated by perinatal conditions and stillbirths in both the community and the facility data. The modelled estimates for chronic causes were very similar to those from SVR/VA. Acute febrile illnesses were coded more specifically in the facility data than in the VA. Injuries were more prevalent in the SVR/VA data than in that from the facilities. Conclusion. In this setting, improved International classification of diseases and health related problems, tenth revision (ICD-10) coding practices and applying facility-based cause structures to counts of deaths from communities, derived from SVR, appears to produce reasonable estimates of the cause-specific mortality burden in those aged 5 years and older determined directly from VA. For the perinatal and neonatal age group, VA appears to be required. Use of this approach in a nationally representative sample of facilities may produce reliable national estimates of the cause-specific mortality burden for leading causes of death in adults.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Retrieving large amounts of information over wide area networks, including the Internet, is problematic due to issues arising from latency of response, lack of direct memory access to data serving resources, and fault tolerance. This paper describes a design pattern for solving the issues of handling results from queries that return large amounts of data. Typically these queries would be made by a client process across a wide area network (or Internet), with one or more middle-tiers, to a relational database residing on a remote server. The solution involves implementing a combination of data retrieval strategies, including the use of iterators for traversing data sets and providing an appropriate level of abstraction to the client, double-buffering of data subsets, multi-threaded data retrieval, and query slicing. This design has recently been implemented and incorporated into the framework of a commercial software product developed at Oracle Corporation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Heterogeneous and incomplete datasets are common in many real-world visualisation applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which was originally developed for complete continuous data, can be extended to model heterogeneous (i.e. containing both continuous and discrete values) and missing data. This paper describes and assesses the resulting model on both synthetic and real-world heterogeneous data with missing values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Use of modern object-oriented methods of designing of information systems (IS) both descriptions of interrelations IS and automated with its help business-processes of the enterprises leads to necessity of construction uniform complete IS on the basis of set of local models of such system. As a result of use of such approach there are the contradictions caused by inconsistency of actions of separate developers IS with each other and that is much more important, inconsistency of the points of view of separate users IS. Besides similar contradictions arise while in service IS at the enterprise because of constant change separate business- processes of the enterprise. It is necessary to note also, that now overwhelming majority IS is developed and maintained as set of separate functional modules. Each of such modules can function as independent IS. However the problem of integration of separate functional modules in uniform system can lead to a lot of problems. Among these problems it is possible to specify, for example, presence in modules of functions which are not used by the enterprise to destination, to complexity of information and program integration of modules of various manufacturers, etc. In most cases these contradictions and the reasons, their caused, are consequence of primary representation IS as equilibrium steady system. In work [1] representation IS as dynamic multistable system which is capable to carry out following actions has been considered:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research presented in this dissertation is comprised of several parts which jointly attain the goal of Semantic Distributed Database Management with Applications to Internet Dissemination of Environmental Data. ^ Part of the research into more effective and efficient data management has been pursued through enhancements to the Semantic Binary Object-Oriented database (Sem-ODB) such as more effective load balancing techniques for the database engine, and the use of Sem-ODB as a tool for integrating structured and unstructured heterogeneous data sources. Another part of the research in data management has pursued methods for optimizing queries in distributed databases through the intelligent use of network bandwidth; this has applications in networks that provide varying levels of Quality of Service or throughput. ^ The application of the Semantic Binary database model as a tool for relational database modeling has also been pursued. This has resulted in database applications that are used by researchers at the Everglades National Park to store environmental data and to remotely-sensed imagery. ^ The areas of research described above have contributed to the creation TerraFly, which provides for the dissemination of geospatial data via the Internet. TerraFly research presented herein ranges from the development of TerraFly's back-end database and interfaces, through the features that are presented to the public (such as the ability to provide autopilot scripts and on-demand data about a point), to applications of TerraFly in the areas of hazard mitigation, recreation, and aviation. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The mediator software architecture design has been developed to provide data integration and retrieval in distributed, heterogeneous environments. Since the initial conceptualization of this architecture, many new technologies have emerged that can facilitate the implementation of this design. The purpose of this thesis was to show that a mediator framework supporting users of mobile devices could be implemented using common software technologies available today. In addition, the prototype was developed with a view to providing a better understanding of what a mediator is and to expose issues that will have to be addressed in full, more robust designs. The prototype developed for this thesis was implemented using various technologies including: Java, XML, and Simple Object Access Protocol (SOAP) among others. SOAP was used to accomplish inter-process communication. In the end, it is expected that more data intensive software applications will be possible in a world with ever-increasing demands for information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data integration systems offer uniform access to a set of autonomous and heterogeneous data sources. One of the main challenges in data integration is reconciling semantic differences among data sources. Approaches that been used to solve this problem can be categorized as schema-based and attribute-based. Schema-based approaches use schema information to identify the semantic similarity in data; furthermore, they focus on reconciling types before reconciling attributes. In contrast, attribute-based approaches use statistical and structural information of attributes to identify the semantic similarity of data in different sources. This research examines an approach to semantic reconciliation based on integrating properties expressed at different levels of abstraction or granularity using the concept of property precedence. Property precedence reconciles the meaning of attributes by identifying similarities between attributes based on what these attributes represent in the real world. In order to use property precedence for semantic integration, we need to identify the precedence of attributes within and across data sources. The goal of this research is to develop and evaluate a method and algorithms that will identify precedence relations among attributes and build property precedence graph (PPG) that can be used to support integration.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article discusses event monitoring options for heterogeneous event sources as they are given in nowadays heterogeneous distributed information systems. It follows the central assumption, that a fully generic event monitoring solution cannot provide complete support for event monitoring; instead, event source specific semantics such as certain event types or support for certain event monitoring techniques have to be taken into account. Following from this, the core result of the work presented here is the extension of a configurable event monitoring (Web) service for a variety of event sources. A service approach allows us to trade genericity for the exploitation of source specific characteristics. It thus delivers results for the areas of SOA, Web services, CEP and EDA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation comprises three chapters. The first chapter motivates the use of a novel data set combining survey and administrative sources for the study of internal labor migration. By following a sample of individuals from the American Community Survey (ACS) across their employment outcomes over time according to the Longitudinal Employer-Household Dynamics (LEHD) database, I construct a measure of geographic labor mobility that allows me to exploit information about individuals prior to their move. This enables me to explore aspects of the migration decision, such as homeownership and employment status, in ways that have not previously been possible. In the second chapter, I use this data set to test the theory that falling home prices affect a worker’s propensity to take a job in a different metropolitan area from where he is currently located. Employing a within-CBSA and time estimation that compares homeowners to renters in their propensities to relocate for jobs, I find that homeowners who have experienced declines in the nominal value of their homes are approximately 12% less likely than average to take a new job in a location outside of the metropolitan area where they currently reside. This evidence is consistent with the hypothesis that housing lock-in has contributed to the decline in labor mobility of homeowners during the recent housing bust. The third chapter focuses on a sample of unemployed workers in the same data set, in order to compare the unemployment durations of those who find subsequent employment by relocating to a new metropolitan area, versus those who find employment in their original location. Using an instrumental variables strategy to address the endogeneity of the migration decision, I find that out-migrating for a new job significantly reduces the time to re-employment. These results stand in contrast to OLS estimates, which suggest that those who move have longer unemployment durations. This implies that those who migrate for jobs in the data may be particularly disadvantaged in their ability to find employment, and thus have strong short-term incentives to relocate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

GridRM is an open and extensible resource monitoring system, based on the Global Grid Forum's Grid Monitoring Architecture (GMA). GridRM is not intended to interact with applications; rather it is designed to monitor the resources that an application may use. This paper focuses on the dynamic driver infrastructure used by GridRM to interact with heterogeneous data sources, such as SNMP or Ganglia agents, and how it provides a homogeneous view of the underlying heterogeneous data. This paper discusses the local infrastructure and details work implementing and deploying a number of drivers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Integrating information in the molecular biosciences involves more than the cross-referencing of sequences or structures. Experimental protocols, results of computational analyses, annotations and links to relevant literature form integral parts of this information, and impart meaning to sequence or structure. In this review, we examine some existing approaches to integrating information in the molecular biosciences. We consider not only technical issues concerning the integration of heterogeneous data sources and the corresponding semantic implications, but also the integration of analytical results. Within the broad range of strategies for integration of data and information, we distinguish between platforms and developments. We discuss two current platforms and six current developments, and identify what we believe to be their strengths and limitations. We identify key unsolved problems in integrating information in the molecular biosciences, and discuss possible strategies for addressing them including semantic integration using ontologies, XML as a data model, and graphical user interfaces as integrative environments.