937 resultados para data integration


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Query processing over the Internet involving autonomous data sources is a major task in data integration. It requires the estimated costs of possible queries in order to select the best one that has the minimum cost. In this context, the cost of a query is affected by three factors: network congestion, server contention state, and complexity of the query. In this paper, we study the effects of both the network congestion and server contention state on the cost of a query. We refer to these two factors together as system contention states. We present a new approach to determining the system contention states by clustering the costs of a sample query. For each system contention state, we construct two cost formulas for unary and join queries respectively using the multiple regression process. When a new query is submitted, its system contention state is estimated first using either the time slides method or the statistical method. The cost of the query is then calculated using the corresponding cost formulas. The estimated cost of the query is further adjusted to improve its accuracy. Our experiments show that our methods can produce quite accurate cost estimates of the submitted queries to remote data sources over the Internet.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Web databases are now pervasive. Such a database can be accessed via its query interface (usually HTML query form) only. Extracting Web query interfaces is a critical step in data integration across multiple Web databases, which creates a formal representation of a query form by extracting a set of query conditions in it. This paper presents a novel approach to extracting Web query interfaces. In this approach, a generic set of query condition rules are created to define query conditions that are semantically equivalent to SQL search conditions. Query condition rules represent the semantic roles that labels and form elements play in query conditions, and how they are hierarchically grouped into constructs of query conditions. To group labels and form elements in a query form, we explore both their structural proximity in the hierarchy of structures in the query form, which is captured by a tree of nested tags in the HTML codes of the form, and their semantic similarity, which is captured by various short texts used in labels, form elements and their properties. We have implemented the proposed approach and our experimental results show that the approach is highly effective.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Matching query interfaces is a crucial step in data integration across multiple Web databases. The problem is closely related to schema matching that typically exploits different features of schemas. Relying on a particular feature of schemas is not suffcient. We propose an evidential approach to combining multiple matchers using Dempster-Shafer theory of evidence. First, our approach views the match results of an individual matcher as a source of evidence that provides a level of confidence on the validity of each candidate attribute correspondence. Second, it combines multiple sources of evidence to get a combined mass function that represents the overall level of confidence, taking into account the match results of different matchers. Our combination mechanism does not require use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Finally it uses some heuristics to resolve any conflicts between the attribute correspondences of different source attributes. Our experimental results show that our approach is highly accurate and effective.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A exigente inovação na área das aplicações biomédicas tem guiado a evolução das tecnologias de informação nas últimas décadas. Os desafios associados a uma gestão, integração, análise e interpretação eficientes dos dados provenientes das mais modernas tecnologias de hardware e software requerem um esforço concertado. Desde hardware para sequenciação de genes a registos electrónicos de paciente, passando por pesquisa de fármacos, a possibilidade de explorar com precisão os dados destes ambientes é vital para a compreensão da saúde humana. Esta tese engloba a discussão e o desenvolvimento de melhores estratégias informáticas para ultrapassar estes desafios, principalmente no contexto da composição de serviços, incluindo técnicas flexíveis de integração de dados, como warehousing ou federação, e técnicas avançadas de interoperabilidade, como serviços web ou LinkedData. A composição de serviços é apresentada como um ideal genérico, direcionado para a integração de dados e para a interoperabilidade de software. Relativamente a esta última, esta investigação debruçou-se sobre o campo da farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições para este projeto, um novo standard de interoperabilidade e um motor de execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma plataforma para realizar estudos avançados de farmacovigilância. No contexto do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os desafios associados à integração de dados distribuídos e heterogéneos no campo do varíoma humano. Foi criada uma nova solução, WAVe - Web Analyses of the Variome, que fornece uma coleção rica de dados de variação genética através de uma interface Web inovadora e de uma API avançada. O desenvolvimento destas estratégias evidenciou duas oportunidades claras na área de software biomédico: melhorar o processo de implementação de software através do recurso a técnicas de desenvolvimento rápidas e aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do paradigma de web semântica. A plataforma COEUS atravessa as fronteiras de integração e interoperabilidade, fornecendo metodologias para a aquisição e tradução flexíveis de dados, bem como uma camada de serviços interoperáveis para explorar semanticamente os dados agregados. Combinando as técnicas de desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a box", a plataforma COEUS é uma aproximação pioneira, permitindo o desenvolvimento da próxima geração de aplicações biomédicas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Esta dissertação aborda a utilização de métodos geofísicos de superfície e de profundidade na caracterização de fluidos, compreendendo várias abordagens distintas, consoante o tipo de temática em análise. Assim, o assunto proposto para a presente dissertação relaciona-se com a utilização conjunta de sondagens mecânicas e prospecção geofísica na caracterização de fluidos através da exploração de três temáticas distintas, designadamente a Hidrogeologia, a Geotermia e o Ambiente. O presente trabalho de dissertação baseado na experiência profissional da autora, é feito de acordo com o estabelecimento de uma linha condutora, fundamentada e estruturada, para uma visão global do modo como a interpretação integrada das sondagens mecânicas e dos métodos geofísicos dá um forte contributo para a melhoria do conhecimento de uma região, seja ele geológico, hidrogeológico ou ambiental. Os temas a abordar foram seleccionados não só pela sua indiscutível importância, devido ao interesse estratégico que actualmente suscitam, como também pelos diferentes modos de processar e interpretar a informação geofísica adquirida consoante os objectivos da mesma, de forma a salientar particularidades na respectiva interpretação e integração de dados.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present work aims to achieve and further develop a hydrogeomechanical approach in Caldas da Cavaca hydromineral system rock mass (Aguiar da Beira, NW Portugal), and contribute to a better understanding of the hydrogeological conceptual site model. A collection of several data, namely geology, hydrogeology, rock and soil geotechnics, borehole hydraulics and hydrogeomechanics, was retrieved from three rock slopes (Lagoa, Amores and Cancela). To accomplish a comprehensive analysis and rock engineering conceptualisation of the site, a multi‐technical approach were used, such as, field and laboratory techniques, hydrogeotechnical mapping, hydrogeomechanical zoning and hydrogeomechanical scheme classifications and indexes. In addition, a hydrogeomechanical data analysis and assessment, such as Hydro‐Potential (HP)‐Value technique, JW Joint Water Reduction index, Hydraulic Classification (HC) System were applied on rock slopes. The hydrogeomechanical zone HGMZ 1 of Lagoa slope achieved higher hydraulic conductivities with poorer rock mass quality results, followed by the hydrogeomechanical zone HGMZ 2 of Lagoa slope, with poor to fair rock mass quality and lower hydraulic parameters. In addition, Amores slope had a fair to good rock mass quality and the lowest hydraulic conductivity. The hydrogeomechanical zone HGMZ 3 of Lagoa slope, and the hydrogeomechanical zones HGMZ 1 and HGMZ 2 of Cancela slope had a fair to poor rock mass quality but were completely dry. Geographical Information Systems (GIS) mapping technologies was used in overall hydrogeological and hydrogeomechanical data integration in order to improve the hydrogeological conceptual site model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To identify common variants influencing body mass index (BMI), we analyzed genome-wide association data from 16,876 individuals of European descent. After previously reported variants in FTO, the strongest association signal (rs17782313, P = 2.9 x 10(-6)) mapped 188 kb downstream of MC4R (melanocortin-4 receptor), mutations of which are the leading cause of monogenic severe childhood-onset obesity. We confirmed the BMI association in 60,352 adults (per-allele effect = 0.05 Z-score units; P = 2.8 x 10(-15)) and 5,988 children aged 7-11 (0.13 Z-score units; P = 1.5 x 10(-8)). In case-control analyses (n = 10,583), the odds for severe childhood obesity reached 1.30 (P = 8.0 x 10(-11)). Furthermore, we observed overtransmission of the risk allele to obese offspring in 660 families (P (pedigree disequilibrium test average; PDT-avg) = 2.4 x 10(-4)). The SNP location and patterns of phenotypic associations are consistent with effects mediated through altered MC4R function. Our findings establish that common variants near MC4R influence fat mass, weight and obesity risk at the population level and reinforce the need for large-scale data integration to identify variants influencing continuous biomedical traits.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A simple, low-cost concentric capillary nebulizer (CCN) was developed and evaluated for ICP spectrometry. The CCN could be operated at sample uptake rates of 0.050-1.00 ml min'^ and under oscillating and non-oscillating conditions. Aerosol characteristics for the CCN were studied using a laser Fraunhofter diffraction analyzer. Solvent transport efficiencies and transport rates, detection limits, and short- and long-term stabilities were evaluated for the CCN with a modified cyclonic spray chamber at different sample uptake rates. The Mg II (280.2nm)/l\/lg 1(285.2nm) ratio was used for matrix effect studies. Results were compared to those with conventional nebulizers, a cross-flow nebulizer with a Scott-type spray chamber, a GemCone nebulizer with a cyclonic spray chamber, and a Meinhard TR-30-K3 concentric nebulizer with a cyclonic spray chamber. Transport efficiencies of up to 57% were obtained for the CCN. For the elements tested, short- and long-term precisions and detection limits obtained with the CCN at 0.050-0.500 ml min'^ are similar to, or better than, those obtained on the same instrument using the conventional nebulizers (at 1.0 ml min'^). The depressive and enhancement effects of easily ionizable element Na, sulfuric acid, and dodecylamine surfactant on analyte signals with the CCN are similar to, or better than, those obtained with the conventional nebulizers. However, capillary clog was observed when the sample solution with high dissolved solids was nebulized for more than 40 min. The effects of data acquisition and data processing on detection limits were studied using inductively coupled plasma-atomic emission spectrometry. The study examined the effects of different detection limit approaches, the effects of data integration modes, the effects of regression modes, the effects of the standard concentration range and the number of standards, the effects of sample uptake rate, and the effect of Integration time. All the experiments followed the same protocols. Three detection limit approaches were examined, lUPAC method, the residual standard deviation (RSD), and the signal-to-background ratio and relative standard deviation of the background (SBR-RSDB). The study demonstrated that the different approaches, the integration modes, the regression methods, and the sample uptake rates can have an effect on detection limits. The study also showed that the different approaches give different detection limits and some methods (for example, RSD) are susceptible to the quality of calibration curves. Multicomponents spectral fitting (MSF) gave the best results among these three integration modes, peak height, peak area, and MSF. Weighted least squares method showed the ability to obtain better quality calibration curves. Although an effect of the number of standards on detection limits was not observed, multiple standards are recommended because they provide more reliable calibration curves. An increase of sample uptake rate and integration time could improve detection limits. However, an improvement with increased integration time on detection limits was not observed because the auto integration mode was used.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many online services access a large number of autonomous data sources and at the same time need to meet different user requirements. It is essential for these services to achieve semantic interoperability among these information exchange entities. In the presence of an increasing number of proprietary business processes, heterogeneous data standards, and diverse user requirements, it is critical that the services are implemented using adaptable, extensible, and scalable technology. The COntext INterchange (COIN) approach, inspired by similar goals of the Semantic Web, provides a robust solution. In this paper, we describe how COIN can be used to implement dynamic online services where semantic differences are reconciled on the fly. We show that COIN is flexible and scalable by comparing it with several conventional approaches. With a given ontology, the number of conversions in COIN is quadratic to the semantic aspect that has the largest number of distinctions. These semantic aspects are modeled as modifiers in a conceptual ontology; in most cases the number of conversions is linear with the number of modifiers, which is significantly smaller than traditional hard-wiring middleware approach where the number of conversion programs is quadratic to the number of sources and data receivers. In the example scenario in the paper, the COIN approach needs only 5 conversions to be defined while traditional approaches require 20,000 to 100 million. COIN achieves this scalability by automatically composing all the comprehensive conversions from a small number of declaratively defined sub-conversions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present a P2P-based database sharing system that provides information sharing capabilities through keyword-based search techniques. Our system requires neither a global schema nor schema mappings between different databases, and our keyword-based search algorithms are robust in the presence of frequent changes in the content and membership of peers. To facilitate data integration, we introduce keyword join operator to combine partial answers containing different keywords into complete answers. We also present an efficient algorithm that optimize the keyword join operations for partial answer integration. Our experimental study on both real and synthetic datasets demonstrates the effectiveness of our algorithms, and the efficiency of the proposed query processing strategies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The time-course of metabolic events following response to a model hepatotoxin ethionine (800 mg/kg) was investigated over a 7 day period in rats using high-resolution (1)H NMR spectroscopic analysis of urine and multivariate statistics. Complementary information was obtained by multivariate analysis of (1)H MAS NMR spectra of intact liver and by conventional histopathology and clinical chemistry of blood plasma. (1)H MAS NMR spectra of liver showed toxin-induced lipidosis 24 h postdose consistent with the steatosis observed by histopathology, while hypertaurinuria was suggestive of liver injury. Early biochemical changes in urine included elevation of guanidinoacetate, suggesting impaired methylation reactions. Urinary increases in 5-oxoproline and glycine suggested disruption of the gamma-glutamyl cycle. Signs of ATP depletion together with impairment of the energy metabolism were given from the decreased levels in tricarboxylic acid cycle intermediates, the appearance of ketone bodies in urine, the depletion of hepatic glucose and glycogen, and also hypoglycemia. The observed increase in nicotinuric acid in urine could be an indication of an increase in NAD catabolism, a possible consequence of ATP depletion. Effects on the gut microbiota were suggested by the observed urinary reductions in the microbial metabolites 3-/4-hydroxyphenyl propionic acid, dimethylamine, and tryptamine. At later stages of toxicity, there was evidence of kidney damage, as indicated by the tubular damage observed by histopathology, supported by increased urinary excretion of lactic acid, amino acids, and glucose. These studies have given new insights into mechanisms of ethionine-induced toxicity and show the value of multisystem level data integration in the understanding of experimental models of toxicity or disease.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Parasitic infections cause a myriad of responses in their mammalian hosts, on immune as well as on metabolic level. A multiplex panel of cytokines and metabolites derived from four parasite-rodent models, namely, Plasmodium berghei-mouse, Trypanosoma brucei brucei-mouse, Schistosoma mansoni-mouse, and Fasciola hepatica-rat were statistically coanalyzed. 1H NMR spectroscopy and multivariate statistical analysis were used to characterize the urine and plasma metabolite profiles in infected and noninfected animals. Each parasite generated a unique metabolic signature in the host. Plasma cytokine concentrations were obtained using the ‘Meso Scale Discovery’ multi cytokine assay platform. Multivariate data integration methods were subsequently used to elucidate the component of the metabolic signature which is associated with inflammation and to determine specific metabolic correlates with parasite-induced changes in plasma cytokine levels. For example, the relative levels of acetyl glycoproteins extracted from the plasma metabolite profile in the P. berghei-infected mice were statistically correlated with IFN-γ, whereas the same cytokine was anticorrelated with glucose levels. Both the metabolic and the cytokine data showed a similar spatial distribution in principal component analysis scores plots constructed for the combined murine data, with samples from all infected animals clustering according to the parasite species and whereby the protozoan infections (P. berghei and T. b. brucei) grouped separately from the helminth infection (S. mansoni). For S. mansoni, the main infection-responsive cytokines were IL-4 and IL-5, which covaried with lactate, choline, and D-3-hydroxybutyrate. This study demonstrates that the inherently differential immune response to single and multicellular parasites not only manifests in the cytokine expression, but also consequently imprints on the metabolic signature, and calls for in-depth analysis to further explore direct links between immune features and biochemical pathways.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nitrogen flows from European watersheds to coastal marine waters Executive summary Nature of the problem • Most regional watersheds in Europe constitute managed human territories importing large amounts of new reactive nitrogen. • As a consequence, groundwater, surface freshwater and coastal seawater are undergoing severe nitrogen contamination and/or eutrophication problems. Approaches • A comprehensive evaluation of net anthropogenic inputs of reactive nitrogen (NANI) through atmospheric deposition, crop N fixation,fertiliser use and import of food and feed has been carried out for all European watersheds. A database on N, P and Si fluxes delivered at the basin outlets has been assembled. • A number of modelling approaches based on either statistical regression analysis or mechanistic description of the processes involved in nitrogen transfer and transformations have been developed for relating N inputs to watersheds to outputs into coastal marine ecosystems. Key findings/state of knowledge • Throughout Europe, NANI represents 3700 kgN/km2/yr (range, 0–8400 depending on the watershed), i.e. five times the background rate of natural N2 fixation. • A mean of approximately 78% of NANI does not reach the basin outlet, but instead is stored (in soils, sediments or ground water) or eliminated to the atmosphere as reactive N forms or as N2. • N delivery to the European marine coastal zone totals 810 kgN/km2/yr (range, 200–4000 depending on the watershed), about four times the natural background. In areas of limited availability of silica, these inputs cause harmful algal blooms. Major uncertainties/challenges • The exact dimension of anthropogenic N inputs to watersheds is still imperfectly known and requires pursuing monitoring programmes and data integration at the international level. • The exact nature of ‘retention’ processes, which potentially represent a major management lever for reducing N contamination of water resources, is still poorly understood. • Coastal marine eutrophication depends to a large degree on local morphological and hydrographic conditions as well as on estuarine processes, which are also imperfectly known. Recommendations • Better control and management of the nitrogen cascade at the watershed scale is required to reduce N contamination of ground- and surface water, as well as coastal eutrophication. • In spite of the potential of these management measures, there is no choice at the European scale but to reduce the primary inputs of reactive nitrogen to watersheds, through changes in agriculture, human diet and other N flows related to human activity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Indian monsoon is an important component of Earth's climate system, accurate forecasting of its mean rainfall being essential for regional food and water security. Accurate measurement of the rainfall is essential for various water-related applications, the evaluation of numerical models and detection and attribution of trends, but a variety of different gridded rainfall datasets are available for these purposes. In this study, six gridded rainfall datasets are compared against the India Meteorological Department (IMD) gridded rainfall dataset, chosen as the most representative of the observed system due to its high gauge density. The datasets comprise those based solely on rain gauge observations and those merging rain gauge data with satellite-derived products. Various skill scores and subjective comparisons are carried out for the Indian region during the south-west monsoon season (June to September). Relative biases and skill metrics are documented at all-India and sub-regional scales. In the gauge-based (land-only) category, Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation of water resources (APHRODITE) and Global Precipitation Climatology Center (GPCC) datasets perform better relative to the others in terms of a variety of skill metrics. In the merged category, the Global Precipitation Climatology Project (GPCP) dataset is shown to perform better than the Climate Prediction Center Merged Analysis of Precipitation (CMAP) for the Indian monsoon in terms of various metrics, when compared with the IMD gridded data. Most of the datasets have difficulty in representing rainfall over orographic regions including the Western Ghats mountains, in north-east India and the Himalayan foothills. The wide range of skill scores seen among the datasets and even the change of sign of bias found in some years are causes of concern. This uncertainty between datasets is largest in north-east India. These results will help those studying the Indian monsoon region to select an appropriate dataset depending on their application and focus of research.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces an ontology-based knowledge model for knowledge management. This model can facilitate knowledge discovery that provides users with insight for decision making. The users requiring the insight normally play different roles with different requirements in an organisation. To meet the requirements, insights are created by purposely aggregated transnational data. This involves a semantic data integration process. In this paper, we present a knowledge management system which is capable of representing knowledge requirements in a domain context and enabling the semantic data integration through ontology modeling. The knowledge domain context of United Bible Societies is used to illustrate the features of the knowledge management capabilities.