937 resultados para data integration


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces PartSS, a new partition-based fil- tering for tasks performing string comparisons under edit distance constraints. PartSS offers improvements over the state-of-the-art method NGPP with the implementation of a new partitioning scheme and also improves filtering abil- ities by exploiting theoretical results on shifting and scaling ranges, thus accelerating the rate of calculating edit distance between strings. PartSS filtering has been implemented within two major tasks of data integration: similarity join and approximate membership extraction under edit distance constraints. The evaluation on an extensive range of real-world datasets demonstrates major gain in efficiency over NGPP and QGrams approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We are pleased to present the papers from the Australasian Health Informatics and Knowledge Management (HIKM) conference stream held on 20 January 2011 in Perth as a session of the Australasian Computer Science Week (ASCW) 2011. Formerly HIKM was named Health Data and Knowledge Management, however the inclusion of the health informatics term is timely given the current health reform. The submissions to HIKM 2011 demonstrated that Australasian researchers lead with many research and development innovations coming to fruition. Some of these innovations can be seen here, and we believe further recognition will accomplish by continuation to HIKM in the future. The HIKM conference is a review of health informatics related research, development and education opportunities. The conference papers were written to communicate with other researchers and share research findings, capturing each and every aspect of the health informatics field. They are namely: conceptual models and architectures, privacy and quality of health data, health workflow management patient journey analysis, health information retrieval, analysis and visualisation, data integration/linking, systems for integrated or coordinated care, electronic health records (EHRs) and personally controlled electronic health records (PCEHRs), health data ontologies, and standardisation in health data and clinical applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The continuous growth of the XML data poses a great concern in the area of XML data management. The need for processing large amounts of XML data brings complications to many applications, such as information retrieval, data integration and many others. One way of simplifying this problem is to break the massive amount of data into smaller groups by application of clustering techniques. However, XML clustering is an intricate task that may involve the processing of both the structure and the content of XML data in order to identify similar XML data. This research presents four clustering methods, two methods utilizing the structure of XML documents and the other two utilizing both the structure and the content. The two structural clustering methods have different data models. One is based on a path model and other is based on a tree model. These methods employ rigid similarity measures which aim to identifying corresponding elements between documents with different or similar underlying structure. The two clustering methods that utilize both the structural and content information vary in terms of how the structure and content similarity are combined. One clustering method calculates the document similarity by using a linear weighting combination strategy of structure and content similarities. The content similarity in this clustering method is based on a semantic kernel. The other method calculates the distance between documents by a non-linear combination of the structure and content of XML documents using a semantic kernel. Empirical analysis shows that the structure-only clustering method based on the tree model is more scalable than the structure-only clustering method based on the path model as the tree similarity measure for the tree model does not need to visit the parents of an element many times. Experimental results also show that the clustering methods perform better with the inclusion of the content information on most test document collections. To further the research, the structural clustering method based on tree model is extended and employed in XML transformation. The results from the experiments show that the proposed transformation process is faster than the traditional transformation system that translates and converts the source XML documents sequentially. Also, the schema matching process of XML transformation produces a better matching result in a shorter time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The importance of passenger experience in aviation has become well understood in the last several years. It is now generally accepted that the provision of good passenger experience is not an option, but a necessity, from an aviation profitability perspective. In this paper, we paint a picture of the future passenger experience by consolidating a number of industry and research perspectives. Using the future passenger experience as a starting point, we explore the components needed to enable this future vision. From this bottom-up approach, we identify the need to resolve data formatting and data ownership issues. The resolution of these data integration issues is necessary to enable the seamless future travel experience that is envisioned by the aviation industry. By looking at the passenger experience from this bottom-up, data centric perspective, we identify a potential shift in the way that future passenger terminals will be designed. Whereas currently the design of terminals is largely an architectural practice, in the near future, the design of the terminal building may become more of a virtual technology practice. This of course will pose a new set of challenges to designers of airport terminal environments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Gene expression is one of the most critical factors influencing the phenotype of a cell. As a result of several technological advances, measuring gene expression levels has become one of the most common molecular biological measurements to study the behaviour of cells. The scientific community has produced enormous and constantly increasing collection of gene expression data from various human cells both from healthy and pathological conditions. However, while each of these studies is informative and enlighting in its own context and research setup, diverging methods and terminologies make it very challenging to integrate existing gene expression data to a more comprehensive view of human transcriptome function. On the other hand, bioinformatic science advances only through data integration and synthesis. The aim of this study was to develop biological and mathematical methods to overcome these challenges and to construct an integrated database of human transcriptome as well as to demonstrate its usage. Methods developed in this study can be divided in two distinct parts. First, the biological and medical annotation of the existing gene expression measurements needed to be encoded by systematic vocabularies. There was no single existing biomedical ontology or vocabulary suitable for this purpose. Thus, new annotation terminology was developed as a part of this work. Second part was to develop mathematical methods correcting the noise and systematic differences/errors in the data caused by various array generations. Additionally, there was a need to develop suitable computational methods for sample collection and archiving, unique sample identification, database structures, data retrieval and visualization. Bioinformatic methods were developed to analyze gene expression levels and putative functional associations of human genes by using the integrated gene expression data. Also a method to interpret individual gene expression profiles across all the healthy and pathological tissues of the reference database was developed. As a result of this work 9783 human gene expression samples measured by Affymetrix microarrays were integrated to form a unique human transcriptome resource GeneSapiens. This makes it possible to analyse expression levels of 17330 genes across 175 types of healthy and pathological human tissues. Application of this resource to interpret individual gene expression measurements allowed identification of tissue of origin with 92.0% accuracy among 44 healthy tissue types. Systematic analysis of transcriptional activity levels of 459 kinase genes was performed across 44 healthy and 55 pathological tissue types and a genome wide analysis of kinase gene co-expression networks was done. This analysis revealed biologically and medically interesting data on putative kinase gene functions in health and disease. Finally, we developed a method for alignment of gene expression profiles (AGEP) to perform analysis for individual patient samples to pinpoint gene- and pathway-specific changes in the test sample in relation to the reference transcriptome database. We also showed how large-scale gene expression data resources can be used to quantitatively characterize changes in the transcriptomic program of differentiating stem cells. Taken together, these studies indicate the power of systematic bioinformatic analyses to infer biological and medical insights from existing published datasets as well as to facilitate the interpretation of new molecular profiling data from individual patients.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The state of PICES science - 2004 (pdf 0.7 MB) 2004 Wooster Award (pdf 0.2 MB) Micronekton – What are they and why are they important? (pdf 0.5 MB) Upscaling for a better understanding of climate links to ecosystems (pdf 0.1 MB) PICES Interns (pdf 0.1 MB) Report of the APN workshop on “Climate interactions and marine ecosystems” (pdf 0.6 MB) Photo highlights of PICES XIII (pdf 0.3 MB) Recent trends in waters of the subarctic NE Pacific – summer 2004 (pdf 0.1 MB) The state of the western North Pacific in the first half of 2004 (pdf 0.3 MB) The Bering Sea: Current status and recent events (pdf 0.1 MB) Study Group on Fisheries Ecosystem Responses to Recent Regime Shifts completes its mandate for the provision of scientific advice (pdf 0.1 MB) PICES Calendar (pdf 0.1 MB) The new PICES Working Group on Ecosystem-based management (pdf 0.05 MB) CO2 data integration activity for the North Pacific (pdf 0.2 MB) Carbon cycle changes in the North Pacific (pdf 0.8 MB) New and upcoming PICES publications (pdf 0.8 MB)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Report of Opening Session (pdf 0.07 Mb) Report of Governing Council (pdf 0.2 Mb) Report of the Finance and Administration Committee (pdf 0.08 Mb) Reports of Science Board and Committees Science Board inter-sessional meeting (pdf 0.05 Mb) Science Board (pdf 0.1 Mb) Biological Oceanography Committee (pdf 0.1 Mb) Fishery Science Committee (pdf 0.04 Mb) Marine Environmental Quality Committee (pdf 0.04 Mb) Physical Oceanography and Climate Committee (pdf 0.04 Mb) Technical Committee on Data Exchange (pdf 0.04 Mb) Reports of Sections, Working and Study Groups Harmful Algal Blooms Section (pdf 0.03 Mb) Working Group 17 on Biogeochemical data integration and synthesis (pdf 0.03 Mb) Working Group 18 on Mariculture in the 21st century - The intersection between ecology, socio-economics and production (pdf 0.06 Mb) Study Group on Ecosystem-based management science and its application to the North Pacific (pdf 0.04 Mb) Reports of the Climate Change and Carrying Capacity Program Implementation Panel on the CCCC Program (pdf 0.04 Mb) BASS Task Team (pdf 0.04 Mb) CFAME Task Team (pdf 0.04 Mb) MODEL Task Team (pdf 0.04 Mb) MONITOR Task Team (pdf 0.03 Mb) REX Task Team (pdf 0.04 Mb) Reports of Advisory Panels Advisory Panel on Continuous Plankton Recorder Survey in the North Pacific (pdf 0.4 Mb) Advisory Panel on Iron Fertilization Experiment in the Subarctic Pacific Ocean (pdf 0.03 Mb) Advisory Panel on Marine Birds and Mammals (pdf 0.04 Mb) Advisory Panel on Micronekton Sampling Inter-Calibration experiment (pdf 0.04 Mb) Summary of Scientific Sessions and Workshops (pdf 0.2 Mb) Membership List (pdf 0.07 Mb) List of Participants (pdf 0.09 Mb) List of Acronyms (pdf 0.03 Mb)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Report of Opening Session (pdf 58 KB) Report of Governing Council Meeting (pdf 244 KB) Report of 2003 interim Governing Council meeting Tenth Anniversary PICES Organization Review Report of the Finance and Administration Committee (pdf 102 KB) 2002 Auditor's report to the Organization Review of PICES Publication Program Reports of Science Board and Committees: Science Board/Governing Council interim meeting (pdf 81 KB) Science Board (pdf 95 KB) Study Group on PICES Capacity Building Biological Oceanography Committee (pdf 65 KB) Advisory Panel on Micronekton sampling gear intercalibration experiment Advisory Panel on Marine birds and mammals Fishery Science Committee (pdf 41 KB) Working Group 16 on Climate change, shifts to fish production, and fisheries management Marine Environmental Quality Committee (pdf 76 KB) Working Group 15 on Ecology of Harmful Algal Blooms (HABs) in the North Pacific Physical Oceanography and Climate Committee (pdf 70 KB) Working Group 17 on Biogeochemical data integration and synthesis Advisory Panel on North Pacific Data Buoy Technical Committee on Data Exchange (pdf 32 KB) Implementation Panel on the CCCC Program (pdf 64 KB) Nemuro Experimental Planning Team (NEXT) BASS Task Team (pdf 35 KB) Advisory Panel on Iron Fertilization Experiment MODEL Task Team (pdf 29 KB) MONITOR Task Team (pdf 30KB) REX Task Team (pdf 25 KB) Documenting Scientific Sessions (pdf 164 KB) List of Participants (pdf 60 KB) List of Acronyms (pdf 21 KB)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Report of Opening Session (pdf 51 KB) Report of Governing Council Meeting(pdf 136 KB) Report of the Finance and Administration Committee (pdf 48 KB) Reports of Science Board and Committees: Science Board (pdf 71 KB) Biological Oceanography Committee (pdf 66 KB) Working Group 14: Effective sampling of micronekton Marine Birds and Mammals Advisory Panel Fishery Science Committee (pdf 36 KB) Working Group 16: Climate change, shifts to fish production, and fisheries management Marine Environmental Quality Committee (pdf 39 KB) Working Group 15: Ecology of Harmful Algal Blooms (HABs) in the North Pacific Physical Oceanography and Climate Committee (pdf 49 KB) North Pacific Data Buoy Advisory Panel Working Group 17: Biogeochemical data integration and synthesis Technical Committee on Data Exchange (pdf 29 KB) Implementation Panel on the CCCC Program (pdf 43 KB) BASS Task Team (pdf 30 KB) Iron Fertilization Experiment Advisory Panel MODEL Task Team (pdf 28 KB) MONITOR Task Team (pdf 34 KB) Summary of Continuous Plankton Recorder activities in 2002 REX Task Team (pdf 21 KB) Documenting Scientific Sessions (pdf 140 KB) List of Participants (pdf 59 KB) List of Acronyms (pdf 21 KB)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Executive Summary [pdf, 0.01 MB] Introduction [pdf, 0.01 MB] Synthesis of the WOCE/JGOFS global CO2 survey data in the North Pacific [pdf, 0.3 MB] Air-sea CO2 fluxes [pdf, 0.1 MB] DIC, TAlk and anthropogenic CO2 distributions in the North Pacific [pdf, 3 MB] Biogeochemical and global implications [pdf, 0.1 MB] Recommendations for the future of carbon studies within PICES [pdf, 0.1 MB] References [pdf, 0.1 MB] Appendix A. Summary of PICES Working Group 13 activities (1998-2001) [pdf, 0.1 MB] Appendix B. Results of Working Group 13 method inter-comparison studies [pdf, 0.6 MB] Appendix C. Results of Working Group 13 data integration workshops [pdf, 0.5 MB] (57 page document)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Os estudos de relação entre a paisagem e a água doce vêm sendo aprofundados pela comunidade científica e pelos propositores de políticas públicas, principalmente, para atender às demandas sobre as maneiras que este sistema ambiental pode ser alterado e na identificação das implicações políticas e ecológicas destas mudanças. Quanto mais se torna intenso e diversificado o uso dos corpos hídricos e da paisagem em bacias hidrográficas maior é a necessidade de se definir formas de planejamento, gerenciamento e gestão ecológica desses ecossistemas. O completo entendimento do funcionamento e dos processos ecológicos que ocorrem em uma bacia hidrográfica exige conhecimento simultâneo de seus sistemas aquáticos e terrestres, da biodiversidade, da fisiografia, da geologia e de sua conservação, temporal e espacial. Este entendimento e conhecimento da área de interesse são vitais para proposições de instrumentos ambientais legais, como Unidades de Conservação (UCs). É muito importante que a fundamentação destas propostas tenha como eixo principal o funcionamento dos ecossistemas e das paisagens, de forma a garantir uma maior conectividade e integração entre água (doce, salobra e salgada) e terra, e seus múltiplos usos. A presente tese foi desenvolvida com base neste contexto, apresentando e aplicando metodologias integradoras, seja na ecologia de paisagem (EP), seja na relação entre os ambientes dulcícola e terrestre. O objetivo principal deste trabalho foi o desenvolvimento de processos para planejamento ambiental em BHs, através do diagnóstico, compreensão e análise do funcionamento e dinâmica da paisagem e de ecossistemas de rios e córregos, apoiados no uso de geotecnologias. De acordo com os resultados obtidos, a BHGM ocupa uma área de 1260,36 km e 204,69 km de perímetro. É uma bacia com forma mais alongada que circular (KC = 1,6144e IC =0,4747 km/km) que indica uma menor susceptibilidade a enchentes em condições normais de precipitação exceto em eventos de intensidades anômalas. O mapeamento base (2007) realizado indicou que a bacia possuía 34,86% de uso antrópico e 64,04 % de remanescente florestal. Os dados de fitofisionomia potencial indicaram predominância da classe Florestas Ombrófila Densa Submontana (40%) e de Terras Baixas (39%). Foram estabelecidas para bacia 269 unidades de paisagem (integração da geomorfologia, geologia, fitofisionomia e uso da terra e cobertura vegetal (2007)) que junto com os dados de métrica de paisagem constituíram a proposta integrativa da tese para ecologia de paisagem. Em relação à qualidade ambiental foram adotados o índice de avaliação visual (IAV), o índice multimétrico físico-químico bacteriológico e o índice biótico estendido (IBE). A comparação entre estes índices demonstrou a confirmação entre os seus resultados para a maioria dos pontos amostrados nas áreas de referência e de pelo menos dois índices para os pontos intermediários e impactados. Foram propostos também dois cenários para a bacia: um considerando as condicionantes e medidas compensatórias vinculadas à licença prévia do complexo petroquímico do Estado do Rio de Janeiro (COMPERJ); e outro, sem considerar estas condições. O primeiro indicou a realização da restauração ecológica, seguindo as diretrizes do mapa síntese, integrada para restauração da paisagem.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nos depósitos cenozóicos da Bacia de Taubaté são encontrados depósitos de folhelhos betuminosos (oleígenos) pertencentes à Formação Tremembé, de idade oligocênica, que durante alguns anos na década de 50 foram investigados com relação ao seu aproveitamento econômico através da extração industrial do óleo nele contido. Entretanto, em face de aspectos tecnológicos e conjunturais da época, esse aproveitamento industrial foi considerado inviável do ponto de vista econômico. Nesta dissertação é proposta uma aplicação da perfilagem geofísica para a caracterização da faciologia dos folhelhos betuminosos da Formação Tremembé, tendo como objetivo principal a identificação das eletrofácies nos perfis elétricos, através de uma metodologia robusta e consistente. A identificação de eletrofácies é importante para ajudar na determinação da caracterização de uma reserva não convencional e na análise da viabilidade econômica. Neste estudo foram utilizados os perfis convencionais de poço: Raio gama, resitividade, potencial espontâneo e sônico. Os dados de perfis de poços foram integrados com os testemunhos e dados geoquímicos, mais precisamente os dados de COT, S, IH, S2 para uma caracterização realística das eletrofácies. Os dados foram obtidos a partir de três sondagens rotativas realizadas na Bacia de Taubaté, resultantes de testemunhagem contínua e perfilagem a cabo ao longo do intervalo de folhelhos da Formação Tremembé. A partir disto, obtém-se como resposta um modelo específico para cada litologia, onde cada pico corresponde a uma eletrofácies, permitindo o estabelecimento de alguns padrões ou assinaturas geofísicas para as principais fácies ocorrentes. Como resultado deste trabalho, foi possível correlacionar as eletrofácies entre os poços numa seção modelo, a partir de similaridade lateral das eletrofácies entre os marcos estratigráficos representado, foi possível observar a continuidade de duas sequências de folhelhos betuminoso com alto teores de COT, S, IH, S2, considerados os mais importantes do ponto de vista econômico e gerado um modelo faciológico 2D e 3D dessas camadas. Os resultados obtidos neste trabalho são bastante promissores, apontando para a possibilidade de aplicação desta técnica a outros poços da Bacia de Taubaté, fornecendo subsídios relevantes à determinação da evolução sedimentar.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The goals of the workshop were to: present verified data, draft a report containing visual descriptors and information on the structure and function of BOBLME ecosystems; identify information gaps and transboundary issues and develop capacity of national scientists on data integration techniques.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.