947 resultados para Integrated Data Repository
Resumo:
Esta dissertação incide sobre a problemática da construção de um data warehouse para a empresa AdClick que opera na área de marketing digital. O marketing digital é um tipo de marketing que utiliza os meios de comunicação digital, com a mesma finalidade do método tradicional que se traduz na divulgação de bens, negócios e serviços e a angariação de novos clientes. Existem diversas estratégias de marketing digital tendo em vista atingir tais objetivos, destacando-se o tráfego orgânico e tráfego pago. Onde o tráfego orgânico é caracterizado pelo desenvolvimento de ações de marketing que não envolvem quaisquer custos inerentes à divulgação e/ou angariação de potenciais clientes. Por sua vez o tráfego pago manifesta-se pela necessidade de investimento em campanhas capazes de impulsionar e atrair novos clientes. Inicialmente é feita uma abordagem do estado da arte sobre business intelligence e data warehousing, e apresentadas as suas principais vantagens as empresas. Os sistemas business intelligence são necessários, porque atualmente as empresas detêm elevados volumes de dados ricos em informação, que só serão devidamente explorados fazendo uso das potencialidades destes sistemas. Nesse sentido, o primeiro passo no desenvolvimento de um sistema business intelligence é concentrar todos os dados num sistema único integrado e capaz de dar apoio na tomada de decisões. É então aqui que encontramos a construção do data warehouse como o sistema único e ideal para este tipo de requisitos. Nesta dissertação foi elaborado o levantamento das fontes de dados que irão abastecer o data warehouse e iniciada a contextualização dos processos de negócio existentes na empresa. Após este momento deu-se início à construção do data warehouse, criação das dimensões e tabelas de factos e definição dos processos de extração e carregamento dos dados para o data warehouse. Assim como a criação das diversas views. Relativamente ao impacto que esta dissertação atingiu destacam-se as diversas vantagem a nível empresarial que a empresa parceira neste trabalho retira com a implementação do data warehouse e os processos de ETL para carregamento de todas as fontes de informação. Sendo que algumas vantagens são a centralização da informação, mais flexibilidade para os gestores na forma como acedem à informação. O tratamento dos dados de forma a ser possível a extração de informação a partir dos mesmos.
Resumo:
Mestrado em Engenharia Informática - Área de Especialização em Tecnologias do Conhecimento e Decisão
Resumo:
Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfilment of the requirements for the degree of Master in Computer Science
Resumo:
DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.
Resumo:
BACKGROUND. Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. RESULTS. To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. CONCLUSIONS. The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others).
Resumo:
A large percentage of bridges in the state of Iowa are classified as structurally or fiinctionally deficient. These bridges annually compete for a share of Iowa's limited transportation budget. To avoid an increase in the number of deficient bridges, the state of Iowa decided to implement a comprehensive Bridge Management System (BMS) and selected the Pontis BMS software as a bridge management tool. This program will be used to provide a selection of maintenance, repair, and replacement strategies for the bridge networks to achieve an efficient and possibly optimal allocation of resources. The Pontis BMS software uses a new rating system to evaluate extensive and detailed inspection data gathered for all bridge elements. To manually collect these data would be a highly time-consuming job. The objective of this work was to develop an automated-computerized methodology for an integrated data base that includes the rating conditions as defined in the Pontis program. Several of the available techniques that can be used to capture inspection data were reviewed, and the most suitable method was selected. To accomplish the objectives of this work, two userfriendly programs were developed. One program is used in the field to collect inspection data following a step-by-step procedure without the need to refer to the Pontis user's manuals. The other program is used in the office to read the inspection data and prepare input files for the Pontis BMS software. These two programs require users to have very limited knowledge of computers. On-line help screens as well as options for preparing, viewing, and printing inspection reports are also available. The developed data collection software will improve and expedite the process of conducting bridge inspections and preparing the required input files for the Pontis program. In addition, it will eliminate the need for large storage areas and will simplify retrieval of inspection data. Furthermore, the approach developed herein will facilitate transferring these captured data electronically between offices within the Iowa DOT and across the state.
Resumo:
Pablo de Castro, Director de GrandIR, describió la visión que el Grupo euroCRIS tiene de la infraestructura integrada de gestión de la información científica, compuesta por un sistema CRIS institucional, un repositorio de publicaciones y un repositorio de datos y software, y presentó el modelo de infraestructura integrada del Trinity College Dublin (TCD) como estudio de caso internacional. El sistema CRIS del TCD (TCD Research Support System o RSS), desde su primera versión en 2002, está basado en el estándar CERIF, un modelo de descripción de la actividad científica que está adquiriendo una progresiva relevancia como base de los sistemas CRIS en Europa, particularmente en el Reino Unido. Se citaron en la presentación los ensayos para incorporar CERIF al modelo de datos del software ePrints de repositorios, habilitándolo así para soportar parte de las tareas de recolección de información que realiza un CRIS, y la progresiva cobertura de CERIF a ámbitos tales como la gestión de datos de investigación.
Resumo:
MetaNetX is a repository of genome-scale metabolic networks (GSMNs) and biochemical pathways from a number of major resources imported into a common namespace of chemical compounds, reactions, cellular compartments-namely MNXref-and proteins. The MetaNetX.org website (http://www.metanetx.org/) provides access to these integrated data as well as a variety of tools that allow users to import their own GSMNs, map them to the MNXref reconciliation, and manipulate, compare, analyze, simulate (using flux balance analysis) and export the resulting GSMNs. MNXref and MetaNetX are regularly updated and freely available.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.
Resumo:
Problem: Dental radiographs generally display one or more findings/diagnoses, and are linked to a unique set of patient demographics, medical history and other findings not represented by the image. However, this information is not associated with radiographs in any type of meta format, and images are not searchable based on any clinical criteria (1,2). The purpose of this pilot study is to create an online, searchable data repository of dental radiographs to be used for patient care, teaching and research. [See PDF for complete abstract]
Resumo:
The oceans play a critical role in the Earth's climate, but unfortunately, the extent of this role is only partially understood. One major obstacle is the difficulty associated with making high-quality, globally distributed observations, a feat that is nearly impossible using only ships and other ocean-based platforms. The data collected by satellite-borne ocean color instruments, however, provide environmental scientists a synoptic look at the productivity and variability of the Earth's oceans and atmosphere, respectively, on high-resolution temporal and spatial scales. Three such instruments, the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) onboard ORBIMAGE's OrbView-2 satellite, and two Moderate Resolution Imaging Spectroradiometers (MODIS) onboard the National Aeronautic and Space Administration's (NASA) Terra and Aqua satellites, have been in continuous operation since September 1997, February 2000, and June 2002, respectively. To facilitate the assembly of a suitably accurate data set for climate research, members of the NASA Sensor Intercomparison and Merger for Biological and Interdisciplinary Oceanic Studies (SIMBIOS) Project and SeaWiFS Project Offices devote significant attention to the calibration and validation of these and other ocean color instruments. This article briefly presents results from the SIMBIOS and SeaWiFS Project Office's (SSPO) satellite ocean color validation activities and describes the SeaWiFS Bio-optical Archive and Storage System (SeaBASS), a state-of-the-art system for archiving, cataloging, and distributing the in situ data used in these activities.
Resumo:
Fluctuations in oxygen (d18O) and carbon (d13C) isotope values of benthic foraminiferal calcite from the tropical Pacific and Southern Oceans indicate rapid reversals in the dominant mode and direction of the thermohaline circulation during a 1 m.y. interval (71-70 Ma) in the Maastrichtian. At the onset of this change, benthic foraminiferal d18O values increased and were highest in low-latitude Pacific Ocean waters, whereas benthic and planktic foraminiferal d13C values decreased and benthic values were lowest in the Southern Ocean. Subsequently, benthic foraminiferal d18O values in the Indo-Pacific decreased, and benthic and planktic d13C values increased globally. These isotopic patterns suggest that cool intermediate-depth waters, derived from high-latitude regions, penetrated temporarily to the tropics. The low benthic d13C values at the Southern Ocean sites, however, suggest that these cool waters may have been derived from high northern rather than high southern latitudes. Correlation with eustatic sea-level curves suggests that sea-level change was the most likely mechanism to change the circulation and/or source(s) of intermediate-depth waters. We thus propose that oceanic circulation during the latest Cretaceous was vigorous and that competing sources of intermediate- and deep-water formation, linked to changes in climate and sea level, may have alternated in importance.
Resumo:
Differences in regional responses to climate fluctuations are well documented on short time scales (e.g., El Niño-Southern Oscillation), but with the exception of latitudinal temperature gradients, regional patterns are seldom considered in discussions of ancient greenhouse climates. Contrary to the expectation of global warming or global cooling implicit in most treatments of climate evolution over millions of years, this paper shows that the North Atlantic warmed by as much as 6°C (1.5% decrease in d18O values of planktic foraminifera) during the Maastrichtian global cooling interval. We suggest that warming was the result of the importation of heat from the South Atlantic. Decreasing North Atlantic d18O values are also associated with increasing gradients in planktic d13C values, suggesting increasing surface-water stratification and a correlated strengthening of the North Atlantic Polar Front. If correct, this conclusion predicts arctic cooling during the late Maastrichtian. Beyond implications for the Maastrichtian, these data demonstrate that climate does not behave as if there is a simple global thermostat, even on geologic time scales.