963 resultados para integrating data
Resumo:
DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.
Resumo:
The recent liberalization of the German energy market has forced the energy industry to develop and install new information systems to support agents on the energy trading floors in their analytical tasks. Besides classical approaches of building a data warehouse giving insight into the time series to understand market and pricing mechanisms, it is crucial to provide a variety of external data from the web. Weather information as well as political news or market rumors are relevant to give the appropriate interpretation to the variables of a volatile energy market. Starting from a multidimensional data model and a collection of buy and sell transactions a data warehouse is built that gives analytical support to the agents. Following the idea of web farming we harvest the web, match the external information sources after a filtering and evaluation process to the data warehouse objects, and present this qualified information on a user interface where market values are correlated with those external sources over the time axis.
Resumo:
Replication Data Management (RDM) aims at enabling the use of data collections from several iterations of an experiment. However, there are several major challenges to RDM from integrating data models and data from empirical study infrastructures that were not designed to cooperate, e.g., data model variation of local data sources. [Objective] In this paper we analyze RDM needs and evaluate conceptual RDM approaches to support replication researchers. [Method] We adapted the ATAM evaluation process to (a) analyze RDM use cases and needs of empirical replication study research groups and (b) compare three conceptual approaches to address these RDM needs: central data repositories with a fixed data model, heterogeneous local repositories, and an empirical ecosystem. [Results] While the central and local approaches have major issues that are hard to resolve in practice, the empirical ecosystem allows bridging current gaps in RDM from heterogeneous data sources. [Conclusions] The empirical ecosystem approach should be explored in diverse empirical environments.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
The amount and quality of available biomass is a key factor for the sustainable livestock industry and agricultural management related decision making. Globally 31.5% of land cover is grassland while 80% of Ireland’s agricultural land is grassland. In Ireland, grasslands are intensively managed and provide the cheapest feed source for animals. This dissertation presents a detailed state of the art review of satellite remote sensing of grasslands, and the potential application of optical (Moderate–resolution Imaging Spectroradiometer (MODIS)) and radar (TerraSAR-X) time series imagery to estimate the grassland biomass at two study sites (Moorepark and Grange) in the Republic of Ireland using both statistical and state of the art machine learning algorithms. High quality weather data available from the on-site weather station was also used to calculate the Growing Degree Days (GDD) for Grange to determine the impact of ancillary data on biomass estimation. In situ and satellite data covering 12 years for the Moorepark and 6 years for the Grange study sites were used to predict grassland biomass using multiple linear regression, Neuro Fuzzy Inference Systems (ANFIS) models. The results demonstrate that a dense (8-day composite) MODIS image time series, along with high quality in situ data, can be used to retrieve grassland biomass with high performance (R2 = 0:86; p < 0:05, RMSE = 11.07 for Moorepark). The model for Grange was modified to evaluate the synergistic use of vegetation indices derived from remote sensing time series and accumulated GDD information. As GDD is strongly linked to the plant development, or phonological stage, an improvement in biomass estimation would be expected. It was observed that using the ANFIS model the biomass estimation accuracy increased from R2 = 0:76 (p < 0:05) to R2 = 0:81 (p < 0:05) and the root mean square error was reduced by 2.72%. The work on the application of optical remote sensing was further developed using a TerraSAR-X Staring Spotlight mode time series over the Moorepark study site to explore the extent to which very high resolution Synthetic Aperture Radar (SAR) data of interferometrically coherent paddocks can be exploited to retrieve grassland biophysical parameters. After filtering out the non-coherent plots it is demonstrated that interferometric coherence can be used to retrieve grassland biophysical parameters (i. e., height, biomass), and that it is possible to detect changes due to the grass growth, and grazing and mowing events, when the temporal baseline is short (11 days). However, it not possible to automatically uniquely identify the cause of these changes based only on the SAR backscatter and coherence, due to the ambiguity caused by tall grass laid down due to the wind. Overall, the work presented in this dissertation has demonstrated the potential of dense remote sensing and weather data time series to predict grassland biomass using machine-learning algorithms, where high quality ground data were used for training. At present a major limitation for national scale biomass retrieval is the lack of spatial and temporal ground samples, which can be partially resolved by minor modifications in the existing PastureBaseIreland database by adding the location and extent ofeach grassland paddock in the database. As far as remote sensing data requirements are concerned, MODIS is useful for large scale evaluation but due to its coarse resolution it is not possible to detect the variations within the fields and between the fields at the farm scale. However, this issue will be resolved in terms of spatial resolution by the Sentinel-2 mission, and when both satellites (Sentinel-2A and Sentinel-2B) are operational the revisit time will reduce to 5 days, which together with Landsat-8, should enable sufficient cloud-free data for operational biomass estimation at a national scale. The Synthetic Aperture Radar Interferometry (InSAR) approach is feasible if there are enough coherent interferometric pairs available, however this is difficult to achieve due to the temporal decorrelation of the signal. For repeat-pass InSAR over a vegetated area even an 11 days temporal baseline is too large. In order to achieve better coherence a very high resolution is required at the cost of spatial coverage, which limits its scope for use in an operational context at a national scale. Future InSAR missions with pair acquisition in Tandem mode will minimize the temporal decorrelation over vegetation areas for more focused studies. The proposed approach complements the current paradigm of Big Data in Earth Observation, and illustrates the feasibility of integrating data from multiple sources. In future, this framework can be used to build an operational decision support system for retrieval of grassland biophysical parameters based on data from long term planned optical missions (e. g., Landsat, Sentinel) that will ensure the continuity of data acquisition. Similarly, Spanish X-band PAZ and TerraSAR-X2 missions will ensure the continuity of TerraSAR-X and COSMO-SkyMed.
Resumo:
Calcium has a pivotal role in biological functions, and serum calcium levels have been associated with numerous disorders of bone and mineral metabolism, as well as with cardiovascular mortality. Here we report results from a genome-wide association study of serum calcium, integrating data from four independent cohorts including a total of 12,865 individuals of European and Indian Asian descent. Our meta-analysis shows that serum calcium is associated with SNPs in or near the calcium-sensing receptor (CASR) gene on 3q13. The top hit with a p-value of 6.3 x 10(-37) is rs1801725, a missense variant, explaining 1.26% of the variance in serum calcium. This SNP had the strongest association in individuals of European descent, while for individuals of Indian Asian descent the top hit was rs17251221 (p = 1.1 x 10(-21)), a SNP in strong linkage disequilibrium with rs1801725. The strongest locus in CASR was shown to replicate in an independent Icelandic cohort of 4,126 individuals (p = 1.02 x 10(-4)). This genome-wide meta-analysis shows that common CASR variants modulate serum calcium levels in the adult general population, which confirms previous results in some candidate gene studies of the CASR locus. This study highlights the key role of CASR in calcium regulation.
Resumo:
The diagnostic approach to diffuse parenchymal lung disease (DPLD) and especially to the idiopathic interstitial pneumonias has changed over the last two decades, mostly thanks to the development of high resolution CT. Though far from replacing pathology, this additional tool has contributed to the definition of new and more precise diagnostic criteria especially for idiopathic interstitial pneumonias, integrating data provided by the three main contributors: lung specialist, radiologist and pathologist. The purpose of this article is to review the role of histopathology in the multidisciplinary approach of the diagnosis of DPLD and idiopathic interstitial pneumonias.
Resumo:
At many institutions, program review is an underproductive exercise. Review of existing programs is often a check-the-box formality, with inconsistent criteria and little connection to institutional priorities or funding considerations. Decisions about where to concentrate resources across the portfolio can be highly politicized. This report profiles how academic planning exemplars use program review as a strategic tool, integrating data on academic quality, student demand, and resource utilization to improve the economics of challenged programs and prioritize programs for investment and expansion.
Resumo:
A Amazônia tem sido observada principalmente por meio do fenômeno do desmatamento, utilizando recursos tradicionais do sensoriamento remoto como a quantificação de área desflorestada e seu posterior incremento anual, que parece se constituir em uma metodologia eficaz. Ratificando este raciocínio, constatei num levantamento de 16.591 multas aplicadas pelo IBAMA/PA no período de 2000 até 2008, que mais de 85,0% das autuações estavam relacionadas apenas à componente flora;e na jurisdição da Gerência do IBAMA em Santarém, oeste do Pará, em 2008, quase 60% das multas se deu por conta de desflorestamento, identificados via sensoriamento remoto. Ressalta-se que as análises de imagens de satélites por si só não definem elementos da superfície terrestre, contribuindo pouco para o entendimento e posterior intervenção da realidade. Nesse contexto, foram investigados e vetorizados 479 estabelecimentos rurais nas regiões de Paragominas e Santarém, estado do Pará, que possuem históricos de uso e ocupação distintos, qualificando-os segundo suas trajetórias tecnológicas prevalentes, na perspectiva apresentada por Costa, concretizando um importante passo para correção das distorções no desenvolvimento econômico, agregando informação ao dado de sensoriamento remoto. Aplicaram-se recursos geotecnológicos de métricas de paisagem, construíram-se banco de dados celular integrado com estatísticas e algoritmos de otimização probabilística, associando dados de classificação não supervisionada isodata (validadas com kappa= 0,87, classificação considerada“excelente”) com os tipos de produção coletados em campo, gerando uma classificação final "integrada"(kappa= 0,78, classificação “muito boa”). Na região de Paragominas, foram qualificadas 3 tipos de trajetórias tecnológicas, a camponesa T8 (domínio de culturas temporárias), a camponesa T3 e patronal T4 (especializadas em pecuária de corte). Em Santarém, revelaram-se 2 trajetórias, a camponesa T2 (forte presença de culturas permanentes, temporárias e sistemas agro florestais) e a patronal T7 (mutação da T4, aumentando a participação das culturas temporárias). A metodologia aplicada logrou êxito, espacializando as propriedades rurais segundo seus tipos de trajetórias tecnológicas e gerando classes de uso mais representativas, como cultura temporária e pastagem, mas que na classificação via sensoriamento remoto isodata é englobada pela classe “agropecuária”, possibilitando uma visão mais realista das atividades de produção desenvolvidas na área investigada, concretizando a geração de informações espaciais integrando dados de diferentes fontes e o aumento do poder de leitura do pixel.
Resumo:
This study aimed to perform the analysis and characterization of environmental sensitivity to oil from Baixada Santista in the State of São Paulo. The work was done by integrating data of physical environmet, socio-economic activities and the biological / ecological provided by the Research Group on Environmental Sensitivity to Oil Spill in the Institute of Geosciences and Exact Sciences of UNESP “Julio de Mesquita Filho” wich works in the conjunction with the Program of Human Resources Training in Geosciences and Environmental Sciences Applied to the Oil and Gas (PRH-05) of the National Petroleum Agency (ANP). Were also performed descriptive statistical analysis, based on dispersion and trend parameters, which allowed to answer questions related to environmental sensitivity index (ISL) prevalence, the predominance of the ISL by environment and the predominance of the environments of the area, thus providing an overview of Baixada Santista’s main towns about sensitivity to oil. Analyses performed in this study may also help mitigate the environmental and socioeconomic impacts and contribute to contingency plans development for Baixada Santista
Resumo:
Pós-graduação em Ciência da Informação - FFC
Resumo:
We investigated the seasonal patterns of Amazonian forest photosynthetic activity, and the effects thereon of variations in climate and land-use, by integrating data from a network of ground-based eddy flux towers in Brazil established as part of the ‘Large-Scale Biosphere Atmosphere Experiment in Amazonia’ project. We found that degree of water limitation, as indicated by the seasonality of the ratio of sensible to latent heat flux (Bowen ratio) predicts seasonal patterns of photosynthesis. In equatorial Amazonian forests (5◦ N–5◦ S), water limitation is absent, and photosynthetic fluxes (or gross ecosystem productivity, GEP) exhibit high or increasing levels of photosynthetic activity as the dry season progresses, likely a consequence of allocation to growth of new leaves. In contrast, forests along the southern flank of the Amazon, pastures converted from forest, and mixed forest-grass savanna, exhibit dry-season declines in GEP, consistent with increasing degrees of water limitation. Although previous work showed tropical ecosystem evapotranspiration (ET) is driven by incoming radiation, GEP observations reported here surprisingly show no or negative relationships with photosynthetically active radiation (PAR). Instead, GEP fluxes largely followed the phenology of canopy photosynthetic capacity (Pc), with only deviations from this primary pattern driven by variations in PAR. Estimates of leaf flush at three
Resumo:
El objetivo del Proyecto Fin de Carrera (PFC) es el de conocer, simular y crear una red VoIP sobre una red de datos en un entorno docente, más concretamente, en la asignatura Redes y Servicios de telecomunicación en Grado en Ingeniería de Telecomunicaciones en la Universidad Politécnica de Madrid (UPM). Una vez se adquieran los conocimientos necesarios, se propondrán una serie de prácticas para que los alumnos se vayan familiarizando con el software y hardware utilizados, de manera que, se irá subiendo el grado de dificultad hasta que puedan realizar una auténtica red VoIP por sí mismos. A parte de la realización de las prácticas, los alumnos deberán pasar una prueba de los conocimientos adquiridos al final de cada práctica mediante preguntas tipo test. Los sistemas elegidos para la implantación de una red VoIP en los módulos de laboratorio son: 3CX System Phone y Asteisk-Trixbox. Los cuales, son capaces de trabajar mediante gestores gráficos para simplificar el nivel de dificultad de la configuración. 3CX es una PBX que trabaja sobre Windows y se basa exclusivamente en el protocolo SIP. Esto facilita el manejo para usuarios que solo han usado Windows sin quitar funcionalidades que tienen otras centralitas en otros sistemas operativos. La versión demo activa todas las opciones para poder familiarizarse con este sistema. Por otro lado, Asterisk trabaja en todas las plataformas, aunque se ha seleccionado trabajar sobre Linux. Esta selección se ha realizado porque el resto de plataformas limitan la configuración de la IP PBX, esta es de código abierto y permite realizar todo tipo de configuraciones. Además, es un software gratuito, esto es una ventaja a la hora de configurar novedades o resolver problemas, ya que hay muchos especialistas que dan soporte y ayudan de forma gratuita. La voz sobre Internet es habitualmente conocida como VoIP (Voice Over IP), debido a que IP (Internet Protocol) es el protocolo de red de Internet. Como tecnología, la VoIP no es solo un paso más en el crecimiento de las comunicaciones por voz, sino que supone integrar las comunicaciones de datos y las de voz en una misma red, y en concreto, en la red con mayor cobertura mundial: Internet. La mayor importancia y motivación de este Proyecto Fin de Carrera es que el alumno sea capaz de llegar a un entorno laboral y pueda tener unos conocimientos capaces de afrontar esta tecnología que esta tan a la orden del día. La importancia que estas redes tienen y tendrán en un futuro muy próximo en el mundo de la informática y las comunicaciones. Cabe decir, que se observa que estas disciplinas tecnológicas evolucionan a pasos agigantados y se requieren conocimientos más sólidos. ABSTRACT. The objective of my final project during my studies in university was, to simulate and create a VoIP network over a data network in a teaching environment, more specifically on the subject of telecommunications networks and services in Telecommunication Engineering Degree in Polytechnic University of Madrid (UPM). Once acquiring the necessary knowledge a number of practices were proposed to the students to become familiar with the software and hardware used, so that it would rise to the level of difficulty that they could make a real VoIP network for themselves. Parts of the experimental practices were that students must pass a test of knowledge acquired at the end of each practice by choice questions. The systems chosen for the implementation of a VoIP network in the laboratory modules are: 3CX Phone System and Asteisk - Trixbox. Which were able to work with graphics operators to simplify the difficulty level of the configuration. 3CX is a PBX that works on Windows and is based solely on the SIP protocol. This facilitates handling for users who have only used Windows without removing functionality with other exchanges in other operating systems. Active demo version all options to get to grips with this system. Moreover, Asterisk works on all platforms, but has been selected to work on Linux. This selection was made because other platforms limit the IP PBX configuration, as this is open source and allows all kinds of configurations. Also, Linux is a free software and an advantage when configuring new or solve problems, as there are many specialists that support and help for free. Voice over Internet is commonly known as VoIP (Voice Over IP), because IP (Internet Protocol) is the Internet protocol network. As technology, VoIP is not just another step in the growth of voice communications, but communications of integrating data and voice on a single network, and in particular, in the network with the largest global coverage: Internet. The increased importance and motivation of this Thesis is that the student is able to reach a working environment and may have some knowledge to deal with these technologies that is so much the order of the day. The importances of these networks have and will be of essences in the very near future in the world of computing and communications. It must be said it is observed that these technological disciplines evolve by leaps and bounds stronger knowledge required.
Resumo:
Parte de la investigación biomédica actual se encuentra centrada en el análisis de datos heterogéneos. Estos datos pueden tener distinto origen, estructura, y semántica. Gran cantidad de datos de interés para los investigadores se encuentran en bases de datos públicas, que recogen información de distintas fuentes y la ponen a disposición de la comunidad de forma gratuita. Para homogeneizar estas fuentes de datos públicas con otras de origen privado, existen diversas herramientas y técnicas que permiten automatizar los procesos de homogeneización de datos heterogéneos. El Grupo de Informática Biomédica (GIB) [1] de la Universidad Politécnica de Madrid colabora en el proyecto europeo P-medicine [2], cuya finalidad reside en el desarrollo de una infraestructura que facilite la evolución de los procedimientos médicos actuales hacia la medicina personalizada. Una de las tareas enmarcadas en el proyecto P-medicine que tiene asignado el grupo consiste en elaborar herramientas que ayuden a usuarios en el proceso de integración de datos contenidos en fuentes de información heterogéneas. Algunas de estas fuentes de información son bases de datos públicas de ámbito biomédico contenidas en la plataforma NCBI [3] (National Center for Biotechnology Information). Una de las herramientas que el grupo desarrolla para integrar fuentes de datos es Ontology Annotator. En una de sus fases, la labor del usuario consiste en recuperar información de una base de datos pública y seleccionar de forma manual los resultados relevantes. Para automatizar el proceso de búsqueda y selección de resultados relevantes, por un lado existe un gran interés en conseguir generar consultas que guíen hacia resultados lo más precisos y exactos como sea posible, por otro lado, existe un gran interés en extraer información relevante de elevadas cantidades de documentos, lo cual requiere de sistemas que analicen y ponderen los datos que caracterizan a los mismos. En el campo informático de la inteligencia artificial, dentro de la rama de la recuperación de la información, existen diversos estudios acerca de la expansión de consultas a partir de retroalimentación relevante que podrían ser de gran utilidad para dar solución a la cuestión. Estos estudios se centran en técnicas para reformular o expandir la consulta inicial utilizando como realimentación los resultados que en una primera instancia fueron relevantes para el usuario, de forma que el nuevo conjunto de resultados tenga mayor proximidad con los que el usuario realmente desea. El objetivo de este trabajo de fin de grado consiste en el estudio, implementación y experimentación de métodos que automaticen el proceso de extracción de información trascendente de documentos, utilizándola para expandir o reformular consultas. De esta forma se pretende mejorar la precisión y el ranking de los resultados asociados. Dichos métodos serán integrados en la herramienta Ontology Annotator y enfocados a la fuente de datos de PubMed [4].---ABSTRACT---Part of the current biomedical research is focused on the analysis of heterogeneous data. These data may have different origin, structure and semantics. A big quantity of interesting data is contained in public databases which gather information from different sources and make it open and free to be used by the community. In order to homogenize thise sources of public data with others which origin is private, there are some tools and techniques that allow automating the processes of integration heterogeneous data. The biomedical informatics group of the Universidad Politécnica de Madrid cooperates with the European project P-medicine which main purpose is to create an infrastructure and models to facilitate the transition from current medical practice to personalized medicine. One of the tasks of the project that the group is in charge of consists on the development of tools that will help users in the process of integrating data from diverse sources. Some of the sources are biomedical public data bases from the NCBI platform (National Center for Biotechnology Information). One of the tools in which the group is currently working on for the integration of data sources is called the Ontology Annotator. In this tool there is a phase in which the user has to retrieve information from a public data base and select the relevant data contained in it manually. For automating the process of searching and selecting data on the one hand, there is an interest in automatically generating queries that guide towards the more precise results as possible. On the other hand, there is an interest on retrieve relevant information from large quantities of documents. The solution requires systems that analyze and weigh the data allowing the localization of the relevant items. In the computer science field of the artificial intelligence, in the branch of information retrieval there are diverse studies about the query expansion from relevance feedback that could be used to solve the problem. The main purpose of this studies is to obtain a set of results that is the closer as possible to the information that the user really wants to retrieve. In order to reach this purpose different techniques are used to reformulate or expand the initial query using a feedback the results that where relevant for the user, with this method, the new set of results will have more proximity with the ones that the user really desires. The goal of this final dissertation project consists on the study, implementation and experimentation of methods that automate the process of extraction of relevant information from documents using this information to expand queries. This way, the precision and the ranking of the results associated will be improved. These methods will be integrated in the Ontology Annotator tool and will focus on the PubMed data source.