937 resultados para Distributed data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The future power grid will effectively utilize renewable energy resources and distributed generation to respond to energy demand while incorporating information technology and communication infrastructure for their optimum operation. This dissertation contributes to the development of real-time techniques, for wide-area monitoring and secure real-time control and operation of hybrid power systems. ^ To handle the increased level of real-time data exchange, this dissertation develops a supervisory control and data acquisition (SCADA) system that is equipped with a state estimation scheme from the real-time data. This system is verified on a specially developed laboratory-based test bed facility, as a hardware and software platform, to emulate the actual scenarios of a real hybrid power system with the highest level of similarities and capabilities to practical utility systems. It includes phasor measurements at hundreds of measurement points on the system. These measurements were obtained from especially developed laboratory based Phasor Measurement Unit (PMU) that is utilized in addition to existing commercially based PMU’s. The developed PMU was used in conjunction with the interconnected system along with the commercial PMU’s. The tested studies included a new technique for detecting the partially islanded micro grids in addition to several real-time techniques for synchronization and parameter identifications of hybrid systems. ^ Moreover, due to numerous integration of renewable energy resources through DC microgrids, this dissertation performs several practical cases for improvement of interoperability of such systems. Moreover, increased number of small and dispersed generating stations and their need to connect fast and properly into the AC grids, urged this work to explore the challenges that arise in synchronization of generators to the grid and through introduction of a Dynamic Brake system to improve the process of connecting distributed generators to the power grid.^ Real time operation and control requires data communication security. A research effort in this dissertation was developed based on Trusted Sensing Base (TSB) process for data communication security. The innovative TSB approach improves the security aspect of the power grid as a cyber-physical system. It is based on available GPS synchronization technology and provides protection against confidentiality attacks in critical power system infrastructures. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We provide a compilation of downward fluxes (total mass, POC, PON, BSiO2, CaCO3, PIC and lithogenic/terrigenous fluxes) from over 6000 sediment trap measurements distributed in the Atlantic Ocean, from 30 degree North to 49 degree South, and covering the period 1982-2011. Data from the Mediterranean Sea are also included. Data were compiled from different sources: data repositories (BCO-DMO, PANGAEA), time series sites (BATS, CARIACO), published scientific papers and/or personal communications from PI's. All sources are specifed in the data set. Data from the World Ocean Atlas 2009 were extracted to provide each flux observation with contextual environmental data, such as temperature, salinity, oxygen (concentration, AOU and percentage saturation), nitrate, phosphate and silicate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Marine phytoplankton can evolve rapidly when confronted with aspects of climate change because of their large population sizes and fast generation times. Despite this, the importance of environment fluctuations, a key feature of climate change, has received little attention-selection experiments with marine phytoplankton are usually carried out in stable environments and use single or few representatives of a species, genus or functional group. Here we investigate whether and by how much environmental fluctuations contribute to changes in ecologically important phytoplankton traits such as C:N ratios and cell size, and test the variability of changes in these traits within the globally distributed species Ostreococcus. We have evolved 16 physiologically distinct lineages of Ostreococcus at stable high CO2 (1031±87?µatm CO2, SH) and fluctuating high CO2 (1012±244?µatm CO2, FH) for 400 generations. We find that although both fluctuation and high CO2 drive evolution, FH-evolved lineages are smaller, have reduced C:N ratios and respond more strongly to further increases in CO2 than do SH-evolved lineages. This indicates that environmental fluctuations are an important factor to consider when predicting how the characteristics of future phytoplankton populations will have an impact on biogeochemical cycles and higher trophic levels in marine food webs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation five years since its first description by Nisumaa et al. (2010). Most of study sites from which data archived are still in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans are still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcomed shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The DTRF2014 is a realization of the the fundamental Earth-fixed coordinate system, the International Terrestrial Reference System (ITRS). It has been computed by the Deutsches Geodätisches Forschungsinstitut der Technischen Universität München (DGFI-TUM). The DTRF2014 consists of station positions and velocities of 1712 globally distributed geodetic observing stations of the observation techniques VLBI, SLR, GNSS and DORIS. Additionally, for the first time, non-tidal atmospheric and hydrological loading is considered in the solution. The DTRF2014 was released in August 2016 and incorporates observation data of the four techniques up 2014. The observation data were processed and submitted by the corresponding technique services: IGS (International GNSS Service, http://igscb.jpl.nasa.gov) IVS (International VLBI Service, http://ivscc.gsfc.nasa.gov) ILRS (International Laser Ranging Service, http://ilrs.gsfc.nasa.gov) IDS (International DORIS Service, http://ids-doris.org). The DTRF2014 is an independent ITRS realization. It is computed on the basis of the same input data as the realizations JTRF2014 (JPL, Pasadena) and ITRF2014 (IGN, Paris). The three realizations of the ITRS differ conceptually. While DTRF2014 and ITRF2014 are based on station positions at a reference epoch and velocities, the JTRF2014 is based on time series of station positions. DTRF2014 and ITRF2014 result from different combination strategies: The ITRF2014 is based on the combination of solutions, the DTRF2014 is computed by the combination of normal equations. The DTRF2014 comprises 3D coordinates and coordinate changes of 1347 GNSS-, 113 VLBI-, 99 SLR- and 153 DORIS-stations. The reference epoch is 1.1.2005, 0h UTC. The Earth Orientation Parameters (EOP) - that means the coordinates of the terrestrial and the celestial pole, UT1-UTC and the Length of Day (LOD) - were simultaneously estimated with the station coordinates. The EOP time series cover the period from 1979.7 to 2015.0. The station names are the official IERS identifiers: CDP numbers or 4-character IDs and DOMES numbers (http://itrf.ensg.ign.fr/doc_ITRF/iers_sta_list.txt). The DTRF2014 solution is available in one comprehensive SINEX file and four technique-specific SINEX files, see below. A detailed description of the solution is given on the website of DGFI-TUM (http://www.dgfi.tum.de/en/science-data-products/dtrf2014/). More information can be made available by request.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The program PanTool was developed as a tool box like a Swiss Army Knife for data conversion and recalculation, written to harmonize individual data collections to standard import format used by PANGAEA. The format of input files the program PanTool needs is a tabular saved in plain ASCII. The user can create this files with a spread sheet program like MS-Excel or with the system text editor. PanTool is distributed as freeware for the operating systems Microsoft Windows, Apple OS X and Linux.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Funding and trial registration: Scottish Government Chief Scientist Office grant CZH/3/17. ClinicalTrials.gov registration NCT01602705.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Acknowledgements. This work was mainly funded by the EU FP7 CARBONES project (contracts FP7-SPACE-2009-1-242316), with also a small contribution from GEOCARBON project (ENV.2011.4.1.1-1-283080). This work used eddy covariance data acquired by the FLUXNET community and in particular by the following networks: AmeriFlux (U.S. Department of Energy, Biological and Environmental Research, Terrestrial Carbon Program; DE-FG02-04ER63917 and DE-FG02-04ER63911), AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, Fluxnet-Canada (supported by CFCAS, NSERC, BIOCAP, Environment Canada, and NRCan), GreenGrass, KoFlux, LBA, NECC, OzFlux, TCOS-Siberia, USCCC. We acknowledge the financial support to the eddy covariance data harmonization provided by CarboEuropeIP, FAO-GTOS-TCO, iLEAPS, Max Planck Institute for Biogeochemistry, National Science Foundation, University of Tuscia, Université Laval and Environment Canada and US Department of Energy and the database development and technical support from Berkeley Water Center, Lawrence Berkeley National Laboratory, Microsoft Research eScience, Oak Ridge National Laboratory, University of California-Berkeley, University of Virginia. Philippe Ciais acknowledges support from the European Research Council through Synergy grant ERC-2013-SyG-610028 “IMBALANCE-P”. The authors wish to thank M. Jung for providing access to the GPP MTE data, which were downloaded from the GEOCARBON data portal (https://www.bgc-jena.mpg.de/geodb/projects/Data.php). The authors are also grateful to computing support and resources provided at LSCE and to the overall ORCHIDEE project that coordinate the development of the code (http://labex.ipsl.fr/orchidee/index.php/about-the-team).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel interrogation technique for fully distributed linearly chirped fiber Bragg grating (LCFBG) strain sensors with simultaneous high temporal and spatial resolution based on optical time-stretch frequency-domain reflectometry (OTS-FDR) is proposed and experimentally demonstrated. LCFBGs is a promising candidate for fully distributed sensors thanks to its longer grating length and broader reflection bandwidth compared to normal uniform FBGs. In the proposed system, two identical LCFBGs are employed in a Michelson interferometer setup with one grating serving as the reference grating whereas the other serving as the sensing element. Broadband spectral interferogram is formed and the strain information is encoded into the wavelength-dependent free spectral range (FSR). Ultrafast interrogation is achieved based on dispersion-induced time stretch such that the target spectral interferogram is mapped to a temporal interference waveform that can be captured in real-Time using a single-pixel photodector. The distributed strain along the sensing grating can be reconstructed from the instantaneous RF frequency of the captured waveform. High-spatial resolution is also obtained due to high-speed data acquisition. In a proof-of-concept experiment, ultrafast real-Time interrogation of fully-distributed grating sensors with various strain distributions is experimentally demonstrated. An ultrarapid measurement speed of 50 MHz with a high spatial resolution of 31.5 μm over a gauge length of 25 mm and a strain resolution of 9.1 μϵ have been achieved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Statin therapy reduces the risk of occlusive vascular events, but uncertainty remains about potential effects on cancer. We sought to provide a detailed assessment of any effects on cancer of lowering LDL cholesterol (LDL-C) with a statin using individual patient records from 175,000 patients in 27 large-scale statin trials. Methods and Findings: Individual records of 134,537 participants in 22 randomised trials of statin versus control (median duration 4.8 years) and 39,612 participants in 5 trials of more intensive versus less intensive statin therapy (median duration 5.1 years) were obtained. Reducing LDL-C with a statin for about 5 years had no effect on newly diagnosed cancer or on death from such cancers in either the trials of statin versus control (cancer incidence: 3755 [1.4% per year [py]] versus 3738 [1.4% py], RR 1.00 [95% CI 0.96-1.05]; cancer mortality: 1365 [0.5% py] versus 1358 [0.5% py], RR 1.00 [95% CI 0.93-1.08]) or in the trials of more versus less statin (cancer incidence: 1466 [1.6% py] vs 1472 [1.6% py], RR 1.00 [95% CI 0.93-1.07]; cancer mortality: 447 [0.5% py] versus 481 [0.5% py], RR 0.93 [95% CI 0.82-1.06]). Moreover, there was no evidence of any effect of reducing LDL-C with statin therapy on cancer incidence or mortality at any of 23 individual categories of sites, with increasing years of treatment, for any individual statin, or in any given subgroup. In particular, among individuals with low baseline LDL-C (<2 mmol/L), there was no evidence that further LDL-C reduction (from about 1.7 to 1.3 mmol/L) increased cancer risk (381 [1.6% py] versus 408 [1.7% py]; RR 0.92 [99% CI 0.76-1.10]). Conclusions: In 27 randomised trials, a median of five years of statin therapy had no effect on the incidence of, or mortality from, any type of cancer (or the aggregate of all cancer).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Distributed Computing frameworks belong to a class of programming models that allow developers to

launch workloads on large clusters of machines. Due to the dramatic increase in the volume of

data gathered by ubiquitous computing devices, data analytic workloads have become a common

case among distributed computing applications, making Data Science an entire field of

Computer Science. We argue that Data Scientist's concern lays in three main components: a dataset,

a sequence of operations they wish to apply on this dataset, and some constraint they may have

related to their work (performances, QoS, budget, etc). However, it is actually extremely

difficult, without domain expertise, to perform data science. One need to select the right amount

and type of resources, pick up a framework, and configure it. Also, users are often running their

application in shared environments, ruled by schedulers expecting them to specify precisely their resource

needs. Inherent to the distributed and concurrent nature of the cited frameworks, monitoring and

profiling are hard, high dimensional problems that block users from making the right

configuration choices and determining the right amount of resources they need. Paradoxically, the

system is gathering a large amount of monitoring data at runtime, which remains unused.

In the ideal abstraction we envision for data scientists, the system is adaptive, able to exploit

monitoring data to learn about workloads, and process user requests into a tailored execution

context. In this work, we study different techniques that have been used to make steps toward

such system awareness, and explore a new way to do so by implementing machine learning

techniques to recommend a specific subset of system configurations for Apache Spark applications.

Furthermore, we present an in depth study of Apache Spark executors configuration, which highlight

the complexity in choosing the best one for a given workload.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.