68 resultados para Collection of Network Data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The term 'big data' has recently emerged to describe a range of technological and commercial trends enabling the storage and analysis of huge amounts of customer data, such as that generated by social networks and mobile devices. Much of the commercial promise of big data is in the ability to generate valuable insights from collecting new types and volumes of data in ways that were not previously economically viable. At the same time a number of questions have been raised about the implications for individual privacy. This paper explores key perspectives underlying the emergence of big data, and considers both the opportunities and ethical challenges raised for market research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Performance modelling is a useful tool in the lifeycle of high performance scientific software, such as weather and climate models, especially as a means of ensuring efficient use of available computing resources. In particular, sufficiently accurate performance prediction could reduce the effort and experimental computer time required when porting and optimising a climate model to a new machine. In this paper, traditional techniques are used to predict the computation time of a simple shallow water model which is illustrative of the computation (and communication) involved in climate models. These models are compared with real execution data gathered on AMD Opteron-based systems, including several phases of the U.K. academic community HPC resource, HECToR. Some success is had in relating source code to achieved performance for the K10 series of Opterons, but the method is found to be inadequate for the next-generation Interlagos processor. The experience leads to the investigation of a data-driven application benchmarking approach to performance modelling. Results for an early version of the approach are presented using the shallow model as an example.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Population modelling is increasingly recognised as a useful tool for pesticide risk assessment. For vertebrates that may ingest pesticides with their food, such as woodpigeon (Columba palumbus), population models that simulate foraging behaviour explicitly can help predicting both exposure and population-level impact. Optimal foraging theory is often assumed to explain the individual-level decisions driving distributions of individuals in the field, but it may not adequately predict spatial and temporal characteristics of woodpigeon foraging because of the woodpigeons’ excellent memory, ability to fly long distances, and distinctive flocking behaviour. Here we present an individual-based model (IBM) of the woodpigeon. We used the model to predict distributions of foraging woodpigeons that use one of six alternative foraging strategies: optimal foraging, memory-based foraging and random foraging, each with or without flocking mechanisms. We used pattern-oriented modelling to determine which of the foraging strategies is best able to reproduce observed data patterns. Data used for model evaluation were gathered during a long-term woodpigeon study conducted between 1961 and 2004 and a radiotracking study conducted in 2003 and 2004, both in the UK, and are summarised here as three complex patterns: the distributions of foraging birds between vegetation types during the year, the number of fields visited daily by individuals, and the proportion of fields revisited by them on subsequent days. The model with a memory-based foraging strategy and a flocking mechanism was the only one to reproduce these three data patterns, and the optimal foraging model produced poor matches to all of them. The random foraging strategy reproduced two of the three patterns but was not able to guarantee population persistence. We conclude that with the memory-based foraging strategy including a flocking mechanism our model is realistic enough to estimate the potential exposure of woodpigeons to pesticides. We discuss how exposure can be linked to our model, and how the model could be used for risk assessment of pesticides, for example predicting exposure and effects in heterogeneous landscapes planted seasonally with a variety of crops, while accounting for differences in land use between landscapes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fire activity has varied globally and continuously since the last glacial maximum (LGM) in response to long-term changes in global climate and shorter-term regional changes in climate, vegetation, and human land use. We have synthesized sedimentary charcoal records of biomass burning since the LGM and present global maps showing changes in fire activity for time slices during the past 21,000 years (as differences in charcoal accumulation values compared to pre-industrial). There is strong broad-scale coherence in fire activity after the LGM, but spatial heterogeneity in the signals increases thereafter. In North America, Europe and southern South America, charcoal records indicate less-than-present fire activity during the deglacial period, from 21,000 to ∼11,000 cal yr BP. In contrast, the tropical latitudes of South America and Africa show greater-than-present fire activity from ∼19,000 to ∼17,000 cal yr BP and most sites from Indochina and Australia show greater-than-present fire activity from 16,000 to ∼13,000 cal yr BP. Many sites indicate greater-than-present or near-present activity during the Holocene with the exception of eastern North America and eastern Asia from 8,000 to ∼3,000 cal yr BP, Indonesia and Australia from 11,000 to 4,000 cal yr BP, and southern South America from 6,000 to 3,000 cal yr BP where fire activity was less than present. Regional coherence in the patterns of change in fire activity was evident throughout the post-glacial period. These complex patterns can largely be explained in terms of large-scale climate controls modulated by local changes in vegetation and fuel load

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Replacement, expansion and upgrading of assets in the electricity network represents financial investment for the distribution utilities. Network Investment Deferral (NID) is a well discussed benefit of wider adoption of Distributed Generation (DG). There have been many attempts to quantify and evaluate the financial benefit for the distribution utilities. While the carbon benefits of NID are commonly mentioned, there is little attempt to quantify these impacts. This paper explores the quantitative methods previously used to evaluate financial benefits in order to discuss the carbon impacts. These carbon impacts are important for companies owning DG equipment for internal reporting and emissions reductions ambitions. Currently, a GB wide approach is taken as a means for discussing more regional and local methods to be used in future work. By investigating these principles, the paper offers a novel approach to quantifying carbon emissions from various DG technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called lossy compression). This work presents a yes-no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognised incorrectly by the yes-filter (that is, to recognise the false positives of the yes-filter). By querying the no-filter after an object has been recognised by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognises as many as possible false positives but no true positives, thus producing the most accurate yes-no Bloom filter among all yes-no Bloom filters. This paper studies how optimization techniques can be used to maximize the number of false positives recognised by the no-filter, with the constraint being that it should recognise no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution. Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX built-in solver (B&B), and this is what can be recommended for use in yes-no Bloom filters. In a wider context of the study of lossy compression algorithms, our researchis an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses how global financial institutions are using big data analytics within their compliance operations. A lot of previous research has focused on the strategic implications of big data, but not much research has considered how such tools are entwined with regulatory breaches and investigations in financial services. Our work covers two in-depth qualitative case studies, each addressing a distinct type of analytics. The first case focuses on analytics which manage everyday compliance breaches and so are expected by managers. The second case focuses on analytics which facilitate investigation and litigation where serious unexpected breaches may have occurred. In doing so, the study focuses on the micro/data to understand how these tools are influencing operational risks and practices. The paper draws from two bodies of literature, the social studies of information systems and finance to guide our analysis and practitioner recommendations. The cases illustrate how technologies are implicated in multijurisdictional challenges and regulatory conflicts at each end of the operational risk spectrum. We find that compliance analytics are both shaping and reporting regulatory matters yet often firms may have difficulties in recruiting individuals with relevant but diverse skill sets. The cases also underscore the increasing need for financial organizations to adopt robust information governance policies and processes to ease future remediation efforts.