982 resultados para data aggregation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine-to-Machine (M2M) paradigm enables machines (sensors, actuators, robots, and smart meter readers) to communicate with each other with little or no human intervention. M2M is a key enabling technology for the cyber-physical systems (CPSs). This paper explores CPS beyond M2M concept and looks at futuristic applications. Our vision is CPS with distributed actuation and in-network processing. We describe few particular use cases that motivate the development of the M2M communication primitives tailored to large-scale CPS. M2M communications in literature were considered in limited extent so far. The existing work is based on small-scale M2M models and centralized solutions. Different sources discuss different primitives. Few existing decentralized solutions do not scale well. There is a need to design M2M communication primitives that will scale to thousands and trillions of M2M devices, without sacrificing solution quality. The main paradigm shift is to design localized algorithms, where CPS nodes make decisions based on local knowledge. Localized coordination and communication in networked robotics, for matching events and robots, were studied to illustrate new directions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wireless body area networks (WBANs), as a promising health-care system, can provide tremendous benefits for timely and continuous patient care and remote health monitoring. Owing to the restriction of communication, computation and power in WBANs, cloud-assisted WBANs, which offer more reliable, intelligent, and timely health-care services for mobile users and patients, are receiving increasing attention. However, how to aggregate the health data multifunctionally and efficiently is still an open issue to the cloud server (CS). In this paper, we propose a privacy-preserving and multifunctional health data aggregation (PPM-HDA) mechanism with fault tolerance for cloud-assisted WBANs. With PPM-HDA, the CS can compute multiple statistical functions of users' health data in a privacy-preserving way to offer various services. In particular, we first propose a multifunctional health data additive aggregation scheme (MHDA+) to support additive aggregate functions, such as average and variance. Then, we put forward MHDA as an extension of MHDA+ to support nonadditive aggregations, such as min/max, median, percentile, and histogram. The PPM-HDA can resist differential attacks, which most existing data aggregation schemes suffer from. The security analysis shows that the PPM-HDA can protect users' privacy against many threats. Performance evaluations illustrate that the computational overhead of MHDA+ is significantly reduced with the assistance of CSs. Our MHDA scheme is more efficient than previously reported min/max aggregation schemes in terms of communication overhead when the applications require large plaintext space and highly accurate data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As a leading framework for processing and analyzing big data, MapReduce is leveraged by many enterprises to parallelize their data processing on distributed computing systems. Unfortunately, the all-to-all data forwarding from map tasks to reduce tasks in the traditional MapReduce framework would generate a large amount of network traffic. The fact that the intermediate data generated by map tasks can be combined with significant traffic reduction in many applications motivates us to propose a data aggregation scheme for MapReduce jobs in cloud. Specifically, we design an aggregation architecture under the existing MapReduce framework with the objective of minimizing the data traffic during the shuffle phase, in which aggregators can reside anywhere in the cloud. Some experimental results also show that our proposal outperforms existing work by reducing the network traffic significantly.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Large amounts of information can be overwhelming and costly to process, especially when transmitting data over a network. A typical modern Geographical Information System (GIS) brings all types of data together based on the geographic component of the data and provides simple point-and-click query capabilities as well as complex analysis tools. Querying a Geographical Information System, however, can be prohibitively expensive due to the large amounts of data which may need to be processed. Since the use of GIS technology has grown dramatically in the past few years, there is now a need more than ever, to provide users with the fastest and least expensive query capabilities, especially since an approximated 80 % of data stored in corporate databases has a geographical component. However, not every application requires the same, high quality data for its processing. In this paper we address the issues of reducing the cost and response time of GIS queries by preaggregating data by compromising the data accuracy and precision. We present computational issues in generation of multi-level resolutions of spatial data and show that the problem of finding the best approximation for the given region and a real value function on this region, under a predictable error, in general is "NP-complete.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Rapid and affordable tumor molecular profiling has led to an explosion of clinical and genomic data poised to enhance the diagnosis, prognostication and treatment of cancer. A critical point has now been reached at which the analysis and storage of annotated clinical and genomic information in unconnected silos will stall the advancement of precision cancer care. Information systems must be harmonized to overcome the multiple technical and logistical barriers to data sharing. Against this backdrop, the Global Alliance for Genomic Health (GA4GH) was established in 2013 to create a common framework that enables responsible, voluntary and secure sharing of clinical and genomic data. This Perspective from the GA4GH Clinical Working Group Cancer Task Team highlights the data-aggregation challenges faced by the field, suggests potential collaborative solutions and describes how GA4GH can catalyze a harmonized data-sharing culture.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Die zunehmende Vernetzung der Informations- und Kommunikationssysteme führt zu einer weiteren Erhöhung der Komplexität und damit auch zu einer weiteren Zunahme von Sicherheitslücken. Klassische Schutzmechanismen wie Firewall-Systeme und Anti-Malware-Lösungen bieten schon lange keinen Schutz mehr vor Eindringversuchen in IT-Infrastrukturen. Als ein sehr wirkungsvolles Instrument zum Schutz gegenüber Cyber-Attacken haben sich hierbei die Intrusion Detection Systeme (IDS) etabliert. Solche Systeme sammeln und analysieren Informationen von Netzwerkkomponenten und Rechnern, um ungewöhnliches Verhalten und Sicherheitsverletzungen automatisiert festzustellen. Während signatur-basierte Ansätze nur bereits bekannte Angriffsmuster detektieren können, sind anomalie-basierte IDS auch in der Lage, neue bisher unbekannte Angriffe (Zero-Day-Attacks) frühzeitig zu erkennen. Das Kernproblem von Intrusion Detection Systeme besteht jedoch in der optimalen Verarbeitung der gewaltigen Netzdaten und der Entwicklung eines in Echtzeit arbeitenden adaptiven Erkennungsmodells. Um diese Herausforderungen lösen zu können, stellt diese Dissertation ein Framework bereit, das aus zwei Hauptteilen besteht. Der erste Teil, OptiFilter genannt, verwendet ein dynamisches "Queuing Concept", um die zahlreich anfallenden Netzdaten weiter zu verarbeiten, baut fortlaufend Netzverbindungen auf, und exportiert strukturierte Input-Daten für das IDS. Den zweiten Teil stellt ein adaptiver Klassifikator dar, der ein Klassifikator-Modell basierend auf "Enhanced Growing Hierarchical Self Organizing Map" (EGHSOM), ein Modell für Netzwerk Normalzustand (NNB) und ein "Update Model" umfasst. In dem OptiFilter werden Tcpdump und SNMP traps benutzt, um die Netzwerkpakete und Hostereignisse fortlaufend zu aggregieren. Diese aggregierten Netzwerkpackete und Hostereignisse werden weiter analysiert und in Verbindungsvektoren umgewandelt. Zur Verbesserung der Erkennungsrate des adaptiven Klassifikators wird das künstliche neuronale Netz GHSOM intensiv untersucht und wesentlich weiterentwickelt. In dieser Dissertation werden unterschiedliche Ansätze vorgeschlagen und diskutiert. So wird eine classification-confidence margin threshold definiert, um die unbekannten bösartigen Verbindungen aufzudecken, die Stabilität der Wachstumstopologie durch neuartige Ansätze für die Initialisierung der Gewichtvektoren und durch die Stärkung der Winner Neuronen erhöht, und ein selbst-adaptives Verfahren eingeführt, um das Modell ständig aktualisieren zu können. Darüber hinaus besteht die Hauptaufgabe des NNB-Modells in der weiteren Untersuchung der erkannten unbekannten Verbindungen von der EGHSOM und der Überprüfung, ob sie normal sind. Jedoch, ändern sich die Netzverkehrsdaten wegen des Concept drif Phänomens ständig, was in Echtzeit zur Erzeugung nicht stationärer Netzdaten führt. Dieses Phänomen wird von dem Update-Modell besser kontrolliert. Das EGHSOM-Modell kann die neuen Anomalien effektiv erkennen und das NNB-Model passt die Änderungen in Netzdaten optimal an. Bei den experimentellen Untersuchungen hat das Framework erfolgversprechende Ergebnisse gezeigt. Im ersten Experiment wurde das Framework in Offline-Betriebsmodus evaluiert. Der OptiFilter wurde mit offline-, synthetischen- und realistischen Daten ausgewertet. Der adaptive Klassifikator wurde mit dem 10-Fold Cross Validation Verfahren evaluiert, um dessen Genauigkeit abzuschätzen. Im zweiten Experiment wurde das Framework auf einer 1 bis 10 GB Netzwerkstrecke installiert und im Online-Betriebsmodus in Echtzeit ausgewertet. Der OptiFilter hat erfolgreich die gewaltige Menge von Netzdaten in die strukturierten Verbindungsvektoren umgewandelt und der adaptive Klassifikator hat sie präzise klassifiziert. Die Vergleichsstudie zwischen dem entwickelten Framework und anderen bekannten IDS-Ansätzen zeigt, dass der vorgeschlagene IDSFramework alle anderen Ansätze übertrifft. Dies lässt sich auf folgende Kernpunkte zurückführen: Bearbeitung der gesammelten Netzdaten, Erreichung der besten Performanz (wie die Gesamtgenauigkeit), Detektieren unbekannter Verbindungen und Entwicklung des in Echtzeit arbeitenden Erkennungsmodells von Eindringversuchen.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article explores how data envelopment analysis (DEA), along with a smoothed bootstrap method, can be used in applied analysis to obtain more reliable efficiency rankings for farms. The main focus is the smoothed homogeneous bootstrap procedure introduced by Simar and Wilson (1998) to implement statistical inference for the original efficiency point estimates. Two main model specifications, constant and variable returns to scale, are investigated along with various choices regarding data aggregation. The coefficient of separation (CoS), a statistic that indicates the degree of statistical differentiation within the sample, is used to demonstrate the findings. The CoS suggests a substantive dependency of the results on the methodology and assumptions employed. Accordingly, some observations are made on how to conduct DEA in order to get more reliable efficiency rankings, depending on the purpose for which they are to be used. In addition, attention is drawn to the ability of the SLICE MODEL, implemented in GAMS, to enable researchers to overcome the computational burdens of conducting DEA (with bootstrapping).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The exponential increase in data, computing power and the availability of readily accessible analytical software has allowed organisations around the world to leverage the benefits of integrating multiple heterogeneous data files for enterprise-level planning and decision making. Benefits from effective data integration to the health and medical research community include more trustworthy research, higher service quality, improved personnel efficiency, reduction of redundant tasks, facilitation of auditing and more timely, relevant and specific information. The costs of poor quality processes elevate the risk of erroneous outcomes, an erosion of confidence in the data and the organisations using these data. To date there are no documented set of standards for best practice integration of heterogeneous data files for research purposes. Therefore, the aim of this paper is to describe a set of clear protocol for data file integration (Data Integration Protocol In Ten-steps; DIPIT) translational to any field of research.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The success of a Wireless Sensor Network (WSN) deployment strongly depends on the quality of service (QoS) it provides regarding issues such as data accuracy, data aggregation delays and network lifetime maximisation. This is especially challenging in data fusion mechanisms, where a small fraction of low quality data in the fusion input may negatively impact the overall fusion result. In this paper, we present a fuzzy-based data fusion approach for WSN with the aim of increasing the QoS whilst reducing the energy consumption of the sensor network. The proposed approach is able to distinguish and aggregate only true values of the collected data as such, thus reducing the burden of processing the entire data at the base station (BS). It is also able to eliminate redundant data and consequently reduce energy consumption thus increasing the network lifetime. We studied the effectiveness of the proposed data fusion approach experimentally and compared it with two baseline approaches in terms of data collection, number of transferred data packets and energy consumption. The results of the experiments show that the proposed approach achieves better results than the baseline approaches.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In order to generate sales promotion response predictions, marketing analysts estimate demand models using either disaggregated (consumer-level) or aggregated (store-level) scanner data. Comparison of predictions from these demand models is complicated by the fact that models may accommodate different forms of consumer heterogeneity depending on the level of data aggregation. This study shows via simulation that demand models with various heterogeneity specifications do not produce more accurate sales response predictions than a homogeneous demand model applied to store-level data, with one major exception: a random coefficients model designed to capture within-store heterogeneity using store-level data produced significantly more accurate sales response predictions (as well as better fit) compared to other model specifications. An empirical application to the paper towel product category adds additional insights. This article has supplementary material online.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper reports on the development of the Humanities Networked Infrastructure (HuNI), a service which aggregates data from thirty Australian data sources and makes them available for use by researchers across the humanities and creative arts, and more widely by the general public. We discuss the methods used by HuNI to aggregate data, as well as the conceptual framework which has shaped the design of HuNI’s Data Model around six core entity types. Two of the key functions available to users of HuNI – building collections and creating links – are discussed, together with their design rationale.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the recent decision Association for Molecular Pathology v. Myriad Genetics1, the US Supreme Court held that naturally occurring sequences from human genomic DNA are not patentable subject matter. Only certain complementary DNAs (cDNA), modified sequences and methods to use sequences are potentially patentable. It is likely that this distinction will hold for all DNA sequences, whether animal, plant or microbial2. However, it is not clear whether this means that other naturally occurring informational molecules, such as polypeptides (proteins) or polysaccharides, will also be excluded from patents. The decision underscores a pressing need for precise analysis of patents that disclose and reference genetic sequences, especially in the claims. Similarly, data sets, standards compliance and analytical tools must be improved—in particular, data sets and analytical tools must be made openly accessible—in order to provide a basis for effective decision making and policy setting to support biological innovation. Here, we present a web-based platform that allows such data aggregation, analysis and visualization in an open, shareable facility. To demonstrate the potential for the extension of this platform to global patent jurisdictions, we discuss the results of a global survey of patent offices that shows that much progress is still needed in making these data freely available for aggregation in the first place.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Given the drawbacks for using geo-political areas in mapping outcomes unrelated to geo-politics, a compromise is to aggregate and analyse data at the grid level. This has the advantage of allowing spatial smoothing and modelling at a biologically or physically relevant scale. This article addresses two consequent issues: the choice of the spatial smoothness prior and the scale of the grid. Firstly, we describe several spatial smoothness priors applicable for grid data and discuss the contexts in which these priors can be employed based on different aims. Two such aims are considered, i.e., to identify regions with clustering and to model spatial dependence in the data. Secondly, the choice of the grid size is shown to depend largely on the spatial patterns. We present a guide on the selection of spatial scales and smoothness priors for various point patterns based on the two aims for spatial smoothing.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The increased availability of high frequency data sets have led to important new insights in understanding of financial markets. The use of high frequency data is interesting and persuasive, since it can reveal new information that cannot be seen in lower data aggregation. This dissertation explores some of the many important issues connected with the use, analysis and application of high frequency data. These include the effects of intraday seasonal, the behaviour of time varying volatility, the information content of various market data, and the issue of inter market linkages utilizing high frequency 5 minute observations from major European and the U.S stock indices, namely DAX30 of Germany, CAC40 of France, SMI of Switzerland, FTSE100 of the UK and SP500 of the U.S. The first essay in the dissertation shows that there are remarkable similarities in the intraday behaviour of conditional volatility across European equity markets. Moreover, the U.S macroeconomic news announcements have significant cross border effect on both, European equity returns and volatilities. The second essay reports substantial intraday return and volatility linkages across European stock indices of the UK and Germany. This relationship appears virtually unchanged by the presence or absence of the U.S stock market. However, the return correlation among the U.K and German markets rises significantly following the U.S stock market opening, which could largely be described as a contemporaneous effect. The third essay sheds light on market microstructure issues in which traders and market makers learn from watching market data, and it is this learning process that leads to price adjustments. This study concludes that trading volume plays an important role in explaining international return and volatility transmissions. The examination concerning asymmetry reveals that the impact of the positive volume changes is larger on foreign stock market volatility than the negative changes. The fourth and the final essay documents number of regularities in the pattern of intraday return volatility, trading volume and bid-ask spreads. This study also reports a contemporaneous and positive relationship between the intraday return volatility, bid ask spread and unexpected trading volume. These results verify the role of trading volume and bid ask quotes as proxies for information arrival in producing contemporaneous and subsequent intraday return volatility. Moreover, asymmetric effect of trading volume on conditional volatility is also confirmed. Overall, this dissertation explores the role of information in explaining the intraday return and volatility dynamics in international stock markets. The process through which the information is incorporated in stock prices is central to all information-based models. The intraday data facilitates the investigation that how information gets incorporated into security prices as a result of the trading behavior of informed and uninformed traders. Thus high frequency data appears critical in enhancing our understanding of intraday behavior of various stock markets’ variables as it has important implications for market participants, regulators and academic researchers.