876 resultados para bigdata, data stream processing, dsp, apache storm, cyber security


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Companies face new challenges almost every day. In order to stay competitive, it is important that companies strive for continuous development and improvement. By describing companies through their processes it is possible to get a clear overview of the entire operation, which can contribute, to a well-established overall understanding of the company. This is a case study based on Stort AB which is a small logistics company specialized in international transportation and logistics solutions. The purpose of this study is to perform value stream mapping in order to create a more efficient production process and propose possible improvements in order to reduce processing time. After performing value stream mapping, data envelopment analysis is used to calculate how lean Stort AB is today and how lean the company can become by implementing the proposed improvements. The results show that the production process can improve efficiency by minimizing waste produced by a bad workplace layout and over-processing. The authors suggested solution is to introduce standardized processes and invest in technical instruments in order to automate the process to reduce process time. According to data envelopment analysis the business is 41 percent lean at present and may soon become 55 percent lean and finally reach an optimum 100 percent lean mode if the process is automated.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The advancement of GPS technology has made it possible to use GPS devices as orientation and navigation tools, but also as tools to track spatiotemporal information. GPS tracking data can be broadly applied in location-based services, such as spatial distribution of the economy, transportation routing and planning, traffic management and environmental control. Therefore, knowledge of how to process the data from a standard GPS device is crucial for further use. Previous studies have considered various issues of the data processing at the time. This paper, however, aims to outline a general procedure for processing GPS tracking data. The procedure is illustrated step-by-step by the processing of real-world GPS data of car movements in Borlänge in the centre of Sweden.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The generation of heterogeneous big data sources with ever increasing volumes, velocities and veracities over the he last few years has inspired the data science and research community to address the challenge of extracting knowledge form big data. Such a wealth of generated data across the board can be intelligently exploited to advance our knowledge about our environment, public health, critical infrastructure and security. In recent years we have developed generic approaches to process such big data at multiple levels for advancing decision-support. It specifically concerns data processing with semantic harmonisation, low level fusion, analytics, knowledge modelling with high level fusion and reasoning. Such approaches will be introduced and presented in context of the TRIDEC project results on critical oil and gas industry drilling operations and also the ongoing large eVacuate project on critical crowd behaviour detection in confined spaces.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Data Processing Department of ISHC has developed coding forms to be used for the data to be entered into the program. The Highway Planning and Programming and the Design Departments are responsible for coding and submitting the necessary data forms to Data Processing for the noise prediction on the highway sections.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This document does NOT address the issue of oxygen data quality control (either real-time or delayed mode). As a preliminary step towards that goal, this document seeks to ensure that all countries deploying floats equipped with oxygen sensors document the data and metadata related to these floats properly. We produced this document in response to action item 14 from the AST-10 meeting in Hangzhou (March 22-23, 2009). Action item 14: Denis Gilbert to work with Taiyo Kobayashi and Virginie Thierry to ensure DACs are processing oxygen data according to recommendations. If the recommendations contained herein are followed, we will end up with a more uniform set of oxygen data within the Argo data system, allowing users to begin analysing not only their own oxygen data, but also those of others, in the true spirit of Argo data sharing. Indications provided in this document are valid as of the date of writing this document. It is very likely that changes in sensors, calibrations and conversions equations will occur in the future. Please contact V. Thierry (vthierry@ifremer.fr) for any inconsistencies or missing information. A dedicated webpage on the Argo Data Management website (www) contains all information regarding Argo oxygen data management : current and previous version of this cookbook, oxygen sensor manuals, calibration sheet examples, examples of matlab code to process oxygen data, test data, etc..

Relevância:

40.00% 40.00%

Publicador:

Resumo:

By providing vehicle-to-vehicle and vehicle-to-infrastructure wireless communications, vehicular ad hoc networks (VANETs), also known as the “networks on wheels”, can greatly enhance traffic safety, traffic efficiency and driving experience for intelligent transportation system (ITS). However, the unique features of VANETs, such as high mobility and uneven distribution of vehicular nodes, impose critical challenges of high efficiency and reliability for the implementation of VANETs. This dissertation is motivated by the great application potentials of VANETs in the design of efficient in-network data processing and dissemination. Considering the significance of message aggregation, data dissemination and data collection, this dissertation research targets at enhancing the traffic safety and traffic efficiency, as well as developing novel commercial applications, based on VANETs, following four aspects: 1) accurate and efficient message aggregation to detect on-road safety relevant events, 2) reliable data dissemination to reliably notify remote vehicles, 3) efficient and reliable spatial data collection from vehicular sensors, and 4) novel promising applications to exploit the commercial potentials of VANETs. Specifically, to enable cooperative detection of safety relevant events on the roads, the structure-less message aggregation (SLMA) scheme is proposed to improve communication efficiency and message accuracy. The scheme of relative position based message dissemination (RPB-MD) is proposed to reliably and efficiently disseminate messages to all intended vehicles in the zone-of-relevance in varying traffic density. Due to numerous vehicular sensor data available based on VANETs, the scheme of compressive sampling based data collection (CS-DC) is proposed to efficiently collect the spatial relevance data in a large scale, especially in the dense traffic. In addition, with novel and efficient solutions proposed for the application specific issues of data dissemination and data collection, several appealing value-added applications for VANETs are developed to exploit the commercial potentials of VANETs, namely general purpose automatic survey (GPAS), VANET-based ambient ad dissemination (VAAD) and VANET based vehicle performance monitoring and analysis (VehicleView). Thus, by improving the efficiency and reliability in in-network data processing and dissemination, including message aggregation, data dissemination and data collection, together with the development of novel promising applications, this dissertation will help push VANETs further to the stage of massive deployment.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Hardboard processing wastewater was evaluated as a feedstock in a bio refinery co-located with the hardboard facility for the production of fuel grade ethanol. A thorough characterization was conducted on the wastewater and the composition changes of which during the process in the bio refinery were tracked. It was determined that the wastewater had a low solid content (1.4%), and hemicellulose was the main component in the solid, accounting for up to 70%. Acid pretreatment alone can hydrolyze the majority of the hemicellulose as well as oligomers, and over 50% of the monomer sugars generated were xylose. The percentage of lignin remained in the liquid increased after acid pretreatment. The characterization results showed that hardboard processing wastewater is a feasible feedstock for the production of ethanol. The optimum conditions to hydrolyze hemicellulose into fermentable sugars were evaluated with a two-stage experiment, which includes acid pretreatment and enzymatic hydrolysis. The experimental data were fitted into second order regression models and Response Surface Methodology (RSM) was employed. The results of the experiment showed that for this type of feedstock enzymatic hydrolysis is not that necessary. In order to reach a comparatively high total sugar concentration (over 45g/l) and low furfural concentration (less than 0.5g/l), the optimum conditions were reached when acid concentration was between 1.41 to 1.81%, and reaction time was 48 to 76 minutes. The two products produced from the bio refinery were compared with traditional products, petroleum gasoline and traditional potassium acetate, in the perspective of sustainability, with greenhouse gas (GHG) emission as an indicator. Three allocation methods, system expansion, mass allocation and market value allocation methods were employed in this assessment. It was determined that the life cycle GHG emissions of ethanol were -27.1, 20.8 and 16 g CO2 eq/MJ, respectively, in the three allocation methods, whereas that of petroleum gasoline is 90 g CO2 eq/MJ. The life cycle GHG emissions of potassium acetate in mass allocation and market value allocation method were 555.7 and 716.0 g CO2 eq/kg, whereas that of traditional potassium acetate is 1020 g CO2/kg.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Healthcare systems have assimilated information and communication technologies in order to improve the quality of healthcare and patient's experience at reduced costs. The increasing digitalization of people's health information raises however new threats regarding information security and privacy. Accidental or deliberate data breaches of health data may lead to societal pressures, embarrassment and discrimination. Information security and privacy are paramount to achieve high quality healthcare services, and further, to not harm individuals when providing care. With that in mind, we give special attention to the category of Mobile Health (mHealth) systems. That is, the use of mobile devices (e.g., mobile phones, sensors, PDAs) to support medical and public health. Such systems, have been particularly successful in developing countries, taking advantage of the flourishing mobile market and the need to expand the coverage of primary healthcare programs. Many mHealth initiatives, however, fail to address security and privacy issues. This, coupled with the lack of specific legislation for privacy and data protection in these countries, increases the risk of harm to individuals. The overall objective of this thesis is to enhance knowledge regarding the design of security and privacy technologies for mHealth systems. In particular, we deal with mHealth Data Collection Systems (MDCSs), which consists of mobile devices for collecting and reporting health-related data, replacing paper-based approaches for health surveys and surveillance. This thesis consists of publications contributing to mHealth security and privacy in various ways: with a comprehensive literature review about mHealth in Brazil; with the design of a security framework for MDCSs (SecourHealth); with the design of a MDCS (GeoHealth); with the design of Privacy Impact Assessment template for MDCSs; and with the study of ontology-based obfuscation and anonymisation functions for health data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In questo elaborato vengono analizzate differenti tecniche per la detection di jammer attivi e costanti in una comunicazione satellitare in uplink. Osservando un numero limitato di campioni ricevuti si vuole identificare la presenza di un jammer. A tal fine sono stati implementati i seguenti classificatori binari: support vector machine (SVM), multilayer perceptron (MLP), spectrum guarding e autoencoder. Questi algoritmi di apprendimento automatico dipendono dalle features che ricevono in ingresso, per questo motivo è stata posta particolare attenzione alla loro scelta. A tal fine, sono state confrontate le accuratezze ottenute dai detector addestrati utilizzando differenti tipologie di informazione come: i segnali grezzi nel tempo, le statistical features, le trasformate wavelet e lo spettro ciclico. I pattern prodotti dall’estrazione di queste features dai segnali satellitari possono avere dimensioni elevate, quindi, prima della detection, vengono utilizzati i seguenti algoritmi per la riduzione della dimensionalità: principal component analysis (PCA) e linear discriminant analysis (LDA). Lo scopo di tale processo non è quello di eliminare le features meno rilevanti, ma combinarle in modo da preservare al massimo l’informazione, evitando problemi di overfitting e underfitting. Le simulazioni numeriche effettuate hanno evidenziato come lo spettro ciclico sia in grado di fornire le features migliori per la detection producendo però pattern di dimensioni elevate, per questo motivo è stato necessario l’utilizzo di algoritmi di riduzione della dimensionalità. In particolare, l'algoritmo PCA è stato in grado di estrarre delle informazioni migliori rispetto a LDA, le cui accuratezze risentivano troppo del tipo di jammer utilizzato nella fase di addestramento. Infine, l’algoritmo che ha fornito le prestazioni migliori è stato il Multilayer Perceptron che ha richiesto tempi di addestramento contenuti e dei valori di accuratezza elevati.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a new food classification which assigns foodstuffs according to the extent and purpose of the industrial processing applied to them. Three main groups are defined: unprocessed or minimally processed foods (group 1), processed culinary and food industry ingredients (group 2), and ultra-processed food products (group 3). The use of this classification is illustrated by applying it to data collected in the Brazilian Household Budget Survey which was conducted in 2002/2003 through a probabilistic sample of 48,470 Brazilian households. The average daily food availability was 1,792 kcal/person being 42.5% from group 1 (mostly rice and beans and meat and milk), 37.5% from group 2 (mostly vegetable oils, sugar, and flours), and 20% from group 3 (mostly breads, biscuits, sweets, soft drinks, and sausages). The share of group 3 foods increased with income, and represented almost one third of all calories in higher income households. The impact of the replacement of group 1 foods and group 2 ingredients by group 3 products on the overall quality of the diet, eating patterns and health is discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hot tensile and creep tests were carried out on Kanthal A1 alloy in the temperature range from 600 to 800 degrees C. Each of these sets of data were analyzed separately according to their own methodologies, but an attempt was made to find a correlation between them. A new criterion proposed for converting hot tensile data to creep data, makes possible the analysis of the two kinds of results according to usual creep relations like: Norton, Monkman-Grant, Larson-Miller and others. The remarkable compatibility verified between both sets of data by this procedure strongly suggests that hot tensile data can be converted to creep data and vice-versa for Kanthal A1 alloy, as verified previously for other metallic materials.