781 resultados para big data storage


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Negli ultimi anni i dati, la loro gestione e gli strumenti per la loro analisi hanno subito una trasformazione. Si è visto un notevole aumento dei dati raccolti dagli utenti, che si aggira tra il 40 e il 60 percento annuo, grazie ad applicazioni web, sensori, ecc.. Ciò ha fatto nascere il termine Big Data, con il quale ci si riferisce a dataset talmente grandi che non sono gestibili da sistemi tradizionali, come DBMS relazionali in esecuzione su una singola macchina. Infatti, quando la dimensione di un dataset supera pochi terabyte, si è obbligati ad utilizzare un sistema distribuito, in cui i dati sono partizionati su più macchine. Per gestire i Big Data sono state create tecnologie che riescono ad usare la potenza computazionale e la capacità di memorizzazione di un cluster, con un incremento prestazionale proporzionale al numero di macchine presenti sullo stesso. Il più utilizzato di questi sistemi è Hadoop, che offre un sistema per la memorizzazione e l’analisi distribuita dei dati. Grazie alla ridondanza dei dati ed a sofisticati algoritmi, Hadoop riesce a funzionare anche in caso di fallimento di uno o più macchine del cluster, in modo trasparente all’utente. Su Hadoop si possono eseguire diverse applicazioni, tra cui MapReduce, Hive e Apache Spark. É su quest’ultima applicazione, nata per il data processing, che è maggiormente incentrato il progetto di tesi. Un modulo di Spark, chiamato Spark SQL, verrà posto in confronto ad Hive nella velocità e nella flessibilità nell’eseguire interrogazioni su database memorizzati sul filesystem distribuito di Hadoop.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Il presente elaborato ha come oggetto l’analisi delle prestazioni e il porting di un sistema di SBI sulla distribuzione Hadoop di Cloudera. Nello specifico è stato fatto un porting dei dati del progetto WebPolEU. Successivamente si sono confrontate le prestazioni del query engine Impala con quelle di ElasticSearch che, diversamente da Oracle, sfrutta la stessa componente hardware (cluster).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nella tesi, inizialmente, viene introdotto il concetto di Big Data, descrivendo le caratteristiche principali, il loro utilizzo, la provenienza e le opportunità che possono apportare. Successivamente, si sono spiegati i motivi che hanno portato alla nascita del movimento NoSQL, come la necessità di dover gestire i Big Data pur mantenendo una struttura flessibile nel tempo. Inoltre, dopo un confronto con i sistemi tradizionali, si è passati al classificare questi DBMS in diverse famiglie, accennando ai concetti strutturali sulle quali si basano, per poi spiegare il funzionamento. In seguito è stato descritto il database MongoDB orientato ai documenti. Sono stati approfonditi i dettagli strutturali, i concetti sui quali si basa e gli obbiettivi che si pone, per poi andare ad analizzare nello specifico importanti funzioni, come le operazioni di inserimento e cancellazione, ma anche il modo di interrogare il database. Grazie alla sue caratteristiche che lo rendono molto performante, MonogDB, è stato utilizzato come supporto di base di dati per la realizzazione di un applicazione web che permette di mostrare la mappa della connettività urbana.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La quantità di dati che vengono generati e immagazzinati sta aumentando sempre più grazie alle nuove tecnologie e al numero di utenti sempre maggiore. Questi dati, elaborati correttamente, permettono quindi di ottenere delle informazioni di valore strategico che aiutano nell’effettuare decisioni aziendali a qualsiasi livello, dalla produzione fino al marketing. Sono nati soprattutto negli ultimi anni numerosi framework proprietari e open source che permettono l'elaborazione di questi dati sfruttando un cluster. In particolare tra i più utilizzati e attivi in questo momento a livello open source troviamo Hadoop e Spark. Obiettivo di questa tesi è realizzare un modello di Spark per realizzare una funzione di costo che sia non solo implementabile all’interno dell’ottimizzatore di Spark SQL, ma anche per poter effettuare delle simulazioni di esecuzione di query su tale sistema. Si è quindi studiato nel dettaglio con ducumentazione e test il comportamento del sistema per realizzare un modello. I dati ottenuti sono infine stati confrontati con dati sperimentali ottenuti tramite l'utilizzo di un cluster. Con la presenza di tale modello non solo risulta possibile comprendere in maniera più approfondita il reale comportamento di Spark ma permette anche di programmare applicazioni più efficienti e progettare con maggiore precisione sistemi per la gestione dei dataset che sfruttino tali framework.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ogni giorno vengono generati grandi moli di dati attraverso sorgenti diverse. Questi dati, chiamati Big Data, sono attualmente oggetto di forte interesse nel settore IT (Information Technology). I processi digitalizzati, le interazioni sui social media, i sensori ed i sistemi mobili, che utilizziamo quotidianamente, sono solo un piccolo sottoinsieme di tutte le fonti che contribuiscono alla produzione di questi dati. Per poter analizzare ed estrarre informazioni da questi grandi volumi di dati, tante sono le tecnologie che sono state sviluppate. Molte di queste sfruttano approcci distribuiti e paralleli. Una delle tecnologie che ha avuto maggior successo nel processamento dei Big Data, e Apache Hadoop. Il Cloud Computing, in particolare le soluzioni che seguono il modello IaaS (Infrastructure as a Service), forniscono un valido strumento all'approvvigionamento di risorse in maniera semplice e veloce. Per questo motivo, in questa proposta, viene utilizzato OpenStack come piattaforma IaaS. Grazie all'integrazione delle tecnologie OpenStack e Hadoop, attraverso Sahara, si riesce a sfruttare le potenzialita offerte da un ambiente cloud per migliorare le prestazioni dell'elaborazione distribuita e parallela. Lo scopo di questo lavoro e ottenere una miglior distribuzione delle risorse utilizzate nel sistema cloud con obiettivi di load balancing. Per raggiungere questi obiettivi, si sono rese necessarie modifiche sia al framework Hadoop che al progetto Sahara.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background Through this paper, we present the initial steps for the creation of an integrated platform for the provision of a series of eHealth tools and services to both citizens and travelers in isolated areas of thesoutheast Mediterranean, and on board ships travelling across it. The platform was created through an INTERREG IIIB ARCHIMED project called INTERMED. Methods The support of primary healthcare, home care and the continuous education of physicians are the three major issues that the proposed platform is trying to facilitate. The proposed system is based on state-of-the-art telemedicine systems and is able to provide the following healthcare services: i) Telecollaboration and teleconsultation services between remotely located healthcare providers, ii) telemedicine services in emergencies, iii) home telecare services for "at risk" citizens such as the elderly and patients with chronic diseases, and iv) eLearning services for the continuous training through seminars of both healthcare personnel (physicians, nurses etc) and persons supporting "at risk" citizens. These systems support data transmission over simple phone lines, internet connections, integrated services digital network/digital subscriber lines, satellite links, mobile networks (GPRS/3G), and wireless local area networks. The data corresponds, among others, to voice, vital biosignals, still medical images, video, and data used by eLearning applications. The proposed platform comprises several systems, each supporting different services. These were integrated using a common data storage and exchange scheme in order to achieve system interoperability in terms of software, language and national characteristics. Results The platform has been installed and evaluated in different rural and urban sites in Greece, Cyprus and Italy. The evaluation was mainly related to technical issues and user satisfaction. The selected sites are, among others, rural health centers, ambulances, homes of "at-risk" citizens, and a ferry. Conclusions The results proved the functionality and utilization of the platform in various rural places in Greece, Cyprus and Italy. However, further actions are needed to enable the local healthcare systems and the different population groups to be familiarized with, and use in their everyday lives, mature technological solutions for the provision of healthcare services.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

SMARTDIAB is a platform designed to support the monitoring, management, and treatment of patients with type 1 diabetes mellitus (T1DM), by combining state-of-the-art approaches in the fields of database (DB) technologies, communications, simulation algorithms, and data mining. SMARTDIAB consists mainly of two units: 1) the patient unit (PU); and 2) the patient management unit (PMU), which communicate with each other for data exchange. The PMU can be accessed by the PU through the internet using devices, such as PCs/laptops with direct internet access or mobile phones via a Wi-Fi/General Packet Radio Service access network. The PU consists of an insulin pump for subcutaneous insulin infusion to the patient and a continuous glucose measurement system. The aforementioned devices running a user-friendly application gather patient's related information and transmit it to the PMU. The PMU consists of a diabetes data management system (DDMS), a decision support system (DSS) that provides risk assessment for long-term diabetes complications, and an insulin infusion advisory system (IIAS), which reside on a Web server. The DDMS can be accessed from both medical personnel and patients, with appropriate security access rights and front-end interfaces. The DDMS, apart from being used for data storage/retrieval, provides also advanced tools for the intelligent processing of the patient's data, supporting the physician in decision making, regarding the patient's treatment. The IIAS is used to close the loop between the insulin pump and the continuous glucose monitoring system, by providing the pump with the appropriate insulin infusion rate in order to keep the patient's glucose levels within predefined limits. The pilot version of the SMARTDIAB has already been implemented, while the platform's evaluation in clinical environment is being in progress.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Big Brother Watch and others have filed a complaint against the United Kingdom under the European Convention on Human Rights about a violation of Article 8, the right to privacy. It regards the NSA affair and UK-based surveillance activities operated by secret services. The question is whether it will be declared admissible and, if so, whether the European Court of Human Rights will find a violation. This article discusses three possible challenges for these types of complaints and analyses whether the current privacy paradigm is still adequate in view of the development known as Big Data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Teaching is a dynamic activity. It can be very effective, if its impact is constantly monitored and adjusted to the demands of changing social contexts and needs of learners. This implies that teachers need to be aware about teaching and learning processes. Moreover, they should constantly question their didactical methods and the learning resources, which they provide to their students. They should reflect if their actions are suitable, and they should regulate their teaching, e.g., by updating learning materials based on new knowledge about learners, or by motivating learners to engage in further learning activities. In the last years, a rising interest in ‘learning analytics’ is observable. This interest is motivated by the availability of massive amounts of educational data. Also, the continuously increasing processing power, and a strong motivation for discovering new information from these pools of educational data, is pushing further developments within the learning analytics research field. Learning analytics could be a method for reflective teaching practice that enables and guides teachers to investigate and evaluate their work in future learning scenarios. However, this potentially positive impact has not yet been sufficiently verified by learning analytics research. Another method that pursues these goals is ‘action research’. Learning analytics promises to initiate action research processes because it facilitates awareness, reflection and regulation of teaching activities analogous to action research. Therefore, this thesis joins both concepts, in order to improve the design of learning analytics tools. Central research question of this thesis are: What are the dimensions of learning analytics in relation to action research, which need to be considered when designing a learning analytics tool? How does a learning analytics dashboard impact the teachers of technology-enhanced university lectures regarding ‘awareness’, ‘reflection’ and ‘action’? Does it initiate action research? Which are central requirements for a learning analytics tool, which pursues such effects? This project followed design-based research principles, in order to answer these research questions. The main contributions are: a theoretical reference model that connects action research and learning analytics, the conceptualization and implementation of a learning analytics tool, a requirements catalogue for useful and usable learning analytics design based on evaluations, a tested procedure for impact analysis, and guidelines for the introduction of learning analytics into higher education.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Logistiknetzwerke von Unternehmen wachsen sehr schnell und werden immer komplexer. Unternehmen wissen oft nicht, von welchen anderen Unternehmen sie abhängig sind und welche geschäftskritischen Risiken sich daraus für sie ergeben. Aus diesem Grund wird in diesem Artikel ein Konzept eines proaktiven Ri-sikomanagements in Logistiknetzwerken vorgestellt. Das Konzept basiert auf der Big Data Technologie und verwendet zur Identifikation von Risiken und zum Aufbau eines Logistiknetzwerkes neben internen Unternehmensdaten auch externe Daten, z. B. Social Media Plattformen oder andere Datenportale. Diese Daten werden ausgewertet und mit Risiken behaftete Beziehungen werden dem Bediener grafisch angezeigt. Zusätzlich dazu kann das System dem Benutzer mögliche Alternativen zur Vermeidung dieser Risiken aufzeigen und somit zur Entscheidungsunterstützung genutzt werden.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Simulation techniques are almost indispensable in the analysis of complex systems. Materials- and related information flow processes in logistics often possess such complexity. Further problem arise as the processes change over time and pose a Big Data problem as well. To cope with these issues adaptive simulations are more and more frequently used. This paper presents a few relevant advanced simulation models and intro-duces a novel model structure, which unifies modelling of geometrical relations and time processes. This way the process structure and their geometric relations can be handled in a well understandable and transparent way. Capabilities and applicability of the model is also presented via a demonstrational example.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Source materials like fine art, over-sized, fragile maps, and delicate artifacts have traditionally been digitally converted through the use of controlled lighting and high resolution scanners and camera backs. In addition the capture of items such as general and special collections bound monographs has recently grown both through consortial efforts like the Internet Archive's Open Content Alliance and locally at the individual institution level. These projects, in turn, have introduced increasingly higher resolution consumer-grade digital single lens reflex cameras or "DSLRs" as a significant part of the general cultural heritage digital conversion workflow. Central to the authors' discussion is the fact that both camera backs and DSLRs commonly share the ability to capture native raw file formats. Because these formats include such advantages as access to an image's raw mosaic sensor data within their architecture, many institutions choose raw for initial capture due to its high bit-level and unprocessed nature. However to date these same raw formats, so important to many at the point of capture, have yet to be considered "archival" within most published still imaging standards, if they are considered at all. Throughout many workflows raw files are deleted and thrown away after more traditionally "archival" uncompressed TIFF or JPEG 2000 files have been derived downstream from their raw source formats [1][2]. As a result, the authors examine the nature of raw anew and consider the basic questions, Should raw files be retained? What might their role be? Might they in fact form a new archival format space? Included in the discussion is a survey of assorted raw file types and their attributes. Also addressed are various sustainability issues as they pertain to archival formats with a special emphasis on both raw's positive and negative characteristics as they apply to archival practices. Current common archival workflows versus possible raw-based ones are investigated as well. These comparisons are noted in the context of each approach's differing levels of usable captured image data, various preservation virtues, and the divergent ideas of strictly fixed renditions versus the potential for improved renditions over time. Special attention is given to the DNG raw format through a detailed inspection of a number of its various structural components and the roles that they play in the format's latest specification. Finally an evaluation is drawn of both proprietary raw formats in general and DNG in particular as possible alternative archival formats for still imaging.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This chapter presents fuzzy cognitive maps (FCM) as a vehicle for Web knowledge aggregation, representation, and reasoning. The corresponding Web KnowARR framework incorporates findings from fuzzy logic. To this end, a first emphasis is particularly on the Web KnowARR framework along with a stakeholder management use case to illustrate the framework’s usefulness as a second focal point. This management form is to help projects to acceptance and assertiveness where claims for company decisions are actively involved in the management process. Stakeholder maps visually (re-) present these claims. On one hand, they resort to non-public content and on the other they resort to content that is available to the public (mostly on the Web). The Semantic Web offers opportunities not only to present public content descriptively but also to show relationships. The proposed framework can serve as the basis for the public content of stakeholder maps.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The fuzzy analytical network process (FANP) is introduced as a potential multi-criteria-decision-making (MCDM) method to improve digital marketing management endeavors. Today’s information overload makes digital marketing optimization, which is needed to continuously improve one’s business, increasingly difficult. The proposed FANP framework is a method for enhancing the interaction between customers and marketers (i.e., involved stakeholders) and thus for reducing the challenges of big data. The presented implementation takes realities’ fuzziness into account to manage the constant interaction and continuous development of communication between marketers and customers on the Web. Using this FANP framework, the marketers are able to increasingly meet the varying requirements of their customers. To improve the understanding of the implementation, advanced visualization methods (e.g., wireframes) are used.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a novel surrogate model-based global optimization framework allowing a large number of function evaluations. The method, called SpLEGO, is based on a multi-scale expected improvement (EI) framework relying on both sparse and local Gaussian process (GP) models. First, a bi-objective approach relying on a global sparse GP model is used to determine potential next sampling regions. Local GP models are then constructed within each selected region. The method subsequently employs the standard expected improvement criterion to deal with the exploration-exploitation trade-off within selected local models, leading to a decision on where to perform the next function evaluation(s). The potential of our approach is demonstrated using the so-called Sparse Pseudo-input GP as a global model. The algorithm is tested on four benchmark problems, whose number of starting points ranges from 102 to 104. Our results show that SpLEGO is effective and capable of solving problems with large number of starting points, and it even provides significant advantages when compared with state-of-the-art EI algorithms.