818 resultados para big data
Resumo:
Estudiosos de todo el mundo se están centrando en el estudio del fenómeno de las ciudades inteligentes. La producción bibliográfica española sobre este tema ha crecido exponencialmente en los últimos años. Las nuevas ciudades inteligentes se fundamentan en nuevas visiones de desarrollo urbano que integran múltiples soluciones tecnológicas ligadas al mundo de la información y de la comunicación, todas ellas actuales y al servicio de las necesidades de la ciudad. La literatura en español sobre este tema proviene de campos tan diferentes como la Arquitectura, la Ingeniería, las Ciencias Políticas y el Derecho o las Ciencias Empresariales. La finalidad de las ciudades inteligentes es la mejora de la vida de sus ciudadanos a través de la implementación de tecnologías de la información y de la comunicación que resuelvan las necesidades de sus habitantes, por lo que los investigadores del campo de las Ciencias de la Comunicación y de la Información tienen mucho que decir. Este trabajo analiza un total de 120 textos y concluye que el fenómeno de las ciudades inteligentes será uno de los ejes centrales de la investigación multidisciplinar en los próximos años en nuestro país.
Resumo:
El Periodismo de Datos se ha convertido en una de las tendencias que se están implantando en los medios. En pocos años el desarrollo y visibilidad de esta modalidad ha aumentado considerablemente y son numerosos los medios que cuentan con equipos y espacios específicos de Periodismo de Datos en el panorama internacional. Del mismo modo, existen aplicaciones, plataformas, webs, o fundaciones al margen de las empresas periodísticas cuya labor también puede ser enmarcada en este ámbito. El objetivo principal de esta contribución es establecer una radiografía de la implantación del Periodismo de Datos en España; tanto dentro como fuera de los medios. Aunque se trata de una disciplina todavía en fase de desarrollo, parece adecuado realizar un estudio exploratorio que ofrezca una panorámica de su situación actual en España.
Resumo:
This paper explores the relationship between the rise of “new” social movements (15-M and Occupy) and the Internet. The new social media gives rise to new kinds of social movements which embed this technology from the moment of conception. The future of social movements will be characterised by movinets, which will have the effect of developing new efficient ways of activism. The movinets, with their embedded technology and capacity to circulate ideas among different spheres of reality, have a potential to alter the dynamics of social mobilisation.
Resumo:
Expanding on the growing movement to take academic and other erudite subjugated knowledges and distill them into some graphic form, this “cartoon” is a recounting of the author’s 2014 article, “Big Data, Actionable Information, Scientific Knowledge and the Goal of Control,” Teknokultura, Vol. 11/no. 3, pp. 529-54. It is an analysis of the idea of Big Data and an argument that its power relies on its instrumentalist specificity and not its extent. Mind control research in general and optogenetics in particular are the case study. Noir seems an appropriate aesthetic for this analysis, so direct quotes from the article are illustrated by publically available screen shots from iconic and unknown films of the 20th century. The only addition to the original article is a framing insight from the admirable activist network CrimethInc.
Resumo:
With Tweet volumes reaching 500 million a day, sampling is inevitable for any application using Twitter data. Realizing this, data providers such as Twitter, Gnip and Boardreader license sampled data streams priced in accordance with the sample size. Big Data applications working with sampled data would be interested in working with a large enough sample that is representative of the universal dataset. Previous work focusing on the representativeness issue has considered ensuring the global occurrence rates of key terms, be reliably estimated from the sample. Present technology allows sample size estimation in accordance with probabilistic bounds on occurrence rates for the case of uniform random sampling. In this paper, we consider the problem of further improving sample size estimates by leveraging stratification in Twitter data. We analyze our estimates through an extensive study using simulations and real-world data, establishing the superiority of our method over uniform random sampling. Our work provides the technical know-how for data providers to expand their portfolio to include stratified sampled datasets, whereas applications are benefited by being able to monitor more topics/events at the same data and computing cost.
Resumo:
The astonishing development of diverse and different hardware platforms is twofold: on one side, the challenge for the exascale performance for big data processing and management; on the other side, the mobile and embedded devices for data collection and human machine interaction. This drove to a highly hierarchical evolution of programming models. GVirtuS is the general virtualization system developed in 2009 and firstly introduced in 2010 enabling a completely transparent layer among GPUs and VMs. This paper shows the latest achievements and developments of GVirtuS, now supporting CUDA 6.5, memory management and scheduling. Thanks to the new and improved remoting capabilities, GVirtus now enables GPU sharing among physical and virtual machines based on x86 and ARM CPUs on local workstations,computing clusters and distributed cloud appliances.
Resumo:
The mismatch between human capacity and the acquisition of Big Data such as Earth imagery undermines commitments to Convention on Biological Diversity (CBD) and Aichi targets. Artificial intelligence (AI) solutions to Big Data issues are urgently needed as these could prove to be faster, more accurate, and cheaper. Reducing costs of managing protected areas in remote deep waters and in the High Seas is of great importance, and this is a realm where autonomous technology will be transformative.
Resumo:
This keynote presentation will report some of our research work and experience on the development and applications of relevant methods, models, systems and simulation techniques in support of different types and various levels of decision making for business, management and engineering. In particular, the following topics will be covered. Modelling, multi-agent-based simulation and analysis of the allocation management of carbon dioxide emission permits in China (Nanfeng Liu & Shuliang Li Agent-based simulation of the dynamic evolution of enterprise carbon assets (Yin Zeng & Shuliang Li) A framework & system for extracting and representing project knowledge contexts using topic models and dynamic knowledge maps: a big data perspective (Jin Xu, Zheng Li, Shuliang Li & Yanyan Zhang) Open innovation: intelligent model, social media & complex adaptive system simulation (Shuliang Li & Jim Zheng Li) A framework, model and software prototype for modelling and simulation for deshopping behaviour and how companies respond (Shawkat Rahman & Shuliang Li) Integrating multiple agents, simulation, knowledge bases and fuzzy logic for international marketing decision making (Shuliang Li & Jim Zheng Li) A Web-based hybrid intelligent system for combined conventional, digital, mobile, social media and mobile marketing strategy formulation (Shuliang Li & Jim Zheng Li) A hybrid intelligent model for Web & social media dynamics, and evolutionary and adaptive branding (Shuliang Li) A hybrid paradigm for modelling, simulation and analysis of brand virality in social media (Shuliang Li & Jim Zheng Li) Network configuration management: attack paradigms and architectures for computer network survivability (Tero Karvinen & Shuliang Li)
Resumo:
Fundamentals of data science and introduction to COMP6235
Resumo:
“La Business Intelligence per il monitoraggio delle vendite: il caso Ducati Motor Holding”. L’obiettivo di questa tesi è quello di illustrare cos’è la Business Intelligence e di mostrare i cambiamenti verificatisi in Ducati Motor Holding, in seguito alla sua adozione, in termini di realizzazione di report e dashboard per il monitoraggio delle vendite. L’elaborato inizia con una panoramica generale sulla storia e gli utilizzi della Business Intelligence nella quale vengono toccati i principali fondamenti teorici: Data Warehouse, data mining, analisi what-if, rappresentazione multidimensionale dei dati, costruzione del team di BI eccetera. Si proseguirà mediante un focus sui Big Data convogliando l’attenzione sul loro utilizzo e utilità nel settore dell’automotive (inteso nella sua accezione più generica e cioè non solo come mercato delle auto, ma anche delle moto), portando in questo modo ad un naturale collegamento con la realtà Ducati. Si apre così una breve overview sull’azienda descrivendone la storia, la struttura commerciale attraverso la quale vengono gestite le vendite e la gamma dei prodotti. Dal quarto capitolo si entra nel vivo dell’argomento: la Business Intelligence in Ducati. Si inizia descrivendo le fasi che hanno fino ad ora caratterizzato il progetto di Business Analytics (il cui obiettivo è per l'appunto introdurre la BI i azienda) per poi concentrarsi, a livello prima teorico e poi pratico, sul reporting sales e cioè sulla reportistica basata sul monitoraggio delle vendite.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering.
Resumo:
With the development of electronic devices, more and more mobile clients are connected to the Internet and they generate massive data every day. We live in an age of “Big Data”, and every day we generate hundreds of million magnitude data. By analyzing the data and making prediction, we can carry out better development plan. Unfortunately, traditional computation framework cannot meet the demand, so the Hadoop would be put forward. First the paper introduces the background and development status of Hadoop, compares the MapReduce in Hadoop 1.0 and YARN in Hadoop 2.0, and analyzes the advantages and disadvantages of them. Because the resource management module is the core role of YARN, so next the paper would research about the resource allocation module including the resource management, resource allocation algorithm, resource preemption model and the whole resource scheduling process from applying resource to finishing allocation. Also it would introduce the FIFO Scheduler, Capacity Scheduler, and Fair Scheduler and compare them. The main work has been done in this paper is researching and analyzing the Dominant Resource Fair algorithm of YARN, putting forward a maximum resource utilization algorithm based on Dominant Resource Fair algorithm. The paper also provides a suggestion to improve the unreasonable facts in resource preemption model. Emphasizing “fairness” during resource allocation is the core concept of Dominant Resource Fair algorithm of YARM. Because the cluster is multiple users and multiple resources, so the user’s resource request is multiple too. The DRF algorithm would divide the user’s resources into dominant resource and normal resource. For a user, the dominant resource is the one whose share is highest among all the request resources, others are normal resource. The DRF algorithm requires the dominant resource share of each user being equal. But for these cases where different users’ dominant resource amount differs greatly, emphasizing “fairness” is not suitable and can’t promote the resource utilization of the cluster. By analyzing these cases, this thesis puts forward a new allocation algorithm based on DRF. The new algorithm takes the “fairness” into consideration but not the main principle. Maximizing the resource utilization is the main principle and goal of the new algorithm. According to comparing the result of the DRF and new algorithm based on DRF, we found that the new algorithm has more high resource utilization than DRF. The last part of the thesis is to install the environment of YARN and use the Scheduler Load Simulator (SLS) to simulate the cluster environment.
Resumo:
In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets.
Resumo:
Securing e-health applications in the context of Internet of Things (IoT) is challenging. Indeed, resources scarcity in such environment hinders the implementation of existing standard based protocols. Among these protocols, MIKEY (Multimedia Internet KEYing) aims at establishing security credentials between two communicating entities. However, the existing MIKEY modes fail to meet IoT specificities. In particular, the pre-shared key mode is energy efficient, but suffers from severe scalability issues. On the other hand, asymmetric modes such as the public key mode are scalable, but are highly resource consuming. To address this issue, we combine two previously proposed approaches to introduce a new hybrid MIKEY mode. Indeed, relying on a cooperative approach, a set of third parties is used to discharge the constrained nodes from heavy computational operations. Doing so, the pre-shared mode is used in the constrained part of the network, while the public key mode is used in the unconstrained part of the network. Preliminary results show that our proposed mode is energy preserving whereas its security properties are kept safe.