891 resultados para Big data analytics


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An emerging consensus in cognitive science views the biological brain as a hierarchically-organized predictive processing system. This is a system in which higher-order regions are continuously attempting to predict the activity of lower-order regions at a variety of (increasingly abstract) spatial and temporal scales. The brain is thus revealed as a hierarchical prediction machine that is constantly engaged in the effort to predict the flow of information originating from the sensory surfaces. Such a view seems to afford a great deal of explanatory leverage when it comes to a broad swathe of seemingly disparate psychological phenomena (e.g., learning, memory, perception, action, emotion, planning, reason, imagination, and conscious experience). In the most positive case, the predictive processing story seems to provide our first glimpse at what a unified (computationally-tractable and neurobiological plausible) account of human psychology might look like. This obviously marks out one reason why such models should be the focus of current empirical and theoretical attention. Another reason, however, is rooted in the potential of such models to advance the current state-of-the-art in machine intelligence and machine learning. Interestingly, the vision of the brain as a hierarchical prediction machine is one that establishes contact with work that goes under the heading of 'deep learning'. Deep learning systems thus often attempt to make use of predictive processing schemes and (increasingly abstract) generative models as a means of supporting the analysis of large data sets. But are such computational systems sufficient (by themselves) to provide a route to general human-level analytic capabilities? I will argue that they are not and that closer attention to a broader range of forces and factors (many of which are not confined to the neural realm) may be required to understand what it is that gives human cognition its distinctive (and largely unique) flavour. The vision that emerges is one of 'homomimetic deep learning systems', systems that situate a hierarchically-organized predictive processing core within a larger nexus of developmental, behavioural, symbolic, technological and social influences. Relative to that vision, I suggest that we should see the Web as a form of 'cognitive ecology', one that is as much involved with the transformation of machine intelligence as it is with the progressive reshaping of our own cognitive capabilities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il presente elaborato ha come oggetto la progettazione e lo sviluppo di una soluzione Hadoop per il Calcolo di Big Data Analytics. Nell'ambito del progetto di monitoraggio dei bottle cooler, le necessità emerse dall'elaborazione di dati in continua crescita, ha richiesto lo sviluppo di una soluzione in grado di sostituire le tradizionali tecniche di ETL, non pi�ù su�fficienti per l'elaborazione di Big Data. L'obiettivo del presente elaborato consiste nel valutare e confrontare le perfomance di elaborazione ottenute, da un lato, dal flusso di ETL tradizionale, e dall'altro dalla soluzione Hadoop implementata sulla base del framework MapReduce.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we evaluate and compare two representativeand popular distributed processing engines for large scalebig data analytics, Spark and graph based engine GraphLab. Wedesign a benchmark suite including representative algorithmsand datasets to compare the performances of the computingengines, from performance aspects of running time, memory andCPU usage, network and I/O overhead. The benchmark suite istested on both local computer cluster and virtual machines oncloud. By varying the number of computers and memory weexamine the scalability of the computing engines with increasingcomputing resources (such as CPU and memory). We also runcross-evaluation of generic and graph based analytic algorithmsover graph processing and generic platforms to identify thepotential performance degradation if only one processing engineis available. It is observed that both computing engines showgood scalability with increase of computing resources. WhileGraphLab largely outperforms Spark for graph algorithms, ithas close running time performance as Spark for non-graphalgorithms. Additionally the running time with Spark for graphalgorithms over cloud virtual machines is observed to increaseby almost 100% compared to over local computer clusters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La tesi presenta uno studio della libreria grafica per web D3, sviluppata in javascript, e ne presenta una catalogazione dei grafici implementati e reperibili sul web. Lo scopo è quello di valutare la libreria e studiarne i pregi e difetti per capire se sia opportuno utilizzarla nell'ambito di un progetto Europeo. Per fare questo vengono studiati i metodi di classificazione dei grafici presenti in letteratura e viene esposto e descritto lo stato dell'arte del data visualization. Viene poi descritto il metodo di classificazione proposto dal team di progettazione e catalogata la galleria di grafici presente sul sito della libreria D3. Infine viene presentato e studiato in maniera formale un algoritmo per selezionare un grafico in base alle esigenze dell'utente.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il lavoro presentato in questo elaborato tratterà lo sviluppo di un sistema di alerting che consenta di monitorare proattivamente una o più sorgenti dati aziendali, segnalando le eventuali condizioni di irregolarità rilevate; questo verrà incluso all'interno di sistemi già esistenti dedicati all'analisi dei dati e alla pianificazione, ovvero i cosiddetti Decision Support Systems. Un sistema di supporto alle decisioni è in grado di fornire chiare informazioni per tutta la gestione dell'impresa, misurandone le performance e fornendo proiezioni sugli andamenti futuri. Questi sistemi vengono catalogati all'interno del più ampio ambito della Business Intelligence, che sottintende l'insieme di metodologie in grado di trasformare i dati di business in informazioni utili al processo decisionale. L'intero lavoro di tesi è stato svolto durante un periodo di tirocinio svolto presso Iconsulting S.p.A., IT System Integrator bolognese specializzato principalmente nello sviluppo di progetti di Business Intelligence, Enterprise Data Warehouse e Corporate Performance Management. Il software che verrà illustrato in questo elaborato è stato realizzato per essere collocato all'interno di un contesto più ampio, per rispondere ai requisiti di un cliente multinazionale leader nel settore della telefonia mobile e fissa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The fast development of Information Communication Technologies (ICT) offers new opportunities to realize future smart cities. To understand, manage and forecast the city's behavior, it is necessary the analysis of different kinds of data from the most varied dataset acquisition systems. The aim of this research activity in the framework of Data Science and Complex Systems Physics is to provide stakeholders with new knowledge tools to improve the sustainability of mobility demand in future cities. Under this perspective, the governance of mobility demand generated by large tourist flows is becoming a vital issue for the quality of life in Italian cities' historical centers, which will worsen in the next future due to the continuous globalization process. Another critical theme is sustainable mobility, which aims to reduce private transportation means in the cities and improve multimodal mobility. We analyze the statistical properties of urban mobility of Venice, Rimini, and Bologna by using different datasets provided by companies and local authorities. We develop algorithms and tools for cartography extraction, trips reconstruction, multimodality classification, and mobility simulation. We show the existence of characteristic mobility paths and statistical properties depending on transport means and user's kinds. Finally, we use our results to model and simulate the overall behavior of the cars moving in the Emilia Romagna Region and the pedestrians moving in Venice with software able to replicate in silico the demand for mobility and its dynamic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The idea behind the project is to develop a methodology for analyzing and developing techniques for the diagnosis and the prediction of the state of charge and health of lithium-ion batteries for automotive applications. For lithium-ion batteries, residual functionality is measured in terms of state of health; however, this value cannot be directly associated with a measurable value, so it must be estimated. The development of the algorithms is based on the identification of the causes of battery degradation, in order to model and predict the trend. Therefore, models have been developed that are able to predict the electrical, thermal and aging behavior. In addition to the model, it was necessary to develop algorithms capable of monitoring the state of the battery, online and offline. This was possible with the use of algorithms based on Kalman filters, which allow the estimation of the system status in real time. Through machine learning algorithms, which allow offline analysis of battery deterioration using a statistical approach, it is possible to analyze information from the entire fleet of vehicles. Both systems work in synergy in order to achieve the best performance. Validation was performed with laboratory tests on different batteries and under different conditions. The development of the model allowed to reduce the time of the experimental tests. Some specific phenomena were tested in the laboratory, and the other cases were artificially generated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses how global financial institutions are using big data analytics within their compliance operations. A lot of previous research has focused on the strategic implications of big data, but not much research has considered how such tools are entwined with regulatory breaches and investigations in financial services. Our work covers two in-depth qualitative case studies, each addressing a distinct type of analytics. The first case focuses on analytics which manage everyday compliance breaches and so are expected by managers. The second case focuses on analytics which facilitate investigation and litigation where serious unexpected breaches may have occurred. In doing so, the study focuses on the micro/data to understand how these tools are influencing operational risks and practices. The paper draws from two bodies of literature, the social studies of information systems and finance to guide our analysis and practitioner recommendations. The cases illustrate how technologies are implicated in multijurisdictional challenges and regulatory conflicts at each end of the operational risk spectrum. We find that compliance analytics are both shaping and reporting regulatory matters yet often firms may have difficulties in recruiting individuals with relevant but diverse skill sets. The cases also underscore the increasing need for financial organizations to adopt robust information governance policies and processes to ease future remediation efforts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

At the moment, the phrases “big data” and “analytics” are often being used as if they were magic incantations that will solve all an organization’s problems at a stroke. The reality is that data on its own, even with the application of analytics, will not solve any problems. The resources that analytics and big data can consume represent a significant strategic risk if applied ineffectively. Any analysis of data needs to be guided, and to lead to action. So while analytics may lead to knowledge and intelligence (in the military sense of that term), it also needs the input of knowledge and intelligence (in the human sense of that term). And somebody then has to do something new or different as a result of the new insights, or it won’t have been done to any purpose. Using an analytics example concerning accounts payable in the public sector in Canada, this paper reviews thinking from the domains of analytics, risk management and knowledge management, to show some of the pitfalls, and to present a holistic picture of how knowledge management might help tackle the challenges of big data and analytics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En aquest treball es presenta el laboratori que l'eLearn Center posa a disposició del professorat i investigadors en e-learning de la UOC per al disseny sistemàtic d'experiments sota una perspectiva de Learning Analytics, però tambémecanismes per fer seguiment i documentar tot el procés que envolta el disseny d'experiències docents de forma que sigui més senzill transferir-les a altres àmbits.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract: Big Data has been characterised as a great economic opportunity and a massive threat to privacy. Both may be correct: the same technology can indeed be used in ways that are highly beneficial and those that are ethically intolerable, maybe even simultaneously. Using examples of how Big Data might be used in education - normally referred to as "learning analytics" - the seminar will discuss possible ethical and legal frameworks for Big Data, and how these might guide the development of technologies, processes and policies that can deliver the benefits of Big Data without the nightmares. Speaker Biography: Andrew Cormack is Chief Regulatory Adviser, Jisc Technologies. He joined the company in 1999 as head of the JANET-CERT and EuroCERT incident response teams. In his current role he concentrates on the security, policy and regulatory issues around the network and services that Janet provides to its customer universities and colleges. Previously he worked for Cardiff University running web and email services, and for NERC's Shipboard Computer Group. He has degrees in Mathematics, Humanities and Law.