686 resultados para Impala, Hadoop, Big Data, HDFS, Social Business Intelligence, SBI, cloudera
Resumo:
At the moment, the phrases “big data” and “analytics” are often being used as if they were magic incantations that will solve all an organization’s problems at a stroke. The reality is that data on its own, even with the application of analytics, will not solve any problems. The resources that analytics and big data can consume represent a significant strategic risk if applied ineffectively. Any analysis of data needs to be guided, and to lead to action. So while analytics may lead to knowledge and intelligence (in the military sense of that term), it also needs the input of knowledge and intelligence (in the human sense of that term). And somebody then has to do something new or different as a result of the new insights, or it won’t have been done to any purpose. Using an analytics example concerning accounts payable in the public sector in Canada, this paper reviews thinking from the domains of analytics, risk management and knowledge management, to show some of the pitfalls, and to present a holistic picture of how knowledge management might help tackle the challenges of big data and analytics.
Resumo:
Cloud computing offers massive scalability and elasticity required by many scien-tific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new oppor-tunities for application developers. This paper investigates how workflow sys-tems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data.
Resumo:
Advocates of Big Data assert that we are in the midst of an epistemological revolution, promising the displacement of the modernist methodological hegemony of causal analysis and theory generation. It is alleged that the growing ‘deluge’ of digitally generated data, and the development of computational algorithms to analyse them, has enabled new inductive ways of accessing everyday relational interactions through their ‘datafication’. This paper critically engages with these discourses of Big Data and complexity, particularly as they operate in the discipline of International Relations, where it is alleged that Big Data approaches have the potential for developing self-governing societal capacities for resilience and adaptation through the real-time reflexive awareness and management of risks and problems as they arise. The epistemological and ontological assumptions underpinning Big Data are then analysed to suggest that critical and posthumanist approaches have come of age through these discourses, enabling process-based and relational understandings to be translated into policy and governance practices. The paper thus raises some questions for the development of critical approaches to new posthuman forms of governance and knowledge production.
How the World Learned to Stop Worrying and Love Failure: Big Data, Resilience and Emergent Causality
Resumo:
In modernity, failure was the discourse of critique, today, it is increasingly the discourse of power: failure has changed its allegiances. Over the last two decades, failure has been enfolded into discourses of power, facilitating the development of new policy approaches. Foremost among governing approaches that seek to include and to govern through failure is that of resilience. This article seeks to reflect upon how the understanding of failure has become transformed in this process, particularly linking this transformation to the radical appreciation of contingency and of the limits to instrumental cause-and-effect approaches to rule. Whereas modernity was shaped by a contestation over failure as an epistemological boundary, under conditions of contingency and complexity there appears to be a new consensus on failure as an ontological necessity. This problematic ‘ontological turn’ is illustrated using examples of changing approaches to risks, especially anthropogenic understandings of environmental threats, formerly seen as ‘natural’.
Resumo:
Abstract: Decision support systems have been widely used for years in companies to gain insights from internal data, thus making successful decisions. Lately, thanks to the increasing availability of open data, these systems are also integrating open data to enrich decision making process with external data. On the other hand, within an open-data scenario, decision support systems can be also useful to decide which data should be opened, not only by considering technical or legal constraints, but other requirements, such as "reusing potential" of data. In this talk, we focus on both issues: (i) open data for decision making, and (ii) decision making for opening data. We will first briefly comment some research problems regarding using open data for decision making. Then, we will give an outline of a novel decision-making approach (based on how open data is being actually used in open-source projects hosted in Github) for supporting open data publication. Bio of the speaker: Jose-Norberto Mazón holds a PhD from the University of Alicante (Spain). He is head of the "Cátedra Telefónica" on Big Data and coordinator of the Computing degree at the University of Alicante. He is also member of the WaKe research group at the University of Alicante. His research work focuses on open data management, data integration and business intelligence within "big data" scenarios, and their application to the tourism domain (smart tourism destinations). He has published his research in international journals, such as Decision Support Systems, Information Sciences, Data & Knowledge Engineering or ACM Transaction on the Web. Finally, he is involved in the open data project in the University of Alicante, including its open data portal at http://datos.ua.es
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.
Resumo:
The mainstay of Big Data is prediction in that it allows practitioners, researchers, and policy analysts to predict trends based upon the analysis of large and varied sources of data. These can range from changing social and political opinions, patterns in crimes, and consumer behaviour. Big Data has therefore shifted the criterion of success in science from causal explanations to predictive modelling and simulation. The 19th-century science sought to capture phenomena and seek to show the appearance of it through causal mechanisms while 20th-century science attempted to save the appearance and relinquish causal explanations. Now 21st-century science in the form of Big Data is concerned with the prediction of appearances and nothing more. However, this pulls social science back in the direction of a more rule- or law-governed reality model of science and away from a consideration of the internal nature of rules in relation to various practices. In effect Big Data offers us no more than a world of surface appearance and in doing so it makes disappear any context-specific conceptual sensitivity.
Resumo:
During the last decades, we assisted to what is called “information explosion”. With the advent of the new technologies and new contexts, the volume, velocity and variety of data has increased exponentially, becoming what is known today as big data. Among them, we emphasize telecommunications operators, which gather, using network monitoring equipment, millions of network event records, the Call Detail Records (CDRs) and the Event Detail Records (EDRs), commonly known as xDRs. These records are stored and later processed to compute network performance and quality of service metrics. With the ever increasing number of collected xDRs, its generated volume needing to be stored has increased exponentially, making the current solutions based on relational databases not suited anymore. To tackle this problem, the relational data store can be replaced by Hadoop File System (HDFS). However, HDFS is simply a distributed file system, this way not supporting any aspect of the relational paradigm. To overcome this difficulty, this paper presents a framework that enables the current systems inserting data into relational databases, to keep doing it transparently when migrating to Hadoop. As proof of concept, the developed platform was integrated with the Altaia - a performance and QoS management of telecommunications networks and services.
Resumo:
66 p.
Resumo:
Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.
Resumo:
A pesar de la existencia de una multitud de investigaciones sobre el análisis de sentimiento, existen pocos trabajos que traten el tema de su implantación práctica y real y su integración con la inteligencia de negocio y big data de tal forma que dichos análisis de sentimiento estén incorporados en una arquitectura (que soporte todo el proceso desde la obtención de datos hasta su explotación con las herramientas de BI) aplicada a la gestión de la crisis. Se busca, por medio de este trabajo, investigar cómo se pueden unir los mundos de análisis (de sentimiento y crisis) y de la tecnología (todo lo relacionado con la inteligencia de negocios, minería de datos y Big Data), y crear una solución de Inteligencia de Negocios que comprenda la minería de datos y el análisis de sentimiento (basados en grandes volúmenes de datos), y que ayude a empresas y/o gobiernos con la gestión de crisis. El autor se ha puesto a estudiar formas de trabajar con grandes volúmenes de datos, lo que se conoce actualmente como Big Data Science, o la ciencia de los datos aplicada a grandes volúmenes de datos (Big Data), y unir esta tecnología con el análisis de sentimiento relacionado a una situación real (en este trabajo la situación elegida fue la del proceso de impechment de la presidenta de Brasil, Dilma Rousseff). En esta unión se han utilizado técnicas de inteligencia de negocios para la creación de cuadros de mandos, rutinas de ETC (Extracción, Transformación y Carga) de los datos así como también técnicas de minería de textos y análisis de sentimiento. El trabajo ha sido desarrollado en distintas partes y con distintas fuentes de datos (datasets) debido a las distintas pruebas de tecnología a lo largo del proyecto. Uno de los datasets más importantes del proyecto son los tweets recogidos entre los meses de diciembre de 2015 y enero de 2016. Los mensajes recogidos contenían la palabra "Dilma" en el mensaje. Todos los twittees fueron recogidos con la API de Streaming del Twitter. Es muy importante entender que lo que se publica en la red social Twitter no se puede manipular y representa la opinión de la persona o entidad que publica el mensaje. Por esto se puede decir que hacer el proceso de minería de datos con los datos del Twitter puede ser muy eficiente y verídico. En 3 de diciembre de 2015 se aceptó la petición de apertura del proceso del impechment del presidente de Brasil, Dilma Rousseff. La petición fue aceptada por el presidente de la Cámara de los Diputados, el diputado Sr. Eduardo Cunha (PMDBRJ), y de este modo se creó una expectativa sobre el sentimiento de la población y el futuro de Brasil. También se ha recogido datos de las búsquedas en Google referentes a la palabra Dilma; basado en estos datos, el objetivo es llegar a un análisis global de sentimiento (no solo basado en los twittees recogidos). Utilizando apenas dos fuentes (Twitter y búsquedas de Google) han sido extraídos muchísimos datos, pero hay muchas otras fuentes donde es posible obtener informaciones con respecto de las opiniones de las personas acerca de un tema en particular. Así, una herramienta que pueda recoger, extraer y almacenar tantos datos e ilustrar las informaciones de una manera eficaz que ayude y soporte una toma de decisión, contribuye para la gestión de crisis.
Business intelligence em sistemas de apoio à gestão de frotas: Análise de Tecnologias e metodologias
Resumo:
O objecto de estudo desta tese de mestrado surgiu da necessidade de dar resposta a uma proposta para uma solução de business intelligence a pedido de um cliente da empresa onde até à data me encontro a desempenhar funções de analista programador júnior. O projecto consistiu na realização de um sistema de monitorização de eventos e análise de operações, portanto um sistema integrado de gestão de frotas com módulo de business intelligence. Durante o decurso deste projecto foi necessário analisar metodologias de desenvolvimento, aprender novas linguagens, ferramentas, como C#, JasperReport, visual studio, Microsoft SQL Server entre outros. ABSTRACT: Business Intelligence applied to fleet management systems - Technologies and Methodologies Analysis. The object of study of this master's thesis was the necessity of responding to a proposal for a business intelligence solution at the request of a client company where so far I find the duties of junior programmer. The project consisted of a system event monitoring and analysis of operations, so an integrated fleet management with integrated business intelligence. During the course of this project was necessary to analyze development methodologies, learn new languages, tools such as C #, JasperReports, visual studio, Microsoft Sql Server and others.
Resumo:
Summary: More than ever before contemporary societies are characterised by the huge amounts of data being transferred. Authorities, companies, academia and other stakeholders refer to Big Data when discussing the importance of large and complex datasets and developing possible solutions for their use. Big Data promises to be the next frontier of innovation for institutions and individuals, yet it also offers possibilities to predict and influence human behaviour with ever-greater precision
Resumo:
Acoustic sensing is a promising approach to scaling faunal biodiversity monitoring. Scaling the analysis of audio collected by acoustic sensors is a big data problem. Standard approaches for dealing with big acoustic data include automated recognition and crowd based analysis. Automatic methods are fast at processing but hard to rigorously design, whilst manual methods are accurate but slow at processing. In particular, manual methods of acoustic data analysis are constrained by a 1:1 time relationship between the data and its analysts. This constraint is the inherent need to listen to the audio data. This paper demonstrates how the efficiency of crowd sourced sound analysis can be increased by an order of magnitude through the visual inspection of audio visualized as spectrograms. Experimental data suggests that an analysis speedup of 12× is obtainable for suitable types of acoustic analysis, given that only spectrograms are shown.