873 resultados para big data analytics


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resources from the Singapore Summer School 2014 hosted by NUS. ws-summerschool.comp.nus.edu.sg

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this talk we explore the trade-offs involved and the main problems that come with big data using the Web as case study: scalability, redundancy, bias, noise, spam, and privacy. Speaker Biography Ricardo Baeza-Yates Ricardo Baeza-Yates is VP of Research for Yahoo Labs leading teams in United States, Europe and Latin America since 2006 and based in Sunnyvale, California, since August 2014. During this time he has lead the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We are sympathetic with Bentley et al’s attempt to encompass the wisdom of crowds in a generative model, but posit that success at using Big Data will include more sensitive measurements, more and more varied sources of information, as well as build from the indirect information available through technology, from ancillary technical features to data from brain-computer interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

JASMIN is a super-data-cluster designed to provide a high-performance high-volume data analysis environment for the UK environmental science community. Thus far JASMIN has been used primarily by the atmospheric science and earth observation communities, both to support their direct scientific workflow, and the curation of data products in the STFC Centre for Environmental Data Archival (CEDA). Initial JASMIN configuration and first experiences are reported here. Useful improvements in scientific workflow are presented. It is clear from the explosive growth in stored data and use that there was a pent up demand for a suitable big-data analysis environment. This demand is not yet satisfied, in part because JASMIN does not yet have enough compute, the storage is fully allocated, and not all software needs are met. Plans to address these constraints are introduced.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The term 'big data' has recently emerged to describe a range of technological and commercial trends enabling the storage and analysis of huge amounts of customer data, such as that generated by social networks and mobile devices. Much of the commercial promise of big data is in the ability to generate valuable insights from collecting new types and volumes of data in ways that were not previously economically viable. At the same time a number of questions have been raised about the implications for individual privacy. This paper explores key perspectives underlying the emergence of big data, and considers both the opportunities and ethical challenges raised for market research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pervasive healthcare aims to deliver deinstitutionalised healthcare services to patients anytime and anywhere. Pervasive healthcare involves remote data collection through mobile devices and sensor network which the data is usually in large volume, varied formats and high frequency. The nature of big data such as volume, variety, velocity and veracity, together with its analytical capabilities com-plements the delivery of pervasive healthcare. However, there is limited research in intertwining these two domains. Most research focus mainly on the technical context of big data application in the healthcare sector. Little attention has been paid to a strategic role of big data which impacts the quality of healthcare services provision at the organisational level. Therefore, this paper delivers a conceptual view of big data architecture for pervasive healthcare via an intensive literature review to address the aforementioned research problems. This paper provides three major contributions: 1) identifies the research themes of big data and pervasive healthcare, 2) establishes the relationship between research themes, which later composes the big data architecture for pervasive healthcare, and 3) sheds a light on future research, such as semiosis and sense-making, and enables practitioners to implement big data in the pervasive healthcare through the proposed architecture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Widespread commercial use of the internet has significantly increased the volume and scope of data being collected by organisations. ‘Big data’ has emerged as a term to encapsulate both the technical and commercial aspects of this growing data collection activity. To date, much of the discussion of big data has centred upon its transformational potential for innovation and efficiency, yet there has been less reflection on its wider implications beyond commercial value creation. This paper builds upon normal accident theory (NAT) to analyse the broader ethical implications of big data. It argues that the strategies behind big data require organisational systems that leave them vulnerable to normal accidents, that is to say some form of accident or disaster that is both unanticipated and inevitable. Whilst NAT has previously focused on the consequences of physical accidents, this paper suggests a new form of system accident that we label data accidents. These have distinct, less tangible and more complex characteristics and raise significant questions over the role of individual privacy in a ‘data society’. The paper concludes by considering the ways in which the risks of such data accidents might be managed or mitigated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The size and complexity of data sets generated within ecosystem-level programmes merits their capture, curation, storage and analysis, synthesis and visualisation using Big Data approaches. This review looks at previous attempts to organise and analyse such data through the International Biological Programme and draws on the mistakes made and the lessons learned for effective Big Data approaches to current Research Councils United Kingdom (RCUK) ecosystem-level programmes, using Biodiversity and Ecosystem Service Sustainability (BESS) and Environmental Virtual Observatory Pilot (EVOp) as exemplars. The challenges raised by such data are identified, explored and suggestions are made for the two major issues of extending analyses across different spatio-temporal scales and for the effective integration of quantitative and qualitative data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the arrival of Big Data Era, properly utilizing the power of big data is becoming increasingly essential for the strength and competitiveness of businesses and organizations. We are facing grand challenges from big data from different perspectives, such as processing, communication, security, and privacy. In this talk, we discuss the big data challenges in network traffic classification and our solutions to the challenges. The significance of the research lies in the fact that each year the network traffic increase exponentially on the current Internet. Traffic classification has wide applications in network management, from security monitoring to quality of service measurements. Recent research tends to apply machine-learning techniques to flow statistical feature based classification methods. In this talk, we propose a series of novel approaches for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. We analyze the new classification approaches and their performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world traffic datasets to validate the proposed approach. The results show the traffic classification performance can be improved significantly even under the extreme difficult circumstance of very few training samples. Our work has significant impact on security applications.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for big data. These classifiers are very large, but are quite easy to generate and use. They can be so large that it makes sense to use them only for big data. They are generated automatically as a result of several iterations in applying ensemble meta classifiers. They incorporate diverse ensemble meta classifiers into several tiers simultaneously and combine them into one automatically generated iterative system so that many ensemble meta classifiers function as integral parts of other ensemble meta classifiers at higher tiers. In this paper, we carry out a comprehensive investigation of the performance of LIME classifiers for a problem concerning security of big data. Our experiments compare LIME classifiers with various base classifiers and standard ordinary ensemble meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of classifications. LIME classifiers performed better than the base classifiers and standard ensemble meta classifiers.