891 resultados para Big data analytics
Degut al gran interès actual per instal·lar clústers dedicats al tractament de dades amb Hadoop, s'ha dissenyat una distribució de Linux que automatitza totes les tasques associades. Aquesta distribució permet fer el desplegament sobre un clúster i realitzar una configuració bàsica del mateix de la forma més desatesa possible.
Este proyecto de final de carrera corresponde al área de inteligencia artificial y representa un caso de uso que pretende utilizar datos reales referentes a accidentes de tráfico (datos de accidentes, muertos, heridos, etc.) y analizarlas conjuntamente con datos que puedan tener una posible relación con los accidentes como el parque de vehículos, las temperaturas de la zona de los accidentes, etc. con la finalidad de poder obtener las posibles relaciones causa-efecto.
Aquest treball de final de carrera vol donar una solució a un suposat encàrrec de la Unió Europea de construir una base de dades relacional que permeti emmagatzemar dades de l'activitat física dels ciutadans, obtingudes a partir de dispositius wearables, i dades de l'estat de salut i malalties diagnosticades, recollides pels sistemes informàtics dels diferents serveis de salut. Amb totes aquestes dades recopilades la nostra base de dades permetrà, a través d'aplicacions d'alt nivell, extreure informació útil que permeti conèixer l'estat de salut real dels ciutadans i dissenyar actuacions i campanyes que permetin la seva millora.
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
The whole research of the current Master Thesis project is related to Big Data transfer over Parallel Data Link and my main objective is to assist the Saint-Petersburg National Research University ITMO research team to accomplish this project and apply Green IT methods for the data transfer system. The goal of the team is to transfer Big Data by using parallel data links with SDN Openflow approach. My task as a team member was to compare existing data transfer applications in case to verify which results the highest data transfer speed in which occasions and explain the reasons. In the context of this thesis work a comparison between 5 different utilities was done, which including Fast Data Transfer (FDT), BBCP, BBFTP, GridFTP, and FTS3. A number of scripts where developed which consist of creating random binary data to be incompressible to have fair comparison between utilities, execute the Utilities with specified parameters, create log files, results, system parameters, and plot graphs to compare the results. Transferring such an enormous variety of data can take a long time, and hence, the necessity appears to reduce the energy consumption to make them greener. In the context of Green IT approach, our team used Cloud Computing infrastructure called OpenStack. It’s more efficient to allocated specific amount of hardware resources to test different scenarios rather than using the whole resources from our testbed. Testing our implementation with OpenStack infrastructure results that the virtual channel does not consist of any traffic and we can achieve the highest possible throughput. After receiving the final results we are in place to identify which utilities produce faster data transfer in different scenarios with specific TCP parameters and we can use them in real network data links.
Teknologian nopea kehitys ja toimintaympäristön muutokset kannustavat organisaatioita omaksumaan innovatiivisia ratkaisuja, pysyäkseen kilpailukykyisinä ja kehityksessä mukana. Perinteisen kustannussäästöjen ja toiminnan tehostamisen tavoittelun onkin syrjäyttänyt halu vahvistaa ja kasvattaa kilpailuetua. Näistä kannusteista huolimatta tutkimukset osoittavat, että suurin osa Suomessa toimivista organisaatioista ei johda eri teknologioihin pohjautuvia innovaatioita kovinkaan kokonaisvaltaisesti, vaan IT:n käyttö on enemmänkin reaktiivista. Tutkimuksessa tutkimmekin, miten Suomessa johdetaan IT-innovaatioiden, ja näistä erityisesti big datan, käyttöönottoa sekä madollisia poikkeavuuksia näiden välillä. Tutkimus on tehty kvalitatiivisin tutkimusmenetelmin, hyödyntämällä fokusryhmätutkimusmetodia empiirisen aineiston keruussa. Tutkimus esittää IT-innovaatioiden käyttöönoton johtamisen prosessina, joka lähtee liikkeelle viestintäkanavista saatavasta ärsykkeesta. Tietämystä vahvistetaan suostutteluvaiheessa, joka laukaisee arviointivaiheen. Prosessi etenee lopulta IT-innovaatioiden hyötyjen ja kustannusten arvioinnin myötä käyttöönotosta tehtävään päätökseen. Käyttöönottoon vaikuttavat myös erilaiset taustatekijät, jotka voivat edistää tai estää IT-innovaation omaksumista. Päätöksentekovaiheessa organisaation tietohallintojohdolla ja liiketoimintajohdolla on omat roolinsa, jotka muotoutuvat organisaation työnjaon ja investoinnin suuruuden mukaan. Tutkimuskohteiden käyttöönoton johtamistavoista kertovat, miten organisaatioiden käyttöönottoprosessin ja päätöksentekoprosessin vaiheet etenevät, mitkä taustatekijät vaikuttavat käyttöönottopäätökseen ja millaisia hyötyjä tavoitellaan. Big datan johtamistapojen selvittämiseen vaikuttaa myös se, onko organisaatiolla strategiaa tai toimintasuunnitelmaa sen hyödyntämiseksi. Tutkielman johtopäätöksenä toteamme, että yleistä IT-innovaatioiden käyttöönottoa johdetaan kolmella tavalla: strategisesti, reaktiivisesti ja muutoksen pakottamana. Johtamistapojen erot tulevat esiin investoinnin suuruuden, käyttöönottoon johtavan päätöksenteon sekä käyttöönoton taustalla vaikuttavien syiden kautta. Yleisesti IT-innovaatioita näytettäisiin johdettavan melko samassa suhteessa strategisesti, reaktiivisesti ja muutoksen pakottamana. Big datan käyttöönoton johtamisessa havaitsimme piirteitä vain strategisesta ja reaktiivisesta johtamisesta. Yleinen IT-innovaatioiden ja big datan käyttöönoton johtaminen eroavat toisistaan sen suhteen, että big dataa näytettäisiin johdettavan vielä vähemmän strategisesti ja sen päätöksentekovastuut ovat hajanaisempia. Yleisesti voidaan sanoa, että tutkimuskohteilla esiintyi heikosti selkeitä ja kokonaisvaltaisia strategioita tai toimintasuunnitelmia IT-innovaatioiden käyttöönoton johtamiseksi.
In the new age of information technology, big data has grown to be the prominent phenomena. As information technology evolves, organizations have begun to adopt big data and apply it as a tool throughout their decision-making processes. Research on big data has grown in the past years however mainly from a technical stance and there is a void in business related cases. This thesis fills the gap in the research by addressing big data challenges and failure cases. The Technology-Organization-Environment framework was applied to carry out a literature review on trends in Business Intelligence and Knowledge management information system failures. A review of extant literature was carried out using a collection of leading information system journals. Academic papers and articles on big data, Business Intelligence, Decision Support Systems, and Knowledge Management systems were studied from both failure and success aspects in order to build a model for big data failure. I continue and delineate the contribution of the Information System failure literature as it is the principal dynamics behind technology-organization-environment framework. The gathered literature was then categorised and a failure model was developed from the identified critical failure points. The failure constructs were further categorized, defined, and tabulated into a contextual diagram. The developed model and table were designed to act as comprehensive starting point and as general guidance for academics, CIOs or other system stakeholders to facilitate decision-making in big data adoption process by measuring the effect of technological, organizational, and environmental variables with perceived benefits, dissatisfaction and discontinued use.
In the new age of information technology, big data has grown to be the prominent phenomena. As information technology evolves, organizations have begun to adopt big data and apply it as a tool throughout their decision-making processes. Research on big data has grown in the past years however mainly from a technical stance and there is a void in business related cases. This thesis fills the gap in the research by addressing big data challenges and failure cases. The Technology-Organization-Environment framework was applied to carry out a literature review on trends in Business Intelligence and Knowledge management information system failures. A review of extant literature was carried out using a collection of leading information system journals. Academic papers and articles on big data, Business Intelligence, Decision Support Systems, and Knowledge Management systems were studied from both failure and success aspects in order to build a model for big data failure. I continue and delineate the contribution of the Information System failure literature as it is the principal dynamics behind technology-organization-environment framework. The gathered literature was then categorised and a failure model was developed from the identified critical failure points. The failure constructs were further categorized, defined, and tabulated into a contextual diagram. The developed model and table were designed to act as comprehensive starting point and as general guidance for academics, CIOs or other system stakeholders to facilitate decision-making in big data adoption process by measuring the effect of technological, organizational, and environmental variables with perceived benefits, dissatisfaction and discontinued use.
Resources from the Singapore Summer School 2014 hosted by NUS. ws-summerschool.comp.nus.edu.sg
Abstract Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this talk we explore the trade-offs involved and the main problems that come with big data using the Web as case study: scalability, redundancy, bias, noise, spam, and privacy. Speaker Biography Ricardo Baeza-Yates Ricardo Baeza-Yates is VP of Research for Yahoo Labs leading teams in United States, Europe and Latin America since 2006 and based in Sunnyvale, California, since August 2014. During this time he has lead the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.
We are sympathetic with Bentley et al’s attempt to encompass the wisdom of crowds in a generative model, but posit that success at using Big Data will include more sensitive measurements, more and more varied sources of information, as well as build from the indirect information available through technology, from ancillary technical features to data from brain-computer interface.
JASMIN is a super-data-cluster designed to provide a high-performance high-volume data analysis environment for the UK environmental science community. Thus far JASMIN has been used primarily by the atmospheric science and earth observation communities, both to support their direct scientific workflow, and the curation of data products in the STFC Centre for Environmental Data Archival (CEDA). Initial JASMIN configuration and first experiences are reported here. Useful improvements in scientific workflow are presented. It is clear from the explosive growth in stored data and use that there was a pent up demand for a suitable big-data analysis environment. This demand is not yet satisfied, in part because JASMIN does not yet have enough compute, the storage is fully allocated, and not all software needs are met. Plans to address these constraints are introduced.
Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.