917 resultados para big data.
This thesis investigates the legal, ethical, technical, and psychological issues of general data processing and artificial intelligence practices and the explainability of AI systems. It consists of two main parts. In the initial section, we provide a comprehensive overview of the big data processing ecosystem and the main challenges we face today. We then evaluate the GDPR’s data privacy framework in the European Union. The Trustworthy AI Framework proposed by the EU’s High-Level Expert Group on AI (AI HLEG) is examined in detail. The ethical principles for the foundation and realization of Trustworthy AI are analyzed along with the assessment list prepared by the AI HLEG. Then, we list the main big data challenges the European researchers and institutions identified and provide a literature review on the technical and organizational measures to address these challenges. A quantitative analysis is conducted on the identified big data challenges and the measures to address them, which leads to practical recommendations for better data processing and AI practices in the EU. In the subsequent part, we concentrate on the explainability of AI systems. We clarify the terminology and list the goals aimed at the explainability of AI systems. We identify the reasons for the explainability-accuracy trade-off and how we can address it. We conduct a comparative cognitive analysis between human reasoning and machine-generated explanations with the aim of understanding how explainable AI can contribute to human reasoning. We then focus on the technical and legal responses to remedy the explainability problem. In this part, GDPR’s right to explanation framework and safeguards are analyzed in-depth with their contribution to the realization of Trustworthy AI. Then, we analyze the explanation techniques applicable at different stages of machine learning and propose several recommendations in chronological order to develop GDPR-compliant and Trustworthy XAI systems.
Big data and AI are paving the way to promising scenarios in clinical practice and research. However, the use of such technologies might clash with GDPR requirements. Today, two forces are driving the EU policies in this domain. The first is the necessity to protect individuals’ safety and fundamental rights. The second is to incentivize the deployment of innovative technologies. The first objective is pursued by legislative acts such as the GDPR or the AIA, the second is supported by the new data strategy recently launched by the European Commission. Against this background, the thesis analyses the issue of GDPR compliance when big data and AI systems are implemented in the health domain. The thesis focuses on the use of co-regulatory tools for compliance with the GDPR. This work argues that there are two level of co-regulation in the EU legal system. The first, more general, is the approach pursued by the EU legislator when shaping legislative measures that deal with fast-evolving technologies. The GDPR can be deemed a co-regulatory solution since it mainly introduces general requirements, which implementation shall then be interpretated by the addressee of the law following a risk-based approach. This approach, although useful is costly and sometimes burdensome for organisations. The second co-regulatory level is represented by specific co-regulatory tools, such as code of conduct and certification mechanisms. These tools are meant to guide and support the interpretation effort of the addressee of the law. The thesis argues that the lack of co-regulatory tools which are supposed to implement data protection law in specific situations could be an obstacle to the deployment of innovative solutions in complex scenario such as the health ecosystem. The thesis advances hypothesis on theoretical level about the reasons of such a lack of co-regulatory solutions.
L’argomento di questa tesi nasce dall’idea di unire due temi che stanno assumendo sempre più importanza nei nostri giorni, ovvero l’economia circolare e i big data, e ha come obiettivo quello di fornire dei punti di collegamento tra questi due. In un mondo tecnologico come quello di oggi, che sta trasformando tutto quello che abbiamo tra le nostre mani in digitale, si stanno svolgendo sempre più studi per capire come la sostenibilità possa essere supportata dalle tecnologie emergenti. L’economia circolare costituisce un nuovo paradigma economico in grado di sostituirsi a modelli di crescita incentrati su una visione lineare, puntando ad una riduzione degli sprechi e ad un radicale ripensamento nella concezione dei prodotti e nel loro uso nel tempo. In questa transizione verso un’economia circolare può essere utile considerare di assumere le nuove tecnologie emergenti per semplificare i processi di produzione e attuare politiche più sostenibili, che stanno diventando sempre più apprezzate anche dai consumatori. Il tutto verrà sostenuto dall’utilizzo sempre più significativo dei big data, ovvero di grandi dati ricchi di informazioni che permettono, tramite un’attenta analisi, di sviluppare piani di produzione che seguono il paradigma circolare: questo viene attuato grazie ai nuovi sistemi digitali sempre più innovativi e alle figure specializzate che acquisiscono sempre più conoscenze in questo campo.
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
We are living in the era of Big Data. A time which is characterized by the continuous creation of vast amounts of data, originated from different sources, and with different formats. First, with the rise of the social networks and, more recently, with the advent of the Internet of Things (IoT), in which everyone and (eventually) everything is linked to the Internet, data with enormous potential for organizations is being continuously generated. In order to be more competitive, organizations want to access and explore all the richness that is present in those data. Indeed, Big Data is only as valuable as the insights organizations gather from it to make better decisions, which is the main goal of Business Intelligence. In this paper we describe an experiment in which data obtained from a NoSQL data source (database technology explicitly developed to deal with the specificities of Big Data) is used to feed a Business Intelligence solution.
This work is dedicated to comparison of open source as well as proprietary transport protocols for highspeed data transmission via IP networks. The contemporary common TCP needs significant improvement since it was developed as general-purpose transport protocol and firstly introduced four decades ago. In nowadays networks, TCP fits not all communication needs that society has. Caused of it another transport protocols have been developed and successfully used for e.g. Big Data movement. In scope of this research the following protocols have been investigated for its efficiency on 10Gbps links: UDT, RBUDP, MTP and RWTP. The protocols were tested under different impairments such as Round Trip Time up to 400 ms and packet losses up to 2%. Investigated parameters are the data rate under different conditions of the network, the CPU load by sender andreceiver during the experiments, size of feedback data, CPU usage per Gbps and the amount of feedback data per GiByte of effectively transmitted data. The best performance and fair resources consumption was observed by RWTP. From the opensource projects, the best behavior is showed by RBUDP.
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
This master’s thesis has examined how Entrepreneurial, Customer and Knowledge Management Orientations are needed in the use of Big data technology by small retail firms in their Customer Knowledge Management. A vision of the ability of small retailers to move to the Big data era is based on empirical evidence of owner-managers’ attitudes and the firms’ processes. Abductive content analysis was used as a research strategy and the qualitative data was collected through theme interviews of owner-managers of 11 small-size retail firms. The biggest obstacles to the use of Big data by small retail firms are: a lack of information about the new technology; a lack of Knowledge Management Orientation; and, a lack of proactive dimension in Entrepreneurial and Customer Orientations. A strong reactive customer-led orientation, and the ability of the owner-manager to system thinking will support Customer Knowledge Management development. The low stage of technology-use is preventing utilization of customer information. Co-operation between firms or with educational organizations may significantly enhance the use of Big data –technology by small retail firms.
Big datalle on povattu satojen miljardien dollarien hyödyntämispotentiaalia. Big data kuvaa lukuista eri lähteistä peräisin olevia valtavia ja nopeasti kasvavia datamassoja. Kandidaatintyön tavoitteena on tutkia, kuinka big dataa voidaan hyödyntää toimitusketjun hallinnassa sekä toimitusketjun eri osa-alueilla. Työ on tehty kirjallisuuskatsauksena pohjautuen big datan ja toimitusketjun hallinnan kirjallisuuteen sekä erityisesti näitä yhdistäviin tieteellisiin artikkeleihin. Big dataa hyödyntämällä toimitusketjua saadaan tehostettua, tuottoja maksimoitua sekä kysyntää ja tarjontaa yhteensovitettua paremmin. Big dataa hyödyntämällä myös riskien hallinta, päätöksenteko, muutosvalmius ja sidosryhmäsuhteet paranevat. Big datan avulla asiakkaasta saadaan luotua kokonaisnäkymä, jonka avulla markkinointia, segmentointia, hinnoittelua ja tuotteen sijoittelua voidaan optimoida. Big datan avulla myös hankintaa, tuotantoa ja kunnossapitoa pystytään parantamaan sekä kuljetuksia ja varastoja seuraamaan tehokkaammin. Big datan hyödyntäminen on haastavaa ja siihen liittyy teknologisia, organisatorisia ja prosesseihin liittyviä haasteita. Yhtenä ratkaisuna on big data - analytiikan käyttöönoton ja käytön ulkoistaminen, mutta se sisältää omat riskinsä.
Tämän kandidaatintutkielman tarkoituksena oli selvittää minkälaisia liiketoiminnallisia mahdollisuuksia ja haasteita Big Dataan ja sen ominaispiirteisiin liittyy, ja miten Big Data määritellään nykyaikaisesti ja ajankohtaisesti. Tutkimusongelmaa lähestyttiin narratiivisen kirjallisuuskatsauksen keinoin. Toisin sanoen tutkielma on hajanaisen tiedon avulla koostettu yhtenäinen katsaus nykytilanteeseen. Lähdeaineisto koostuu pääosin tieteellisistä artikkeleista, mutta käytössä oli myös oppikirjamateriaalia, konferenssijulkaisuja ja uutisartikkeleja. Tutkimuksessa käytetyt akateemisen kirjallisuuden lähteet sisälsivät keskenään paljon samankaltaisia näkemyksiä tutkimusaihetta kohtaan. Niiden perusteella muodostettiin kaksi taulukkoa havaituista mahdollisuuksista ja haasteista, ja taulukoiden rivit nimettiin niitä kuvaavien ominaispiirteiden mukaan. Tutkimuksessa liiketoiminnalliset mahdollisuudet ja haasteet jaettiin viiteen pääkategoriaan ja neljään alakategoriaan. Tutkimus toteutettiin liiketoiminnan näkökulmasta, joten siinä sivuutettiin monenlaisia Big Datan teknisiä aspekteja. Tutkielman luonne on poikkitieteellinen, ja sen avulla pyritään havainnoimaan tämän hetken yhtä uusinta tietojenkäsittelykäsittelytieteiden termiä liiketoiminnallisessa kontekstissa. Tutkielmassa Big Dataan liittyvillä ominaispiirteillä todettiin olevan mahdollisuuksia, jotka voitiin jaotella korrelaatioiden havaitsemisen perusteella markkinoiden tarkemman segmentoinnin mahdollisuuksiin ja päätöksenteon tukena toimimiseen. Reaaliaikaisen seurannan mahdollisuudet perustuvat Big Datan nopeuteen ja kokoon, eli sen jatkuvaan kasvuun. Ominaispiirteisiin liittyvät haasteet voidaan jakaa viiteen kategoriaan, joista osa liittyy toimintaympäristöön ja osa organisaation sisäiseen toimintaan.
Kestävää kehitystä on tutkittu jo vuosikymmeniä, kun taas kaikkia big datan mahdollisuuksia ei tunneta. Kestävää kehitystä ja big dataa ei ole vielä tutkittu yhdessä laajemmin, mutta voidaan jo todeta, että näiden kahden tekijän välillä on yhteyksiä. Työ käsittelee big datan hyödyntämistä ja sen tarjoamien mahdollisuuksien vaikutuksia kestävässä liiketoiminnassa. Työn alussa määritellään big data ja kestävän kehityksen osa-alueet, joiden pohjalta tutkimusosuudessa syvennytään tarkastelemaan big datan hyötyjä ja sen soveltamisen keinoja kestävän liiketoiminnan tukena. Työn tavoitteena on selvittää, kuinka big dataa voi hyödyntää yrityksen kestävän liiketoiminnan eri osa-alueilla. Työssä kestävä liiketoiminta on jaettu liiketoiminnan johtamiseen ja käytännön operatiiviseen toimintaan. Liiketoiminnan johtaminen sisältää yrityksen strategian sekä innovaatiotoiminnan. Kestävän liiketoiminnan operatiivisissa toiminnoissa keskitytään valmistukseen, tuotteen elinkaaren hallintaan, toimitusketjun hallintaan sekä tiedonhallintaan. Työ tarjoaa keinoja ja ratkaisuja, joilla yritys voi kehittää kestävää liiketoimintaansa. Tutkimusosuuden pohjalta voidaan todeta, että big datasta ja sen harkitusta hyödyntämisestä on hyötyä kestävässä liiketoiminnassa.
This is a research discussion about the Hampshire Hub - see http://protohub.net/. The aim is to find out more about the project, and discuss future collaboration and sharing of ideas. Mark Braggins (Hampshire Hub Partnership) will introduce the Hampshire Hub programme, setting out its main objectives, work done to-date, next steps including the Hampshire data store (which will use the PublishMyData linked data platform), and opportunities for University of Southampton to engage with the programme , including the forthcoming Hampshire Hackathons Bill Roberts (Swirrl) will give an overview of the PublishMyData platform, and how it will help deliver the objectives of the Hampshire Hub. He will detail some of the new functionality being added to the platform Steve Peters (DCLG Open Data Communities) will focus on developing a web of data that blends and combines local and national data sources around localities, and common topics/themes. This will include observations on the potential employing emerging new, big data sources to help deliver more effective, better targeted public services. Steve will illustrate this with practical examples of DCLG’s work to publish its own data in a SPARQL end-point, so that it can be used over the web alongside related 3rd party sources. He will share examples of some of the practical challenges, particularly around querying and re-using geographic LinkedData in a federated world of SPARQL end-point.