891 resultados para Big data analytics
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Tiedon hyödyntäminen on yhä merkittävämmässä osassa markkinoinnin perustehtävien toteuttamisessa. Tiedon avulla asiakkaiden tarpeita saadaan tunnistettua paremmin ja niihin voidaan vastata tehokkaammin. Lisäksi erityisesti viimevuosien teknologinen kehitys on parantanut tiedon hyödyntämismahdollisuuksia markkinoinnissa merkittävästi. Tämä tutkielma käsittelee uudenlaista tiedon hyödyntämistä markkinointiviestinnässä. Työssä tutkitaan tiedon kehityksen uutta aikakautta, big dataa, joka vaikuttaa erityisesti markkinointiviestinnän kohdentamisen kehittymiseen. Kohdentamisen kanavana tutkielmassa tarkastellaan mobiiliverkkopankkia, mistä syystä markkinointiviestinnän tutkiminen on rajattu työssä käsittämään lähinnä asiakaspalvelun sekä mainonnan. Tutkielman tarkoituksena on vastata kysymykseen: Mitkä ovat big datan tuomat mahdollisuudet ja haasteet markkinointiviestinnän kohdentamisessa mobiiliverkkopankissa? Vastaus tähän tutkimuskysymykseen muodostetaan kahden osakysymyksen avulla: Mitä mahdollisuuksia ja haasteita big data tuo markkinointiviestinnän kohdentamiseen? Millainen markkinointiviestinnän kohdentamisen kanava on mobiiliverkkopankki? Tutkimuskysymykseen vastattiin sekä tutkielman teoriaosassa toteutetun teoreettis-käsitteellisen aikaisemman tutkimuksen läpikäynnin että erillisen empiirisen tapaustutkimuksen avulla. Laadullinen tapaustutkimus suoritettiin teemahaastatteluina S-Pankin mobiiliverkkopankkiin, Smobiiliin, liittyen. Teemahaastatteluissa haastateltiin seitsemää asiantuntijaa sekä kahta S-mobiilin käyttäjää. Tutkielman teoriaosassa tuli ilmi, että big datan avulla kuluttaja on mahdollista tuntea kokonaisuudessaan paremmin, mikä parantaa perinteisiä sekä tarjoaa myös täysin uusia markkinointiviestinnän kohdentamisen keinoja. Näiden kautta on mahdollista vaikuttaa yrityksen kilpailuetuun. Teoriaosuudessa todettiin big datan tuomien haasteiden liittyvän ilmiön uutuuteen ja tuntemattomuuteen, tiedonhallintaan sekä yritysten ulkopuolelta tuleviin haasteisiin koskien yksityisyydensuojaa, kuluttajien mielipiteitä sekä erilaisia määrättyjä rajoitteita. Tapaustutkimuksen tulokset erosivat näistä löydöksistä ainoastaan haasteiden tärkeyden painotuksissa: suurimpana tiedonhallintaan liittyvänä haasteena empiirisessä tutkimuksessa tuli esiin teknologioiden tarve tutkielman teoriaosuudessa ilmenneen asiantuntijuuden tarpeen sijaan. Lisäksi ulkoisia haasteita ei koettu merkittävinä haasteina empiirisessä tutkimuksessa. Tapaustutkimuksen tulokset tukivat tutkielman teoriaosuudessa muodostettua kuvaa mobiiliverkkopankista markkinointiviestinnän kohdentamisen kanavana: se on henkilökohtainen pankkisovellus, jonka avulla tarkasti tunnettu asiakas voidaan tavoittaa suojatussa ympäristössä tehokkaasti, ajasta ja paikasta riippumatta. Mobiiliverkkopankkia käytetään oletettavasti päämääräkeskeisesti, mutta mahdollisesti myös osittain viihdykkeellisesti sekä ajankulutustarkoituksiin. Lisäksi sovelluksen sisällä on mahdollista saada asiakkaan jakamaton huomio, jota älypuhelimen erilaiset käyttötilanteet voivat kuitenkin käytännössä häiritä. Mobiiliverkkopankin kautta toteutetun markkinointiviestinnän kohdentamisen hyväksyttävyyteen voitaneen vaikuttaa luvan pyytämisen, pull-tyyppisen mainonnan, viestinnän hyödyllisyyden sekä viestivän yrityksen brändin luotettavuuden kautta. Tutkielman aihepiiri on vielä hyvin tuore ja muuttuva, mistä syystä siihen liittyvät ilmiöt ja termit ovat osittain vakiintumattomia sekä hankalasti hahmotettavissa. Tämä tuo esiin tarpeellisia jatkotutkimusmahdollisuuksia liittyen esimerkiksi big data -termin käsiteanalyyttiseen tutkimiseen. Jatkotutkimusta olisi hyödyllistä suorittaa myös koskien big datan ja mobiiliverkkopankin avulla toteutetun kohdentamisen
Resumo:
Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.
Resumo:
Cet essai est présenté en tant que mémoire de maîtrise dans le cadre du programme de droit des technologies de l’information. Ce mémoire traite de différents modèles d’affaires qui ont pour caractéristique commune de commercialiser les données dans le contexte des technologies de l’information. Les pratiques commerciales observées sont peu connues et l’un des objectifs est d’informer le lecteur quant au fonctionnement de ces pratiques. Dans le but de bien situer les enjeux, cet essai discutera d’abord des concepts théoriques de vie privée et de protection des renseignements personnels. Une fois ce survol tracé, les pratiques de « data brokerage », de « cloud computing » et des solutions « analytics » seront décortiquées. Au cours de cette description, les enjeux juridiques soulevés par chaque aspect de la pratique en question seront étudiés. Enfin, le dernier chapitre de cet essai sera réservé à deux enjeux, soit le rôle du consentement et la sécurité des données, qui ne relèvent pas d’une pratique commerciale spécifique, mais qui sont avant tout des conséquences directes de l’évolution des technologies de l’information.
Resumo:
Cet essai est présenté en tant que mémoire de maîtrise dans le cadre du programme de droit des technologies de l’information. Ce mémoire traite de différents modèles d’affaires qui ont pour caractéristique commune de commercialiser les données dans le contexte des technologies de l’information. Les pratiques commerciales observées sont peu connues et l’un des objectifs est d’informer le lecteur quant au fonctionnement de ces pratiques. Dans le but de bien situer les enjeux, cet essai discutera d’abord des concepts théoriques de vie privée et de protection des renseignements personnels. Une fois ce survol tracé, les pratiques de « data brokerage », de « cloud computing » et des solutions « analytics » seront décortiquées. Au cours de cette description, les enjeux juridiques soulevés par chaque aspect de la pratique en question seront étudiés. Enfin, le dernier chapitre de cet essai sera réservé à deux enjeux, soit le rôle du consentement et la sécurité des données, qui ne relèvent pas d’une pratique commerciale spécifique, mais qui sont avant tout des conséquences directes de l’évolution des technologies de l’information.
Resumo:
Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.
Resumo:
The thesis is the result of work conducted during a period of six months at the Strategy department of Automobili Lamborghini S.p.A. in Sant'Agata Bolognese (BO) and concerns the study and analysis of Big Data relating to Lamborghini's connected cars. The Big Data is a project of Connected Car Project House, that is an inter-departmental team which works toward the definition of the Lamborghini corporate connectivity strategy and its implementation in the product portfolio. The Data of the connected cars is one of the hottest topics right now in the automotive industry; in fact, all the largest automotive companies are investi,ng a lot in this direction, in order to derive the greatest advantages both from a purely economic point of view, because from these data you can understand a lot the behaviors and habits of each driver, and from a technological point of view because it will increasingly promote the development of 5G that will be an important enabler for the future of connectivity. The main purpose of the work by Lamborghini prospective is to analyze the data of the connected cars, in particular a data-set referred to connected Huracans that had been already placed on the market, and, starting from that point, derive valuable Key Performance Indicators (KPIs) on which the company could partly base the decisions to be made in the near future. The key result that we have obtained at the end of this period was the creation of a Dashboard, in which is possible to visualize many parameters and indicators both related to driving habits and the use of the vehicle itself, which has brought great insights on the huge potential and value that is present behind the study of these data. The final Demo of the project has received great interest, not only from the whole strategy department but also from all the other business areas of Lamborghini, making mostly a great awareness that this will be the road to follow in the coming years.
Resumo:
Una gestione, un’analisi e un’interpretazione efficienti dei big data possono cambiare il modello lavorativo, modificare i risultati, aumentare le produzioni, e possono aprire nuove strade per l’assistenza sanitaria moderna. L'obiettivo di questo studio è incentrato sulla costruzione di una dashboard interattiva di un nuovo modello e nuove prestazioni nell’ambito della Sanità territoriale. Lo scopo è quello di fornire al cliente una piattaforma di Data Visualization che mostra risultati utili relativi ai dati sanitari in modo da fornire agli utilizzatori sia informazioni descrittive che statistiche sulla attuale gestione delle cure e delle terapie somministrate. Si propone uno strumento che consente la navigazione dei dati analizzando l’andamento di un set di indicatori di fine vita calcolati a partire da pazienti oncologici della Regione Emilia Romagna in un arco temporale che va dal 2010 ad oggi.
Resumo:
Nel panorama aziendale odierno, risulta essere di fondamentale importanza la capacità, da parte di un’azienda o di una società di servizi, di orientare in modo programmatico la propria innovazione in modo tale da poter essere competitivi sul mercato. In molti casi, questo e significa investire una cospicua somma di denaro in progetti che andranno a migliorare aspetti essenziali del prodotto o del servizio e che avranno un importante impatto sulla trasformazione digitale dell’azienda. Lo studio che viene proposto riguarda in particolar modo due approcci che sono tipicamente in antitesi tra loro proprio per il fatto che si basano su due tipologie di dati differenti, i Big Data e i Thick Data. I due approcci sono rispettivamente il Data Science e il Design Thinking. Nel corso dei seguenti capitoli, dopo aver definito gli approcci di Design Thinking e Data Science, verrà definito il concetto di blending e la problematica che ruota attorno all’intersezione dei due metodi di innovazione. Per mettere in evidenza i diversi aspetti che riguardano la tematica, verranno riportati anche casi di aziende che hanno integrato i due approcci nei loro processi di innovazione, ottenendo importanti risultati. In particolar modo verrà riportato il lavoro di ricerca svolto dall’autore riguardo l'esame, la classificazione e l'analisi della letteratura esistente all'intersezione dell'innovazione guidata dai dati e dal pensiero progettuale. Infine viene riportato un caso aziendale che è stato condotto presso la realtà ospedaliero-sanitaria di Parma in cui, a fronte di una problematica relativa al rapporto tra clinici dell’ospedale e clinici del territorio, si è progettato un sistema innovativo attraverso l’utilizzo del Design Thinking. Inoltre, si cercherà di sviluppare un’analisi critica di tipo “what-if” al fine di elaborare un possibile scenario di integrazione di metodi o tecniche provenienti anche dal mondo del Data Science e applicarlo al caso studio in oggetto.
Resumo:
A classical application of biosignal analysis has been the psychophysiological detection of deception, also known as the polygraph test, which is currently a part of standard practices of law enforcement agencies and several other institutions worldwide. Although its validity is far from gathering consensus, the underlying psychophysiological principles are still an interesting add-on for more informal applications. In this paper we present an experimental off-the-person hardware setup, propose a set of feature extraction criteria and provide a comparison of two classification approaches, targeting the detection of deception in the context of a role-playing interactive multimedia environment. Our work is primarily targeted at recreational use in the context of a science exhibition, where the main goal is to present basic concepts related with knowledge discovery, biosignal analysis and psychophysiology in an educational way, using techniques that are simple enough to be understood by children of different ages. Nonetheless, this setting will also allow us to build a significant data corpus, annotated with ground-truth information, and collected with non-intrusive sensors, enabling more advanced research on the topic. Experimental results have shown interesting findings and provided useful guidelines for future work. Pattern Recognition
Resumo:
One of the main problems of hyperspectral data analysis is the presence of mixed pixels due to the low spatial resolution of such images. Linear spectral unmixing aims at inferring pure spectral signatures and their fractions at each pixel of the scene. The huge data volumes acquired by hyperspectral sensors put stringent requirements on processing and unmixing methods. This letter proposes an efficient implementation of the method called simplex identification via split augmented Lagrangian (SISAL) which exploits the graphics processing unit (GPU) architecture at low level using Compute Unified Device Architecture. SISAL aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The proposed implementation is performed in a pixel-by-pixel fashion using coalesced accesses to memory and exploiting shared memory to store temporary data. Furthermore, the kernels have been optimized to minimize the threads divergence, therefore achieving high GPU occupancy. The experimental results obtained for the simulated and real hyperspectral data sets reveal speedups up to 49 times, which demonstrates that the GPU implementation can significantly accelerate the method's execution over big data sets while maintaining the methods accuracy.
Resumo:
Na atualidade, existe uma quantidade de dados criados diariamente que ultrapassam em muito as mais otimistas espectativas estabelecidas na década anterior. Estes dados têm origens bastante diversas e apresentam-se sobre várias formas. Este novo conceito que dá pelo nome de Big Data está a colocar novos e rebuscados desafios ao seu armazenamento, tratamento e manipulação. Os tradicionais sistemas de armazenamento não se apresentam como a solução indicada para este problema. Estes desafios são alguns dos mais analisados e dissertados temas informáticos do momento. Várias tecnologias têm emergido com esta nova era, das quais se salienta um novo paradigma de armazenamento, o movimento NoSQL. Esta nova filosofia de armazenamento visa responder às necessidades de armazenamento e processamento destes volumosos e heterogéneos dados. Os armazéns de dados são um dos componentes mais importantes do âmbito Business Intelligence e são, maioritariamente, utilizados como uma ferramenta de apoio aos processos de tomada decisão, levados a cabo no dia-a-dia de uma organização. A sua componente histórica implica que grandes volumes de dados sejam armazenados, tratados e analisados tendo por base os seus repositórios. Algumas organizações começam a ter problemas para gerir e armazenar estes grandes volumes de informação. Esse facto deve-se, em grande parte, à estrutura de armazenamento que lhes serve de base. Os sistemas de gestão de bases de dados relacionais são, há algumas décadas, considerados como o método primordial de armazenamento de informação num armazém de dados. De facto, estes sistemas começam a não se mostrar capazes de armazenar e gerir os dados operacionais das organizações, sendo consequentemente cada vez menos recomendada a sua utilização em armazéns de dados. É intrinsecamente interessante o pensamento de que as bases de dados relacionais começam a perder a luta contra o volume de dados, numa altura em que um novo paradigma de armazenamento surge, exatamente com o intuito de dominar o grande volume inerente aos dados Big Data. Ainda é mais interessante o pensamento de que, possivelmente, estes novos sistemas NoSQL podem trazer vantagens para o mundo dos armazéns de dados. Assim, neste trabalho de mestrado, irá ser estudada a viabilidade e as implicações da adoção de bases de dados NoSQL, no contexto de armazéns de dados, em comparação com a abordagem tradicional, implementada sobre sistemas relacionais. Para alcançar esta tarefa, vários estudos foram operados tendo por base o sistema relacional SQL Server 2014 e os sistemas NoSQL, MongoDB e Cassandra. Várias etapas do processo de desenho e implementação de um armazém de dados foram comparadas entre os três sistemas, sendo que três armazéns de dados distintos foram criados tendo por base cada um dos sistemas. Toda a investigação realizada neste trabalho culmina no confronto da performance de consultas, realizadas nos três sistemas.
Resumo:
O desenvolvimento de aplicações para dispositivos móveis já não é uma área recente, contudo continua a crescer a um ritmo veloz. É notório o avanço tecnológico dos últimos anos e a crescente popularidade destes dispositivos. Este avanço deve-se não só à grande evolução no que diz respeito às características destes dispositivos, mas também à possibilidade de criar aplicações inovadoras, práticas e passíveis de solucionar os problemas dos utilizadores em geral. Nesse sentido, as necessidades do quotidiano obrigam à implementação de soluções que satisfaçam os utilizadores, e nos dias de hoje, essa satisfação muitas vezes passa pelos dispositivos móveis, que já tem um papel fundamental na vida das pessoas. Atendendo ao aumento do número de raptos de crianças e à insegurança que se verifica nos dias de hoje, as quais dificultam a tarefa de todos os pais/cuidadores que procuraram manter as suas crianças a salvo, é relevante criar uma nova ferramenta capaz de os auxiliar nesta árdua tarefa. A partir desta realidade, e com vista a cumprir os aspetos acima mencionados, surge assim esta dissertação de mestrado. Esta aborda o estudo e implementação efetuados no sentido de desenvolver um sistema de monitorização de crianças. Assim, o objetivo deste projeto passa por desenvolver uma aplicação nativa para Android e um back-end, utilizando um servidor de base de dados NoSQL para o armazenamento da informação, aplicando os conceitos estudados e as tecnologias existentes. A solução tem como principais premissas: ser o mais user-friendly possível, a otimização, a escalabilidade para outras situações (outros tipos de monitorizações) e a aplicação das mais recentes tecnologias. Assim sendo, um dos estudos mais aprofundados nesta dissertação de mestrado está relacionado com as bases de dados NoSQL, dada a sua importância no projeto.
Resumo:
Sectorization means dividing a whole into parts (sectors), a procedure that occurs in many contexts and applications, usually to achieve some goal or to facilitate an activity. The objective may be a better organization or simplification of a large problem into smaller sub-problems. Examples of applications are political districting and sales territory division. When designing/comparing sectors some characteristics such as contiguity, equilibrium and compactness are usually considered. This paper presents and describes new generic measures and proposes a new measure, desirability, connected with the idea of preference.
Resumo:
Context-aware recommendation of personalised tourism resources is possible because of personal mobile devices and powerful data filtering algorithms. The devices contribute with computing capabilities, on board sensors, ubiquitous Internet access and continuous user monitoring, whereas the filtering algorithms provide the ability to match the profile (interests and the context) of the tourist against a large knowledge bases of tourism resources. While, in terms of technology, personal mobile devices can gather user-related information, including the user context and access multiple data sources, the creation and maintenance of an updated knowledge base of tourism-related resources requires a collaborative approach due to the heterogeneity, volume and dynamic nature of the resources. The current PhD thesis aims to contribute to the solution of this problem by adopting a Crowdsourcing approach for the collaborative maintenance of the knowledge base of resources, Trust and Reputation for the validation of uploaded resources as well as publishers, Big Data for user profiling and context-aware filtering algorithms for the personalised recommendation of tourism resources.