818 resultados para big data
Resumo:
Peer-to-peer information sharing has fundamentally changed customer decision-making process. Recent developments in information technologies have enabled digital sharing platforms to influence various granular aspects of the information sharing process. Despite the growing importance of digital information sharing, little research has examined the optimal design choices for a platform seeking to maximize returns from information sharing. My dissertation seeks to fill this gap. Specifically, I study novel interventions that can be implemented by the platform at different stages of the information sharing. In collaboration with a leading for-profit platform and a non-profit platform, I conduct three large-scale field experiments to causally identify the impact of these interventions on customers’ sharing behaviors as well as the sharing outcomes. The first essay examines whether and how a firm can enhance social contagion by simply varying the message shared by customers with their friends. Using a large randomized field experiment, I find that i) adding only information about the sender’s purchase status increases the likelihood of recipients’ purchase; ii) adding only information about referral reward increases recipients’ follow-up referrals; and iii) adding information about both the sender’s purchase as well as the referral rewards increases neither the likelihood of purchase nor follow-up referrals. I then discuss the underlying mechanisms. The second essay studies whether and how a firm can design unconditional incentive to engage customers who already reveal willingness to share. I conduct a field experiment to examine the impact of incentive design on sender’s purchase as well as further referral behavior. I find evidence that incentive structure has a significant, but interestingly opposing, impact on both outcomes. The results also provide insights about senders’ motives in sharing. The third essay examines whether and how a non-profit platform can use mobile messaging to leverage recipients’ social ties to encourage blood donation. I design a large field experiment to causally identify the impact of different types of information and incentives on donor’s self-donation and group donation behavior. My results show that non-profits can stimulate group effect and increase blood donation, but only with group reward. Such group reward works by motivating a different donor population. In summary, the findings from the three studies will offer valuable insights for platforms and social enterprises on how to engineer digital platforms to create social contagion. The rich data from randomized experiments and complementary sources (archive and survey) also allows me to test the underlying mechanism at work. In this way, my dissertation provides both managerial implication and theoretical contribution to the phenomenon of peer-to-peer information sharing.
Resumo:
In recent years the technological world has grown by incorporating billions of small sensing devices, collecting and sharing real-world information. As the number of such devices grows, it becomes increasingly difficult to manage all these new information sources. There is no uniform way to share, process and understand context information. In previous publications we discussed efficient ways to organize context information that is independent of structure and representation. However, our previous solution suffers from semantic sensitivity. In this paper we review semantic methods that can be used to minimize this issue, and propose an unsupervised semantic similarity solution that combines distributional profiles with public web services. Our solution was evaluated against Miller-Charles dataset, achieving a correlation of 0.6.
Resumo:
O livro organizado por Kira Tarapanoff nos apresenta a temática inteligência organizacional e competitiva no contexto da Web 2.0., a partir de quatro distintos enfoques: I. Web 2.0: novas oportunidades para a atividade de inteligência e Big Data; II. Novas arquiteturas informacionais; III. Desenvolvimento de estratégias por meio da Web 2.0; IV. Metodologias. O livro reúne nove capítulos elaborados por quinze autores brasileiros e um finlandês, este último traduzido para o português, alinhados ao tema principal do livro.
Resumo:
Background: Rheumatoid arthritis (RA) is a chronic inflammatory arthritis that causes significant morbidity and mortality and has no cure. Although early treatment strategies and biologic therapies such as TNFα blocking antibodies have revolutionised treatment, there still remains considerable unmet need. JAK kinase inhibitors, which target multiple inflammatory cytokines, have shown efficacy in treating RA although their exact mechanism of action remains to be determined. Stratified medicine promises to deliver the right drug to the right patient at the right time by using predictive ‘omic biomarkers discovered using bioinformatic and “Big Data” techniques. Therefore, knowledge across the realms of clinical rheumatology, applied immunology, bioinformatics and data science is required to realise this goal. Aim: To use bioinformatic tools to analyse the transcriptome of CD14 macrophages derived from patients with inflammatory arthritis and define a JAK/STAT signature. Thereafter to investigate the role of JAK inhibition on inflammatory cytokine production in a macrophage cell contact activation assay. Finally, to investigate JAK inhibition, following RA synovial fluid stimulation of monocytes. Methods and Results: Using bioinformatic software such as limma from the Bioconductor repository, I determined that there was a JAK/STAT signature in synovial CD14 macrophages from patients with RA and this differed from psoriatic arthritis samples. JAK inhibition using a JAK1/3 inhibitor tofacitinib reduced TNFα production when macrophages were cell contact activated by cytokine stimulated CD4 T-cells. Other pro-inflammatory cytokines such as IL-6 and chemokines such as IP-10 were also reduced. RA synovial fluid failed to stimulate monocytes to phosphorylate STAT1, 3 or 6 but CD4 T-cells activated STAT3 with this stimulus. RNA sequencing of synovial fluid stimulated CD4 T-cells showed an upregulation of SOCS3, BCL6 and SBNO2, a gene associated with RA but with unknown function and tofacitinib reversed this. Conclusion: These studies demonstrate that tofacitinib is effective at reducing inflammatory mediator production in a macrophage cell contact assay and also affects soluble factor mediated stimulation of CD4 T-cells. This suggests that the effectiveness of JAK inhibition is due to inhibition of multiple cytokine pathways such as IL-6, IL-15 and interferon. RNA sequencing is a useful tool to identify non-coding RNA transcripts that are associated with synovial fluid stimulation and JAK inhibition but these require further validation. SBNO2, a gene that is associated with RA, may be biomarker of tofacitinib treatment but requires further investigation and validation in wider disease cohorts.
Resumo:
La ciencia de la computación arrancó con la era de las máquinas tabulables para después pasar a las programables. Sin embargo el mundo actual vive una transformación radical de la información. Por un lado la avalancha masiva de datos, el llamado Big Data hace que los sistemas requieran de una inteligencia adicional para extraer conocimiento válido de los datos. Por otro lado demandamos cada día más ordenadores que nos entiendan y se comuniquen mejor con nosotros. La computación cognitiva, la nueva era de la computación, viene a responder a estas necesidades: sistemas que utilizan la inteligencia biológica como modelo para establecer una relación más satisfactoria con los seres humanos. El lenguaje natural, la capacidad de moverse en un mundo ambiguo y el aprendizaje son características de los sistemas cognitivos, uno de los cuales, IBM Watson es el ejemplo más elocuente en la actualidad de este nuevo paradigma.
Resumo:
During the last decades, we assisted to what is called “information explosion”. With the advent of the new technologies and new contexts, the volume, velocity and variety of data has increased exponentially, becoming what is known today as big data. Among them, we emphasize telecommunications operators, which gather, using network monitoring equipment, millions of network event records, the Call Detail Records (CDRs) and the Event Detail Records (EDRs), commonly known as xDRs. These records are stored and later processed to compute network performance and quality of service metrics. With the ever increasing number of collected xDRs, its generated volume needing to be stored has increased exponentially, making the current solutions based on relational databases not suited anymore. To tackle this problem, the relational data store can be replaced by Hadoop File System (HDFS). However, HDFS is simply a distributed file system, this way not supporting any aspect of the relational paradigm. To overcome this difficulty, this paper presents a framework that enables the current systems inserting data into relational databases, to keep doing it transparently when migrating to Hadoop. As proof of concept, the developed platform was integrated with the Altaia - a performance and QoS management of telecommunications networks and services.
Resumo:
People go through their life making all kinds of decisions, and some of these decisions affect their demand for transportation, for example, their choices of where to live and where to work, how and when to travel and which route to take. Transport related choices are typically time dependent and characterized by large number of alternatives that can be spatially correlated. This thesis deals with models that can be used to analyze and predict discrete choices in large-scale networks. The proposed models and methods are highly relevant for, but not limited to, transport applications. We model decisions as sequences of choices within the dynamic discrete choice framework, also known as parametric Markov decision processes. Such models are known to be difficult to estimate and to apply to make predictions because dynamic programming problems need to be solved in order to compute choice probabilities. In this thesis we show that it is possible to explore the network structure and the flexibility of dynamic programming so that the dynamic discrete choice modeling approach is not only useful to model time dependent choices, but also makes it easier to model large-scale static choices. The thesis consists of seven articles containing a number of models and methods for estimating, applying and testing large-scale discrete choice models. In the following we group the contributions under three themes: route choice modeling, large-scale multivariate extreme value (MEV) model estimation and nonlinear optimization algorithms. Five articles are related to route choice modeling. We propose different dynamic discrete choice models that allow paths to be correlated based on the MEV and mixed logit models. The resulting route choice models become expensive to estimate and we deal with this challenge by proposing innovative methods that allow to reduce the estimation cost. For example, we propose a decomposition method that not only opens up for possibility of mixing, but also speeds up the estimation for simple logit models, which has implications also for traffic simulation. Moreover, we compare the utility maximization and regret minimization decision rules, and we propose a misspecification test for logit-based route choice models. The second theme is related to the estimation of static discrete choice models with large choice sets. We establish that a class of MEV models can be reformulated as dynamic discrete choice models on the networks of correlation structures. These dynamic models can then be estimated quickly using dynamic programming techniques and an efficient nonlinear optimization algorithm. Finally, the third theme focuses on structured quasi-Newton techniques for estimating discrete choice models by maximum likelihood. We examine and adapt switching methods that can be easily integrated into usual optimization algorithms (line search and trust region) to accelerate the estimation process. The proposed dynamic discrete choice models and estimation methods can be used in various discrete choice applications. In the area of big data analytics, models that can deal with large choice sets and sequential choices are important. Our research can therefore be of interest in various demand analysis applications (predictive analytics) or can be integrated with optimization models (prescriptive analytics). Furthermore, our studies indicate the potential of dynamic programming techniques in this context, even for static models, which opens up a variety of future research directions.
Resumo:
The effective supplier evaluation and purchasing processes are of vital importance to business organizations, making the suppliers selection problem a fundamental key issue to their success. We consider a complex supplier selection problem with multiple products where minimum package quantities, minimum order values related to delivery costs, and discounted pricing schemes are taken into account. Our main contribution is to present a mixed integer linear programming (MILP) model for this supplier selection problem. The model is used to solve several examples including three real case studies from an electronic equipment assembly company.
Resumo:
Worldwide air traffic tends to increase and for many airports it is no longer an op-tion to expand terminals and runways, so airports are trying to maximize their op-erational efficiency. Many airports already operate near their maximal capacity. Peak hours imply operational bottlenecks and cause chained delays across flights impacting passengers, airlines and airports. Therefore there is a need for the opti-mization of the ground movements at the airports. The ground movement prob-lem consists of routing the departing planes from the gate to the runway for take-off, and the arriving planes from the runway to the gate, and to schedule their movements. The main goal is to minimize the time spent by the planes during their ground movements while respecting all the rules established by the Ad-vanced Surface Movement, Guidance and Control Systems of the International Civil Aviation. Each aircraft event (arrival or departing authorization) generates a new environment and therefore a new instance of the Ground Movement Prob-lem. The optimization approach proposed is based on an Iterated Local Search and provides a fast heuristic solution for each real-time event generated instance granting all safety regulations. Preliminary computational results are reported for real data comparing the heuristic solutions with the solutions obtained using a mixed-integer programming approach.
Resumo:
Las transformaciones tecnológicas y de información que está experimentando la sociedad, especialmente en la última década, está produciendo un crecimiento exponencial de los datos en todos los ámbitos de la sociedad. Los datos que se generan en los diferentes ámbitos se corresponden con elementos primarios de información que por sí solos son irrelevantes como apoyo a las tomas de decisiones. Para que estos datos puedan ser de utilidad en cualquier proceso de decisión, es preciso que se conviertan en información, es decir, en un conjunto de datos procesados con un significado, para ayudar a crear conocimiento. Estos procesos de transformación de datos en información se componen de diferentes fases como la localización de las fuentes de información, captura, análisis y medición.Este cambio tecnológico y a su vez de la sociedad ha provocado un aumento de las fuentes de información, de manera que cualquier persona, empresas u organización, puede generar información que puede ser relevante para el negocio de las empresas o gobiernos. Localizar estas fuentes, identificar información relevante en la fuente y almacenar la información que generan, la cual puede tener diferentes formatos, es el primer paso de todo el proceso anteriormente descrito, el cual tiene que ser ejecutado de manera correcta ya que el resto de fases dependen de las fuentes y datos recolectados. Para la identificación de información relevante en las fuentes se han creado lo que se denomina, robot de búsqueda, los cuales examinan de manera automática una fuente de información, localizando y recolectando datos que puedan ser de interés.En este trabajo se diseña e implementa un robot de conocimiento junto con los sistemas de captura de información online para fuentes hipertextuales y redes sociales.
Resumo:
The big data era has dramatically transformed our lives; however, security incidents such as data breaches can put sensitive data (e.g. photos, identities, genomes) at risk. To protect users' data privacy, there is a growing interest in building secure cloud computing systems, which keep sensitive data inputs hidden, even from computation providers. Conceptually, secure cloud computing systems leverage cryptographic techniques (e.g., secure multiparty computation) and trusted hardware (e.g. secure processors) to instantiate a “secure” abstract machine consisting of a CPU and encrypted memory, so that an adversary cannot learn information through either the computation within the CPU or the data in the memory. Unfortunately, evidence has shown that side channels (e.g. memory accesses, timing, and termination) in such a “secure” abstract machine may potentially leak highly sensitive information, including cryptographic keys that form the root of trust for the secure systems. This thesis broadly expands the investigation of a research direction called trace oblivious computation, where programming language techniques are employed to prevent side channel information leakage. We demonstrate the feasibility of trace oblivious computation, by formalizing and building several systems, including GhostRider, which is a hardware-software co-design to provide a hardware-based trace oblivious computing solution, SCVM, which is an automatic RAM-model secure computation system, and ObliVM, which is a programming framework to facilitate programmers to develop applications. All of these systems enjoy formal security guarantees while demonstrating a better performance than prior systems, by one to several orders of magnitude.
Resumo:
Uno de los grandes retos de la HPC (High Performance Computing) consiste en optimizar el subsistema de Entrada/Salida, (E/S), o I/O (Input/Output). Ken Batcher resume este hecho en la siguiente frase: "Un supercomputador es un dispositivo que convierte los problemas limitados por la potencia de cálculo en problemas limitados por la E/S" ("A Supercomputer is a device for turning compute-bound problems into I/O-bound problems") . En otras palabras, el cuello de botella ya no reside tanto en el procesamiento de los datos como en la disponibilidad de los mismos. Además, este problema se exacerbará con la llegada del Exascale y la popularización de las aplicaciones Big Data. En este contexto, esta tesis contribuye a mejorar el rendimiento y la facilidad de uso del subsistema de E/S de los sistemas de supercomputación. Principalmente se proponen dos contribuciones al respecto: i) una interfaz de E/S desarrollada para el lenguaje Chapel que mejora la productividad del programador a la hora de codificar las operaciones de E/S; y ii) una implementación optimizada del almacenamiento de datos de secuencias genéticas. Con más detalle, la primera contribución estudia y analiza distintas optimizaciones de la E/S en Chapel, al tiempo que provee a los usuarios de una interfaz simple para el acceso paralelo y distribuido a los datos contenidos en ficheros. Por tanto, contribuimos tanto a aumentar la productividad de los desarrolladores, como a que la implementación sea lo más óptima posible. La segunda contribución también se enmarca dentro de los problemas de E/S, pero en este caso se centra en mejorar el almacenamiento de los datos de secuencias genéticas, incluyendo su compresión, y en permitir un uso eficiente de esos datos por parte de las aplicaciones existentes, permitiendo una recuperación eficiente tanto de forma secuencial como aleatoria. Adicionalmente, proponemos una implementación paralela basada en Chapel.
Resumo:
Têm-se notado nos últimos anos um crescimento na adoção de tecnologias de computação em nuvem, com uma adesão inicial por parte de particulares e pequenas empresas, e mais recentemente por grandes organizações. Esta tecnologia tem servido de base ao aparecimento de um conjunto de novas tendências, como a Internet das Coisas ligando os nossos equipamentos pessoais e wearables às redes sociais, processos de big data que permitem tipificar comportamentos de clientes ou ainda facilitar a vida ao cidadão com serviços de atendimento integrados. No entanto, tal como em todas as novas tendências disruptivas, que trazem consigo um conjunto de oportunidades, trazem também um conjunto de novos riscos que são necessários de serem equacionados. Embora este caminho praticamente se torne inevitável para uma grande parte de empresas e entidades governamentais, a sua adoção como funcionamento deve ser alvo de uma permanente avaliação e monitorização entre as vantagens e riscos associados. Para tal, é fundamental que as organizações se dotem de uma eficiente gestão do risco, de modo que possam tipificar os riscos (identificar, analisar e quantificar) e orientar-se de uma forma segura e metódica para este novo paradigma. Caso não o façam, os riscos ficam evidenciados, desde uma possível perda de competitividade face às suas congéneres, falta de confiança dos clientes, dos parceiros de negócio e podendo culminar numa total inatividade do negócio. Com esta tese de mestrado desenvolve-se uma análise genérica de risco tendo como base a Norma ISO 31000:2009 e a elaboração de uma proposta de registo de risco, que possa servir de auxiliar em processos de tomada de decisão na contratação e manutenção de serviços de Computação em Nuvem por responsáveis de organizações privadas ou estatais.
Resumo:
Las tecnologías relacionadas con el análisis de datos masivos están empezando a revolucionar nuestra forma de vivir, nos demos cuenta de ello o no. Desde las grandes compañías, que utilizan big data para la mejora de sus resultados, hasta nuestros teléfonos, que lo usan para medir nuestra actividad física. La medicina no es ajena a esta tecnología, que puede utilizarla para mejorar los diagnósticos y establecer planes de seguimiento personalizados a los pacientes. En particular, el trastorno bipolar requiere de atención constante por parte de los profesionales médicos. Con el objetivo de contribuir a esta labor, se presenta una plataforma, denominada bip4cast, que pretende predecir con antelación las crisis de estos enfermos. Uno de sus componentes es una aplicación web creada para realizar el seguimiento a los pacientes y representar gráficamente los datos de que se dispone con el objetivo de que el médico sea capaz de evaluar el estado del paciente, analizando el riesgo de recaída. Además, se estudian las diferentes visualizaciones implementadas en la aplicación con el objetivo de comprobar si se adaptan correctamente a los objetivos que se pretenden alcanzar con ellas. Para ello, generaremos datos aleatorios y representaremos estos gráficamente, examinando las posibles conclusiones que de ellos pudieran extraerse.
Resumo:
Technologies for Big Data and Data Science are receiving increasing research interest nowadays. This paper introduces the prototyping architecture of a tool aimed to solve Big Data Optimization problems. Our tool combines the jMetal framework for multi-objective optimization with Apache Spark, a technology that is gaining momentum. In particular, we make use of the streaming facilities of Spark to feed an optimization problem with data from different sources. We demonstrate the use of our tool by solving a dynamic bi-objective instance of the Traveling Salesman Problem (TSP) based on near real-time traffic data from New York City, which is updated several times per minute. Our experiment shows that both jMetal and Spark can be integrated providing a software platform to deal with dynamic multi-optimization problems.