845 resultados para mining data streams
Resumo:
Worldwide companies currently make a significant effort in performing the materiality analysis, whose aim is to explain corporate sustainability in an annual report. Materiality reflects what are the most important social, economic and environmental issues for a company and its stakeholders. Many studies and standards have been proposed to establish what are the main steps to follow to identify the specific topics to be included in a sustainability report. However, few existing quantitative and structured approaches help understanding how to deal with the identified topics and how to prioritise them to effectively show the most valuable ones. Moreover, the use of traditional approaches involves a long-lasting and complex procedure where a lot of people have to be reached and interviewed and several companies' reports have to be read to extrapolate the material topics to be discussed in the sustainability report. This dissertation aims to propose an automated mechanism to gather stakeholders and the company's opinions identifying relevant issues. To accomplish this purpose, text mining techniques are exploited to analyse textual documents written by either a stakeholder or the reporting company. It is then extracted a measure of how much a document deals with some defined topics. This kind of information is finally manipulated to prioritise topics based on how the author's opinion matters. The entire work is based upon a real case study in the domain of telecommunications.
Resumo:
The compound eyes of mantis shrimps, a group of tropical marine crustaceans, incorporate principles of serial and parallel processing of visual information that may be applicable to artificial imaging systems. Their eyes include numerous specializations for analysis of the spectral and polarizational properties of light, and include more photoreceptor classes for analysis of ultraviolet light, color, and polarization than occur in any other known visual system. This is possible because receptors in different regions of the eye are anatomically diverse and incorporate unusual structural features, such as spectral filters, not seen in other compound eyes. Unlike eyes of most other animals, eyes of mantis shrimps must move to acquire some types of visual information and to integrate color and polarization with spatial vision. Information leaving the retina appears to be processed into numerous parallel data streams leading into the central nervous system, greatly reducing the analytical requirements at higher levels. Many of these unusual features of mantis shrimp vision may inspire new sensor designs for machine vision
Resumo:
The apposition compound eyes of stomatopod crustaceans contain a morphologically distinct eye region specialized for color and polarization vision, called the mid-band. In two stomatopod superfamilies, the mid-band is constructed from six rows of enlarged ommatidia containing multiple photoreceptor classes for spectral and polarization vision. The aim of this study was to begin to analyze the underlying neuroarchitecture, the design of which might reveal clues how the visual system interprets and communicates to deeper levels of the brain the multiple channels of information supplied by the retina. Reduced silver methods were used to investigate the axon pathways from different retinal regions to the lamina ganglionaris and from there to the medulla externa, the medulla interna, and the medulla terminalis. A swollen band of neuropil-here termed the accessory lobe-projects across the equator of. the lamina ganglionaris, the medulla externa, and the medulla interna and represents, structurally, the retina's mid-band. Serial semithin and ultrathin resin sections were used to reconstruct the projection of photoreceptor axons from the retina to the lamina ganglionaris. The eight axons originating from one ommatidium project to the same lamina cartridge. Seven short visual fibers end at two distinct levels in each lamina cartridge, thus geometrically separating the two channels of polarization and spectral information. The eighth visual fiber runs axially through the cartridge and terminates in the medulla externa. We conclude that spatial, color, and polarization information is divided into three parallel data streams from the retina to the central nervous system. (C) 2003 Wiley-Liss, Inc.
Resumo:
Este trabalho utiliza uma estrutura pin empilhada, baseada numa liga de siliceto de carbono amorfo hidrogenado (a-Si:H e/ou a-SiC:H), que funciona como filtro óptico na zona visível do espectro electromagnético. Pretende-se utilizar este dispositivo para realizar a demultiplexagem de sinais ópticos e desenvolver um algoritmo que permita fazer o reconhecimento autónomo do sinal transmitido em cada canal. O objectivo desta tese visa implementar um algoritmo que permita o reconhecimento autónomo da informação transmitida por cada canal através da leitura da fotocorrente fornecida pelo dispositivo. O tema deste trabalho resulta das conclusões de trabalhos anteriores, em que este dispositivo e outros de configuração idêntica foram analisados, de forma a explorar a sua utilização na implementação da tecnologia WDM. Neste trabalho foram utilizados três canais de transmissão (Azul – 470 nm, Verde – 525 nm e Vermelho – 626 nm) e vários tipos de radiação de fundo. Foram realizadas medidas da resposta espectral e da resposta temporal da fotocorrente do dispositivo, em diferentes condições experimentais. Variou-se o comprimento de onda do canal e o comprimento de onda do fundo aplicado, mantendo-se constante a intensidade do canal e a frequência de transmissão. Os resultados obtidos permitiram aferir sobre a influência da presença da radiação de fundo e da tensão aplicada ao dispositivo, usando diferentes sequências de dados transmitidos nos vários canais. Verificou-se, que sob polarização inversa, a radiação de fundo vermelho amplifica os valores de fotocorrente do canal azul e a radiação de fundo azul amplifica o canal vermelho e verde. Para polarização directa, apenas a radiação de fundo azul amplifica os valores de fotocorrente do canal vermelho. Enquanto para ambas as polarizações, a radiação de fundo verde, não tem uma grande influência nos restantes canais. Foram implementados dois algoritmos para proceder ao reconhecimento da informação de cada canal. Na primeira abordagem usou-se a informação contida nas medidas de fotocorrente geradas pelo dispositivo sob polarização inversa e directa. Pela comparação das duas medidas desenvolveu-se e testou-se um algoritmo que permite o reconhecimento dos canais individuais. Numa segunda abordagem procedeu-se ao reconhecimento da informação de cada canal mas com aplicação de radiação de fundo, tendo-se usado a informação contida nas medidas de fotocorrente geradas pelo dispositivo sob polarização inversa sem aplicação de radiação de fundo com a informação contida nas medidas de fotocorrente geradas pelo dispositivo sob polarização inversa com aplicação de radiação de fundo. Pela comparação destas duas medidas desenvolveu-se e testou-se o segundo algoritmo que permite o reconhecimento dos canais individuais com base na aplicação de radiação de fundo.
Resumo:
La UOC ha detectat que en els estudis de Diplomatura de Ciències Empresarials hi ha una quarta part dels estudiants que no continuen els estudis després del primer semestre. La UOC, com a client, ha facilitat les dades de matrícula de 20 semestres d'aquests estudis. Es demana que es cerqui quina o quines poden ser les causes d'aquest abandonament i una proposta per evitar-ho.
Resumo:
Avec les nouvelles technologies des réseaux optiques, une quantité de données de plus en plus grande peut être transportée par une seule longueur d'onde. Cette quantité peut atteindre jusqu’à 40 gigabits par seconde (Gbps). Les flots de données individuels quant à eux demandent beaucoup moins de bande passante. Le groupage de trafic est une technique qui permet l'utilisation efficace de la bande passante offerte par une longueur d'onde. Elle consiste à assembler plusieurs flots de données de bas débit en une seule entité de données qui peut être transporté sur une longueur d'onde. La technique demultiplexage en longueurs d'onde (Wavelength Division Multiplexing WDM) permet de transporter plusieurs longueurs d'onde sur une même fibre. L'utilisation des deux techniques : WDM et groupage de trafic, permet de transporter une quantité de données de l'ordre de terabits par seconde (Tbps) sur une même fibre optique. La protection du trafic dans les réseaux optiques devient alors une opération très vitale pour ces réseaux, puisqu'une seule panne peut perturber des milliers d'utilisateurs et engendre des pertes importantes jusqu'à plusieurs millions de dollars à l'opérateur et aux utilisateurs du réseau. La technique de protection consiste à réserver une capacité supplémentaire pour acheminer le trafic en cas de panne dans le réseau. Cette thèse porte sur l'étude des techniques de groupage et de protection du trafic en utilisant les p-cycles dans les réseaux optiques dans un contexte de trafic dynamique. La majorité des travaux existants considère un trafic statique où l'état du réseau ainsi que le trafic sont donnés au début et ne changent pas. En plus, la majorité de ces travaux utilise des heuristiques ou des méthodes ayant de la difficulté à résoudre des instances de grande taille. Dans le contexte de trafic dynamique, deux difficultés majeures s'ajoutent aux problèmes étudiés, à cause du changement continuel du trafic dans le réseau. La première est due au fait que la solution proposée à la période précédente, même si elle est optimisée, n'est plus nécessairement optimisée ou optimale pour la période courante, une nouvelle optimisation de la solution au problème est alors nécessaire. La deuxième difficulté est due au fait que la résolution du problème pour une période donnée est différente de sa résolution pour la période initiale à cause des connexions en cours dans le réseau qui ne doivent pas être trop dérangées à chaque période de temps. L'étude faite sur la technique de groupage de trafic dans un contexte de trafic dynamique consiste à proposer différents scénarios pour composer avec ce type de trafic, avec comme objectif la maximisation de la bande passante des connexions acceptées à chaque période de temps. Des formulations mathématiques des différents scénarios considérés pour le problème de groupage sont proposées. Les travaux que nous avons réalisés sur le problème de la protection considèrent deux types de p-cycles, ceux protégeant les liens (p-cycles de base) et les FIPP p-cycles (p-cycles protégeant les chemins). Ces travaux ont consisté d’abord en la proposition de différents scénarios pour gérer les p-cycles de protection dans un contexte de trafic dynamique. Ensuite, une étude sur la stabilité des p-cycles dans un contexte de trafic dynamique a été faite. Des formulations de différents scénarios ont été proposées et les méthodes de résolution utilisées permettent d’aborder des problèmes de plus grande taille que ceux présentés dans la littérature. Nous nous appuyons sur la méthode de génération de colonnes pour énumérer implicitement les cycles les plus prometteurs. Dans l'étude des p-cycles protégeant les chemins ou FIPP p-cycles, nous avons proposé des formulations pour le problème maître et le problème auxiliaire. Nous avons utilisé une méthode de décomposition hiérarchique du problème qui nous permet d'obtenir de meilleurs résultats dans un temps raisonnable. Comme pour les p-cycles de base, nous avons étudié la stabilité des FIPP p-cycles dans un contexte de trafic dynamique. Les travaux montrent que dépendamment du critère d'optimisation, les p-cycles de base (protégeant les liens) et les FIPP p-cycles (protégeant les chemins) peuvent être très stables.
Advances in therapeutic risk management through signal detection and risk minimisation tool analyses
Resumo:
Les quatre principales activités de la gestion de risque thérapeutique comportent l’identification, l’évaluation, la minimisation, et la communication du risque. Ce mémoire aborde les problématiques liées à l’identification et à la minimisation du risque par la réalisation de deux études dont les objectifs sont de: 1) Développer et valider un outil de « data mining » pour la détection des signaux à partir des banques de données de soins de santé du Québec; 2) Effectuer une revue systématique afin de caractériser les interventions de minimisation de risque (IMR) ayant été implantées. L’outil de détection de signaux repose sur la méthode analytique du quotient séquentiel de probabilité (MaxSPRT) en utilisant des données de médicaments délivrés et de soins médicaux recueillis dans une cohorte rétrospective de 87 389 personnes âgées vivant à domicile et membres du régime d’assurance maladie du Québec entre les années 2000 et 2009. Quatre associations « médicament-événement indésirable (EI) » connues et deux contrôles « négatifs » ont été utilisés. La revue systématique a été faite à partir d’une revue de la littérature ainsi que des sites web de six principales agences réglementaires. La nature des RMIs ont été décrites et des lacunes de leur implémentation ont été soulevées. La méthode analytique a mené à la détection de signaux dans l'une des quatre combinaisons médicament-EI. Les principales contributions sont: a) Le premier outil de détection de signaux à partir des banques de données administratives canadiennes; b) Contributions méthodologiques par la prise en compte de l'effet de déplétion des sujets à risque et le contrôle pour l'état de santé du patient. La revue a identifié 119 IMRs dans la littérature et 1,112 IMRs dans les sites web des agences réglementaires. La revue a démontré qu’il existe une augmentation des IMRs depuis l’introduction des guides réglementaires en 2005 mais leur efficacité demeure peu démontrée.
Resumo:
A new generation of advanced surveillance systems is being conceived as a collection of multi-sensor components such as video, audio and mobile robots interacting in a cooperating manner to enhance situation awareness capabilities to assist surveillance personnel. The prominent issues that these systems face are: the improvement of existing intelligent video surveillance systems, the inclusion of wireless networks, the use of low power sensors, the design architecture, the communication between different components, the fusion of data emerging from different type of sensors, the location of personnel (providers and consumers) and the scalability of the system. This paper focuses on the aspects pertaining to real-time distributed architecture and scalability. For example, to meet real-time requirements, these systems need to process data streams in concurrent environments, designed by taking into account scheduling and synchronisation. The paper proposes a framework for the design of visual surveillance systems based on components derived from the principles of Real Time Networks/Data Oriented Requirements Implementation Scheme (RTN/DORIS). It also proposes the implementation of these components using the well-known middleware technology Common Object Request Broker Architecture (CORBA). Results using this architecture for video surveillance are presented through an implemented prototype.
Resumo:
We explore the influence of the choice of attenuation factor on Katz centrality indices for evolving communication networks. For given snapshots of a network observed over a period of time, recently developed communicability indices aim to identify best broadcasters and listeners in the network. In this article, we looked into the sensitivity of communicability indices on the attenuation factor constraint, in relation to spectral radius (the largest eigenvalue) of the network at any point in time and its computation in the case of large networks. We proposed relaxed communicability measures where the spectral radius bound on attenuation factor is relaxed and the adjacency matrix is normalised in order to maintain the convergence of the measure. Using a vitality based measure of both standard and relaxed communicability indices we looked at the ways of establishing the most important individuals for broadcasting and receiving of messages related to community bridging roles. We illustrated our findings with two examples of real-life networks, MIT reality mining data set of daily communications between 106 individuals during one year and UK Twitter mentions network, direct messages on Twitter between 12.4k individuals during one week.
Resumo:
Sea surface temperature (SST) measurements are required by operational ocean and atmospheric forecasting systems to constrain modeled upper ocean circulation and thermal structure. The Global Ocean Data Assimilation Experiment (GODAE) High Resolution SST Pilot Project (GHRSST-PP) was initiated to address these needs by coordinating the provision of accurate, high-resolution, SST products for the global domain. The pilot project is now complete, but activities continue within the Group for High Resolution SST (GHRSST). The pilot project focused on harmonizing diverse satellite and in situ data streams that were indexed, processed, quality controlled, analyzed, and documented within a Regional/Global Task Sharing (R/GTS) framework implemented in an internationally distributed manner. Data with meaningful error estimates developed within GHRSST are provided by services within R/GTS. Currently, several terabytes of data are processed at international centers daily, creating more than 25 gigabytes of product. Ensemble SST analyses together with anomaly SST outputs are generated each day, providing confidence in SST analyses via diagnostic outputs. Diagnostic data sets are generated and Web interfaces are provided to monitor the quality of observation and analysis products. GHRSST research and development projects continue to tackle problems of instrument calibration, algorithm development, diurnal variability, skin temperature deviation, and validation/verification of GHRSST products. GHRSST also works closely with applications and users, providing a forum for discussion and feedback between SST users and producers on a regular basis. All data within the GHRSST R/GTS framework are freely available. This paper reviews the progress of GHRSST-PP, highlighting achievements that have been fundamental to the success of the pilot project.
Resumo:
In this article, we investigate how the choice of the attenuation factor in an extended version of Katz centrality influences the centrality of the nodes in evolving communication networks. For given snapshots of a network, observed over a period of time, recently developed communicability indices aim to identify the best broadcasters and listeners (receivers) in the network. Here we explore the attenuation factor constraint, in relation to the spectral radius (the largest eigenvalue) of the network at any point in time and its computation in the case of large networks. We compare three different communicability measures: standard, exponential, and relaxed (where the spectral radius bound on the attenuation factor is relaxed and the adjacency matrix is normalised, in order to maintain the convergence of the measure). Furthermore, using a vitality-based measure of both standard and relaxed communicability indices, we look at the ways of establishing the most important individuals for broadcasting and receiving of messages related to community bridging roles. We compare those measures with the scores produced by an iterative version of the PageRank algorithm and illustrate our findings with two examples of real-life evolving networks: the MIT reality mining data set, consisting of daily communications between 106 individuals over the period of one year, a UK Twitter mentions network, constructed from the direct \emph{tweets} between 12.4k individuals during one week, and a subset the Enron email data set.
Resumo:
Body Sensor Networks (BSNs) have been recently introduced for the remote monitoring of human activities in a broad range of application domains, such as health care, emergency management, fitness and behaviour surveillance. BSNs can be deployed in a community of people and can generate large amounts of contextual data that require a scalable approach for storage, processing and analysis. Cloud computing can provide a flexible storage and processing infrastructure to perform both online and offline analysis of data streams generated in BSNs. This paper proposes BodyCloud, a SaaS approach for community BSNs that supports the development and deployment of Cloud-assisted BSN applications. BodyCloud is a multi-tier application-level architecture that integrates a Cloud computing platform and BSN data streams middleware. BodyCloud provides programming abstractions that allow the rapid development of community BSN applications. This work describes the general architecture of the proposed approach and presents a case study for the real-time monitoring and analysis of cardiac data streams of many individuals.
Resumo:
Body area networks (BANs) are emerging as enabling technology for many human-centered application domains such as health-care, sport, fitness, wellness, ergonomics, emergency, safety, security, and sociality. A BAN, which basically consists of wireless wearable sensor nodes usually coordinated by a static or mobile device, is mainly exploited to monitor single assisted livings. Data generated by a BAN can be processed in real-time by the BAN coordinator and/or transmitted to a server-side for online/offline processing and long-term storing. A network of BANs worn by a community of people produces large amount of contextual data that require a scalable and efficient approach for elaboration and storage. Cloud computing can provide a flexible storage and processing infrastructure to perform both online and offline analysis of body sensor data streams. In this paper, we motivate the introduction of Cloud-assisted BANs along with the main challenges that need to be addressed for their development and management. The current state-of-the-art is overviewed and framed according to the main requirements for effective Cloud-assisted BAN architectures. Finally, relevant open research issues in terms of efficiency, scalability, security, interoperability, prototyping, dynamic deployment and management, are discussed.
Resumo:
Our digital universe is rapidly expanding,more and more daily activities are digitally recorded, data arrives in streams, it needs to be analyzed in real time and may evolve over time. In the last decade many adaptive learning algorithms and prediction systems, which can automatically update themselves with the new incoming data, have been developed. The majority of those algorithms focus on improving the predictive performance and assume that model update is always desired as soon as possible and as frequently as possible. In this study we consider potential model update as an investment decision, which, as in the financial markets, should be taken only if a certain return on investment is expected. We introduce and motivate a new research problem for data streams ? cost-sensitive adaptation. We propose a reference framework for analyzing adaptation strategies in terms of costs and benefits. Our framework allows to characterize and decompose the costs of model updates, and to asses and interpret the gains in performance due to model adaptation for a given learning algorithm on a given prediction task. Our proof-of-concept experiment demonstrates how the framework can aid in analyzing and managing adaptation decisions in the chemical industry.
Resumo:
One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.