16 resultados para Data anonymization and sanitization

em Instituto Politécnico do Porto, Portugal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the statistical distributions of worldwide earthquakes from year 1963 up to year 2012. A Cartesian grid, dividing Earth into geographic regions, is considered. Entropy and the Jensen–Shannon divergence are used to analyze and compare real-world data. Hierarchical clustering and multi-dimensional scaling techniques are adopted for data visualization. Entropy-based indices have the advantage of leading to a single parameter expressing the relationships between the seismic data. Classical and generalized (fractional) entropy and Jensen–Shannon divergence are tested. The generalized measures lead to a clear identification of patterns embedded in the data and contribute to better understand earthquake distributions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

New arguments proving that successive (repeated) measurements have a memory and actually remember each other are presented. The recognition of this peculiarity can change essentially the existing paradigm associated with conventional observation in behavior of different complex systems and lead towards the application of an intermediate model (IM). This IM can provide a very accurate fit of the measured data in terms of the Prony's decomposition. This decomposition, in turn, contains a small set of the fitting parameters relatively to the number of initial data points and allows comparing the measured data in cases where the “best fit” model based on some specific physical principles is absent. As an example, we consider two X-ray diffractometers (defined in paper as A- (“cheap”) and B- (“expensive”) that are used after their proper calibration for the measuring of the same substance (corundum a-Al2O3). The amplitude-frequency response (AFR) obtained in the frame of the Prony's decomposition can be used for comparison of the spectra recorded from (A) and (B) - X-ray diffractometers (XRDs) for calibration and other practical purposes. We prove also that the Fourier decomposition can be adapted to “ideal” experiment without memory while the Prony's decomposition corresponds to real measurement and can be fitted in the frame of the IM in this case. New statistical parameters describing the properties of experimental equipment (irrespective to their internal “filling”) are found. The suggested approach is rather general and can be used for calibration and comparison of different complex dynamical systems in practical purposes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently, power systems (PS) already accommodate a substantial penetration of distributed generation (DG) and operate in competitive environments. In the future, as the result of the liberalisation and political regulations, PS will have to deal with large-scale integration of DG and other distributed energy resources (DER), such as storage and provide market agents to ensure a flexible and secure operation. This cannot be done with the traditional PS operational tools used today like the quite restricted information systems Supervisory Control and Data Acquisition (SCADA) [1]. The trend to use the local generation in the active operation of the power system requires new solutions for data management system. The relevant standards have been developed separately in the last few years so there is a need to unify them in order to receive a common and interoperable solution. For the distribution operation the CIM models described in the IEC 61968/70 are especially relevant. In Europe dispersed and renewable energy resources (D&RER) are mostly operated without remote control mechanisms and feed the maximal amount of available power into the grid. To improve the network operation performance the idea of virtual power plants (VPP) will become a reality. In the future power generation of D&RER will be scheduled with a high accuracy. In order to realize VPP decentralized energy management, communication facilities are needed that have standardized interfaces and protocols. IEC 61850 is suitable to serve as a general standard for all communication tasks in power systems [2]. The paper deals with international activities and experiences in the implementation of a new data management and communication concept in the distribution system. The difficulties in the coordination of the inconsistent developed in parallel communication and data management standards - are first addressed in the paper. The upcoming unification work taking into account the growing role of D&RER in the PS is shown. It is possible to overcome the lag in current practical experiences using new tools for creating and maintenance the CIM data and simulation of the IEC 61850 protocol – the prototype of which is presented in the paper –. The origin and the accuracy of the data requirements depend on the data use (e.g. operation or planning) so some remarks concerning the definition of the digital interface incorporated in the merging unit idea from the power utility point of view are presented in the paper too. To summarize some required future work has been identified.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identity is traditionally defined as an emission concept [1]. Yet, some research points out that there are external factors that can influence it [2]; [3]; [4]. This subject is even more relevant if one considers corporate brands. According to Aaker [5] the number, the power and the credibility of corporate associations are bigger in the case of corporate brands. Literature recognizes the influence of relationships between companies in identity management. Yet, given the increasingly important role of corporate brands, it is surprising that to date no attempt to evaluate that influence has been made in the management of corporate brand identity. Also Keller and Lehman [6] highlight relationships and costumer experience as two areas requiring more investigation. In line with this, the authors intend to develop an empirical research in order to evaluate the influence of relationships between brands in the identity of corporate brand from an internal perspective by interviewing internal stakeholders (brand managers and internal clients). This paper is organized by main contents: theoretical background, research methodology, data analysis and conclusions and finally cues to future investigation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cyanobacteria are widely recognized as a valuable source of bioactive metabolites. The majority of such compounds have been isolated from so-called complex cyanobacteria, such as filamentous or colonial forms, which usually display a larger number of biosynthetic gene clusters in their genomes, when compared to free-living unicellular forms. Nevertheless, picocyanobacteria are also known to have potential to produce bioactive natural products. Here, we report the isolation of hierridin B from the marine picocyanobacterium Cyanobium sp. LEGE 06113. This compound had previously been isolated from the filamentous epiphytic cyanobacterium Phormidium ectocarpi SAG 60.90, and had been shown to possess antiplasmodial activity. A phylogenetic analysis of the 16S rRNA gene from both strains confirmed that these cyanobacteria derive from different evolutionary lineages. We further investigated the biological activity of hierridin B, and tested its cytotoxicity towards a panel of human cancer cell lines; it showed selective cytotoxicity towards HT-29 colon adenocarcinoma cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Consider the problem of disseminating data from an arbitrary source node to all other nodes in a distributed computer system, like Wireless Sensor Networks (WSNs). We assume that wireless broadcast is used and nodes do not know the topology. We propose new protocols which disseminate data faster and use fewer broadcasts than the simple broadcast protocol.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the communication stack of the REMPLI system: a structure using power-lines and IPbased networks for communication, for data acquisition and control of energy distribution and consumption. It is furthermore prepared to use alternative communication media like GSM or analog modem connections. The REMPLI system provides communication service for existing applications, namely automated meter reading, energy billing and domotic applications. The communication stack, consisting of physical, network, transport, and application layer is described as well as the communication services provided by the system. We show how the peculiarities of the power-line communication influence the design of the communication stack, by introducing requirements to efficiently use the limited bandwidth, optimize traffic and implement fair use of the communication medium for the extensive communication partners.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of the this paper is to show that the DGPS data Internet service we designed and developed provides campus-wide real time access to Differential GPS (DGPS) data and, thus, supports precise outdoor navigation. First we describe the developed distributed system in terms of architecture (a three tier client/server application), services provided (real time DGPS data transportation from remote DGPS sources and campus wide data dissemination) and transmission modes implemented (raw and frame mode over TCP and UDP). Then we present and discuss the results obtained and, finally, we draw some conclusions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Managing the physical and compute infrastructure of a large data center is an embodiment of a Cyber-Physical System (CPS). The physical parameters of the data center (such as power, temperature, pressure, humidity) are tightly coupled with computations, even more so in upcoming data centers, where the location of workloads can vary substantially due, for example, to workloads being moved in a cloud infrastructure hosted in the data center. In this paper, we describe a data collection and distribution architecture that enables gathering physical parameters of a large data center at a very high temporal and spatial resolutionof the sensor measurements. We think this is an important characteristic to enable more accurate heat-flow models of the data center andwith them, _and opportunities to optimize energy consumption. Havinga high resolution picture of the data center conditions, also enables minimizing local hotspots, perform more accurate predictive maintenance (pending failures in cooling and other infrastructure equipment can be more promptly detected) and more accurate billing. We detail this architecture and define the structure of the underlying messaging system that is used to collect and distribute the data. Finally, we show the results of a preliminary study of a typical data center radio environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta dissertação incide sobre a problemática da construção de um data warehouse para a empresa AdClick que opera na área de marketing digital. O marketing digital é um tipo de marketing que utiliza os meios de comunicação digital, com a mesma finalidade do método tradicional que se traduz na divulgação de bens, negócios e serviços e a angariação de novos clientes. Existem diversas estratégias de marketing digital tendo em vista atingir tais objetivos, destacando-se o tráfego orgânico e tráfego pago. Onde o tráfego orgânico é caracterizado pelo desenvolvimento de ações de marketing que não envolvem quaisquer custos inerentes à divulgação e/ou angariação de potenciais clientes. Por sua vez o tráfego pago manifesta-se pela necessidade de investimento em campanhas capazes de impulsionar e atrair novos clientes. Inicialmente é feita uma abordagem do estado da arte sobre business intelligence e data warehousing, e apresentadas as suas principais vantagens as empresas. Os sistemas business intelligence são necessários, porque atualmente as empresas detêm elevados volumes de dados ricos em informação, que só serão devidamente explorados fazendo uso das potencialidades destes sistemas. Nesse sentido, o primeiro passo no desenvolvimento de um sistema business intelligence é concentrar todos os dados num sistema único integrado e capaz de dar apoio na tomada de decisões. É então aqui que encontramos a construção do data warehouse como o sistema único e ideal para este tipo de requisitos. Nesta dissertação foi elaborado o levantamento das fontes de dados que irão abastecer o data warehouse e iniciada a contextualização dos processos de negócio existentes na empresa. Após este momento deu-se início à construção do data warehouse, criação das dimensões e tabelas de factos e definição dos processos de extração e carregamento dos dados para o data warehouse. Assim como a criação das diversas views. Relativamente ao impacto que esta dissertação atingiu destacam-se as diversas vantagem a nível empresarial que a empresa parceira neste trabalho retira com a implementação do data warehouse e os processos de ETL para carregamento de todas as fontes de informação. Sendo que algumas vantagens são a centralização da informação, mais flexibilidade para os gestores na forma como acedem à informação. O tratamento dos dados de forma a ser possível a extração de informação a partir dos mesmos.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática - Área de Especialização em Tecnologias do Conhecimento e Decisão

Relevância:

100.00% 100.00%

Publicador:

Resumo:

WWW is a huge, open, heterogeneous system, however its contents data is mainly human oriented. The Semantic Web needs to assure that data is readable and “understandable” to intelligent software agents, though the use of explicit and formal semantics. Ontologies constitute a privileged artifact for capturing the semantic of the WWW data. Temporal and spatial dimensions are transversal to the generality of knowledge domains and therefore are fundamental for the reasoning process of software agents. Representing temporal/spatial evolution of concepts and their relations in OWL (W3C standard for ontologies) it is not straightforward. Although proposed several strategies to tackle this problem but there is still no formal and standard approach. This work main goal consists of development of methods/tools to support the engineering of temporal and spatial aspects in intelligent systems through the use of OWL ontologies. An existing method for ontology engineering, Fonte was used as framework for the development of this work. As main contributions of this work Fonte was re-engineered in order to: i) support the spatial dimension; ii) work with OWL Ontologies; iii) and support the application of Ontology Design Patterns. Finally, the capabilities of the proposed approach were demonstrated by engineering time and space in a demo ontology about football.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In-network storage of data in wireless sensor networks contributes to reduce the communications inside the network and to favor data aggregation. In this paper, we consider the use of n out of m codes and data dispersal in combination to in-network storage. In particular, we provide an abstract model of in-network storage to show how n out of m codes can be used, and we discuss how this can be achieved in five cases of study. We also define a model aimed at evaluating the probability of correct data encoding and decoding, we exploit this model and simulations to show how, in the cases of study, the parameters of the n out of m codes and the network should be configured in order to achieve correct data coding and decoding with high probability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, data centers are large energy consumers and the trend for next years is expected to increase further, considering the growth in the order of cloud services. A large portion of this power consumption is due to the control of physical parameters of the data center (such as temperature and humidity). However, these physical parameters are tightly coupled with computations, and even more so in upcoming data centers, where the location of workloads can vary substantially due, for example, to workloads being moved in the cloud infrastructure hosted in the data center. Therefore, managing the physical and compute infrastructure of a large data center is an embodiment of a Cyber-Physical System (CPS). In this paper, we describe a data collection and distribution architecture that enables gathering physical parameters of a large data center at a very high temporal and spatial resolution of the sensor measurements. We think this is an important characteristic to enable more accurate heat-flow models of the data center and with them, find opportunities to optimize energy consumptions. Having a high-resolution picture of the data center conditions, also enables minimizing local hot-spots, perform more accurate predictive maintenance (failures in all infrastructure equipments can be more promptly detected) and more accurate billing. We detail this architecture and define the structure of the underlying messaging system that is used to collect and distribute the data. Finally, we show the results of a preliminary study of a typical data center radio environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex industrial plants exhibit multiple interactions among smaller parts and with human operators. Failure in one part can propagate across subsystem boundaries causing a serious disaster. This paper analyzes the industrial accident data series in the perspective of dynamical systems. First, we process real world data and show that the statistics of the number of fatalities reveal features that are well described by power law (PL) distributions. For early years, the data reveal double PL behavior, while, for more recent time periods, a single PL fits better into the experimental data. Second, we analyze the entropy of the data series statistics over time. Third, we use the Kullback–Leibler divergence to compare the empirical data and multidimensional scaling (MDS) techniques for data analysis and visualization. Entropy-based analysis is adopted to assess complexity, having the advantage of yielding a single parameter to express relationships between the data. The classical and the generalized (fractional) entropy and Kullback–Leibler divergence are used. The generalized measures allow a clear identification of patterns embedded in the data.