934 resultados para Data anonymization and sanitization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality reduction plays a crucial role in many hyperspectral data processing and analysis algorithms. This paper proposes a new mean squared error based approach to determine the signal subspace in hyperspectral imagery. The method first estimates the signal and noise correlations matrices, then it selects the subset of eigenvalues that best represents the signal subspace in the least square sense. The effectiveness of the proposed method is illustrated using simulated and real hyperspectral images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática - Área de Especialização em Tecnologias do Conhecimento e Decisão

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hyperspectral remote sensing exploits the electromagnetic scattering patterns of the different materials at specific wavelengths [2, 3]. Hyperspectral sensors have been developed to sample the scattered portion of the electromagnetic spectrum extending from the visible region through the near-infrared and mid-infrared, in hundreds of narrow contiguous bands [4, 5]. The number and variety of potential civilian and military applications of hyperspectral remote sensing is enormous [6, 7]. Very often, the resolution cell corresponding to a single pixel in an image contains several substances (endmembers) [4]. In this situation, the scattered energy is a mixing of the endmember spectra. A challenging task underlying many hyperspectral imagery applications is then decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions [8–10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. Linear mixing model holds approximately when the mixing scale is macroscopic [13] and there is negligible interaction among distinct endmembers [3, 14]. If, however, the mixing scale is microscopic (or intimate mixtures) [15, 16] and the incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [17], the linear model is no longer accurate. Linear spectral unmixing has been intensively researched in the last years [9, 10, 12, 18–21]. It considers that a mixed pixel is a linear combination of endmember signatures weighted by the correspondent abundance fractions. Under this model, and assuming that the number of substances and their reflectance spectra are known, hyperspectral unmixing is a linear problem for which many solutions have been proposed (e.g., maximum likelihood estimation [8], spectral signature matching [22], spectral angle mapper [23], subspace projection methods [24,25], and constrained least squares [26]). In most cases, the number of substances and their reflectances are not known and, then, hyperspectral unmixing falls into the class of blind source separation problems [27]. Independent component analysis (ICA) has recently been proposed as a tool to blindly unmix hyperspectral data [28–31]. ICA is based on the assumption of mutually independent sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance fractions is constant, implying statistical dependence among them. This dependence compromises ICA applicability to hyperspectral images as shown in Refs. [21, 32]. In fact, ICA finds the endmember signatures by multiplying the spectral vectors with an unmixing matrix, which minimizes the mutual information among sources. If sources are independent, ICA provides the correct unmixing, since the minimum of the mutual information is obtained only when sources are independent. This is no longer true for dependent abundance fractions. Nevertheless, some endmembers may be approximately unmixed. These aspects are addressed in Ref. [33]. Under the linear mixing model, the observations from a scene are in a simplex whose vertices correspond to the endmembers. Several approaches [34–36] have exploited this geometric feature of hyperspectral mixtures [35]. Minimum volume transform (MVT) algorithm [36] determines the simplex of minimum volume containing the data. The method presented in Ref. [37] is also of MVT type but, by introducing the notion of bundles, it takes into account the endmember variability usually present in hyperspectral mixtures. The MVT type approaches are complex from the computational point of view. Usually, these algorithms find in the first place the convex hull defined by the observed data and then fit a minimum volume simplex to it. For example, the gift wrapping algorithm [38] computes the convex hull of n data points in a d-dimensional space with a computational complexity of O(nbd=2cþ1), where bxc is the highest integer lower or equal than x and n is the number of samples. The complexity of the method presented in Ref. [37] is even higher, since the temperature of the simulated annealing algorithm used shall follow a log( ) law [39] to assure convergence (in probability) to the desired solution. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) [35] and the N-FINDR [40] still find the minimum volume simplex containing the data cloud, but they assume the presence of at least one pure pixel of each endmember in the data. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. PPI algorithm uses the minimum noise fraction (MNF) [41] as a preprocessing step to reduce dimensionality and to improve the signal-to-noise ratio (SNR). The algorithm then projects every spectral vector onto skewers (large number of random vectors) [35, 42,43]. The points corresponding to extremes, for each skewer direction, are stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) is found to be an extreme. The pixels with the highest scores are the purest ones. N-FINDR algorithm [40] is based on the fact that in p spectral dimensions, the p-volume defined by a simplex formed by the purest pixels is larger than any other volume defined by any other combination of pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside the data. ORA SIS [44, 45] is a hyperspectral framework developed by the U.S. Naval Research Laboratory consisting of several algorithms organized in six modules: exemplar selector, adaptative learner, demixer, knowledge base or spectral library, and spatial postrocessor. The first step consists in flat-fielding the spectra. Next, the exemplar selection module is used to select spectral vectors that best represent the smaller convex cone containing the data. The other pixels are rejected when the spectral angle distance (SAD) is less than a given thresh old. The procedure finds the basis for a subspace of a lower dimension using a modified Gram–Schmidt orthogonalizati on. The selected vectors are then projected onto this subspace and a simplex is found by an MV T pro cess. ORA SIS is oriented to real-time target detection from uncrewed air vehicles using hyperspectral data [46]. In this chapter we develop a new algorithm to unmix linear mixtures of endmember spectra. First, the algorithm determines the number of endmembers and the signal subspace using a newly developed concept [47, 48]. Second, the algorithm extracts the most pure pixels present in the data. Unlike other methods, this algorithm is completely automatic and unsupervised. To estimate the number of endmembers and the signal subspace in hyperspectral linear mixtures, the proposed scheme begins by estimating sign al and noise correlation matrices. The latter is based on multiple regression theory. The signal subspace is then identified by selectin g the set of signal eigenvalue s that best represents the data, in the least-square sense [48,49 ], we note, however, that VCA works with projected and with unprojected data. The extraction of the end members exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. As PPI and N-FIND R algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data on to a direction orthogonal to the subspace spanned by the endmembers already determined. The new end member signature corresponds to the extreme of the projection. The algorithm iterates until all end members are exhausted. VCA performs much better than PPI and better than or comparable to N-FI NDR; yet it has a computational complexity between on e and two orders of magnitude lower than N-FINDR. The chapter is structure d as follows. Section 19.2 describes the fundamentals of the proposed method. Section 19.3 and Section 19.4 evaluate the proposed algorithm using simulated and real data, respectively. Section 19.5 presents some concluding remarks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

WWW is a huge, open, heterogeneous system, however its contents data is mainly human oriented. The Semantic Web needs to assure that data is readable and “understandable” to intelligent software agents, though the use of explicit and formal semantics. Ontologies constitute a privileged artifact for capturing the semantic of the WWW data. Temporal and spatial dimensions are transversal to the generality of knowledge domains and therefore are fundamental for the reasoning process of software agents. Representing temporal/spatial evolution of concepts and their relations in OWL (W3C standard for ontologies) it is not straightforward. Although proposed several strategies to tackle this problem but there is still no formal and standard approach. This work main goal consists of development of methods/tools to support the engineering of temporal and spatial aspects in intelligent systems through the use of OWL ontologies. An existing method for ontology engineering, Fonte was used as framework for the development of this work. As main contributions of this work Fonte was re-engineered in order to: i) support the spatial dimension; ii) work with OWL Ontologies; iii) and support the application of Ontology Design Patterns. Finally, the capabilities of the proposed approach were demonstrated by engineering time and space in a demo ontology about football.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An individual experiences double coverage when he bene ts from more than one health insurance plan at the same time. This paper examines the impact of such supplementary insurance on the demand for health care services. Its novelty is that within the context of count data modelling and without imposing restrictive parametric assumptions, the analysis is carried out for di¤erent points of the conditional distribution, not only for its mean location. Results indicate that moral hazard is present across the whole outcome distribution for both public and private second layers of health insurance coverage but with greater magnitude in the latter group. By looking at di¤erent points we unveil that stronger double coverage e¤ects are smaller for high levels of usage. We use data for Portugal, taking advantage of particular features of the public and private protection schemes on top of the statutory National Health Service. By exploring the last Portuguese Health Survey, we were able to evaluate their impacts on the consumption of doctor visi

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis submitted to the Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, for the degree of Doctor of Philosophy in Biochemistry

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In-network storage of data in wireless sensor networks contributes to reduce the communications inside the network and to favor data aggregation. In this paper, we consider the use of n out of m codes and data dispersal in combination to in-network storage. In particular, we provide an abstract model of in-network storage to show how n out of m codes can be used, and we discuss how this can be achieved in five cases of study. We also define a model aimed at evaluating the probability of correct data encoding and decoding, we exploit this model and simulations to show how, in the cases of study, the parameters of the n out of m codes and the network should be configured in order to achieve correct data coding and decoding with high probability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, data centers are large energy consumers and the trend for next years is expected to increase further, considering the growth in the order of cloud services. A large portion of this power consumption is due to the control of physical parameters of the data center (such as temperature and humidity). However, these physical parameters are tightly coupled with computations, and even more so in upcoming data centers, where the location of workloads can vary substantially due, for example, to workloads being moved in the cloud infrastructure hosted in the data center. Therefore, managing the physical and compute infrastructure of a large data center is an embodiment of a Cyber-Physical System (CPS). In this paper, we describe a data collection and distribution architecture that enables gathering physical parameters of a large data center at a very high temporal and spatial resolution of the sensor measurements. We think this is an important characteristic to enable more accurate heat-flow models of the data center and with them, find opportunities to optimize energy consumptions. Having a high-resolution picture of the data center conditions, also enables minimizing local hot-spots, perform more accurate predictive maintenance (failures in all infrastructure equipments can be more promptly detected) and more accurate billing. We detail this architecture and define the structure of the underlying messaging system that is used to collect and distribute the data. Finally, we show the results of a preliminary study of a typical data center radio environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex industrial plants exhibit multiple interactions among smaller parts and with human operators. Failure in one part can propagate across subsystem boundaries causing a serious disaster. This paper analyzes the industrial accident data series in the perspective of dynamical systems. First, we process real world data and show that the statistics of the number of fatalities reveal features that are well described by power law (PL) distributions. For early years, the data reveal double PL behavior, while, for more recent time periods, a single PL fits better into the experimental data. Second, we analyze the entropy of the data series statistics over time. Third, we use the Kullback–Leibler divergence to compare the empirical data and multidimensional scaling (MDS) techniques for data analysis and visualization. Entropy-based analysis is adopted to assess complexity, having the advantage of yielding a single parameter to express relationships between the data. The classical and the generalized (fractional) entropy and Kullback–Leibler divergence are used. The generalized measures allow a clear identification of patterns embedded in the data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Electrotécnica e de Computadores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present work aims to achieve and further develop a hydrogeomechanical approach in Caldas da Cavaca hydromineral system rock mass (Aguiar da Beira, NW Portugal), and contribute to a better understanding of the hydrogeological conceptual site model. A collection of several data, namely geology, hydrogeology, rock and soil geotechnics, borehole hydraulics and hydrogeomechanics, was retrieved from three rock slopes (Lagoa, Amores and Cancela). To accomplish a comprehensive analysis and rock engineering conceptualisation of the site, a multi‐technical approach were used, such as, field and laboratory techniques, hydrogeotechnical mapping, hydrogeomechanical zoning and hydrogeomechanical scheme classifications and indexes. In addition, a hydrogeomechanical data analysis and assessment, such as Hydro‐Potential (HP)‐Value technique, JW Joint Water Reduction index, Hydraulic Classification (HC) System were applied on rock slopes. The hydrogeomechanical zone HGMZ 1 of Lagoa slope achieved higher hydraulic conductivities with poorer rock mass quality results, followed by the hydrogeomechanical zone HGMZ 2 of Lagoa slope, with poor to fair rock mass quality and lower hydraulic parameters. In addition, Amores slope had a fair to good rock mass quality and the lowest hydraulic conductivity. The hydrogeomechanical zone HGMZ 3 of Lagoa slope, and the hydrogeomechanical zones HGMZ 1 and HGMZ 2 of Cancela slope had a fair to poor rock mass quality but were completely dry. Geographical Information Systems (GIS) mapping technologies was used in overall hydrogeological and hydrogeomechanical data integration in order to improve the hydrogeological conceptual site model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Upper gastrointestinal bleeding is the severe complication of stress-related mucosal disease in hospitalized patients. In intensive care units (ICU), risk factors are well defined and only mechanical ventilation and coagulopathy proved to be relevant for significant bleeding. On the contrary, in non-ICU settings there is no consensus about this issue. Nevertheless, omeprazole is still widely used in prophylaxis of bleeding. The objective of our study was to evaluate the relevance of stress-related mucosal disease bleeding in patients admitted to an internal medicine ward, and the role of omeprazole in its prophylaxis. METHODS: We conducted a retrospective study in which we analysed consecutive patients who were admitted to our ward over a year. We recorded demographic characteristics of the patients, potential risk factors for stress-related mucosal disease (clinical data, laboratory, and medication), administration of prophylactic omeprazole, and total cost of this prophylaxis. Patients with active gastrointestinal bleeding on the admission were excluded. We recorded every upper gastrointestinal bleeding event with clinical relevance. RESULTS: Five hundred and thirty-five patients, mean age 70 years, mean length of stay 9.6+/-7.7 days; 140 (26.2%) patients were treated with 40 mg of omeprazole intravenously, 193 (36.1%) with 20mg of omeprazole orally, and 202 (37.8%) patients had no prophylaxis. There was only one episode (0.2%) of clinically relevant bleeding. CONCLUSION: In patients admitted to an internal medicine ward, incidence of upper gastrointestinal bleeding as a complication of stress-related mucosal disease is low. We found that there is no advantage in prophylaxis with omeprazole.