8 resultados para Data anonymization and sanitization

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, the use of Reverse Engineering systems has got a considerable interest for a wide number of applications. Therefore, many research activities are focused on accuracy and precision of the acquired data and post processing phase improvements. In this context, this PhD Thesis deals with the definition of two novel methods for data post processing and data fusion between physical and geometrical information. In particular a technique has been defined for error definition in 3D points’ coordinates acquired by an optical triangulation laser scanner, with the aim to identify adequate correction arrays to apply under different acquisition parameters and operative conditions. Systematic error in data acquired is thus compensated, in order to increase accuracy value. Moreover, the definition of a 3D thermogram is examined. Object geometrical information and its thermal properties, coming from a thermographic inspection, are combined in order to have a temperature value for each recognizable point. Data acquired by an optical triangulation laser scanner are also used to normalize temperature values and make thermal data independent from thermal-camera point of view.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The discovery of the Cosmic Microwave Background (CMB) radiation in 1965 is one of the fundamental milestones supporting the Big Bang theory. The CMB is one of the most important source of information in cosmology. The excellent accuracy of the recent CMB data of WMAP and Planck satellites confirmed the validity of the standard cosmological model and set a new challenge for the data analysis processes and their interpretation. In this thesis we deal with several aspects and useful tools of the data analysis. We focus on their optimization in order to have a complete exploitation of the Planck data and contribute to the final published results. The issues investigated are: the change of coordinates of CMB maps using the HEALPix package, the problem of the aliasing effect in the generation of low resolution maps, the comparison of the Angular Power Spectrum (APS) extraction performances of the optimal QML method, implemented in the code called BolPol, and the pseudo-Cl method, implemented in Cromaster. The QML method has been then applied to the Planck data at large angular scales to extract the CMB APS. The same method has been applied also to analyze the TT parity and the Low Variance anomalies in the Planck maps, showing a consistent deviation from the standard cosmological model, the possible origins for this results have been discussed. The Cromaster code instead has been applied to the 408 MHz and 1.42 GHz surveys focusing on the analysis of the APS of selected regions of the synchrotron emission. The new generation of CMB experiments will be dedicated to polarization measurements, for which are necessary high accuracy devices for separating the polarizations. Here a new technology, called Photonic Crystals, is exploited to develop a new polarization splitter device and its performances are compared to the devices used nowadays.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Internet of Things (IoT) is the next industrial revolution: we will interact naturally with real and virtual devices as a key part of our daily life. This technology shift is expected to be greater than the Web and Mobile combined. As extremely different technologies are needed to build connected devices, the Internet of Things field is a junction between electronics, telecommunications and software engineering. Internet of Things application development happens in silos, often using proprietary and closed communication protocols. There is the common belief that only if we can solve the interoperability problem we can have a real Internet of Things. After a deep analysis of the IoT protocols, we identified a set of primitives for IoT applications. We argue that each IoT protocol can be expressed in term of those primitives, thus solving the interoperability problem at the application protocol level. Moreover, the primitives are network and transport independent and make no assumption in that regard. This dissertation presents our implementation of an IoT platform: the Ponte project. Privacy issues follows the rise of the Internet of Things: it is clear that the IoT must ensure resilience to attacks, data authentication, access control and client privacy. We argue that it is not possible to solve the privacy issue without solving the interoperability problem: enforcing privacy rules implies the need to limit and filter the data delivery process. However, filtering data require knowledge of how the format and the semantics of the data: after an analysis of the possible data formats and representations for the IoT, we identify JSON-LD and the Semantic Web as the best solution for IoT applications. Then, this dissertation present our approach to increase the throughput of filtering semantic data by a factor of ten.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the era of the Internet of Everything, a user with a handheld or wearable device equipped with sensing capability has become a producer as well as a consumer of information and services. The more powerful these devices get, the more likely it is that they will generate and share content locally, leading to the presence of distributed information sources and the diminishing role of centralized servers. As of current practice, we rely on infrastructure acting as an intermediary, providing access to the data. However, infrastructure-based connectivity might not always be available or the best alternative. Moreover, it is often the case where the data and the processes acting upon them are of local scopus. Answers to a query about a nearby object, an information source, a process, an experience, an ability, etc. could be answered locally without reliance on infrastructure-based platforms. The data might have temporal validity limited to or bounded to a geographical area and/or the social context where the user is immersed in. In this envisioned scenario users could interact locally without the need for a central authority, hence, the claim of an infrastructure-less, provider-less platform. The data is owned by the users and consulted locally as opposed to the current approach of making them available globally and stay on forever. From a technical viewpoint, this network resembles a Delay/Disruption Tolerant Network where consumers and producers might be spatially and temporally decoupled exchanging information with each other in an adhoc fashion. To this end, we propose some novel data gathering and dissemination strategies for use in urban-wide environments which do not rely on strict infrastructure mediation. While preserving the general aspects of our study and without loss of generality, we focus our attention toward practical applicative scenarios which help us capture the characteristics of opportunistic communication networks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aging process is characterized by the progressive fitness decline experienced at all the levels of physiological organization, from single molecules up to the whole organism. Studies confirmed inflammaging, a chronic low-level inflammation, as a deeply intertwined partner of the aging process, which may provide the “common soil” upon which age-related diseases develop and flourish. Thus, albeit inflammation per se represents a physiological process, it can rapidly become detrimental if it goes out of control causing an excess of local and systemic inflammatory response, a striking risk factor for the elderly population. Developing interventions to counteract the establishment of this state is thus a top priority. Diet, among other factors, represents a good candidate to regulate inflammation. Building on top of this consideration, the EU project NU-AGE is now trying to assess if a Mediterranean diet, fortified for the elderly population needs, may help in modulating inflammaging. To do so, NU-AGE enrolled a total of 1250 subjects, half of which followed a 1-year long diet, and characterized them by mean of the most advanced –omics and non –omics analyses. The aim of this thesis was the development of a solid data management pipeline able to efficiently cope with the results of these assays, which are now flowing inside a centralized database, ready to be used to test the most disparate scientific hypotheses. At the same time, the work hereby described encompasses the data analysis of the GEHA project, which was focused on identifying the genetic determinants of longevity, with a particular focus on developing and applying a method for detecting epistatic interactions in human mtDNA. Eventually, in an effort to propel the adoption of NGS technologies in everyday pipeline, we developed a NGS variant calling pipeline devoted to solve all the sequencing-related issues of the mtDNA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (≃100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the “instrument familiarization plan”), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.