921 resultados para Processing wikipedia data
Resumo:
Structural Health Monitoring (SHM) is the process of characterization for existing civil structures that proposes for damage detection and structural identification. It's based firstly on the collection of data that are inevitably affected by noise. In this work a procedure to denoise the measured acceleration signal is proposed, based on EMD-thresholding techniques. Moreover the velocity and displacement responses are estimated, starting from measured acceleration.
Resumo:
A main objective of the human movement analysis is the quantitative description of joint kinematics and kinetics. This information may have great possibility to address clinical problems both in orthopaedics and motor rehabilitation. Previous studies have shown that the assessment of kinematics and kinetics from stereophotogrammetric data necessitates a setup phase, special equipment and expertise to operate. Besides, this procedure may cause feeling of uneasiness on the subjects and may hinder with their walking. The general aim of this thesis is the implementation and evaluation of new 2D markerless techniques, in order to contribute to the development of an alternative technique to the traditional stereophotogrammetric techniques. At first, the focus of the study has been the estimation of the ankle-foot complex kinematics during stance phase of the gait. Two particular cases were considered: subjects barefoot and subjects wearing ankle socks. The use of socks was investigated in view of the development of the hybrid method proposed in this work. Different algorithms were analyzed, evaluated and implemented in order to have a 2D markerless solution to estimate the kinematics for both cases. The validation of the proposed technique was done with a traditional stereophotogrammetric system. The implementation of the technique leads towards an easy to configure (and more comfortable for the subject) alternative to the traditional stereophotogrammetric system. Then, the abovementioned technique has been improved so that the measurement of knee flexion/extension could be done with a 2D markerless technique. The main changes on the implementation were on occlusion handling and background segmentation. With the additional constraints, the proposed technique was applied to the estimation of knee flexion/extension and compared with a traditional stereophotogrammetric system. Results showed that the knee flexion/extension estimation from traditional stereophotogrammetric system and the proposed markerless system were highly comparable, making the latter a potential alternative for clinical use. A contribution has also been given in the estimation of lower limb kinematics of the children with cerebral palsy (CP). For this purpose, a hybrid technique, which uses high-cut underwear and ankle socks as “segmental markers” in combination with a markerless methodology, was proposed. The proposed hybrid technique is different than the abovementioned markerless technique in terms of the algorithm chosen. Results showed that the proposed hybrid technique can become a simple and low-cost alternative to the traditional stereophotogrammetric systems.
Resumo:
The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (≃100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the “instrument familiarization plan”), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.
Resumo:
Radiometals play an important role in nuclear medicine as involved in diagnostic or therapeutic agents. In the present work the radiochemical aspects of production and processing of very promising radiometals of the third group of the periodic table, namely radiogallium and radiolanthanides are investigated. The 68Ge/68Ga generator (68Ge, T½ = 270.8 d) provides a cyclotron-independent source of positron-emitting 68Ga (T½ = 68 min), which can be used for coordinative labelling. However, for labelling of biomolecules via bifunctional chelators, particularly if legal aspects of production of radiopharmaceuticals are considered, 68Ga(III) as eluted initially needs to be pre-concentrated and purified. The first experimental chapter describes a system for simple and efficient handling of the 68Ge/68Ga generator eluates with a cation-exchange micro-chromatography column as the main component. Chemical purification and volume concentration of 68Ga(III) are carried out in hydrochloric acid – acetone media. Finally, generator produced 68Ga(III) is obtained with an excellent radiochemical and chemical purity in a minimised volume in a form applicable directly for the synthesis of 68Ga-labelled radiopharmaceuticals. For labelling with 68Ga(III), somatostatin analogue DOTA-octreotides (DOTATOC, DOTANOC) are used. 68Ga-DOTATOC and 68Ga-DOTANOC were successfully used to diagnose human somatostatin receptor-expressing tumours with PET/CT. Additionally, the proposed method was adapted for purification and medical utilisation of the cyclotron produced SPECT gallium radionuclide 67Ga(III). Second experimental chapter discusses a diagnostic radiolanthanide 140Nd, produced by irradiation of macro amounts of natural CeO2 and Pr2O3 in natCe(3He,xn)140Nd and 141Pr(p,2n)140Nd nuclear reactions, respectively. With this produced and processed 140Nd an efficient 140Nd/140Pr radionuclide generator system has been developed and evaluated. The principle of radiochemical separation of the mother and daughter radiolanthanides is based on physical-chemical transitions (hot-atom effects) of 140Pr following the electron capture process of 140Nd. The mother radionuclide 140Nd(III) is quantitatively absorbed on a solid phase matrix in the chemical form of 140Nd-DOTA-conjugated complexes, while daughter nuclide 140Pr is generated in an ionic species. With a very high elution yield and satisfactory chemical and radiolytical stability the system could able to provide the short-lived positron-emitting radiolanthanide 140Pr for PET investigations. In the third experimental chapter, analogously to physical-chemical transitions after the radioactive decay of 140Nd in 140Pr-DOTA, the rapture of the chemical bond between a radiolanthanide and the DOTA ligand, after the thermal neutron capture reaction (Szilard-Chalmers effect) was evaluated for production of the relevant radiolanthanides with high specific activity at TRIGA II Mainz nuclear reactor. The physical-chemical model was developed and first quantitative data are presented. As an example, 166Ho could be produced with a specific activity higher than its limiting value for TRIGA II Mainz, namely about 2 GBq/mg versus 0.9 GBq/mg. While free 166Ho(III) is produced in situ, it is not forming a 166Ho-DOTA complex and therefore can be separated from the inactive 165Ho-DOTA material. The analysis of the experimental data shows that radionuclides with half-life T½ < 64 h can be produced on TRIGA II Mainz nuclear reactor, with specific activity higher than any available at irradiation of simple targets e.g. oxides.
Resumo:
The objective of the work is the evaluation of the potential capabilities of navigation satellite signals to retrieve basic atmospheric parameters. A capillary study have been performed on the assumptions more or less explicitly contained in the common processing steps of navigation signals. A probabilistic procedure has been designed for measuring vertical discretised profiles of pressure, temperature and water vapour and their associated errors. Numerical experiments on a synthetic dataset have been performed with the main objective of quantifying the information that could be gained from such approach, using entropy and relative entropy as testing parameters. A simulator of phase delay and bending of a GNSS signal travelling across the atmosphere has been developed to this aim.
Resumo:
This PhD thesis concerns geochemical constraints on recycling and partial melting of Archean continental crust. A natural example of such processes was found in the Iisalmi area of Central Finland. The rocks from this area are Middle to Late Archean in age and experienced metamorphism and partial melting between 2.7-2.63 Ga. The work is based on extensive field work. It is furthermore founded on bulk rock geochemical data as well as in-situ analyses of minerals. All geochemical data were obtained at the Institute of Geosciences, University of Mainz using X-ray fluorescence, solution ICP-MS and laser ablation-ICP-MS for bulk rock geochemical analyses. Mineral analyses were accomplished by electron microprobe and laser ablation ICP-MS. Fluid inclusions were studied by microscope on a heating-freezing-stage at the Geoscience Center, University Göttingen. Part I focuses on the development of a new analytical method for bulk rock trace element determination by laser ablation-ICP-MS using homogeneous glasses fused from rock powder on an Iridium strip heater. This method is applicable for mafic rock samples whose melts have low viscosities and homogenize quickly at temperatures of ~1200°C. Highly viscous melts of felsic samples prevent melting and homogenization at comparable temperatures. Fusion of felsic samples can be enabled by addition of MgO to the rock powder and adjustment of melting temperature and melting duration to the rock composition. Advantages of the fusion method are low detection limits compared to XRF analyses and avoidance of wet-chemical processing and use of strong acids as in solution ICP-MS as well as smaller sample volumes compared to the other methods. Part II of the thesis uses bulk rock geochemical data and results from fluid inclusion studies for discrimination of melting processes observed in different rock types. Fluid inclusion studies demonstrate a major change in fluid composition from CO2-dominated fluids in granulites to aqueous fluids in TTG gneisses and amphibolites. Partial melts were generated in the dry, CO2-rich environment by dehydration melting reactions of amphibole which in addition to tonalitic melts produced the anhydrous mineral assemblages of granulites (grt + cpx + pl ± amph or opx + cpx + pl + amph). Trace element modeling showed that mafic granulites are residues of 10-30 % melt extraction from amphibolitic precursor rocks. The maximum degree of melting in intermediate granulites was ~10 % as inferred from modal abundances of amphibole, clinopyroxene and orthopyroxene. Carbonic inclusions are absent in upper-amphibolite facies migmatites whereas aqueous inclusion with up to 20 wt% NaCl are abundant. This suggests that melting within TTG gneisses and amphibolites took place in the presence of an aqueous fluid phase that enabled melting at the wet solidus at temperatures of 700-750°C. The strong disruption of pre-metamorphic structures in some outcrops suggests that the maximum amount of melt in TTG gneisses was ~25 vol%. The presence of leucosomes in all rock types is taken as the principle evidence for melt formation. However, mineralogical appearance as well as major and trace element composition of many leucosomes imply that leucosomes seldom represent frozen in-situ melts. They are better considered as remnants of the melt channel network, e.g. ways on which melts escaped from the system. Part III of the thesis describes how analyses of minerals from a specific rock type (granulite) can be used to determine partition coefficients between different minerals and between minerals and melt suitable for lower crustal conditions. The trace element analyses by laser ablation-ICP-MS show coherent distribution among the principal mineral phases independent of rock composition. REE contents in amphibole are about 3 times higher than REE contents in clinopyroxene from the same sample. This consistency has to be taken into consideration in models of lower crustal melting where amphibole is replaced by clinopyroxene in the course of melting. A lack of equilibrium is observed between matrix clinopyroxene / amphibole and garnet porphyroblasts which suggests a late stage growth of garnet and slow diffusion and equilibration of the REE during metamorphism. The data provide a first set of distribution coefficients of the transition metals (Sc, V, Cr, Ni) in the lower crust. In addition, analyses of ilmenite and apatite demonstrate the strong influence of accessory phases on trace element distribution. Apatite contains high amounts of REE and Sr while ilmenite incorporates about 20-30 times higher amounts of Nb and Ta than amphibole. Furthermore, trace element mineral analyses provide evidence for magmatic processes such as melt depletion, melt segregation, accumulation and fractionation as well as metasomatism having operated in this high-grade anatectic area.
Resumo:
Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.
Resumo:
This thesis investigates two distinct research topics. The main topic (Part I) is the computational modelling of cardiomyocytes derived from human stem cells, both embryonic (hESC-CM) and induced-pluripotent (hiPSC-CM). The aim of this research line lies in developing models of the electrophysiology of hESC-CM and hiPSC-CM in order to integrate the available experimental data and getting in-silico models to be used for studying/making new hypotheses/planning experiments on aspects not fully understood yet, such as the maturation process, the functionality of the Ca2+ hangling or why the hESC-CM/hiPSC-CM action potentials (APs) show some differences with respect to APs from adult cardiomyocytes. Chapter I.1 introduces the main concepts about hESC-CMs/hiPSC-CMs, the cardiac AP, and computational modelling. Chapter I.2 presents the hESC-CM AP model, able to simulate the maturation process through two developmental stages, Early and Late, based on experimental and literature data. Chapter I.3 describes the hiPSC-CM AP model, able to simulate the ventricular-like and atrial-like phenotypes. This model was used to assess which currents are responsible for the differences between the ventricular-like AP and the adult ventricular AP. The secondary topic (Part II) consists in the study of texture descriptors for biological image processing. Chapter II.1 provides an overview on important texture descriptors such as Local Binary Pattern or Local Phase Quantization. Moreover the non-binary coding and the multi-threshold approach are here introduced. Chapter II.2 shows that the non-binary coding and the multi-threshold approach improve the classification performance of cellular/sub-cellular part images, taken from six datasets. Chapter II.3 describes the case study of the classification of indirect immunofluorescence images of HEp2 cells, used for the antinuclear antibody clinical test. Finally the general conclusions are reported.
Resumo:
This thesis presents several data processing and compression techniques capable of addressing the strict requirements of wireless sensor networks. After introducing a general overview of sensor networks, the energy problem is introduced, dividing the different energy reduction approaches according to the different subsystem they try to optimize. To manage the complexity brought by these techniques, a quick overview of the most common middlewares for WSNs is given, describing in detail SPINE2, a framework for data processing in the node environment. The focus is then shifted on the in-network aggregation techniques, used to reduce data sent by the network nodes trying to prolong the network lifetime as long as possible. Among the several techniques, the most promising approach is the Compressive Sensing (CS). To investigate this technique, a practical implementation of the algorithm is compared against a simpler aggregation scheme, deriving a mixed algorithm able to successfully reduce the power consumption. The analysis moves from compression implemented on single nodes to CS for signal ensembles, trying to exploit the correlations among sensors and nodes to improve compression and reconstruction quality. The two main techniques for signal ensembles, Distributed CS (DCS) and Kronecker CS (KCS), are introduced and compared against a common set of data gathered by real deployments. The best trade-off between reconstruction quality and power consumption is then investigated. The usage of CS is also addressed when the signal of interest is sampled at a Sub-Nyquist rate, evaluating the reconstruction performance. Finally the group sparsity CS (GS-CS) is compared to another well-known technique for reconstruction of signals from an highly sub-sampled version. These two frameworks are compared again against a real data-set and an insightful analysis of the trade-off between reconstruction quality and lifetime is given.
Resumo:
Perfusion CT imaging of the liver has potential to improve evaluation of tumour angiogenesis. Quantitative parameters can be obtained applying mathematical models to Time Attenuation Curve (TAC). However, there are still some difficulties for an accurate quantification of perfusion parameters due, for example, to algorithms employed, to mathematical model, to patient’s weight and cardiac output and to the acquisition system. In this thesis, new parameters and alternative methodologies about liver perfusion CT are presented in order to investigate the cause of variability of this technique. Firstly analysis were made to assess the variability related to the mathematical model used to compute arterial Blood Flow (BFa) values. Results were obtained implementing algorithms based on “ maximum slope method” and “Dual input one compartment model” . Statistical analysis on simulated data demonstrated that the two methods are not interchangeable. Anyway slope method is always applicable in clinical context. Then variability related to TAC processing in the application of slope method is analyzed. Results compared with manual selection allow to identify the best automatic algorithm to compute BFa. The consistency of a Standardized Perfusion Index (SPV) was evaluated and a simplified calibration procedure was proposed. At the end the quantitative value of perfusion map was analyzed. ROI approach and map approach provide related values of BFa and this means that pixel by pixel algorithm give reliable quantitative results. Also in pixel by pixel approach slope method give better results. In conclusion the development of new automatic algorithms for a consistent computation of BFa and the analysis and definition of simplified technique to compute SPV parameter, represent an improvement in the field of liver perfusion CT analysis.
Resumo:
Ultrasound imaging is widely used in medical diagnostics as it is the fastest, least invasive, and least expensive imaging modality. However, ultrasound images are intrinsically difficult to be interpreted. In this scenario, Computer Aided Detection (CAD) systems can be used to support physicians during diagnosis providing them a second opinion. This thesis discusses efficient ultrasound processing techniques for computer aided medical diagnostics, focusing on two major topics: (i) Ultrasound Tissue Characterization (UTC), aimed at characterizing and differentiating between healthy and diseased tissue; (ii) Ultrasound Image Segmentation (UIS), aimed at detecting the boundaries of anatomical structures to automatically measure organ dimensions and compute clinically relevant functional indices. Research on UTC produced a CAD tool for Prostate Cancer detection to improve the biopsy protocol. In particular, this thesis contributes with: (i) the development of a robust classification system; (ii) the exploitation of parallel computing on GPU for real-time performance; (iii) the introduction of both an innovative Semi-Supervised Learning algorithm and a novel supervised/semi-supervised learning scheme for CAD system training that improve system performance reducing data collection effort and avoiding collected data wasting. The tool provides physicians a risk map highlighting suspect tissue areas, allowing them to perform a lesion-directed biopsy. Clinical validation demonstrated the system validity as a diagnostic support tool and its effectiveness at reducing the number of biopsy cores requested for an accurate diagnosis. For UIS the research developed a heart disease diagnostic tool based on Real-Time 3D Echocardiography. Thesis contributions to this application are: (i) the development of an automated GPU based level-set segmentation framework for 3D images; (ii) the application of this framework to the myocardium segmentation. Experimental results showed the high efficiency and flexibility of the proposed framework. Its effectiveness as a tool for quantitative analysis of 3D cardiac morphology and function was demonstrated through clinical validation.
Resumo:
Theoretical models are developed for the continuous-wave and pulsed laser incision and cut of thin single and multi-layer films. A one-dimensional steady-state model establishes the theoretical foundations of the problem by combining a power-balance integral with heat flow in the direction of laser motion. In this approach, classical modelling methods for laser processing are extended by introducing multi-layer optical absorption and thermal properties. The calculation domain is consequently divided in correspondence with the progressive removal of individual layers. A second, time-domain numerical model for the short-pulse laser ablation of metals accounts for changes in optical and thermal properties during a single laser pulse. With sufficient fluence, the target surface is heated towards its critical temperature and homogeneous boiling or "phase explosion" takes place. Improvements are seen over previous works with the more accurate calculation of optical absorption and shielding of the incident beam by the ablation products. A third, general time-domain numerical laser processing model combines ablation depth and energy absorption data from the short-pulse model with two-dimensional heat flow in an arbitrary multi-layer structure. Layer removal is the result of both progressive short-pulse ablation and classical vaporisation due to long-term heating of the sample. At low velocity, pulsed laser exposure of multi-layer films comprising aluminium-plastic and aluminium-paper are found to be characterised by short-pulse ablation of the metallic layer and vaporisation or degradation of the others due to thermal conduction from the former. At high velocity, all layers of the two films are ultimately removed by vaporisation or degradation as the average beam power is increased to achieve a complete cut. The transition velocity between the two characteristic removal types is shown to be a function of the pulse repetition rate. An experimental investigation validates the simulation results and provides new laser processing data for some typical packaging materials.
Resumo:
Several countries have acquired, over the past decades, large amounts of area covering Airborne Electromagnetic data. Contribution of airborne geophysics has dramatically increased for both groundwater resource mapping and management proving how those systems are appropriate for large-scale and efficient groundwater surveying. We start with processing and inversion of two AEM dataset from two different systems collected over the Spiritwood Valley Aquifer area, Manitoba, Canada respectively, the AeroTEM III (commissioned by the Geological Survey of Canada in 2010) and the “Full waveform VTEM” dataset, collected and tested over the same survey area, during the fall 2011. We demonstrate that in the presence of multiple datasets, either AEM and ground data, due processing, inversion, post-processing, data integration and data calibration is the proper approach capable of providing reliable and consistent resistivity models. Our approach can be of interest to many end users, ranging from Geological Surveys, Universities to Private Companies, which are often proprietary of large geophysical databases to be interpreted for geological and\or hydrogeological purposes. In this study we deeply investigate the role of integration of several complimentary types of geophysical data collected over the same survey area. We show that data integration can improve inversions, reduce ambiguity and deliver high resolution results. We further attempt to use the final, most reliable output resistivity models as a solid basis for building a knowledge-driven 3D geological voxel-based model. A voxel approach allows a quantitative understanding of the hydrogeological setting of the area, and it can be further used to estimate the aquifers volumes (i.e. potential amount of groundwater resources) as well as hydrogeological flow model prediction. In addition, we investigated the impact of an AEM dataset towards hydrogeological mapping and 3D hydrogeological modeling, comparing it to having only a ground based TEM dataset and\or to having only boreholes data.
Resumo:
The present thesis addresses several experimental questions regarding the nature of the processes underlying the larger centro-parietal Late Positive Potential (LPP) measured during the viewing of emotional(both pleasant and unpleasant) compared to neutral pictures. During a passive viewing condition, this modulatory difference is significantly reduced with picture repetition, but it does not completely habituate even after a massive repetition of the same picture exemplar. In order to investigate the obligatory nature of the affective modulation of the LPP, in Study 1 we introduced a competing task during repetitive exposure of affective pictures. Picture repetition occurred in a passive viewing context or during a categorization task, in which pictures depicting any mean of transportation were presented as targets, and repeated pictures (affectively engaging images) served as distractor stimuli. Results indicated that the impact of repetition on the LPP affective modulation was very similar between the passive and the task contexts, indicating that the affective processing of visual stimuli reflects an obligatory process that occurs despite participants were engaged in a categorization task. In study 2 we assessed whether the decrease of the LPP affective modulation persists over time, by presenting in day 2 the same set of pictures that were massively repeated in day 1. Results indicated that the reduction of the emotional modulation of the LPP to repeated pictures persisted even after 1-day interval, suggesting a contribution of long-term memory processes on the affective habituation of the LPP. Taken together, the data provide new information regarding the processes underlying the affective modulation of the late positive potential.
Resumo:
Data deduplication describes a class of approaches that reduce the storage capacity needed to store data or the amount of data that has to be transferred over a network. These approaches detect coarse-grained redundancies within a data set, e.g. a file system, and remove them.rnrnOne of the most important applications of data deduplication are backup storage systems where these approaches are able to reduce the storage requirements to a small fraction of the logical backup data size.rnThis thesis introduces multiple new extensions of so-called fingerprinting-based data deduplication. It starts with the presentation of a novel system design, which allows using a cluster of servers to perform exact data deduplication with small chunks in a scalable way.rnrnAfterwards, a combination of compression approaches for an important, but often over- looked, data structure in data deduplication systems, so called block and file recipes, is introduced. Using these compression approaches that exploit unique properties of data deduplication systems, the size of these recipes can be reduced by more than 92% in all investigated data sets. As file recipes can occupy a significant fraction of the overall storage capacity of data deduplication systems, the compression enables significant savings.rnrnA technique to increase the write throughput of data deduplication systems, based on the aforementioned block and file recipes, is introduced next. The novel Block Locality Caching (BLC) uses properties of block and file recipes to overcome the chunk lookup disk bottleneck of data deduplication systems. This chunk lookup disk bottleneck either limits the scalability or the throughput of data deduplication systems. The presented BLC overcomes the disk bottleneck more efficiently than existing approaches. Furthermore, it is shown that it is less prone to aging effects.rnrnFinally, it is investigated if large HPC storage systems inhibit redundancies that can be found by fingerprinting-based data deduplication. Over 3 PB of HPC storage data from different data sets have been analyzed. In most data sets, between 20 and 30% of the data can be classified as redundant. According to these results, future work in HPC storage systems should further investigate how data deduplication can be integrated into future HPC storage systems.rnrnThis thesis presents important novel work in different area of data deduplication re- search.