951 resultados para Data storage


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One challenge on data assimilation (DA) methods is how the error covariance for the model state is computed. Ensemble methods have been proposed for producing error covariance estimates, as error is propagated in time using the non-linear model. Variational methods, on the other hand, use the concepts of control theory, whereby the state estimate is optimized from both the background and the measurements. Numerical optimization schemes are applied which solve the problem of memory storage and huge matrix inversion needed by classical Kalman filter methods. Variational Ensemble Kalman filter (VEnKF), as a method inspired the Variational Kalman Filter (VKF), enjoys the benefits from both ensemble methods and variational methods. It avoids filter inbreeding problems which emerge when the ensemble spread underestimates the true error covariance. In VEnKF this is tackled by resampling the ensemble every time measurements are available. One advantage of VEnKF over VKF is that it needs neither tangent linear code nor adjoint code. In this thesis, VEnKF has been applied to a two-dimensional shallow water model simulating a dam-break experiment. The model is a public code with water height measurements recorded in seven stations along the 21:2 m long 1:4 m wide flume’s mid-line. Because the data were too sparse to assimilate the 30 171 model state vector, we chose to interpolate the data both in time and in space. The results of the assimilation were compared with that of a pure simulation. We have found that the results revealed by the VEnKF were more realistic, without numerical artifacts present in the pure simulation. Creating a wrapper code for a model and DA scheme might be challenging, especially when the two were designed independently or are poorly documented. In this thesis we have presented a non-intrusive approach of coupling the model and a DA scheme. An external program is used to send and receive information between the model and DA procedure using files. The advantage of this method is that the model code changes needed are minimal, only a few lines which facilitate input and output. Apart from being simple to coupling, the approach can be employed even if the two were written in different programming languages, because the communication is not through code. The non-intrusive approach is made to accommodate parallel computing by just telling the control program to wait until all the processes have ended before the DA procedure is invoked. It is worth mentioning the overhead increase caused by the approach, as at every assimilation cycle both the model and the DA procedure have to be initialized. Nonetheless, the method can be an ideal approach for a benchmark platform in testing DA methods. The non-intrusive VEnKF has been applied to a multi-purpose hydrodynamic model COHERENS to assimilate Total Suspended Matter (TSM) in lake Säkylän Pyhäjärvi. The lake has an area of 154 km2 with an average depth of 5:4 m. Turbidity and chlorophyll-a concentrations from MERIS satellite images for 7 days between May 16 and July 6 2009 were available. The effect of the organic matter has been computationally eliminated to obtain TSM data. Because of computational demands from both COHERENS and VEnKF, we have chosen to use 1 km grid resolution. The results of the VEnKF have been compared with the measurements recorded at an automatic station located at the North-Western part of the lake. However, due to TSM data sparsity in both time and space, it could not be well matched. The use of multiple automatic stations with real time data is important to elude the time sparsity problem. With DA, this will help in better understanding the environmental hazard variables for instance. We have found that using a very high ensemble size does not necessarily improve the results, because there is a limit whereby additional ensemble members add very little to the performance. Successful implementation of the non-intrusive VEnKF and the ensemble size limit for performance leads to an emerging area of Reduced Order Modeling (ROM). To save computational resources, running full-blown model in ROM is avoided. When the ROM is applied with the non-intrusive DA approach, it might result in a cheaper algorithm that will relax computation challenges existing in the field of modelling and DA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The value of integrating a heat storage into a geothermal district heating system has been investigated. The behaviour of the system under a novel operational strategy has been simulated focusing on the energetic, economic and environmental effects of the new strategy of incorporation of the heat storage within the system. A typical geothermal district heating system consists of several production wells, a system of pipelines for the transportation of the hot water to end-users, one or more re-injection wells and peak-up devices (usually fossil-fuel boilers). Traditionally in these systems, the production wells change their production rate throughout the day according to heat demand, and if their maximum capacity is exceeded the peak-up devices are used to meet the balance of the heat demand. In this study, it is proposed to maintain a constant geothermal production and add heat storage into the network. Subsequently, hot water will be stored when heat demand is lower than the production and the stored hot water will be released into the system to cover the peak demands (or part of these). It is not intended to totally phase-out the peak-up devices, but to decrease their use, as these will often be installed anyway for back-up purposes. Both the integration of a heat storage in such a system as well as the novel operational strategy are the main novelties of this thesis. A robust algorithm for the sizing of these systems has been developed. The main inputs are the geothermal production data, the heat demand data throughout one year or more and the topology of the installation. The outputs are the sizing of the whole system, including the necessary number of production wells, the size of the heat storage and the dimensions of the pipelines amongst others. The results provide several useful insights into the initial design considerations for these systems, emphasizing particularly the importance of heat losses. Simulations are carried out for three different cases of sizing of the installation (small, medium and large) to examine the influence of system scale. In the second phase of work, two algorithms are developed which study in detail the operation of the installation throughout a random day and a whole year, respectively. The first algorithm can be a potentially powerful tool for the operators of the installation, who can know a priori how to operate the installation on a random day given the heat demand. The second algorithm is used to obtain the amount of electricity used by the pumps as well as the amount of fuel used by the peak-up boilers over a whole year. These comprise the main operational costs of the installation and are among the main inputs of the third part of the study. In the third part of the study, an integrated energetic, economic and environmental analysis of the studied installation is carried out together with a comparison with the traditional case. The results show that by implementing heat storage under the novel operational strategy, heat is generated more cheaply as all the financial indices improve, more geothermal energy is utilised and less fuel is used in the peak-up boilers, with subsequent environmental benefits, when compared to the traditional case. Furthermore, it is shown that the most attractive case of sizing is the large one, although the addition of the heat storage most greatly impacts the medium case of sizing. In other words, the geothermal component of the installation should be sized as large as possible. This analysis indicates that the proposed solution is beneficial from energetic, economic, and environmental perspectives. Therefore, it can be stated that the aim of this study is achieved in its full potential. Furthermore, the new models for the sizing, operation and economic/energetic/environmental analyses of these kind of systems can be used with few adaptations for real cases, making the practical applicability of this study evident. Having this study as a starting point, further work could include the integration of these systems with end-user demands, further analysis of component parts of the installation (such as the heat exchangers) and the integration of a heat pump to maximise utilisation of geothermal energy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The share of variable renewable energy in electricity generation has seen exponential growth during the recent decades, and due to the heightened pursuit of environmental targets, the trend is to continue with increased pace. The two most important resources, wind and insolation both bear the burden of intermittency, creating a need for regulation and posing a threat to grid stability. One possibility to deal with the imbalance between demand and generation is to store electricity temporarily, which was addressed in this thesis by implementing a dynamic model of adiabatic compressed air energy storage (CAES) with Apros dynamic simulation software. Based on literature review, the existing models due to their simplifications were found insufficient for studying transient situations, and despite of its importance, the investigation of part load operation has not yet been possible with satisfactory precision. As a key result of the thesis, the cycle efficiency at design point was simulated to be 58.7%, which correlated well with literature information, and was validated through analytical calculations. The performance at part load was validated against models shown in literature, showing good correlation. By introducing wind resource and electricity demand data to the model, grid operation of CAES was studied. In order to enable the dynamic operation, start-up and shutdown sequences were approximated in dynamic environment, as far as is known, the first time, and a user component for compressor variable guide vanes (VGV) was implemented. Even in the current state, the modularly designed model offers a framework for numerous studies. The validity of the model is limited by the accuracy of VGV correlations at part load, and in addition the implementation of heat losses to the thermal energy storage is necessary to enable longer simulations. More extended use of forecasts is one of the important targets of development, if the system operation is to be optimised in future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The subpolar North Atlantic (SPNA) is important in the global carbon cycle because of the deep water ventilation processes that lead to both high uptake of atmospheric CO2 and large inventories of anthropogenic CO2 (C-ant). Thus, it is crucial to understand its response to increasing anthropogenic pressures. In this work, the budgets of dissolved inorganic carbon (DIC), C-ant and natural DIC (DICnat) in the eastern SPNA in the 2000s, are jointly analyzed using in situ data. The DICnat budget is found to be in steady state, confirming a long-standing hypothesis from in situ data for the first time. The biological activity is driving the uptake of natural CO2 from the atmosphere. The C-ant increase in the ocean is solely responsible of the DIC storage rate which is explained by advection of C-ant from the subtropics (65%) and C-ant air-sea flux (35%). These results demonstrate that the C-ant is accumulating in the SPNA without affecting the natural carbon cycle.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Improper handling has been identified as one of the major reasons for the decline in vaccine potency at the time of administration. Loss of potency becomes evident when immunised individuals contract the diseases the vaccines were meant to prevent. Objective: Assessing the factors associated with vaccine handling and storage practices. Methods: This was a cross-sectional study. Three-stage sampling was used to recruit 380 vaccine handlers from 273 health facilities from 11 Local Government areas in Ibadan. Data was analysed using SPSS version 16 Results: Seventy-three percent were aware of vaccine handling and storage guidelines with 68.4% having ever read such guidelines. Only 15.3% read a guideline less than 1 month prior to the study. About 65.0% had received training on vaccine management. Incorrect handling practices reported included storing injections with vaccines (13.7%) and maintaining vaccine temperature using ice blocks (7.6%). About 43.0% had good knowledge of vaccine management, while 66.1% had good vaccine management practices. Respondents who had good knowledge of vaccine handling and storage [OR=10.0, 95%CI (5.28 – 18.94), p < 0.001] and had received formal training on vaccine management [OR=5.3, 95%CI (2.50 – 11.14), p< 0.001] were more likely to have good vaccine handling and storage practices. Conclusion: Regular training is recommended to enhance vaccine handling and storage practices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

significant amount of Expendable Bathythermograph (XBT) data has been collected in the Mediterranean Sea since 1999 in the framework of operational oceanography activities. The management and storage of such a volume of data poses significant challenges and opportunities. The SeaDataNet project, a pan-European infrastructure for marine data diffusion, provides a convenient way to avoid dispersion of these temperature vertical profiles and to facilitate access to a wider public. The XBT data flow, along with the recent improvements in the quality check procedures and the consistence of the available historical data set are described. The main features of SeaDataNet services and the advantage of using this system for long-term data archiving are presented. Finally, focus on the Ligurian Sea is included in order to provide an example of the kind of information and final products devoted to different users can be easily derived from the SeaDataNet web portal.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The CATARINA Leg1 cruise was carried out from June 22 to July 24 2012 on board the B/O Sarmiento de Gamboa, under the scientific supervision of Aida Rios (CSIC-IIM). It included the occurrence of the OVIDE hydrological section that was performed in June 2002, 2004, 2006, 2008 and 2010, as part of the CLIVAR program (name A25) ), and under the supervision of Herlé Mercier (CNRSLPO). This section begins near Lisbon (Portugal), runs through the West European Basin and the Iceland Basin, crosses the Reykjanes Ridge (300 miles north of Charlie-Gibbs Fracture Zone, and ends at Cape Hoppe (southeast tip of Greenland). The objective of this repeated hydrological section is to monitor the variability of water mass properties and main current transports in the basin, complementing the international observation array relevant for climate studies. In addition, the Labrador Sea was partly sampled (stations 101-108) between Greenland and Newfoundland, but heavy weather conditions prevented the achievement of the section south of 53°40’N. The quality of CTD data is essential to reach the first objective of the CATARINA project, i.e. to quantify the Meridional Overturning Circulation and water mass ventilation changes and their effect on the changes in the anthropogenic carbon ocean uptake and storage capacity. The CATARINA project was mainly funded by the Spanish Ministry of Sciences and Innovation and co-funded by the Fondo Europeo de Desarrollo Regional. The hydrological OVIDE section includes 95 surface-bottom stations from coast to coast, collecting profiles of temperature, salinity, oxygen and currents, spaced by 2 to 25 Nm depending on the steepness of the topography. The position of the stations closely follows that of OVIDE 2002. In addition, 8 stations were carried out in the Labrador Sea. From the 24 bottles closed at various depth at each stations, samples of sea water are used for salinity and oxygen calibration, and for measurements of biogeochemical components that are not reported here. The data were acquired with a Seabird CTD (SBE911+) and an SBE43 for the dissolved oxygen, belonging to the Spanish UTM group. The software SBE data processing was used after decoding and cleaning the raw data. Then, the LPO matlab toolbox was used to calibrate and bin the data as it was done for the previous OVIDE cruises, using on the one hand pre and post-cruise calibration results for the pressure and temperature sensors (done at Ifremer) and on the other hand the water samples of the 24 bottles of the rosette at each station for the salinity and dissolved oxygen data. A final accuracy of 0.002°C, 0.002 psu and 0.04 ml/l (2.3 umol/kg) was obtained on final profiles of temperature, salinity and dissolved oxygen, compatible with international requirements issued from the WOCE program.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A comprehensive environmental monitoring program was conducted in the Ojo Guareña cave system (Spain), one of the longest cave systems in Europe, to assess the magnitude of the spatiotemporal changes in carbon dioxide gas (CO2) in the cave–soil–atmosphere profile. The key climate-driven processes involved in gas exchange, primarily gas diffusion and cave ventilation due to advective forces, were characterized. The spatial distributions of both processes were described through measurements of CO2 and its carbon isotopic signal (δ13C[CO2]) from exterior, soil and cave air samples analyzed by cavity ring-down spectroscopy (CRDS). The trigger mechanisms of air advection (temperature or air density differences or barometric imbalances) were controlled by continuous logging systems. Radon monitoring was also used to characterize the changing airflow that results in a predictable seasonal or daily pattern of CO2 concentrations and its carbon isotopic signal. Large daily oscillations of CO2 levels, ranging from 680 to 1900 ppm day−1 on average, were registered during the daily oscillations of the exterior air temperature around the cave air temperature. These daily variations in CO2 concentration were unobservable once the outside air temperature was continuously below the cave temperature and a prevailing advective-renewal of cave air was established, such that the daily-averaged concentrations of CO2 reached minimum values close to atmospheric background. The daily pulses of CO2 and other tracer gases such as radon (222Rn) were smoothed in the inner cave locations, where fluctuation of both gases was primarily correlated with medium-term changes in air pressure. A pooled analysis of these data provided evidence that atmospheric air that is inhaled into dynamically ventilated caves can then return to the lower troposphere as CO2-rich cave air.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sweet potato is an important strategic agricultural crop grown in many countries around the world. The roots and aerial vine components of the crop are used for both human consumption and, to some extent as a cheap source of animal feed. In spite of its economic value and growing contribution to health and nutrition, harvested sweet potato roots and aerial vine components has limited shelf-life and is easily susceptible to post-harvest losses. Although post-harvest losses of both sweet potato roots and aerial vine components is significant, there is no information available that will support the design and development of appropriate storage and preservation systems. In this context, the present study was initiated to improve scientific knowledge about sweet potato post-harvest handling. Additionally, the study also seeks to develop a PV ventilated mud storehouse for storage of sweet potato roots under tropical conditions. In study one, airflow resistance of sweet potato aerial vine components was investigated. The influence of different operating parameters such as airflow rate, moisture content and bulk depth at different levels on airflow resistance was analyzed. All the operating parameters were observed to have significant (P < 0.01) effect on airflow resistance. Prediction models were developed and were found to adequately describe the experimental pressure drop data. In study two, the resistance of airflow through unwashed and clean sweet potato roots was investigated. The effect of sweet potato roots shape factor, surface roughness, orientation to airflow, and presence of soil fraction on airflow resistance was also assessed. The pressure drop through unwashed and clean sweet potato roots was observed to increase with higher airflow, bed depth, root grade composition, and presence of soil fraction. The physical properties of the roots were incorporated into a modified Ergun model and compared with a modified Shedd’s model. The modified Ergun model provided the best fit to the experimental data when compared with the modified Shedd’s model. In study three, the effect of sweet potato root size (medium and large), different air velocity and temperature on the cooling/or heating rate and time of individual sweet potato roots were investigated. Also, a simulation model which is based on the fundamental solution of the transient equations was proposed for estimating the cooling and heating time at the centre of sweet potato roots. The results showed that increasing air velocity during cooling and heating significantly (P < 0.05) affects the cooling and heating times. Furthermore, the cooling and heating times were significantly different (P < 0.05) among medium and large size sweet potato roots. Comparison of the simulation results with experimental data confirmed that the transient simulation model can be used to accurately estimate the cooling and heating times of whole sweet potato roots under forced convection conditions. In study four, the performance of charcoal evaporative cooling pad configurations for integration into sweet potato roots storage systems was investigated. The experiments were carried out at different levels of air velocity, water flow rates, and three pad configurations: single layer pad (SLP), double layers pad (DLP) and triple layers pad (TLP) made out of small and large size charcoal particles. The results showed that higher air velocity has tremendous effect on pressure drop. Increasing the water flow rate above the range tested had no practical benefits in terms of cooling. It was observed that DLP and TLD configurations with larger wet surface area for both types of pads provided high cooling efficiencies. In study five, CFD technique in the ANSYS Fluent software was used to simulate airflow distribution in a low-cost mud storehouse. By theoretically investigating different geometries of air inlet, plenum chamber, and outlet as well as its placement using ANSYS Fluent software, an acceptable geometry with uniform air distribution was selected and constructed. Experimental measurements validated the selected design. In study six, the performance of the developed PV ventilated system was investigated. Field measurements showed satisfactory results of the directly coupled PV ventilated system. Furthermore, the option of integrating a low-cost evaporative cooling system into the mud storage structure was also investigated. The results showed a reduction of ambient temperature inside the mud storehouse while relative humidity was enhanced. The ability of the developed storage system to provide and maintain airflow, temperature and relative humidity which are the key parameters for shelf-life extension of sweet potato roots highlight its ability to reduce post-harvest losses at the farmer level, particularly under tropical climate conditions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Power-to-Gas storage systems have the potential to address grid-stability issues that arise when an increasing share of power is generated from sources that have a highly variable output. Although the proof-of-concept of these has been promising, the behaviour of the processes in off-design conditions is not easily predictable. The primary aim of this PhD project was to evaluate the performance of an original Power-to-Gas system, made up of innovative components. To achieve this, a numerical model has been developed to simulate the characteristics and the behaviour of the several components when the whole system is coupled with a renewable source. The developed model has been applied to a large variety of scenarios, evaluating the performance of the considered process and exploiting a limited amount of experimental data. The model has been then used to compare different Power-to-Gas concepts, in a real scenario of functioning. Several goals have been achieved. In the concept phase, the possibility to thermally integrate the high temperature components has been demonstrated. Then, the parameters that affect the energy performance of a Power-to-Gas system coupled with a renewable source have been identified, providing general recommendations on the design of hybrid systems; these parameters are: 1) the ratio between the storage system size and the renewable generator size; 2) the type of coupled renewable source; 3) the related production profile. Finally, from the results of the comparative analysis, it is highlighted that configurations with a highly oversized renewable source with respect to the storage system show the maximum achievable profit.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The General Data Protection Regulation (GDPR) has been designed to help promote a view in favor of the interests of individuals instead of large corporations. However, there is the need of more dedicated technologies that can help companies comply with GDPR while enabling people to exercise their rights. We argue that such a dedicated solution must address two main issues: the need for more transparency towards individuals regarding the management of their personal information and their often hindered ability to access and make interoperable personal data in a way that the exercise of one's rights would result in straightforward. We aim to provide a system that helps to push personal data management towards the individual's control, i.e., a personal information management system (PIMS). By using distributed storage and decentralized computing networks to control online services, users' personal information could be shifted towards those directly concerned, i.e., the data subjects. The use of Distributed Ledger Technologies (DLTs) and Decentralized File Storage (DFS) as an implementation of decentralized systems is of paramount importance in this case. The structure of this dissertation follows an incremental approach to describing a set of decentralized systems and models that revolves around personal data and their subjects. Each chapter of this dissertation builds up the previous one and discusses the technical implementation of a system and its relation with the corresponding regulations. We refer to the EU regulatory framework, including GDPR, eIDAS, and Data Governance Act, to build our final system architecture's functional and non-functional drivers. In our PIMS design, personal data is kept in a Personal Data Space (PDS) consisting of encrypted personal data referring to the subject stored in a DFS. On top of that, a network of authorization servers acts as a data intermediary to provide access to potential data recipients through smart contracts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discovery of new materials and their functions has always been a fundamental component of technological progress. Nowadays, the quest for new materials is stronger than ever: sustainability, medicine, robotics and electronics are all key assets which depend on the ability to create specifically tailored materials. However, designing materials with desired properties is a difficult task, and the complexity of the discipline makes it difficult to identify general criteria. While scientists developed a set of best practices (often based on experience and expertise), this is still a trial-and-error process. This becomes even more complex when dealing with advanced functional materials. Their properties depend on structural and morphological features, which in turn depend on fabrication procedures and environment, and subtle alterations leads to dramatically different results. Because of this, materials modeling and design is one of the most prolific research fields. Many techniques and instruments are continuously developed to enable new possibilities, both in the experimental and computational realms. Scientists strive to enforce cutting-edge technologies in order to make progress. However, the field is strongly affected by unorganized file management, proliferation of custom data formats and storage procedures, both in experimental and computational research. Results are difficult to find, interpret and re-use, and a huge amount of time is spent interpreting and re-organizing data. This also strongly limit the application of data-driven and machine learning techniques. This work introduces possible solutions to the problems described above. Specifically, it talks about developing features for specific classes of advanced materials and use them to train machine learning models and accelerate computational predictions for molecular compounds; developing method for organizing non homogeneous materials data; automate the process of using devices simulations to train machine learning models; dealing with scattered experimental data and use them to discover new patterns.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

LHC experiments produce an enormous amount of data, estimated of the order of a few PetaBytes per year. Data management takes place using the Worldwide LHC Computing Grid (WLCG) grid infrastructure, both for storage and processing operations. However, in recent years, many more resources are available on High Performance Computing (HPC) farms, which generally have many computing nodes with a high number of processors. Large collaborations are working to use these resources in the most efficient way, compatibly with the constraints imposed by computing models (data distributed on the Grid, authentication, software dependencies, etc.). The aim of this thesis project is to develop a software framework that allows users to process a typical data analysis workflow of the ATLAS experiment on HPC systems. The developed analysis framework shall be deployed on the computing resources of the Open Physics Hub project and on the CINECA Marconi100 cluster, in view of the switch-on of the Leonardo supercomputer, foreseen in 2023.