3 resultados para downloading of data
em AMS Tesi di Laurea - Alm@DL - Università di Bologna
Resumo:
The present work studies a km-scale data assimilation scheme based on a LETKF developed for the COSMO model. The aim is to evaluate the impact of the assimilation of two different types of data: temperature, humidity, pressure and wind data from conventional networks (SYNOP, TEMP, AIREP reports) and 3d reflectivity from radar volume. A 3-hourly continuous assimilation cycle has been implemented over an Italian domain, based on a 20 member ensemble, with boundary conditions provided from ECMWF ENS. Three different experiments have been run for evaluating the performance of the assimilation on one week in October 2014 during which Genova flood and Parma flood took place: a control run of the data assimilation cycle with assimilation of data from conventional networks only, a second run in which the SPPT scheme is activated into the COSMO model, a third run in which also reflectivity volumes from meteorological radar are assimilated. Objective evaluation of the experiments has been carried out both on case studies and on the entire week: check of the analysis increments, computing the Desroziers statistics for SYNOP, TEMP, AIREP and RADAR, over the Italian domain, verification of the analyses against data not assimilated (temperature at the lowest model level objectively verified against SYNOP data), and objective verification of the deterministic forecasts initialised with the KENDA analyses for each of the three experiments.
Resumo:
PhEDEx, the CMS transfer management system, during the first LHC Run has moved about 150 PB and currently it is moving about 2.5 PB of data per week over the Worldwide LHC Computing Grid (WLGC). It was designed to complete each transfer required by users at the expense of the waiting time necessary for its completion. For this reason, after several years of operations, data regarding transfer latencies has been collected and stored into log files containing useful analyzable informations. Then, starting from the analysis of several typical CMS transfer workflows, a categorization of such latencies has been made with a focus on the different factors that contribute to the transfer completion time. The analysis presented in this thesis will provide the necessary information for equipping PhEDEx in the future with a set of new tools in order to proactively identify and fix any latency issues. PhEDEx, il sistema di gestione dei trasferimenti di CMS, durante il primo Run di LHC ha trasferito all’incirca 150 PB ed attualmente trasferisce circa 2.5 PB di dati alla settimana attraverso la Worldwide LHC Computing Grid (WLCG). Questo sistema è stato progettato per completare ogni trasferimento richiesto dall’utente a spese del tempo necessario per il suo completamento. Dopo svariati anni di operazioni con tale strumento, sono stati raccolti dati relativi alle latenze di trasferimento ed immagazzinati in log files contenenti informazioni utili per l’analisi. A questo punto, partendo dall’analisi di una ampia mole di trasferimenti in CMS, è stata effettuata una suddivisione di queste latenze ponendo particolare attenzione nei confronti dei fattori che contribuiscono al tempo di completamento del trasferimento. L’analisi presentata in questa tesi permetterà di equipaggiare PhEDEx con un insieme di utili strumenti in modo tale da identificare proattivamente queste latenze e adottare le opportune tattiche per minimizzare l’impatto sugli utenti finali.
Resumo:
This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.