2 resultados para Significance-driven computing

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern scientific discoveries are driven by an unsatisfiable demand for computational resources. High-Performance Computing (HPC) systems are an aggregation of computing power to deliver considerably higher performance than one typical desktop computer can provide, to solve large problems in science, engineering, or business. An HPC room in the datacenter is a complex controlled environment that hosts thousands of computing nodes that consume electrical power in the range of megawatts, which gets completely transformed into heat. Although a datacenter contains sophisticated cooling systems, our studies indicate quantitative evidence of thermal bottlenecks in real-life production workload, showing the presence of significant spatial and temporal thermal and power heterogeneity. Therefore minor thermal issues/anomalies can potentially start a chain of events that leads to an unbalance between the amount of heat generated by the computing nodes and the heat removed by the cooling system originating thermal hazards. Although thermal anomalies are rare events, anomaly detection/prediction in time is vital to avoid IT and facility equipment damage and outage of the datacenter, with severe societal and business losses. For this reason, automated approaches to detect thermal anomalies in datacenters have considerable potential. This thesis analyzed and characterized the power and thermal characteristics of a Tier0 datacenter (CINECA) during production and under abnormal thermal conditions. Then, a Deep Learning (DL)-powered thermal hazard prediction framework is proposed. The proposed models are validated against real thermal hazard events reported for the studied HPC cluster while in production. This thesis is the first empirical study of thermal anomaly detection and prediction techniques of a real large-scale HPC system to the best of my knowledge. For this thesis, I used a large-scale dataset, monitoring data of tens of thousands of sensors for around 24 months with a data collection rate of around 20 seconds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.