8 resultados para Distributed virtual machine
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community. The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner. Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services.
Resumo:
In this thesis, the author presents a query language for an RDF (Resource Description Framework) database and discusses its applications in the context of the HELM project (the Hypertextual Electronic Library of Mathematics). This language aims at meeting the main requirements coming from the RDF community. in particular it includes: a human readable textual syntax and a machine-processable XML (Extensible Markup Language) syntax both for queries and for query results, a rigorously exposed formal semantics, a graph-oriented RDF data access model capable of exploring an entire RDF graph (including both RDF Models and RDF Schemata), a full set of Boolean operators to compose the query constraints, fully customizable and highly structured query results having a 4-dimensional geometry, some constructions taken from ordinary programming languages that simplify the formulation of complex queries. The HELM project aims at integrating the modern tools for the automation of formal reasoning with the most recent electronic publishing technologies, in order create and maintain a hypertextual, distributed virtual library of formal mathematical knowledge. In the spirit of the Semantic Web, the documents of this library include RDF metadata describing their structure and content in a machine-understandable form. Using the author's query engine, HELM exploits this information to implement some functionalities allowing the interactive and automatic retrieval of documents on the basis of content-aware requests that take into account the mathematical nature of these documents.
Resumo:
Generic programming is likely to become a new challenge for a critical mass of developers. Therefore, it is crucial to refine the support for generic programming in mainstream Object-Oriented languages — both at the design and at the implementation level — as well as to suggest novel ways to exploit the additional degree of expressiveness made available by genericity. This study is meant to provide a contribution towards bringing Java genericity to a more mature stage with respect to mainstream programming practice, by increasing the effectiveness of its implementation, and by revealing its full expressive power in real world scenario. With respect to the current research setting, the main contribution of the thesis is twofold. First, we propose a revised implementation for Java generics that greatly increases the expressiveness of the Java platform by adding reification support for generic types. Secondly, we show how Java genericity can be leveraged in a real world case-study in the context of the multi-paradigm language integration. Several approaches have been proposed in order to overcome the lack of reification of generic types in the Java programming language. Existing approaches tackle the problem of reification of generic types by defining new translation techniques which would allow for a runtime representation of generics and wildcards. Unfortunately most approaches suffer from several problems: heterogeneous translations are known to be problematic when considering reification of generic methods and wildcards. On the other hand, more sophisticated techniques requiring changes in the Java runtime, supports reified generics through a true language extension (where clauses) so that backward compatibility is compromised. In this thesis we develop a sophisticated type-passing technique for addressing the problem of reification of generic types in the Java programming language; this approach — first pioneered by the so called EGO translator — is here turned into a full-blown solution which reifies generic types inside the Java Virtual Machine (JVM) itself, thus overcoming both performance penalties and compatibility issues of the original EGO translator. Java-Prolog integration Integrating Object-Oriented and declarative programming has been the subject of several researches and corresponding technologies. Such proposals come in two flavours, either attempting at joining the two paradigms, or simply providing an interface library for accessing Prolog declarative features from a mainstream Object-Oriented languages such as Java. Both solutions have however drawbacks: in the case of hybrid languages featuring both Object-Oriented and logic traits, such resulting language is typically too complex, thus making mainstream application development an harder task; in the case of library-based integration approaches there is no true language integration, and some “boilerplate code” has to be implemented to fix the paradigm mismatch. In this thesis we develop a framework called PatJ which promotes seamless exploitation of Prolog programming in Java. A sophisticated usage of generics/wildcards allows to define a precise mapping between Object-Oriented and declarative features. PatJ defines a hierarchy of classes where the bidirectional semantics of Prolog terms is modelled directly at the level of the Java generic type-system.
Resumo:
Large scale wireless adhoc networks of computers, sensors, PDAs etc. (i.e. nodes) are revolutionizing connectivity and leading to a paradigm shift from centralized systems to highly distributed and dynamic environments. An example of adhoc networks are sensor networks, which are usually composed by small units able to sense and transmit to a sink elementary data which are successively processed by an external machine. Recent improvements in the memory and computational power of sensors, together with the reduction of energy consumptions, are rapidly changing the potential of such systems, moving the attention towards datacentric sensor networks. A plethora of routing and data management algorithms have been proposed for the network path discovery ranging from broadcasting/floodingbased approaches to those using global positioning systems (GPS). We studied WGrid, a novel decentralized infrastructure that organizes wireless devices in an adhoc manner, where each node has one or more virtual coordinates through which both message routing and data management occur without reliance on either flooding/broadcasting operations or GPS. The resulting adhoc network does not suffer from the deadend problem, which happens in geographicbased routing when a node is unable to locate a neighbor closer to the destination than itself. WGrid allow multidimensional data management capability since nodes' virtual coordinates can act as a distributed database without needing neither special implementation or reorganization. Any kind of data (both single and multidimensional) can be distributed, stored and managed. We will show how a location service can be easily implemented so that any search is reduced to a simple query, like for any other data type. WGrid has then been extended by adopting a replication methodology. We called the resulting algorithm WRGrid. Just like WGrid, WRGrid acts as a distributed database without needing neither special implementation nor reorganization and any kind of data can be distributed, stored and managed. We have evaluated the benefits of replication on data management, finding out, from experimental results, that it can halve the average number of hops in the network. The direct consequence of this fact are a significant improvement on energy consumption and a workload balancing among sensors (number of messages routed by each node). Finally, thanks to the replications, whose number can be arbitrarily chosen, the resulting sensor network can face sensors disconnections/connections, due to failures of sensors, without data loss. Another extension to {WGrid} is {W*Grid} which extends it by strongly improving network recovery performance from link and/or device failures that may happen due to crashes or battery exhaustion of devices or to temporary obstacles. W*Grid guarantees, by construction, at least two disjoint paths between each couple of nodes. This implies that the recovery in W*Grid occurs without broadcasting transmissions and guaranteeing robustness while drastically reducing the energy consumption. An extensive number of simulations shows the efficiency, robustness and traffic road of resulting networks under several scenarios of device density and of number of coordinates. Performance analysis have been compared to existent algorithms in order to validate the results.
Resumo:
Beamforming entails joint processing of multiple signals received or transmitted by an array of antennas. This thesis addresses the implementation of beamforming in two distinct systems, namely a distributed network of independent sensors, and a broad-band multi-beam satellite network. With the rising popularity of wireless sensors, scientists are taking advantage of the flexibility of these devices, which come with very low implementation costs. Simplicity, however, is intertwined with scarce power resources, which must be carefully rationed to ensure successful measurement campaigns throughout the whole duration of the application. In this scenario, distributed beamforming is a cooperative communication technique, which allows nodes in the network to emulate a virtual antenna array seeking power gains in the order of the size of the network itself, when required to deliver a common message signal to the receiver. To achieve a desired beamforming configuration, however, all nodes in the network must agree upon the same phase reference, which is challenging in a distributed set-up where all devices are independent. The first part of this thesis presents new algorithms for phase alignment, which prove to be more energy efficient than existing solutions. With the ever-growing demand for broad-band connectivity, satellite systems have the great potential to guarantee service where terrestrial systems can not penetrate. In order to satisfy the constantly increasing demand for throughput, satellites are equipped with multi-fed reflector antennas to resolve spatially separated signals. However, incrementing the number of feeds on the payload corresponds to burdening the link between the satellite and the gateway with an extensive amount of signaling, and to possibly calling for much more expensive multiple-gateway infrastructures. This thesis focuses on an on-board non-adaptive signal processing scheme denoted as Coarse Beamforming, whose objective is to reduce the communication load on the link between the ground station and space segment.
Resumo:
Flow features inside centrifugal compressor stages are very complicated to simulate with numerical tools due to the highly complex geometry and varying gas conditions all across the machine. For this reason, a big effort is currently being made to increase the fidelity of the numerical models during the design and validation phases. Computational Fluid Dynamics (CFD) plays an increasing role in the assessment of the performance prediction of centrifugal compressor stages. Historically, CFD was considered reliable for performance prediction on a qualitatively level, whereas tests were necessary to predict compressors performance on a quantitatively basis. In fact "standard" CFD with only the flow-path and blades included into the computational domain is known to be weak in capturing efficiency level and operating range accurately due to the under-estimation of losses and the lack of secondary flows modeling. This research project aims to fill the gap in accuracy between "standard" CFD and tests data by including a high fidelity reproduction of the gas domain and the use of advanced numerical models and tools introduced in the author's OEM in-house CFD code. In other words, this thesis describes a methodology by which virtual tests can be conducted on single stages and multistage centrifugal compressors in a similar fashion to a typical rig test that guarantee end users to operate machines with a confidence level not achievable before. Furthermore, the new "high fidelity" approach allowed understanding flow phenomena not fully captured before, increasing aerodynamicists capability and confidence in designing high efficiency and high reliable centrifugal compressor stages.
Resumo:
The research activity focused on the study, design and evaluation of innovative human-machine interfaces based on virtual three-dimensional environments. It is based on the brain electrical activities recorded in real time through the electrical impulses emitted by the brain waves of the user. The achieved target is to identify and sort in real time the different brain states and adapt the interface and/or stimuli to the corresponding emotional state of the user. The setup of an experimental facility based on an innovative experimental methodology for “man in the loop" simulation was established. It allowed involving during pilot training in virtually simulated flights, both pilot and flight examiner, in order to compare the subjective evaluations of this latter to the objective measurements of the brain activity of the pilot. This was done recording all the relevant information versus a time-line. Different combinations of emotional intensities obtained, led to an evaluation of the current situational awareness of the user. These results have a great implication in the current training methodology of the pilots, and its use could be extended as a tool that can improve the evaluation of a pilot/crew performance in interacting with the aircraft when performing tasks and procedures, especially in critical situations. This research also resulted in the design of an interface that adapts the control of the machine to the situation awareness of the user. The new concept worked on, aimed at improving the efficiency between a user and the interface, and gaining capacity by reducing the user’s workload and hence improving the system overall safety. This innovative research combining emotions measured through electroencephalography resulted in a human-machine interface that would have three aeronautical related applications: • An evaluation tool during the pilot training; • An input for cockpit environment; • An adaptation tool of the cockpit automation.
Resumo:
Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.