8 resultados para Source code (Computer science)
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The availability of a huge amount of source code from code archives and open-source projects opens up the possibility to merge machine learning, programming languages, and software engineering research fields. This area is often referred to as Big Code where programming languages are treated instead of natural languages while different features and patterns of code can be exploited to perform many useful tasks and build supportive tools. Among all the possible applications which can be developed within the area of Big Code, the work presented in this research thesis mainly focuses on two particular tasks: the Programming Language Identification (PLI) and the Software Defect Prediction (SDP) for source codes. Programming language identification is commonly needed in program comprehension and it is usually performed directly by developers. However, when it comes at big scales, such as in widely used archives (GitHub, Software Heritage), automation of this task is desirable. To accomplish this aim, the problem is analyzed from different points of view (text and image-based learning approaches) and different models are created paying particular attention to their scalability. Software defect prediction is a fundamental step in software development for improving quality and assuring the reliability of software products. In the past, defects were searched by manual inspection or using automatic static and dynamic analyzers. Now, the automation of this task can be tackled using learning approaches that can speed up and improve related procedures. Here, two models have been built and analyzed to detect some of the commonest bugs and errors at different code granularity levels (file and method levels). Exploited data and models’ architectures are analyzed and described in detail. Quantitative and qualitative results are reported for both PLI and SDP tasks while differences and similarities concerning other related works are discussed.
Resumo:
Interactive theorem provers (ITP for short) are tools whose final aim is to certify proofs written by human beings. To reach that objective they have to fill the gap between the high level language used by humans for communicating and reasoning about mathematics and the lower level language that a machine is able to “understand” and process. The user perceives this gap in terms of missing features or inefficiencies. The developer tries to accommodate the user requests without increasing the already high complexity of these applications. We believe that satisfactory solutions can only come from a strong synergy between users and developers. We devoted most part of our PHD designing and developing the Matita interactive theorem prover. The software was born in the computer science department of the University of Bologna as the result of composing together all the technologies developed by the HELM team (to which we belong) for the MoWGLI project. The MoWGLI project aimed at giving accessibility through the web to the libraries of formalised mathematics of various interactive theorem provers, taking Coq as the main test case. The motivations for giving life to a new ITP are: • study the architecture of these tools, with the aim of understanding the source of their complexity • exploit such a knowledge to experiment new solutions that, for backward compatibility reasons, would be hard (if not impossible) to test on a widely used system like Coq. Matita is based on the Curry-Howard isomorphism, adopting the Calculus of Inductive Constructions (CIC) as its logical foundation. Proof objects are thus, at some extent, compatible with the ones produced with the Coq ITP, that is itself able to import and process the ones generated using Matita. Although the systems have a lot in common, they share no code at all, and even most of the algorithmic solutions are different. The thesis is composed of two parts where we respectively describe our experience as a user and a developer of interactive provers. In particular, the first part is based on two different formalisation experiences: • our internship in the Mathematical Components team (INRIA), that is formalising the finite group theory required to attack the Feit Thompson Theorem. To tackle this result, giving an effective classification of finite groups of odd order, the team adopts the SSReflect Coq extension, developed by Georges Gonthier for the proof of the four colours theorem. • our collaboration at the D.A.M.A. Project, whose goal is the formalisation of abstract measure theory in Matita leading to a constructive proof of Lebesgue’s Dominated Convergence Theorem. The most notable issues we faced, analysed in this part of the thesis, are the following: the difficulties arising when using “black box” automation in large formalisations; the impossibility for a user (especially a newcomer) to master the context of a library of already formalised results; the uncomfortable big step execution of proof commands historically adopted in ITPs; the difficult encoding of mathematical structures with a notion of inheritance in a type theory without subtyping like CIC. In the second part of the manuscript many of these issues will be analysed with the looking glasses of an ITP developer, describing the solutions we adopted in the implementation of Matita to solve these problems: integrated searching facilities to assist the user in handling large libraries of formalised results; a small step execution semantic for proof commands; a flexible implementation of coercive subtyping allowing multiple inheritance with shared substructures; automatic tactics, integrated with the searching facilities, that generates proof commands (and not only proof objects, usually kept hidden to the user) one of which specifically designed to be user driven.
Resumo:
Interactive theorem provers are tools designed for the certification of formal proofs developed by means of man-machine collaboration. Formal proofs obtained in this way cover a large variety of logical theories, ranging from the branches of mainstream mathematics, to the field of software verification. The border between these two worlds is marked by results in theoretical computer science and proofs related to the metatheory of programming languages. This last field, which is an obvious application of interactive theorem proving, poses nonetheless a serious challenge to the users of such tools, due both to the particularly structured way in which these proofs are constructed, and to difficulties related to the management of notions typical of programming languages like variable binding. This thesis is composed of two parts, discussing our experience in the development of the Matita interactive theorem prover and its use in the mechanization of the metatheory of programming languages. More specifically, part I covers: - the results of our effort in providing a better framework for the development of tactics for Matita, in order to make their implementation and debugging easier, also resulting in a much clearer code; - a discussion of the implementation of two tactics, providing infrastructure for the unification of constructor forms and the inversion of inductive predicates; we point out interactions between induction and inversion and provide an advancement over the state of the art. In the second part of the thesis, we focus on aspects related to the formalization of programming languages. We describe two works of ours: - a discussion of basic issues we encountered in our formalizations of part 1A of the Poplmark challenge, where we apply the extended inversion principles we implemented for Matita; - a formalization of an algebraic logical framework, posing more complex challenges, including multiple binding and a form of hereditary substitution; this work adopts, for the encoding of binding, an extension of Masahiko Sato's canonical locally named representation we designed during our visit to the Laboratory for Foundations of Computer Science at the University of Edinburgh, under the supervision of Randy Pollack.
Resumo:
Internet of Things systems are pervasive systems evolved from cyber-physical to large-scale systems. Due to the number of technologies involved, software development involves several integration challenges. Among them, the ones preventing proper integration are those related to the system heterogeneity, and thus addressing interoperability issues. From a software engineering perspective, developers mostly experience the lack of interoperability in the two phases of software development: programming and deployment. On the one hand, modern software tends to be distributed in several components, each adopting its most-appropriate technology stack, pushing programmers to code in a protocol- and data-agnostic way. On the other hand, each software component should run in the most appropriate execution environment and, as a result, system architects strive to automate the deployment in distributed infrastructures. This dissertation aims to improve the development process by introducing proper tools to handle certain aspects of the system heterogeneity. Our effort focuses on three of these aspects and, for each one of those, we propose a tool addressing the underlying challenge. The first tool aims to handle heterogeneity at the transport and application protocol level, the second to manage different data formats, while the third to obtain optimal deployment. To realize the tools, we adopted a linguistic approach, i.e.\ we provided specific linguistic abstractions that help developers to increase the expressive power of the programming language they use, writing better solutions in more straightforward ways. To validate the approach, we implemented use cases to show that the tools can be used in practice and that they help to achieve the expected level of interoperability. In conclusion, to move a step towards the realization of an integrated Internet of Things ecosystem, we target programmers and architects and propose them to use the presented tools to ease the software development process.
Resumo:
One of the most visionary goals of Artificial Intelligence is to create a system able to mimic and eventually surpass the intelligence observed in biological systems including, ambitiously, the one observed in humans. The main distinctive strength of humans is their ability to build a deep understanding of the world by learning continuously and drawing from their experiences. This ability, which is found in various degrees in all intelligent biological beings, allows them to adapt and properly react to changes by incrementally expanding and refining their knowledge. Arguably, achieving this ability is one of the main goals of Artificial Intelligence and a cornerstone towards the creation of intelligent artificial agents. Modern Deep Learning approaches allowed researchers and industries to achieve great advancements towards the resolution of many long-standing problems in areas like Computer Vision and Natural Language Processing. However, while this current age of renewed interest in AI allowed for the creation of extremely useful applications, a concerningly limited effort is being directed towards the design of systems able to learn continuously. The biggest problem that hinders an AI system from learning incrementally is the catastrophic forgetting phenomenon. This phenomenon, which was discovered in the 90s, naturally occurs in Deep Learning architectures where classic learning paradigms are applied when learning incrementally from a stream of experiences. This dissertation revolves around the Continual Learning field, a sub-field of Machine Learning research that has recently made a comeback following the renewed interest in Deep Learning approaches. This work will focus on a comprehensive view of continual learning by considering algorithmic, benchmarking, and applicative aspects of this field. This dissertation will also touch on community aspects such as the design and creation of research tools aimed at supporting Continual Learning research, and the theoretical and practical aspects concerning public competitions in this field.
Resumo:
The discovery of new materials and their functions has always been a fundamental component of technological progress. Nowadays, the quest for new materials is stronger than ever: sustainability, medicine, robotics and electronics are all key assets which depend on the ability to create specifically tailored materials. However, designing materials with desired properties is a difficult task, and the complexity of the discipline makes it difficult to identify general criteria. While scientists developed a set of best practices (often based on experience and expertise), this is still a trial-and-error process. This becomes even more complex when dealing with advanced functional materials. Their properties depend on structural and morphological features, which in turn depend on fabrication procedures and environment, and subtle alterations leads to dramatically different results. Because of this, materials modeling and design is one of the most prolific research fields. Many techniques and instruments are continuously developed to enable new possibilities, both in the experimental and computational realms. Scientists strive to enforce cutting-edge technologies in order to make progress. However, the field is strongly affected by unorganized file management, proliferation of custom data formats and storage procedures, both in experimental and computational research. Results are difficult to find, interpret and re-use, and a huge amount of time is spent interpreting and re-organizing data. This also strongly limit the application of data-driven and machine learning techniques. This work introduces possible solutions to the problems described above. Specifically, it talks about developing features for specific classes of advanced materials and use them to train machine learning models and accelerate computational predictions for molecular compounds; developing method for organizing non homogeneous materials data; automate the process of using devices simulations to train machine learning models; dealing with scattered experimental data and use them to discover new patterns.
Resumo:
Knowledge graphs and ontologies are closely related concepts in the field of knowledge representation. In recent years, knowledge graphs have gained increasing popularity and are serving as essential components in many knowledge engineering projects that view them as crucial to their success. The conceptual foundation of the knowledge graph is provided by ontologies. Ontology modeling is an iterative engineering process that consists of steps such as the elicitation and formalization of requirements, the development, testing, refactoring, and release of the ontology. The testing of the ontology is a crucial and occasionally overlooked step of the process due to the lack of integrated tools to support it. As a result of this gap in the state-of-the-art, the testing of the ontology is completed manually, which requires a considerable amount of time and effort from the ontology engineers. The lack of tool support is noticed in the requirement elicitation process as well. In this aspect, the rise in the adoption and accessibility of knowledge graphs allows for the development and use of automated tools to assist with the elicitation of requirements from such a complementary source of data. Therefore, this doctoral research is focused on developing methods and tools that support the requirement elicitation and testing steps of an ontology engineering process. To support the testing of the ontology, we have developed XDTesting, a web application that is integrated with the GitHub platform that serves as an ontology testing manager. Concurrently, to support the elicitation and documentation of competency questions, we have defined and implemented RevOnt, a method to extract competency questions from knowledge graphs. Both methods are evaluated through their implementation and the results are promising.
Resumo:
The pervasive availability of connected devices in any industrial and societal sector is pushing for an evolution of the well-established cloud computing model. The emerging paradigm of the cloud continuum embraces this decentralization trend and envisions virtualized computing resources physically located between traditional datacenters and data sources. By totally or partially executing closer to the network edge, applications can have quicker reactions to events, thus enabling advanced forms of automation and intelligence. However, these applications also induce new data-intensive workloads with low-latency constraints that require the adoption of specialized resources, such as high-performance communication options (e.g., RDMA, DPDK, XDP, etc.). Unfortunately, cloud providers still struggle to integrate these options into their infrastructures. That risks undermining the principle of generality that underlies the cloud computing scale economy by forcing developers to tailor their code to low-level APIs, non-standard programming models, and static execution environments. This thesis proposes a novel system architecture to empower cloud platforms across the whole cloud continuum with Network Acceleration as a Service (NAaaS). To provide commodity yet efficient access to acceleration, this architecture defines a layer of agnostic high-performance I/O APIs, exposed to applications and clearly separated from the heterogeneous protocols, interfaces, and hardware devices that implement it. A novel system component embodies this decoupling by offering a set of agnostic OS features to applications: memory management for zero-copy transfers, asynchronous I/O processing, and efficient packet scheduling. This thesis also explores the design space of the possible implementations of this architecture by proposing two reference middleware systems and by adopting them to support interactive use cases in the cloud continuum: a serverless platform and an Industry 4.0 scenario. A detailed discussion and a thorough performance evaluation demonstrate that the proposed architecture is suitable to enable the easy-to-use, flexible integration of modern network acceleration into next-generation cloud platforms.