950 resultados para Software repository mining. Process mining. Software developer contribution
Resumo:
Software Repository Mining (MSR) is a research area that analyses software repositories in order to derive relevant information for the research and practice of software engineering. The main goal of repository mining is to extract static information from repositories (e.g. code repository or change requisition system) into valuable information providing a way to support the decision making of software projects. On the other hand, another research area called Process Mining (PM) aims to find the characteristics of the underlying process of business organizations, supporting the process improvement and documentation. Recent works have been doing several analyses through MSR and PM techniques: (i) to investigate the evolution of software projects; (ii) to understand the real underlying process of a project; and (iii) create defect prediction models. However, few research works have been focusing on analyzing the contributions of software developers by means of MSR and PM techniques. In this context, this dissertation proposes the development of two empirical studies of assessment of the contribution of software developers to an open-source and a commercial project using those techniques. The contributions of developers are assessed through three different perspectives: (i) buggy commits; (ii) the size of commits; and (iii) the most important bugs. For the opensource project 12.827 commits and 8.410 bugs have been analyzed while 4.663 commits and 1.898 bugs have been analyzed for the commercial project. Our results indicate that, for the open source project, the developers classified as core developers have contributed with more buggy commits (although they have contributed with the majority of commits), more code to the project (commit size) and more important bugs solved while the results could not indicate differences with statistical significance between developer groups for the commercial project
Resumo:
Initially this thesis examines the various mechanisms by which technology is acquired within anodizing plants. In so doing the history of the evolution of anodizing technology is recorded, with particular reference to the growth of major markets and to the contribution of the marketing efforts of the aluminium industry. The business economics of various types of anodizing plants are analyzed. Consideration is also given to the impact of developments in anodizing technology on production economics and market growth. The economic costs associated with work rejected for process defects are considered. Recent changes in the industry have created conditions whereby information technology has a potentially important role to play in retaining existing knowledge. One such contribution is exemplified by the expert system which has been developed for the identification of anodizing process defects. Instead of using a "rule-based" expert system, a commercial neural networks program has been adapted for the task. The advantages of neural networks over 'rule-based' systems is that they are better suited to production problems, since the actual conditions prevailing when the defect was produced are often not known with certainty. In using the expert system, the user first identifies the process stage at which the defect probably occurred and is then directed to a file enabling the actual defects to be identified. After making this identification, the user can consult a database which gives a more detailed description of the defect, advises on remedial action and provides a bibliography of papers relating to the defect. The database uses a proprietary hypertext program, which also provides rapid cross-referencing to similar types of defect. Additionally, a graphics file can be accessed which (where appropriate) will display a graphic of the defect on screen. A total of 117 defects are included, together with 221 literature references, supplemented by 48 cross-reference hyperlinks. The main text of the thesis contains 179 literature references. (DX186565)
Resumo:
Software repositories have been getting a lot of attention from researchers in recent years. In order to analyze software repositories, it is necessary to first extract raw data from the version control and problem tracking systems. This poses two challenges: (1) extraction requires a non-trivial effort, and (2) the results depend on the heuristics used during extraction. These challenges burden researchers that are new to the community and make it difficult to benchmark software repository mining since it is almost impossible to reproduce experiments done by another team. In this paper we present the TA-RE corpus. TA-RE collects extracted data from software repositories in order to build a collection of projects that will simplify extraction process. Additionally the collection can be used for benchmarking. As the first step we propose an exchange language capable of making sharing and reusing data as simple as possible.
Resumo:
A manutenção e evolução de sistemas de software tornou-se uma tarefa bastante crítica ao longo dos últimos anos devido à diversidade e alta demanda de funcionalidades, dispositivos e usuários. Entender e analisar como novas mudanças impactam os atributos de qualidade da arquitetura de tais sistemas é um pré-requisito essencial para evitar a deterioração de sua qualidade durante sua evolução. Esta tese propõe uma abordagem automatizada para a análise de variação do atributo de qualidade de desempenho em termos de tempo de execução (tempo de resposta). Ela é implementada por um framework que adota técnicas de análise dinâmica e mineração de repositório de software para fornecer uma forma automatizada de revelar fontes potenciais – commits e issues – de variação de desempenho em cenários durante a evolução de sistemas de software. A abordagem define quatro fases: (i) preparação – escolher os cenários e preparar os releases alvos; (ii) análise dinâmica – determinar o desempenho de cenários e métodos calculando seus tempos de execução; (iii) análise de variação – processar e comparar os resultados da análise dinâmica para releases diferentes; e (iv) mineração de repositório – identificar issues e commits associados com a variação de desempenho detectada. Estudos empíricos foram realizados para avaliar a abordagem de diferentes perspectivas. Um estudo exploratório analisou a viabilidade de se aplicar a abordagem em sistemas de diferentes domínios para identificar automaticamente elementos de código fonte com variação de desempenho e as mudanças que afetaram tais elementos durante uma evolução. Esse estudo analisou três sistemas: (i) SIGAA – um sistema web para gerência acadêmica; (ii) ArgoUML – uma ferramenta de modelagem UML; e (iii) Netty – um framework para aplicações de rede. Outro estudo realizou uma análise evolucionária ao aplicar a abordagem em múltiplos releases do Netty, e dos frameworks web Wicket e Jetty. Nesse estudo foram analisados 21 releases (sete de cada sistema), totalizando 57 cenários. Em resumo, foram encontrados 14 cenários com variação significante de desempenho para Netty, 13 para Wicket e 9 para Jetty. Adicionalmente, foi obtido feedback de oito desenvolvedores desses sistemas através de um formulário online. Finalmente, no último estudo, um modelo de regressão para desempenho foi desenvolvido visando indicar propriedades de commits que são mais prováveis a causar degradação de desempenho. No geral, 997 commits foram minerados, sendo 103 recuperados de elementos de código fonte degradados e 19 de otimizados, enquanto 875 não tiveram impacto no tempo de execução. O número de dias antes de disponibilizar o release e o dia da semana se mostraram como as variáveis mais relevantes dos commits que degradam desempenho no nosso modelo. A área de característica de operação do receptor (ROC – Receiver Operating Characteristic) do modelo de regressão é 60%, o que significa que usar o modelo para decidir se um commit causará degradação ou não é 10% melhor do que uma decisão aleatória.
Resumo:
A manutenção e evolução de sistemas de software tornou-se uma tarefa bastante crítica ao longo dos últimos anos devido à diversidade e alta demanda de funcionalidades, dispositivos e usuários. Entender e analisar como novas mudanças impactam os atributos de qualidade da arquitetura de tais sistemas é um pré-requisito essencial para evitar a deterioração de sua qualidade durante sua evolução. Esta tese propõe uma abordagem automatizada para a análise de variação do atributo de qualidade de desempenho em termos de tempo de execução (tempo de resposta). Ela é implementada por um framework que adota técnicas de análise dinâmica e mineração de repositório de software para fornecer uma forma automatizada de revelar fontes potenciais – commits e issues – de variação de desempenho em cenários durante a evolução de sistemas de software. A abordagem define quatro fases: (i) preparação – escolher os cenários e preparar os releases alvos; (ii) análise dinâmica – determinar o desempenho de cenários e métodos calculando seus tempos de execução; (iii) análise de variação – processar e comparar os resultados da análise dinâmica para releases diferentes; e (iv) mineração de repositório – identificar issues e commits associados com a variação de desempenho detectada. Estudos empíricos foram realizados para avaliar a abordagem de diferentes perspectivas. Um estudo exploratório analisou a viabilidade de se aplicar a abordagem em sistemas de diferentes domínios para identificar automaticamente elementos de código fonte com variação de desempenho e as mudanças que afetaram tais elementos durante uma evolução. Esse estudo analisou três sistemas: (i) SIGAA – um sistema web para gerência acadêmica; (ii) ArgoUML – uma ferramenta de modelagem UML; e (iii) Netty – um framework para aplicações de rede. Outro estudo realizou uma análise evolucionária ao aplicar a abordagem em múltiplos releases do Netty, e dos frameworks web Wicket e Jetty. Nesse estudo foram analisados 21 releases (sete de cada sistema), totalizando 57 cenários. Em resumo, foram encontrados 14 cenários com variação significante de desempenho para Netty, 13 para Wicket e 9 para Jetty. Adicionalmente, foi obtido feedback de oito desenvolvedores desses sistemas através de um formulário online. Finalmente, no último estudo, um modelo de regressão para desempenho foi desenvolvido visando indicar propriedades de commits que são mais prováveis a causar degradação de desempenho. No geral, 997 commits foram minerados, sendo 103 recuperados de elementos de código fonte degradados e 19 de otimizados, enquanto 875 não tiveram impacto no tempo de execução. O número de dias antes de disponibilizar o release e o dia da semana se mostraram como as variáveis mais relevantes dos commits que degradam desempenho no nosso modelo. A área de característica de operação do receptor (ROC – Receiver Operating Characteristic) do modelo de regressão é 60%, o que significa que usar o modelo para decidir se um commit causará degradação ou não é 10% melhor do que uma decisão aleatória.
Resumo:
3G-radioverkon asetusten hallinnointi suoritetaan säätämällä radioverkkotietokantaan talletettavia parametreja. Hallinnointiohjelmistossa tuhannetradioverkon parametrit näkyvät käyttöliittymäkomponentteina, joita ohjelmiston kehityskaaressa jatkuvasti lisätään, muutetaan ja poistetaan asiakkaan tarpeidenmukaan. Parametrien lisäämisen toteutusprosessi on ohjelmistokehittäjälle työlästä ja mekaanista. Diplomityön tavoitteeksi asetettiin kehittää koodigeneraattori, joka luo kaiken toteutusprosessissa tuotetun koodin automaattisesti niistä määrittelyistä, jotka ovat nykyäänkin saatavilla. Työssä kehitetty generaattori nopeuttaa ohjelmoijan työtä eliminoimalla yhden aikaa vievän ja mekaanisen työvaiheen. Seurauksena saadaan yhtenäisempää ohjelmistokoodia ja säästetään yrityksen ohjelmistotuotannon kuluissa, kun ohjelmoijan taito voidaan keskittää vaativimpiin tehtäviin.
Resumo:
The present thesis in focused on the minimization of experimental efforts for the prediction of pollutant propagation in rivers by mathematical modelling and knowledge re-use. Mathematical modelling is based on the well known advection-dispersion equation, while the knowledge re-use approach employs the methods of case based reasoning, graphical analysis and text mining. The thesis contribution to the pollutant transport research field consists of: (1) analytical and numerical models for pollutant transport prediction; (2) two novel techniques which enable the use of variable parameters along rivers in analytical models; (3) models for the estimation of pollutant transport characteristic parameters (velocity, dispersion coefficient and nutrient transformation rates) as functions of water flow, channel characteristics and/or seasonality; (4) the graphical analysis method to be used for the identification of pollution sources along rivers; (5) a case based reasoning tool for the identification of crucial information related to the pollutant transport modelling; (6) and the application of a software tool for the reuse of information during pollutants transport modelling research. These support tools are applicable in the water quality research field and in practice as well, as they can be involved in multiple activities. The models are capable of predicting pollutant propagation along rivers in case of both ordinary pollution and accidents. They can also be applied for other similar rivers in modelling of pollutant transport in rivers with low availability of experimental data concerning concentration. This is because models for parameter estimation developed in the present thesis enable the calculation of transport characteristic parameters as functions of river hydraulic parameters and/or seasonality. The similarity between rivers is assessed using case based reasoning tools, and additional necessary information can be identified by using the software for the information reuse. Such systems represent support for users and open up possibilities for new modelling methods, monitoring facilities and for better river water quality management tools. They are useful also for the estimation of environmental impact of possible technological changes and can be applied in the pre-design stage or/and in the practical use of processes as well.
Resumo:
The purpose of this thesis is to examine software licensing, how a software developer can benefit from it and to define specifications for licensing system of software based medical technology product. The thesis has been divided into theoretical and empirical parts. In the theoretical part the concept of software licensing and different aspects that are connected to it are examined with a help of research material. On the ground of this research, in the empirical part, a licensing system for a medical software product called iCentral is designed. The empirical part is based on interviews, questionnaire and on authors own experience gained while working for the case-company.The thesis has great practical importance for the case-company by proposing both an ideal, and more importantly, practical implementation for a licensing system of a product. Thesis shows that electronic licensing is a viable option to sell medical technology products without a need to revise existing procedures or the enterprise resource system in the case company.
Resumo:
Process algebraic architectural description languages provide a formal means for modeling software systems and assessing their properties. In order to bridge the gap between system modeling and system im- plementation, in this thesis an approach is proposed for automatically generating multithreaded object-oriented code from process algebraic architectural descriptions, in a way that preserves – under certain assumptions – the properties proved at the architectural level. The approach is divided into three phases, which are illustrated by means of a running example based on an audio processing system. First, we develop an architecture-driven technique for thread coordination management, which is completely automated through a suitable package. Second, we address the translation of the algebraically-specified behavior of the individual software units into thread templates, which will have to be filled in by the software developer according to certain guidelines. Third, we discuss performance issues related to the suitability of synthesizing monitors rather than threads from software unit descriptions that satisfy specific constraints. In addition to the running example, we present two case studies about a video animation repainting system and the implementation of a leader election algorithm, in order to summarize the whole approach. The outcome of this thesis is the implementation of the proposed approach in a translator called PADL2Java and its integration in the architecture-centric verification tool TwoTowers.
Resumo:
Software systems need to continuously change to remain useful. Change appears in several forms and needs to be accommodated at different levels. We propose ChangeBoxes as a mechanism to encapsulate, manage, analyze and exploit changes to software systems. Our thesis is that only by making change explicit and manipulable can we enable the software developer to manage software change more effectively than is currently possible. Furthermore we argue that we need new insights into assessing the impact of changes and we need to provide new tools and techniques to manage them. We report on the results of some initial prototyping efforts, and we outline a series of research activities that we have started to explore the potential of ChangeBoxes.
Resumo:
Los flujos de trabajo científicos han sido adoptados durante la última década para representar los métodos computacionales utilizados en experimentos in silico, así como para dar soporte a sus publicaciones asociadas. Dichos flujos de trabajo han demostrado ser útiles para compartir y reproducir experimentos científicos, permitiendo a investigadores visualizar, depurar y ahorrar tiempo a la hora de re-ejecutar un trabajo realizado con anterioridad. Sin embargo, los flujos de trabajo científicos pueden ser en ocasiones difíciles de entender y reutilizar. Esto es debido a impedimentos como el gran número de flujos de trabajo existentes en repositorios, su heterogeneidad o la falta generalizada de documentación y ejemplos de uso. Además, dado que normalmente es posible implementar un mismo método utilizando algoritmos o técnicas distintas, flujos de trabajo aparentemente distintos pueden estar relacionados a un determinado nivel de abstracción, basándose, por ejemplo, en su funcionalidad común. Esta tesis se centra en la reutilización de flujos de trabajo y su abstracción mediante la exploración de relaciones entre los flujos de trabajo de un repositorio y la extracción de abstracciones que podrían ayudar a la hora de reutilizar otros flujos de trabajo existentes. Para ello, se propone un modelo simple de representación de flujos de trabajo y sus ejecuciones, se analizan las abstracciones típicas que se pueden encontrar en los repositorios de flujos de trabajo, se exploran las prácticas habituales de los usuarios a la hora de reutilizar flujos de trabajo existentes y se describe un método para descubrir abstracciones útiles para usuarios, basadas en técnicas existentes de teoría de grafos. Los resultados obtenidos exponen las abstracciones y prácticas comunes de usuarios en términos de reutilización de flujos de trabajo, y muestran cómo las abstracciones que se extraen automáticamente tienen potencial para ser reutilizadas por usuarios que buscan diseñar nuevos flujos de trabajo. Abstract Scientific workflows have been adopted in the last decade to represent the computational methods used in in silico scientific experiments and their associated research products. Scientific workflows have demonstrated to be useful for sharing and reproducing scientific experiments, allowing scientists to visualize, debug and save time when re-executing previous work. However, scientific workflows may be difficult to understand and reuse. The large amount of available workflows in repositories, together with their heterogeneity and lack of documentation and usage examples may become an obstacle for a scientist aiming to reuse the work from other scientists. Furthermore, given that it is often possible to implement a method using different algorithms or techniques, seemingly disparate workflows may be related at a higher level of abstraction, based on their common functionality. In this thesis we address the issue of reusability and abstraction by exploring how workflows relate to one another in a workflow repository, mining abstractions that may be helpful for workflow reuse. In order to do so, we propose a simple model for representing and relating workflows and their executions, we analyze the typical common abstractions that can be found in workflow repositories, we explore the current practices of users regarding workflow reuse and we describe a method for discovering useful abstractions for workflows based on existing graph mining techniques. Our results expose the common abstractions and practices of users in terms of workflow reuse, and show how our proposed abstractions have potential to become useful for users designing new workflows.
Resumo:
In a industrial environment, to know the process one is working with is crucial to ensure its good functioning. In the present work, developed at Prio Biocombustíveis S.A. facilities, using process data, collected during the present work, and historical process data, the methanol recovery process was characterized, having started with the characterization of key process streams. Based on the information retrieved from the stream characterization, Aspen Plus® process simulation software was used to replicate the process and perform a sensitivity analysis with the objective of accessing the relative importance of certain key process variables (reflux/feed ratio, reflux temperature, reboiler outlet temperature, methanol, glycerol and water feed compositions). The work proceeded with the application of a set of statistical tools, starting with the Principal Components Analysis (PCA) from which the interactions between process variables and their contribution to the process variability was studied. Next, the Design of Experiments (DoE) was used to acquire experimental data and, with it, create a model for the water amount in the distillate. However, the necessary conditions to perform this method were not met and so it was abandoned. The Multiple Linear Regression method (MLR) was then used with the available data, creating several empiric models for the water at distillate, the one with the highest fit having a R2 equal to 92.93% and AARD equal to 19.44%. Despite the AARD still being relatively high, the model is still adequate to make fast estimates of the distillate’s quality. As for fouling, its presence has been noticed many times during this work. Not being possible to directly measure the fouling, the reboiler inlet steam pressure was used as an indicator of the fouling growth and its growth variation with the amount of Used Cooking Oil incorporated in the whole process. Comparing the steam cost associated to the reboiler’s operation when fouling is low (1.5 bar of steam pressure) and when fouling is high (reboiler’s steam pressure of 3 bar), an increase of about 58% occurs when the fouling increases.
Resumo:
The use of mechanical shear connectors, mainly headed stud bolts, is the most common way to achieve steel-concrete composite action. The encasement of the steel beam in the depth slab results in increase of strength and stiffness, reducing the total height of the floor. In this investigation, three partially encased composite beams were tested under flexural conditions and the main objective was to investigate some alternative positions for the headed studs. To provide longitudinal shear resistance between the I-shaped beam and the concrete, two positions of the,studs were investigated: vertically welded on the bottom flange and horizontally welded on the faces of the web. The experimental results have shown that the headed studs are effective to provide the composite action and increase the bending strength. Furthermore, the headed studs welded vertically on the bottom flange proved to be the most reliable position.
Resumo:
Onnistuneesti suoritettu suorituskyvyn mittaaminen ja johtaminen tuovat kirjallisuuden mukaan organisaatiolle monia hyötyjä. Ohjelmistotyön suorituskyky vaikuttaa ohjelmistoyritysten kannattavuuteen ja ohjelmointiprojektien tuloksellisuuteen. Ohjelmistotyön suorituskyvyn parantamisessa on suurelta osin keskitytty prosessien parantamiseen. Ohjelmistotyön suorituskyvyn taustalla on kuitenkin paljon muitakin tekijöitä kuin prosessi-indikaattorit. Sitoutuneisuus ja motivoituneisuus nähdään yhä tärkeämpinä tekijöinä ohjelmistotyön suorituskyvyn taustalla, joten suorituskyvyn johtamisen tulee huomioida nykyistä paremmin myös henkilöstön näkökulma. Tämän tutkimuksen tavoitteena oli tutkia suorituskyvyn johtamisen viitekehysten, ohjelmistotyön suorituskyvyn taustatekijöiden, motivaation merkityksen ja johtamistyylien analysoinnin avulla, millainen suorituskyvyn mittaus- ja johtamisjärjestelmä (PMS) tukisi ohjelmistotyön suorituskyvyn johtamista huomioiden henkilöstön näkökulman. Tutkimuksessa analysoitiin aiempia aihepiiriä koskevia tutkimuksia ja lisäksi haastateltiin alan yritysasiantuntijoita. Tutkimuksen tuloksena esitettiin tärkeimmät ohjelmistotyön suorituskyvyn taustatekijät, joiden tilan parantamista suorituskyvyn johtamisen tulee mahdollistaa. Näiden havaittiin olevan läheisessä suhteessa henkilöstön motivaatiotekijöihin, joiden sitouttavaa kehittymistä johtamisen tulee myös tukea. Tulokset kiteytettiin suosituksiin koskien johtamista ja mittaristomallia, joita voidaan hyödyntää ohjelmistotyön suorituskyvyn johtamisessa huomioiden henkilöstön näkökulma. Mallissa on kuvattu mitattavat ja johdettavat tekijät yksilö- ja tiimitasolla, esimiestyössä sekä henkilöstövoimavarojen johtamisessa (HRM).