7 resultados para Mining software repositories
em Universidad Politécnica de Madrid
Resumo:
Based on the empirical evidence that the ratio of email messages in public mailing lists to versioning system commits has remained relatively constant along the history of the Apache Software Foundation (ASF), this paper has as goal to study what can be inferred from such a metric for projects of the ASF. We have found that the metric seems to be an intensive metric as it is independent of the size of the project, its activity, or the number of developers, and remains relatively independent of the technology or functional area of the project. Our analysis provides evidence that the metric is related to the technical effervescence and popularity of project, and as such can be a good candidate to measure its healthy evolution. Other, similar metrics -like the ratio of developer messages to commits and the ratio of issue tracker messages to commits- are studied for several projects as well, in order to see if they have similar characteristics.
Resumo:
In the last decade, a large number of software repositories have been created for different purposes. In this paper we present a survey of the publicly available repositories and classify the most common ones as well as discussing the problems faced by researchers when applying machine learning or statistical techniques to them.
Resumo:
Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed.
Resumo:
ImageJ es un programa informático de tratamiento digital de imagen orientado principalmente hacia el ámbito de las ciencias de la salud. Se trata de un software de dominio público y de código abierto desarrollado en lenguaje Java en las instituciones del National Institutes of Health de Estados Unidos. Incluye por defecto potentes herramientas para editar, procesar y analizar imágenes de casi cualquier tipo y formato. Sin embargo, su mayor virtud reside en su extensibilidad: las funcionalidades de ImageJ pueden ampliarse hasta resolver casi cualquier problema de tratamiento digital de imagen mediante macros, scripts y, especialmente, plugins programables en lenguaje Java gracias a la API que ofrece. Además, ImageJ cuenta con repositorios oficiales en los que es posible obtener de forma gratuita macros, scripts y plugins aplicables en multitud de entornos gracias a la labor de la extensa comunidad de desarrolladores de ImageJ, que los depura, mejora y amplia frecuentemente. Este documento es la memoria de un proyecto que consiste en el análisis detallado de las herramientas de tratamiento digital de imagen que ofrece ImageJ. Tiene por objetivo determinar si ImageJ, a pesar de estar más enfocado a las ciencias de la salud, puede resultar útil en el entorno de la Escuela Técnica Superior de Ingeniería y Sistemas de Telecomunicación de la Universidad Politécnica de Madrid, y en tal caso, resaltar las características que pudieran resultar más beneficiosas en este ámbito y servir además como guía introductoria. En las siguientes páginas se examinan una a una las herramientas de ImageJ (versión 1.48q), su funcionamiento y los mecanismos subyacentes. Se sigue el orden marcado por los menús de la interfaz de usuario: el primer capítulo abarca las herramientas destinadas a la manipulación de imágenes en general (menú Image); el segundo, las herramientas de procesado (menú Process); el tercero, las herramientas de análisis (menú Analyze); y el cuarto y último, las herramientas relacionadas con la extensibilidad de ImageJ (menú Plugins). ABSTRACT. ImageJ is a digital image processing computer program which is mainly focused at the health sciences field. It is a public domain, open source software developed in Java language at the National Institutes of Health of the United States of America. It includes powerful built-in tools to edit, process and analyze almost every type of image in nearly every format. However, its main virtue is its extensibility: ImageJ functionalities can be widened to solve nearly every situation found in digital image processing through macros, scripts and, specially, plugins programmed in Java language thanks to the ImageJ API. In addition, ImageJ has official repositories where it is possible to freely get many different macros, scripts and plugins thanks to the work carried out by the ImageJ developers community, which continuously debug, improve and widen them. This document is a report which explains a detailed analysis of all the digital image processing tools offered by ImageJ. Its final goal is to determine if ImageJ can be useful to the environment of Escuela Tecnica Superior de Ingenierfa y Sistemas de Telecomunicacion of Universidad Politecnica de Madrid, in spite of being focused at the health sciences field. In such a case, it also aims to highlight the characteristics which could be more beneficial in this field, and serve as an introductory guide too. In the following pages, all of the ImageJ tools (version 1.48q) are examined one by one, as well as their work and the underlying mechanics. The document follows the order established by the menus in ImageJ: the first chapter covers all the tools destined to manipulate images in general (menu Image); the second one covers all the processing tools (menu Process); the third one includes analyzing tools (menu Analyze); and finally, the fourth one contains all those tools related to ImageJ extensibility (menu Plugins).
Resumo:
The mobile apps market is a tremendous success, with millions of apps downloaded and used every day by users spread all around the world. For apps’ developers, having their apps published on one of the major app stores (e.g. Google Play market) is just the beginning of the apps lifecycle. Indeed, in order to successfully compete with the other apps in the market, an app has to be updated frequently by adding new attractive features and by fixing existing bugs. Clearly, any developer interested in increasing the success of her app should try to implement features desired by the app’s users and to fix bugs affecting the user experience of many of them. A precious source of information to decide how to collect users’ opinions and wishes is represented by the reviews left by users on the store from which they downloaded the app. However, to exploit such information the app’s developer should manually read each user review and verify if it contains useful information (e.g. suggestions for new features). This is something not doable if the app receives hundreds of reviews per day, as happens for the very popular apps on the market. In this work, our aim is to provide support to mobile apps developers by proposing a novel approach exploiting data mining, natural language processing, machine learning, and clustering techniques in order to classify the user reviews on the basis of the information they contain (e.g. useless, suggestion for new features, bugs reporting). Such an approach has been empirically evaluated and made available in a web-‐based tool publicly available to all apps’ developers. The achieved results showed that the developed tool: (i) is able to correctly categorise user reviews on the basis of their content (e.g. isolating those reporting bugs) with 78% of accuracy, (ii) produces clusters of reviews (e.g. groups together reviews indicating exactly the same bug to be fixed) that are meaningful from a developer’s point-‐of-‐view, and (iii) is considered useful by a software company working in the mobile apps’ development market.
Resumo:
Los flujos de trabajo científicos han sido adoptados durante la última década para representar los métodos computacionales utilizados en experimentos in silico, así como para dar soporte a sus publicaciones asociadas. Dichos flujos de trabajo han demostrado ser útiles para compartir y reproducir experimentos científicos, permitiendo a investigadores visualizar, depurar y ahorrar tiempo a la hora de re-ejecutar un trabajo realizado con anterioridad. Sin embargo, los flujos de trabajo científicos pueden ser en ocasiones difíciles de entender y reutilizar. Esto es debido a impedimentos como el gran número de flujos de trabajo existentes en repositorios, su heterogeneidad o la falta generalizada de documentación y ejemplos de uso. Además, dado que normalmente es posible implementar un mismo método utilizando algoritmos o técnicas distintas, flujos de trabajo aparentemente distintos pueden estar relacionados a un determinado nivel de abstracción, basándose, por ejemplo, en su funcionalidad común. Esta tesis se centra en la reutilización de flujos de trabajo y su abstracción mediante la exploración de relaciones entre los flujos de trabajo de un repositorio y la extracción de abstracciones que podrían ayudar a la hora de reutilizar otros flujos de trabajo existentes. Para ello, se propone un modelo simple de representación de flujos de trabajo y sus ejecuciones, se analizan las abstracciones típicas que se pueden encontrar en los repositorios de flujos de trabajo, se exploran las prácticas habituales de los usuarios a la hora de reutilizar flujos de trabajo existentes y se describe un método para descubrir abstracciones útiles para usuarios, basadas en técnicas existentes de teoría de grafos. Los resultados obtenidos exponen las abstracciones y prácticas comunes de usuarios en términos de reutilización de flujos de trabajo, y muestran cómo las abstracciones que se extraen automáticamente tienen potencial para ser reutilizadas por usuarios que buscan diseñar nuevos flujos de trabajo. Abstract Scientific workflows have been adopted in the last decade to represent the computational methods used in in silico scientific experiments and their associated research products. Scientific workflows have demonstrated to be useful for sharing and reproducing scientific experiments, allowing scientists to visualize, debug and save time when re-executing previous work. However, scientific workflows may be difficult to understand and reuse. The large amount of available workflows in repositories, together with their heterogeneity and lack of documentation and usage examples may become an obstacle for a scientist aiming to reuse the work from other scientists. Furthermore, given that it is often possible to implement a method using different algorithms or techniques, seemingly disparate workflows may be related at a higher level of abstraction, based on their common functionality. In this thesis we address the issue of reusability and abstraction by exploring how workflows relate to one another in a workflow repository, mining abstractions that may be helpful for workflow reuse. In order to do so, we propose a simple model for representing and relating workflows and their executions, we analyze the typical common abstractions that can be found in workflow repositories, we explore the current practices of users regarding workflow reuse and we describe a method for discovering useful abstractions for workflows based on existing graph mining techniques. Our results expose the common abstractions and practices of users in terms of workflow reuse, and show how our proposed abstractions have potential to become useful for users designing new workflows.
Resumo:
Esta memoria es el resultado de un proyecto cuyo objetivo ha sido realizar un análisis de la posible aplicación de técnicas relativas al Process Mining para entornos AmI (Ambient Intelligence). Dicho análisis tiene la facultad de presentar de forma clara los resultados extraídos de los procesos relativos a un caso de uso planteado, así como de aplicar dichos resultados a aplicaciones relativas a entornos AmI, como automatización de tareas o simulación social basada en agentes. Para que dicho análisis sea comprensible por el lector, se presentan detalladas explicaciones de los conceptos tratados y las técnicas empleadas. Además, se analizan exhaustivamente las dos herramientas software más utilizadas en cuanto a minería de procesos se refiere, ProM y Disco, presentando ventajas e inconvenientes de cada una, así como una comparación entre las dos. Posteriormente se ha desarrollado una metodología para el análisis de procesos con la herramienta ProM, anteriormente mencionada, explicando cuidadosamente cada uno de los pasos así como los fundamentos de los algoritmos utilizados. Por último, se han presentado las conclusiones extraídas del trabajo, así como las posibles líneas de continuación del proyecto.