950 resultados para Software repository mining. Process mining. Software developer contribution
Resumo:
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.
Resumo:
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.
Resumo:
This article explores the contribution that artisanal and small-scale mining (ASM) makes to poverty reduction in Tanzania, based on data on gold and diamond mining in Mwanza Region. The evidence suggests that people working in mining or related services are less likely to be in poverty than those with other occupations. However, the picture is complex; while mining income can help reduce poverty and provide a buffer from livelihood shocks, peoples inability to obtain a formal mineral claim, or to effectively exploit their claims, contributes to insecurity. This is reinforced by a context in which ASM is peripheral to large-scale mining interests, is only gradually being addressed within national poverty reduction policies, and is segregated from district-level planning.
Resumo:
This paper introduces an architecture for identifying and modelling in real-time at a copper mine using new technologies as M2M and cloud computing with a server in the cloud and an Android client inside the mine. The proposed design brings up pervasive mining, a system with wider coverage, higher communication efficiency, better fault-tolerance, and anytime anywhere availability. This solution was designed for a plant inside the mine which cannot tolerate interruption and for which their identification in situ, in real time, is an essential part of the system to control aspects such as instability by adjusting their corresponding parameters without stopping the process.
Resumo:
n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.
Resumo:
Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.
Resumo:
Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.
Resumo:
Managing software maintenance is rarely a precise task due to uncertainties concerned with resources and services descriptions. Even when a well-established maintenance process is followed, the risk of delaying tasks remains if the new services are not precisely described or when resources change during process execution. Also, the delay of a task at an early process stage may represent a different delay at the end of the process, depending on complexity or services reliability requirements. This paper presents a knowledge-based representation (Bayesian Networks) for maintenance project delays based on specialists experience and a corresponding tool to help in managing software maintenance projects. (c) 2006 Elsevier Ltd. All rights reserved.
Resumo:
In software development organizations there is sometimes a need for change. In order to meet continuously increasing demands from their customers, Sandvik IT Services- SITS, at Sandvik in Sweden, required improving the way they worked with software development. Due to issues like a lot of work in progress and lot of simultaneous tasks for individuals in the teams that caused stress, it was almost impossible to address the question of working with improvements. In order to enable the improvement process Kanban was introduced in the software development teams. Kanban for software development is a change method created by David J. Anderson. The purpose of this thesis is twofold. One part is to assess what effects Kanban has had on the software development teams. The other part is to make a documentation of the Kanban implementation process at SITS. The documentation has been made on the basis of both company internal resources and observations of the Kanban implementation process. The effects of Kanban have been researched with an interview survey to the teams that have gone through the Kick start of the Kanban process. The result of the thesis is also twofold. One part of the result is an extensive documentation of the implementation process of Kanban at SITS. The other part is an assessment of the effects that Kanban has had at SITS. The major effects have been that the teams are experiencing less stress, more focus on quality and better customer collaboration. It is also evident is that it takes time for some effects to evolve when implementing Kanban
Resumo:
Architecture description languages (ADLs) are used to specify high-level, compositional views of a software application. ADL research focuses on software composed of prefabricated parts, so-called software components. ADLs usually come equipped with rigorous state-transition style semantics, facilitating verification and analysis of specifications. Consequently, ADLs are well suited to configuring distributed and event-based systems. However, additional expressive power is required for the description of enterprise software architectures – in particular, those built upon newer middleware, such as implementations of Java’s EJB specification, or Microsoft’s COM+/.NET. The enterprise requires distributed software solutions that are scalable, business-oriented and mission-critical. We can make progress toward attaining these qualities at various stages of the software development process. In particular, progress at the architectural level can be leveraged through use of an ADL that incorporates trust and dependability analysis. Also, current industry approaches to enterprise development do not address several important architectural design issues. The TrustME ADL is designed to meet these requirements, through combining approaches to software architecture specification with rigorous design-by-contract ideas. In this paper, we focus on several aspects of TrustME that facilitate specification and analysis of middleware-based architectures for trusted enterprise computing systems.
Resumo:
O objetivo da pesquisa atém-se primeiramente em elaborar um protocolo que permita analisar, por meio de um conjunto de indicadores, o processo de reutilização de software no desenvolvimento de sistemas de informação modelando objetos de negócios. O protocolo concebido compõe-se de um modelo analítico e de grades de análise, a serem empregadas na classificação e tabulação dos dados obtidos empiricamente. Com vistas à validação inicial do protocolo de análise, realiza-se um estudo de caso. A investigação ocorre num dos primeiros e, no momento, maior projeto de fornecimento de elementos de software reutilizáveis destinados a negócios, o IBM SANFRANCISCO, bem como no primeiro projeto desenvolvido no Brasil com base no por ele disponibilizado, o sistema Apontamento Universal de Horas (TIME SHEET System). Quanto à aplicabilidade do protocolo na prática, este se mostra abrangente e adequado à compreensão do processo. Quanto aos resultados do estudo de caso, a análise dos dados revela uma situação em que as expectativas (dos pesquisadores) de reutilização de elementos de software orientadas a negócio eram superiores ao observado. Houve, entretanto, reutilização de elementos de baixo nível, que forneceram a infra-estrutura necessária para o desenvolvimento do projeto. Os resultados contextualizados diante das expectativas de reutilização (dos desenvolvedores) são positivos, na medida em que houve benefícios metodológicos e tecnológicos decorrentes da parceria realizada. Por outro lado, constatam-se alguns aspectos restritivos para o desenvolvedor de aplicativos, em virtude de escolhas arbitrárias realizadas pelo provedor de elementos reutilizáveis.
Resumo:
Este trabalho apresenta uma arquitetura para Ambientes de Desenvolvimento de Software (ADS). Esta arquitetura é baseada em produtos comerciais de prateleira (COTS), principalmente em um Sistema de Gerência de Workflow – SGW (Microsoft Exchange 2000 Server – E2K) - e tem como plataforma de funcionamento a Internet, integrando também algumas ferramentas que fazem parte do grande conjunto de aplicativos que é utilizado no processo de desenvolvimento de software. O desenvolvimento de um protótipo (WOSDIE – WOrkflow-based Software Development Integrated Environment) baseado na arquitetura apresentada é descrito em detalhes, mostrando as etapas de construção, funções implementadas e dispositivos necessários para a integração de um SGW, ferramentas de desenvolvimento, banco de dados (WSS – Web Storage System) e outros, para a construção de um ADS. O processo de software aplicado no WOSDIE foi extraído do RUP (Rational Unified Process – Processo Unificado Rational). Este processo foi modelado na ferramenta Workflow Designer, que permite a modelagem dos processos de workflow dentro do E2K. A ativação de ferramentas a partir de um navegador Web e o armazenamento dos artefatos produzidos em um projeto de software também são abordados. O E2K faz o monitoramento dos eventos que ocorrem dentro do ambiente WOSDIE, definindo, a partir das condições modeladas no Workflow Designer, quais atividades devem ser iniciadas após o término de alguma atividade anterior e quem é o responsável pela execução destas novas atividades (assinalamento de atividades). A arquitetura proposta e o protótipo WOSDIE são avaliados segundo alguns critérios retirados de vários trabalhos. Estas avaliações mostram em mais detalhes as características da arquitetura proposta e proporcionam uma descrição das vantagens e problemas associados ao WOSDIE.
Resumo:
A tecnologia de processos de desenvolvimento de software ´e uma importante área de estudo e pesquisas na Engenharia de Software que envolve a construção de ferramentas e ambientes para modelagem, execução, simulação e evolução de processos de desenvolvimento de software, conhecidos como PSEEs (do inglês: Process-Centered Software Engineering Environments). Um modelo de processo de software é uma estrutura complexa que relaciona elementos gerenciáveis (i.e. artefatos, agentes, e atividades) que constituem o processo de software. Esta complexidade, geralmente, dificulta a percepção e entendimento do processo por parte dos profissionais envolvidos, principalmente quando estes profissionais têm acesso apenas a uma visão geral do modelo. Desta forma, há necessidade de mecanismos para visualização e acompanhamento dos processos, fornecendo informações adequadas aos diferentes estados, abstraindo as informações relevantes tanto as fases presentes no processo de desenvolvimento quanto ao agente envolvido, além de facilitar a interação e o entendimento humano sobre os elementos do processo. Estudos afirmam que a maneira como são apresentadas as informações do modelo de processo pode influenciar no sucesso ou não do desenvolvimento do software, assim como facilitar a adoção da tecnologia pela indústria de software. Este trabalho visa contribuir nas pesquisas que buscam mecanismos e cientes para a visualização de processos de software apresentando a abordagem APSEE-Monitor destinada ao apoio a visualização de processos de software durante a sua execução. O principal objetivo desta pesquisa é apresentar um modelo formal de apoio a visualização de processos capaz de extrair dados de processos e organizá-los em sub-domínios de informações de interesse do gerente de processos. Neste trabalho aplica-se o conceito de múltiplas perspectivas como uma estratégia viável para a abstração e organização das informações presentes no modelo de processos. A solução proposta destaca-se por estender a definção original de perspectivas e fornecer uma estratégia de extração dos dados através de uma especificação formal utilizando o paradigma PROSOFT-Algébrico. Além disso, o trabalho apresenta um conjunto de requisitos relativos a interação entre gerentes de processos e PSEEs, a definição formal das perspectivas, uma gramática que define a linguagem de consulta aos processos, e um protótipo da aplicação.
Resumo:
The aim of this case study was to investigate how the evolutionary process of the development of software, especially the change of paradigm for the factory software, has affected the structure of the Board of Relacionamento, Desenvolvimento e Informações - DRD at a public company called Dataprev and also reflecting or not a return to the model taylorist-fordist of the organization on production. For this study, there were interviews and a questionnaire applied, as well as documentary and literature review on the following topics: organizational structure, factory software, taylorism and fordism. From the data collected and analyzed, in the perspective studied, it was understood that there is not a return to the model taylorist-fordist to organize the production in the process of adopting the concept of a software factory in Dataprev. In addition to that, there was a flexibility in the organizational structure of its units of development - the softwares factories Dataprev.