998 resultados para clone detection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Business process model repositories capture precious knowledge about an organization or a business domain. In many cases, these repositories contain hundreds or even thousands of models and they represent several man-years of effort. Over time, process model repositories tend to accumulate duplicate fragments, as new process models are created by copying and merging fragments from other models. This calls for methods to detect duplicate fragments in process models that can be refactored as separate subprocesses in order to increase readability and maintainability. This paper presents an indexing structure to support the fast detection of clones in large process model repositories. Experiments show that the algorithm scales to repositories with hundreds of models. The experimental results also show that a significant number of non-trivial clones can be found in process model repositories taken from industrial practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As organizations reach to higher levels of business process management maturity, they often find themselves maintaining repositories of hundreds or even thousands of process models, representing valuable knowledge about their operations. Over time, process model repositories tend to accumulate duplicate fragments (also called clones) as new process models are created or extended by copying and merging fragments from other models. This calls for methods to detect clones in process models, so that these clones can be refactored as separate subprocesses in order to improve maintainability. This paper presents an indexing structure to support the fast detection of clones in large process model repositories. The proposed index is based on a novel combination of a method for process model decomposition (specifically the Refined Process Structure Tree), with established graph canonization and string matching techniques. Experiments show that the algorithm scales to repositories with hundreds of models. The experimental results also show that a significant number of non-trivial clones can be found in process model repositories taken from industrial practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Evidence exists that repositories of business process models used in industrial practice contain significant amounts of duplication. This duplication may stem from the fact that the repository describes variants of the same pro- cesses and/or because of copy/pasting activity throughout the lifetime of the repository. Previous work has put forward techniques for identifying duplicate fragments (clones) that can be refactored into shared subprocesses. However, these techniques are limited to finding exact clones. This paper analyzes the prob- lem of approximate clone detection and puts forward two techniques for detecting clusters of approximate clones. Experiments show that the proposed techniques are able to accurately retrieve clusters of approximate clones that originate from copy/pasting followed by independent modifications to the copied fragments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In all applications of clone detection it is important to have precise and efficient clone identification algorithms. This paper proposes and outlines a new algorithm, KClone for clone detection that incorporates a novel combination of lexical and local dependence analysis to achieve precision, while retaining speed. The paper also reports on the initial results of a case study using an implementation of KClone with which we have been experimenting. The results indi- cate the ability of KClone to find types-1,2, and 3 clones compared to token-based and PDG-based techniques. The paper also reports results of an initial empirical study of the performance of KClone compared to CCFinderX.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Code clone detection helps connect developers across projects, if we do it on a large scale. The cornerstones that allow clone detection to work on a large scale are: (1) bad hashing (2) lightweight parsing using regular expressions and (3) MapReduce pipelines. Bad hashing means to determine whether or not two artifacts are similar by checking whether their hashes are identical. We show a bad hashing scheme that works well on source code. Lightweight parsing using regular expressions is our technique of obtaining entire parse trees from regular expressions, robustly and efficiently. We detail the algorithm and implementation of one such regular expression engine. MapReduce pipelines are a way of expressing a computation such that it can automatically and simply be parallelized. We detail the design and implementation of one such MapReduce pipeline that is efficient and debuggable. We show a clone detector that combines these cornerstones to detect code clones across all projects, across all versions of each project.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Approximate clone detection is the process of identifying similar process fragments in business process model collections. The tool presented in this paper can efficiently cluster approximate clones in large process model repositories. Once a repository is clustered, users can filter and browse the clusters using different filtering parameters. Our tool can also visualize clusters in the 2D space, allowing a better understanding of clusters and their member fragments. This demonstration will be useful for researchers and practitioners working on large process model repositories, where process standardization is a critical task for increasing the consistency and reducing the complexity of the repository.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Automated process discovery techniques aim at extracting process models from information system logs. Existing techniques in this space are effective when applied to relatively small or regular logs, but generate spaghetti-like and sometimes inaccurate models when confronted to logs with high variability. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. This leads to a collection of process models – each one representing a variant of the business process – as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity and low fitness. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically using subprocess extraction. Splitting is performed in a controlled manner in order to achieve user-defined complexity or fitness thresholds. Experiments on real-life logs show that the technique produces collections of models substantially smaller than those extracted by applying existing trace clustering techniques, while allowing the user to control the fitness of the resulting models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Empirical evidence shows that repositories of business process models used in industrial practice contain significant amounts of duplication. This duplication arises for example when the repository covers multiple variants of the same processes or due to copy-pasting. Previous work has addressed the problem of efficiently retrieving exact clones that can be refactored into shared subprocess models. This article studies the broader problem of approximate clone detection in process models. The article proposes techniques for detecting clusters of approximate clones based on two well-known clustering algorithms: DBSCAN and Hi- erarchical Agglomerative Clustering (HAC). The article also defines a measure of standardizability of an approximate clone cluster, meaning the potential benefit of replacing the approximate clones with a single standardized subprocess. Experiments show that both techniques, in conjunction with the proposed standardizability measure, accurately retrieve clusters of approximate clones that originate from copy-pasting followed by independent modifications to the copied fragments. Additional experiments show that both techniques produce clusters that match those produced by human subjects and that are perceived to be standardizable.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Code clones are portions of source code which are similar to the original program code. The presence of code clones is considered as a bad feature of software as the maintenance of software becomes difficult due to the presence of code clones. Methods for code clone detection have gained immense significance in the last few years as they play a significant role in engineering applications such as analysis of program code, program understanding, plagiarism detection, error detection, code compaction and many more similar tasks. Despite of all these facts, several features of code clones if properly utilized can make software development process easier. In this work, we have pointed out such a feature of code clones which highlight the relevance of code clones in test sequence identification. Here program slicing is used in code clone detection. In addition, a classification of code clones is presented and the benefit of using program slicing in code clone detection is also mentioned in this work.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years, it has been observed that software clones and plagiarism are becoming an increased threat for one?s creativity. Clones are the results of copying and using other?s work. According to the Merriam – Webster dictionary, “A clone is one that appears to be a copy of an original form”. It is synonym to duplicate. Clones lead to redundancy of codes, but not all redundant code is a clone.On basis of this background knowledge ,in order to safeguard one?s idea and to avoid intentional code duplication for pretending other?s work as if their owns, software clone detection should be emphasized more. The objective of this paper is to review the methods for clone detection and to apply those methods for finding the extent of plagiarism occurrence among the Swedish Universities in Master level computer science department and to analyze the results.The rest part of the paper, discuss about software plagiarism detection which employs data analysis technique and then statistical analysis of the results.Plagiarism is an act of stealing and passing off the idea?s and words of another person?s as one?s own. Using data analysis technique, samples(Master level computer Science thesis report) were taken from various Swedish universities and processed in Ephorus anti plagiarism software detection. Ephorus gives the percentage of plagiarism for each thesis document, from this results statistical analysis were carried out using Minitab Software.The results gives a very low percentage of Plagiarism extent among the Swedish universities, which concludes that Plagiarism is not a threat to Sweden?s standard of education in computer science.This paper is based on data analysis, intelligence techniques, EPHORUS software plagiarism detection tool and MINITAB statistical software analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The lack of effective tools have hampered our ability to assess the size, growth and ages of clonal plants. With Serenoa repens (saw palmetto) as a model, we introduce a novel analytical framework that integrates DNA fingerprinting and mathematical modelling to simulate growth and estimate ages of clonal plants. We also demonstrate the application of such life-history information of clonal plants to provide insight into management plans. Serenoa is an ecologically important foundation species in many Southeastern United States ecosystems; yet, many land managers consider Serenoa a troublesome invasive plant. Accordingly, management plans have been developed to reduce or eliminate Serenoa with little understanding of its life history. Using Amplified Fragment Length Polymorphisms, we genotyped 263 Serenoa and 134 Sabal etonia (a sympatric non-clonal palmetto) samples collected from a 20 X 20 m study plot in Florida scrub. Sabal samples were used to assign small field-unidentifiable palmettos to Serenoa or Sabal and also as a negative control for clone detection. We then mathematically modelled clonal networks to estimate genet ages. Our results suggest that Serenoa predominantly propagate via vegetative sprouts and 10000-year-old genets may be common, while showing no evidence of clone formation by Sabal. The results of this and our previous studies suggest that: (i) Serenoa has been part of scrub associations for thousands of years, (ii) Serenoa invasion are unlikely and (ii) once Serenoa is eliminated from local communities, its restoration will be difficult. Reevaluation of the current management tools and plans is an urgent task.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The lack of effective tools has hampered our ability to assess the size, growth and ages of clonal plants. With Serenoa repens (saw palmetto) as a model, we introduce a novel analytical frame work that integrates DNA fingerprinting and mathematical modelling to simulate growth and estimate ages of clonal plants. We also demonstrate the application of such life-history information of clonal plants to provide insight into management plans. Serenoa is an ecologically important foundation species in many Southeastern United States ecosystems; yet, many land managers consider Serenoa a troublesome invasive plant. Accordingly, management plans have been developed to reduce or eliminate Serenoa with little understanding of its life history. Using Amplified Fragment Length Polymorphisms, we genotyped 263 Serenoa and 134 Sabal etonia (a sympatric non-clonal palmetto) samples collected from a 20 x 20 m study plot in Florida scrub. Sabal samples were used to assign small field-unidentifiable palmettos to Serenoa or Sabal and also as a negative control for clone detection. We then mathematically modelled clonal networks to estimate genet ages. Our results suggest that Serenoa predominantly propagate via vegetative sprouts and 10000-year-old genets maybe common, while showing no evidence of clone formation by Sabal. The results of this and our previous studies suggest that: (i) Serenoa has been part of scrub associations for thousands of years, (ii) Serenoa invasions are unlikely and (ii) once Serenoa is eliminated from local communities, its restoration will be difficult. Reevaluation of the current management tools and plans is an urgent task.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Code patterns, including programming patterns and design patterns, are good references for programming language feature improvement and software re-engineering. However, to our knowledge, no existing research has attempted to detect code patterns based on code clone detection technology. In this study, we build upon the previous work and propose to detect and analyze code patterns from a collection of open source projects using NiPAT technology. Because design patterns are most closely associated with object-oriented languages, we choose Java and Python projects to conduct our study. The tool we use for detecting patterns is NiPAT, a pattern detecting tool originally developed for the TXL programming language based on the NiCad clone detector. We extend NiPAT for the Java and Python programming languages. Then, we try to identify all the patterns from the pattern report and classify them into several different categories. In the end of the study, we analyze all the patterns and compare the differences between Java and Python patterns.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As organizations reach higher levels of business process management maturity, they often find themselves maintaining very large process model repositories, representing valuable knowledge about their operations. A common practice within these repositories is to create new process models, or extend existing ones, by copying and merging fragments from other models. We contend that if these duplicate fragments, a.k.a. ex- act clones, can be identified and factored out as shared subprocesses, the repository’s maintainability can be greatly improved. With this purpose in mind, we propose an indexing structure to support fast detection of clones in process model repositories. Moreover, we show how this index can be used to efficiently query a process model repository for fragments. This index, called RPSDAG, is based on a novel combination of a method for process model decomposition (namely the Refined Process Structure Tree), with established graph canonization and string matching techniques. We evaluated the RPSDAG with large process model repositories from industrial practice. The experiments show that a significant number of non-trivial clones can be efficiently found in such repositories, and that fragment queries can be handled efficiently.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Composting refers to aerobic degradation of organic material and is one of the main waste treatment methods used in Finland for treating separated organic waste. The composting process allows converting organic waste to a humus-like end product which can be used to increase the organic matter in agricultural soils, in gardening, or in landscaping. Microbes play a key role as degraders during the composting-process, and the microbiology of composting has been studied for decades, but there are still open questions regarding the microbiota in industrial composting processes. It is known that with the traditional, culturing-based methods only a small fraction, below 1%, of the species in a sample is normally detected. In recent years an immense diversity of bacteria, fungi and archaea has been found to occupy many different environments. Therefore the methods of characterising microbes constantly need to be developed further. In this thesis the presence of fungi and bacteria in full-scale and pilot-scale composting processes was characterised with cloning and sequencing. Several clone libraries were constructed and altogether nearly 6000 clones were sequenced. The microbial communities detected in this study were found to differ from the compost microbes observed in previous research with cultivation based methods or with molecular methods from processes of smaller scale, although there were similarities as well. The bacterial diversity was high. Based on the non-parametric coverage estimations, the number of bacterial operational taxonomic units (OTU) in certain stages of composting was over 500. Sequences similar to Lactobacillus and Acetobacteria were frequently detected in the early stages of drum composting. In tunnel stages of composting the bacterial community comprised of Bacillus, Thermoactinomyces, Actinobacteria and Lactobacillus. The fungal diversity was found to be high and phylotypes similar to yeasts were abundantly found in the full-scale drum and tunnel processes. In addition to phylotypes similar to Candida, Pichia and Geotrichum moulds from genus Thermomyces and Penicillium were observed in tunnel stages of composting. Zygomycetes were detected in the pilot-scale composting processes and in the compost piles. In some of the samples there were a few abundant phylotypes present in the clone libraries that masked the rare ones. The rare phylotypes were of interest and a method for collecting them from clone libraries for sequencing was developed. With negative selection of the abundant phylotyps the rare ones were picked from the clone libraries. Thus 41% of the clones in the studied clone libraries were sequenced. Since microbes play a central role in composting and in many other biotechnological processes, rapid methods for characterization of microbial diversity would be of value, both scientifically and commercially. Current methods, however, lack sensitivity and specificity and are therefore under development. Microarrays have been used in microbial ecology for a decade to study the presence or absence of certain microbes of interest in a multiplex manner. The sequence database collected in this thesis was used as basis for probe design and microarray development. The enzyme assisted detection method, ligation-detection-reaction (LDR) based microarray, was adapted for species-level detection of microbes characteristic of each stage of the composting process. With the use of a specially designed control probe it was established that a species specific probe can detect target DNA representing as little as 0.04% of total DNA in a sample. The developed microarray can be used to monitor composting processes or the hygienisation of the compost end product. A large compost microbe sequence dataset was collected and analysed in this thesis. The results provide valuable information on microbial community composition during industrial scale composting processes. The microarray method was developed based on the sequence database collected in this study. The method can be utilised in following the fate of interesting microbes during composting process in an extremely sensitive and specific manner. The platform for the microarray is universal and the method can easily be adapted for studying microbes from environments other than compost.