986 resultados para source code querying
Resumo:
Code clone detection helps connect developers across projects, if we do it on a large scale. The cornerstones that allow clone detection to work on a large scale are: (1) bad hashing (2) lightweight parsing using regular expressions and (3) MapReduce pipelines. Bad hashing means to determine whether or not two artifacts are similar by checking whether their hashes are identical. We show a bad hashing scheme that works well on source code. Lightweight parsing using regular expressions is our technique of obtaining entire parse trees from regular expressions, robustly and efficiently. We detail the algorithm and implementation of one such regular expression engine. MapReduce pipelines are a way of expressing a computation such that it can automatically and simply be parallelized. We detail the design and implementation of one such MapReduce pipeline that is efficient and debuggable. We show a clone detector that combines these cornerstones to detect code clones across all projects, across all versions of each project.
Resumo:
Background Gray scale images make the bulk of data in bio-medical image analysis, and hence, the main focus of many image processing tasks lies in the processing of these monochrome images. With ever improving acquisition devices, spatial and temporal image resolution increases, and data sets become very large. Various image processing frameworks exists that make the development of new algorithms easy by using high level programming languages or visual programming. These frameworks are also accessable to researchers that have no background or little in software development because they take care of otherwise complex tasks. Specifically, the management of working memory is taken care of automatically, usually at the price of requiring more it. As a result, processing large data sets with these tools becomes increasingly difficult on work station class computers. One alternative to using these high level processing tools is the development of new algorithms in a languages like C++, that gives the developer full control over how memory is handled, but the resulting workflow for the prototyping of new algorithms is rather time intensive, and also not appropriate for a researcher with little or no knowledge in software development. Another alternative is in using command line tools that run image processing tasks, use the hard disk to store intermediate results, and provide automation by using shell scripts. Although not as convenient as, e.g. visual programming, this approach is still accessable to researchers without a background in computer science. However, only few tools exist that provide this kind of processing interface, they are usually quite task specific, and don’t provide an clear approach when one wants to shape a new command line tool from a prototype shell script. Results The proposed framework, MIA, provides a combination of command line tools, plug-ins, and libraries that make it possible to run image processing tasks interactively in a command shell and to prototype by using the according shell scripting language. Since the hard disk becomes the temporal storage memory management is usually a non-issue in the prototyping phase. By using string-based descriptions for filters, optimizers, and the likes, the transition from shell scripts to full fledged programs implemented in C++ is also made easy. In addition, its design based on atomic plug-ins and single tasks command line tools makes it easy to extend MIA, usually without the requirement to touch or recompile existing code. Conclusion In this article, we describe the general design of MIA, a general purpouse framework for gray scale image processing. We demonstrated the applicability of the software with example applications from three different research scenarios, namely motion compensation in myocardial perfusion imaging, the processing of high resolution image data that arises in virtual anthropology, and retrospective analysis of treatment outcome in orthognathic surgery. With MIA prototyping algorithms by using shell scripts that combine small, single-task command line tools is a viable alternative to the use of high level languages, an approach that is especially useful when large data sets need to be processed.
Resumo:
Extensible Business Reporting Language (XBRL) is being adopted by European regulators as a data standard for the exchange of business information. This paper examines the approach of XBRL International (XII) to the meta-data standard's development and diffusion. We theorise the development of XBRL using concepts drawn from a model of successful open source projects. Comparison of the open source model to XBRL enables us to identify a number of interesting similarities and differences. In common with open source projects, the benefits and progress of XBRL have been overstated and 'hyped' by enthusiastic participants. While XBRL is an open data standard in terms of access to the equivalent of its 'source code' we find that the governance structure of the XBRL consortium is significantly different to a model open source approach. The barrier to participation that is created by requiring paid membership and a focus on transacting business at physical conferences and meetings is identified as particularly critical. Decisions about the technical structure of XBRL, the regulator-led pattern of adoption and the organisation of XII are discussed. Finally areas for future research are identified.
Resumo:
In recent years, a surprising new phenomenon has emerged in which globally-distributed online communities collaborate to create useful and sophisticated computer software. These open source software groups are comprised of generally unaffiliated individuals and organizations who work in a seemingly chaotic fashion and who participate on a voluntary basis without direct financial incentive. ^ The purpose of this research is to investigate the relationship between the social network structure of these intriguing groups and their level of output and activity, where social network structure is defined as (1) closure or connectedness within the group, (2) bridging ties which extend outside of the group, and (3) leader centrality within the group. Based on well-tested theories of social capital and centrality in teams, propositions were formulated which suggest that social network structures associated with successful open source software project communities will exhibit high levels of bridging and moderate levels of closure and leader centrality. ^ The research setting was the SourceForge hosting organization and a study population of 143 project communities was identified. Independent variables included measures of closure and leader centrality defined over conversational ties, along with measures of bridging defined over membership ties. Dependent variables included source code commits and software releases for community output, and software downloads and project site page views for community activity. A cross-sectional study design was used and archival data were extracted and aggregated for the two-year period following the first release of project software. The resulting compiled variables were analyzed using multiple linear and quadratic regressions, controlling for group size and conversational volume. ^ Contrary to theory-based expectations, the surprising results showed that successful project groups exhibited low levels of closure and that the levels of bridging and leader centrality were not important factors of success. These findings suggest that the creation and use of open source software may represent a fundamentally new socio-technical development process which disrupts the team paradigm and which triggers the need for building new theories of collaborative development. These new theories could point towards the broader application of open source methods for the creation of knowledge-based products other than software. ^
Resumo:
In recent years, a surprising new phenomenon has emerged in which globally-distributed online communities collaborate to create useful and sophisticated computer software. These open source software groups are comprised of generally unaffiliated individuals and organizations who work in a seemingly chaotic fashion and who participate on a voluntary basis without direct financial incentive. The purpose of this research is to investigate the relationship between the social network structure of these intriguing groups and their level of output and activity, where social network structure is defined as 1) closure or connectedness within the group, 2) bridging ties which extend outside of the group, and 3) leader centrality within the group. Based on well-tested theories of social capital and centrality in teams, propositions were formulated which suggest that social network structures associated with successful open source software project communities will exhibit high levels of bridging and moderate levels of closure and leader centrality. The research setting was the SourceForge hosting organization and a study population of 143 project communities was identified. Independent variables included measures of closure and leader centrality defined over conversational ties, along with measures of bridging defined over membership ties. Dependent variables included source code commits and software releases for community output, and software downloads and project site page views for community activity. A cross-sectional study design was used and archival data were extracted and aggregated for the two-year period following the first release of project software. The resulting compiled variables were analyzed using multiple linear and quadratic regressions, controlling for group size and conversational volume. Contrary to theory-based expectations, the surprising results showed that successful project groups exhibited low levels of closure and that the levels of bridging and leader centrality were not important factors of success. These findings suggest that the creation and use of open source software may represent a fundamentally new socio-technical development process which disrupts the team paradigm and which triggers the need for building new theories of collaborative development. These new theories could point towards the broader application of open source methods for the creation of knowledge-based products other than software.
Resumo:
Context: Obfuscation is a common technique used to protect software against mali- cious reverse engineering. Obfuscators manipulate the source code to make it harder to analyze and more difficult to understand for the attacker. Although different ob- fuscation algorithms and implementations are available, they have never been directly compared in a large scale study. Aim: This paper aims at evaluating and quantifying the effect of several different obfuscation implementations (both open source and commercial), to help developers and project manager to decide which one could be adopted. Method: In this study we applied 44 obfuscations to 18 subject applications covering a total of 4 millions lines of code. The effectiveness of these source code obfuscations has been measured using 10 code metrics, considering modularity, size and complexity of code. Results: Results show that some of the considered obfuscations are effective in mak- ing code metrics change substantially from original to obfuscated code, although this change (called potency of the obfuscation) is different on different metrics. In the pa- per we recommend which obfuscations to select, given the security requirements of the software to be protected.
Resumo:
The availability of a huge amount of source code from code archives and open-source projects opens up the possibility to merge machine learning, programming languages, and software engineering research fields. This area is often referred to as Big Code where programming languages are treated instead of natural languages while different features and patterns of code can be exploited to perform many useful tasks and build supportive tools. Among all the possible applications which can be developed within the area of Big Code, the work presented in this research thesis mainly focuses on two particular tasks: the Programming Language Identification (PLI) and the Software Defect Prediction (SDP) for source codes. Programming language identification is commonly needed in program comprehension and it is usually performed directly by developers. However, when it comes at big scales, such as in widely used archives (GitHub, Software Heritage), automation of this task is desirable. To accomplish this aim, the problem is analyzed from different points of view (text and image-based learning approaches) and different models are created paying particular attention to their scalability. Software defect prediction is a fundamental step in software development for improving quality and assuring the reliability of software products. In the past, defects were searched by manual inspection or using automatic static and dynamic analyzers. Now, the automation of this task can be tackled using learning approaches that can speed up and improve related procedures. Here, two models have been built and analyzed to detect some of the commonest bugs and errors at different code granularity levels (file and method levels). Exploited data and models’ architectures are analyzed and described in detail. Quantitative and qualitative results are reported for both PLI and SDP tasks while differences and similarities concerning other related works are discussed.
Resumo:
Background: High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results: This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion: These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/.S3T source code and datasets can also be downloaded from the aforementioned website.
Resumo:
Abstract — The analytical methods based on evaluation models of interactive systems were proposed as an alternative to user testing in the last stages of the software development due to its costs. However, the use of isolated behavioral models of the system limits the results of the analytical methods. An example of these limitations relates to the fact that they are unable to identify implementation issues that will impact on usability. With the introduction of model-based testing we are enable to test if the implemented software meets the specified model. This paper presents an model-based approach for test cases generation from the static analysis of source code.
Resumo:
A large and growing amount of software systems rely on non-trivial coordination logic for making use of third party services or components. Therefore, it is of outmost importance to understand and capture rigorously this continuously growing layer of coordination as this will make easier not only the veri cation of such systems with respect to their original speci cations, but also maintenance, further development, testing, deployment and integration. This paper introduces a method based on several program analysis techniques (namely, dependence graphs, program slicing, and graph pattern analysis) to extract coordination logic from legacy systems source code. This process is driven by a series of pre-de ned coordination patterns and captured by a special purpose graph structure from which coordination speci cations can be generated in a number of di erent formalisms
Resumo:
Current software development often relies on non-trivial coordination logic for combining autonomous services, eventually running on different platforms. As a rule, however, such a coordination layer is strongly woven within the application at source code level. Therefore, its precise identification becomes a major methodological (and technical) problem and a challenge to any program understanding or refactoring process. The approach introduced in this paper resorts to slicing techniques to extract coordination data from source code. Such data are captured in a specific dependency graph structure from which a coordination model can be recovered either in the form of an Orc specification or as a collection of code fragments corresponding to the identification of typical coordination patterns in the system. Tool support is also discussed
Resumo:
What sort of component coordination strategies emerge in a software integration process? How can such strategies be discovered and further analysed? How close are they to the coordination component of the envisaged architectural model which was supposed to guide the integration process? This paper introduces a framework in which such questions can be discussed and illustrates its use by describing part of a real case-study. The approach is based on a methodology which enables semi-automatic discovery of coordination patterns from source code, combining generalized slicing techniques and graph manipulation
Resumo:
Cryptographic software development is a challenging eld: high performance must be achieved, while ensuring correctness and com- pliance with low-level security policies. CAO is a domain speci c language designed to assist development of cryptographic software. An important feature of this language is the design of a novel type system introducing native types such as prede ned sized vectors, matrices and bit strings, residue classes modulo an integer, nite elds and nite eld extensions, allowing for extensive static validation of source code. We present the formalisation, validation and implementation of this type system
Resumo:
In the context of an e ort to develop methodologies to support the evaluation of interactive system, this paper investigates an approach to detect graphical user interface bad smells. Our approach consists in detecting user interface bad smells through model-based reverse engineering from source code. Models are used to de ne which widgets are present in the interface, when can particular graphical user interface (GUI) events occur, under which conditions, which system actions are executed, and which GUI state is generated next.
Resumo:
This paper presents a catalog of smells in the context of interactive applications. These so-called usability smells are indicators of poor design on an application’s user interface, with the potential to hinder not only its usability but also its maintenance and evolution. To eliminate such usability smells we discuss a set of program/usability refactorings. In order to validate the presented usability smells catalog, and the associated refactorings, we present a preliminary empirical study with software developers in the context of a real open source hospital management application. Moreover, a tool that computes graphical user interface behavior models, giving the applications’ source code, is used to automatically detect usability smells at the model level.