590 resultados para Software clones Detection
Resumo:
This work-in-progress paper presents an ensemble-based model for detecting and mitigating Distributed Denial-of-Service (DDoS) attacks, and its partial implementation. The model utilises network traffic analysis and MIB (Management Information Base) server load analysis features for detecting a wide range of network and application layer DDoS attacks and distinguishing them from Flash Events. The proposed model will be evaluated against realistic synthetic network traffic generated using a software-based traffic generator that we have developed as part of this research. In this paper, we summarise our previous work, highlight the current work being undertaken along with preliminary results obtained and outline the future directions of our work.
Resumo:
Approximate clone detection is the process of identifying similar process fragments in business process model collections. The tool presented in this paper can efficiently cluster approximate clones in large process model repositories. Once a repository is clustered, users can filter and browse the clusters using different filtering parameters. Our tool can also visualize clusters in the 2D space, allowing a better understanding of clusters and their member fragments. This demonstration will be useful for researchers and practitioners working on large process model repositories, where process standardization is a critical task for increasing the consistency and reducing the complexity of the repository.
Resumo:
Static anaylsis represents an approach of checking source code or compiled code of applications before it gets executed. Chess and McGraw state that static anaylsis promises to identify common coding problems automatically. While manual code checking is also a form of static analysis, software tools are used in most cases in order to perform the checks. Chess and McGraw additionaly claim that good static checkers can help to spot and eradicate common security bugs.
Resumo:
This report describes the available functionality and use of the ClusterEval evaluation software. It implements novel and standard measures for the evaluation of cluster quality. This software has been used at the INEX XML Mining track and in the MediaEval Social Event Detection task.
Resumo:
Background Hyperhomocysteinemia as a consequence of the MTHFR 677 C > T variant is associated with cardiovascular disease and stroke. Another factor that can potentially contribute to these disorders is a depleted nitric oxide level, which can be due to the presence of eNOS +894 G > T and eNOS −786 T > C variants that make an individual more susceptible to endothelial dysfunction. A number of genotyping methods have been developed to investigate these variants. However, simultaneous detection methods using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) analysis are still lacking. In this study, a novel multiplex PCR-RFLP method for the simultaneous detection of MTHFR 677 C > T and eNOS +894 G > T and eNOS −786 T > C variants was developed. A total of 114 healthy Malay subjects were recruited. The MTHFR 677 C > T and eNOS +894 G > T and eNOS −786 T > C variants were genotyped using the novel multiplex PCR-RFLP and confirmed by DNA sequencing as well as snpBLAST. Allele frequencies of MTHFR 677 C > T and eNOS +894 G > T and eNOS −786 T > C were calculated using the Hardy Weinberg equation. Methods The 114 healthy volunteers were recruited for this study, and their DNA was extracted. Primer pair was designed using Primer 3 Software version 0.4.0 and validated against the BLAST database. The primer specificity, functionality and annealing temperature were tested using uniplex PCR methods that were later combined into a single multiplex PCR. Restriction Fragment Length Polymorphism (RFLP) was performed in three separate tubes followed by agarose gel electrophoresis. PCR product residual was purified and sent for DNA sequencing. Results The allele frequencies for MTHFR 677 C > T were 0.89 (C allele) and 0.11 (T allele); for eNOS +894 G > T, the allele frequencies were 0.58 (G allele) and 0.43 (T allele); and for eNOS −786 T > C, the allele frequencies were 0.87 (T allele) and 0.13 (C allele). Conclusions Our PCR-RFLP method is a simple, cost-effective and time-saving method. It can be used to successfully genotype subjects for the MTHFR 677 C > T and eNOS +894 G > T and eNOS −786 T > C variants simultaneously with 100% concordance from DNA sequencing data. This method can be routinely used for rapid investigation of the MTHFR 677 C > T and eNOS +894 G > T and eNOS −786 T > C variants.
Resumo:
This paper presents a new framework for distributed intrusion detection based on taint marking. Our system tracks information flows between applications of multiple hosts gathered in groups (i.e., sets of hosts sharing the same distributed information flow policy) by attaching taint labels to system objects such as files, sockets, Inter Process Communication (IPC) abstractions, and memory mappings. Labels are carried over the network by tainting network packets. A distributed information flow policy is defined for each group at the host level by labeling information and defining how users and applications can legally access, alter or transfer information towards other trusted or untrusted hosts. As opposed to existing approaches, where information is most often represented by two security levels (low/high, public/private, etc.), our model identifies each piece of information within a distributed system, and defines their legal interaction in a fine-grained manner. Hosts store and exchange security labels in a peer to peer fashion, and there is no central monitor. Our IDS is implemented in the Linux kernel as a Linux Security Module (LSM) and runs standard software on commodity hardware with no required modification. The only trusted code is our modified operating system kernel. We finally present a scenario of intrusion in a web service running on multiple hosts, and show how our distributed IDS is able to report security violations at each host level.
Resumo:
The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.
Resumo:
Empirical evidence shows that repositories of business process models used in industrial practice contain significant amounts of duplication. This duplication arises for example when the repository covers multiple variants of the same processes or due to copy-pasting. Previous work has addressed the problem of efficiently retrieving exact clones that can be refactored into shared subprocess models. This article studies the broader problem of approximate clone detection in process models. The article proposes techniques for detecting clusters of approximate clones based on two well-known clustering algorithms: DBSCAN and Hi- erarchical Agglomerative Clustering (HAC). The article also defines a measure of standardizability of an approximate clone cluster, meaning the potential benefit of replacing the approximate clones with a single standardized subprocess. Experiments show that both techniques, in conjunction with the proposed standardizability measure, accurately retrieve clusters of approximate clones that originate from copy-pasting followed by independent modifications to the copied fragments. Additional experiments show that both techniques produce clusters that match those produced by human subjects and that are perceived to be standardizable.
Resumo:
This paper presents a technique for the automated removal of noise from process execution logs. Noise is the result of data quality issues such as logging errors and manifests itself in the form of infrequent process behavior. The proposed technique generates an abstract representation of an event log as an automaton capturing the direct follows relations between event labels. This automaton is then pruned from arcs with low relative frequency and used to remove from the log those events not fitting the automaton, which are identified as outliers. The technique has been extensively evaluated on top of various auto- mated process discovery algorithms using both artificial logs with different levels of noise, as well as a variety of real-life logs. The results show that the technique significantly improves the quality of the discovered process model along fitness, appropriateness and simplicity, without negative effects on generalization. Further, the technique scales well to large and complex logs.
Resumo:
Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which dearly demonstrates the advantages of the rank regression models.
Resumo:
There is an increased interest on the use of Unmanned Aerial Vehicles (UAVs) for wildlife and feral animal monitoring around the world. This paper describes a novel system which uses a predictive dynamic application that places the UAV ahead of a user, with a low cost thermal camera, a small onboard computer that identifies heat signatures of a target animal from a predetermined altitude and transmits that target’s GPS coordinates. A map is generated and various data sets and graphs are displayed using a GUI designed for easy use. The paper describes the hardware and software architecture and the probabilistic model for downward facing camera for the detection of an animal. Behavioral dynamics of target movement for the design of a Kalman filter and Markov model based prediction algorithm are used to place the UAV ahead of the user. Geometrical concepts and Haversine formula are applied to the maximum likelihood case in order to make a prediction regarding a future state of the user, thus delivering a new way point for autonomous navigation. Results show that the system is capable of autonomously locating animals from a predetermined height and generate a map showing the location of the animals ahead of the user.