983 resultados para Software clones Detection
Resumo:
Las compañías de desarrollo de software buscan reducir costes a través del desarrollo de diseños que permitan: a) facilidad en la distribución del trabajo de desarrollo, con la menor comunicación de las partes; b) modificabilidad, permitiendo realizar cambios sobre un módulo sin alterar las otras partes y; c) comprensibilidad, permitiendo estudiar un módulo del sistema a la vez. Estas características elementales en el diseño de software se logran a través del diseño de sistemas cuasi-descomponibles, cuyo modelo teórico fue introducido por Simon en su búsqueda de una teoría general de los sistemas. En el campo del diseño de software, Parnas propone un camino práctico para lograr sistemas cuasi-descomponibles llamado el Principio de Ocultación de Información. El Principio de Ocultación de Información es un criterio diferente de descomposición en módulos, cuya implementación logra las características deseables de un diseño eficiente a nivel del proceso de desarrollo y mantenimiento. El Principio y el enfoque orientado a objetos se relacionan debido a que el enfoque orientado a objetos facilita la implementación del Principio, es por esto que cuando los objetos empiezan a tomar fuerza, también aparecen paralelamente las dificultades en el aprendizaje de diseño de software orientado a objetos, las cuales se mantienen hasta la actualidad, tal como se reporta en la literatura. Las dificultades en el aprendizaje de diseño de software orientado a objetos tiene un gran impacto tanto en las aulas como en la profesión. La detección de estas dificultades permitirá a los docentes corregirlas o encaminarlas antes que éstas se trasladen a la industria. Por otro lado, la industria puede estar advertida de los potenciales problemas en el proceso de desarrollo de software. Esta tesis tiene como objetivo investigar sobre las dificultades en el diseño de software orientado a objetos, a través de un estudio empírico. El estudio fue realizado a través de un estudio de caso cualitativo, que estuvo conformado por tres partes. La primera, un estudio inicial que tuvo como objetivo conocer el entendimiento de los estudiantes alrededor del Principio de Ocultación de Información antes de que iniciasen la instrucción. La segunda parte, un estudio llevado a cabo a lo largo del período de instrucción con la finalidad de obtener las dificultades de diseño de software y su nivel de persistencia. Finalmente, una tercera parte, cuya finalidad fue el estudio de las dificultades esenciales de aprendizaje y sus posibles orígenes. Los participantes de este estudio pertenecieron a la materia de Software Design del European Master in Software Engineering de la Escuela Técnica Superior de Ingenieros Informáticos de la Universidad Politécnica de Madrid. Los datos cualitativos usados para el análisis procedieron de las observaciones en las horas de clase y exposiciones, entrevistas realizadas a los estudiantes y ejercicios enviados a lo largo del período de instrucción. Las dificultades presentadas en esta tesis en sus diferentes perspectivas, aportaron conocimiento concreto de un estudio de caso en particular, realizando contribuciones relevantes en el área de diseño de software, docencia, industria y a nivel metodológico. ABSTRACT The software development companies look to reduce costs through the development of designs that will: a) ease the distribution of development work with the least communication between the parties; b) changeability, allowing to change a module without disturbing the other parties and; c) understandability, allowing to study a system module at a time. These basic software design features are achieved through the design of quasidecomposable systems, whose theoretical model was introduced by Simon in his search for a general theory of systems. In the field of software design, Parnas offers a practical way to achieve quasi-decomposable systems, called The Information Hiding Principle. The Information Hiding Principle is different criterion for decomposition into modules, whose implementation achieves the desirable characteristics of an efficient design at the development and maintenance level. The Principle and the object-oriented approach are related because the object-oriented approach facilitates the implementation of The Principle, which is why when objects begin to take hold, also appear alongside the difficulties in learning an object-oriented software design, which remain to this day, as reported in the literature. Difficulties in learning object-oriented software design has a great impact both in the classroom and in the profession. The detection of these difficulties will allow teachers to correct or route them before they move to the industry. On the other hand, the industry can be warned of potential problems related to the software development process. This thesis aims to investigate the difficulties in learning the object-oriented design, through an empirical study. The study was conducted through a qualitative case study, which consisted of three parts. The first, an initial study was aimed to understand the knowledge of the students around The Information Hiding Principle before they start the instruction. The second part, a study was conducted during the entire period of instruction in order to obtain the difficulties of software design and their level of persistence. Finally, a third party, whose purpose was to study the essential difficulties of learning and their possible sources. Participants in this study belonged to the field of Software Design of the European Master in Software Engineering at the Escuela Técnica Superior de Ingenieros Informáticos of Universidad Politécnica de Madrid. The qualitative data used for the analysis came from the observations in class time and exhibitions, performed interviews with students and exercises sent over the period of instruction. The difficulties presented in this thesis, in their different perspectives, provided concrete knowledge of a particular case study, making significant contributions in the area of software design, teaching, industry and methodological level.
Resumo:
Esta tesis se centra en el análisis de dos aspectos complementarios de la ciberdelincuencia (es decir, el crimen perpetrado a través de la red para ganar dinero). Estos dos aspectos son las máquinas infectadas utilizadas para obtener beneficios económicos de la delincuencia a través de diferentes acciones (como por ejemplo, clickfraud, DDoS, correo no deseado) y la infraestructura de servidores utilizados para gestionar estas máquinas (por ejemplo, C & C, servidores explotadores, servidores de monetización, redirectores). En la primera parte se investiga la exposición a las amenazas de los ordenadores victimas. Para realizar este análisis hemos utilizado los metadatos contenidos en WINE-BR conjunto de datos de Symantec. Este conjunto de datos contiene metadatos de instalación de ficheros ejecutables (por ejemplo, hash del fichero, su editor, fecha de instalación, nombre del fichero, la versión del fichero) proveniente de 8,4 millones de usuarios de Windows. Hemos asociado estos metadatos con las vulnerabilidades en el National Vulnerability Database (NVD) y en el Opens Sourced Vulnerability Database (OSVDB) con el fin de realizar un seguimiento de la decadencia de la vulnerabilidad en el tiempo y observar la rapidez de los usuarios a remiendar sus sistemas y, por tanto, su exposición a posibles ataques. Hemos identificado 3 factores que pueden influir en la actividad de parches de ordenadores victimas: código compartido, el tipo de usuario, exploits. Presentamos 2 nuevos ataques contra el código compartido y un análisis de cómo el conocimiento usuarios y la disponibilidad de exploit influyen en la actividad de aplicación de parches. Para las 80 vulnerabilidades en nuestra base de datos que afectan código compartido entre dos aplicaciones, el tiempo entre el parche libera en las diferentes aplicaciones es hasta 118 das (con una mediana de 11 das) En la segunda parte se proponen nuevas técnicas de sondeo activos para detectar y analizar las infraestructuras de servidores maliciosos. Aprovechamos técnicas de sondaje activo, para detectar servidores maliciosos en el internet. Empezamos con el análisis y la detección de operaciones de servidores explotadores. Como una operación identificamos los servidores que son controlados por las mismas personas y, posiblemente, participan en la misma campaña de infección. Hemos analizado un total de 500 servidores explotadores durante un período de 1 año, donde 2/3 de las operaciones tenían un único servidor y 1/2 por varios servidores. Hemos desarrollado la técnica para detectar servidores explotadores a diferentes tipologías de servidores, (por ejemplo, C & C, servidores de monetización, redirectores) y hemos logrado escala de Internet de sondeo para las distintas categorías de servidores maliciosos. Estas nuevas técnicas se han incorporado en una nueva herramienta llamada CyberProbe. Para detectar estos servidores hemos desarrollado una novedosa técnica llamada Adversarial Fingerprint Generation, que es una metodología para generar un modelo único de solicitud-respuesta para identificar la familia de servidores (es decir, el tipo y la operación que el servidor apartenece). A partir de una fichero de malware y un servidor activo de una determinada familia, CyberProbe puede generar un fingerprint válido para detectar todos los servidores vivos de esa familia. Hemos realizado 11 exploraciones en todo el Internet detectando 151 servidores maliciosos, de estos 151 servidores 75% son desconocidos a bases de datos publicas de servidores maliciosos. Otra cuestión que se plantea mientras se hace la detección de servidores maliciosos es que algunos de estos servidores podrán estar ocultos detrás de un proxy inverso silente. Para identificar la prevalencia de esta configuración de red y mejorar el capacidades de CyberProbe hemos desarrollado RevProbe una nueva herramienta a través del aprovechamiento de leakages en la configuración de la Web proxies inversa puede detectar proxies inversos. RevProbe identifica que el 16% de direcciones IP maliciosas activas analizadas corresponden a proxies inversos, que el 92% de ellos son silenciosos en comparación con 55% para los proxies inversos benignos, y que son utilizado principalmente para equilibrio de carga a través de múltiples servidores. ABSTRACT In this dissertation we investigate two fundamental aspects of cybercrime: the infection of machines used to monetize the crime and the malicious server infrastructures that are used to manage the infected machines. In the first part of this dissertation, we analyze how fast software vendors apply patches to secure client applications, identifying shared code as an important factor in patch deployment. Shared code is code present in multiple programs. When a vulnerability affects shared code the usual linear vulnerability life cycle is not anymore effective to describe how the patch deployment takes place. In this work we show which are the consequences of shared code vulnerabilities and we demonstrate two novel attacks that can be used to exploit this condition. In the second part of this dissertation we analyze malicious server infrastructures, our contributions are: a technique to cluster exploit server operations, a tool named CyberProbe to perform large scale detection of different malicious servers categories, and RevProbe a tool that detects silent reverse proxies. We start by identifying exploit server operations, that are, exploit servers managed by the same people. We investigate a total of 500 exploit servers over a period of more 13 months. We have collected malware from these servers and all the metadata related to the communication with the servers. Thanks to this metadata we have extracted different features to group together servers managed by the same entity (i.e., exploit server operation), we have discovered that 2/3 of the operations have a single server while 1/3 have multiple servers. Next, we present CyberProbe a tool that detects different malicious server types through a novel technique called adversarial fingerprint generation (AFG). The idea behind CyberProbe’s AFG is to run some piece of malware and observe its network communication towards malicious servers. Then it replays this communication to the malicious server and outputs a fingerprint (i.e. a port selection function, a probe generation function and a signature generation function). Once the fingerprint is generated CyberProbe scans the Internet with the fingerprint and finds all the servers of a given family. We have performed a total of 11 Internet wide scans finding 151 new servers starting with 15 seed servers. This gives to CyberProbe a 10 times amplification factor. Moreover we have compared CyberProbe with existing blacklists on the internet finding that only 40% of the server detected by CyberProbe were listed. To enhance the capabilities of CyberProbe we have developed RevProbe, a reverse proxy detection tool that can be integrated with CyberProbe to allow precise detection of silent reverse proxies used to hide malicious servers. RevProbe leverages leakage based detection techniques to detect if a malicious server is hidden behind a silent reverse proxy and the infrastructure of servers behind it. At the core of RevProbe is the analysis of differences in the traffic by interacting with a remote server.
Resumo:
Tree nut allergies are considered an important health issue in developed countries. To comply with the regulations on food labeling, reliable allergen detection methods are required. In this work we isolated almond-specific recombinant antibody fragments (scFv) from a commercial phage display library bypassing the use of live animals, hence being consistent with the latest policies on animal welfare. To this end an iterative selection procedure employing the Tomlinson I phage display library and a crude almond protein extract was carried out. Two different almond-specific scFv (named PD1F6 and PD2C9) were isolated after two rounds of biopanning, and an indirect phage ELISA was implemented to detect the presence of almond protein in foodstuffs. The isolated scFvs demonstrated to be highly specific and allowed detection of 40 ng mL?1 and 100 ng mL?1 of raw and roasted almond protein, respectively. The practical detection limit of the assay in almond spiked food products was 0.1 mg g?1 (110e120 ppm). The developed indirect phage ELISA was validated by analysis of 92 commercial food products, showing good correlation with the results obtained by a previously developed real-time PCR method for the detection of almond in foodstuffs. The selected phage clones can be affinity maturated to improve their sensitivity and genetically engineered to be employed in different assay formats.
Resumo:
Detection of loss of heterozygosity (LOH) by comparison of normal and tumor genotypes using PCR-based microsatellite loci provides considerable advantages over traditional Southern blotting-based approaches. However, current methodologies are limited by several factors, including the numbers of loci that can be evaluated for LOH in a single experiment, the discrimination of true alleles versus "stutter bands," and the use of radionucleotides in detecting PCR products. Here we describe methods for high throughput simultaneous assessment of LOH at multiple loci in human tumors; these methods rely on the detection of amplified microsatellite loci by fluorescence-based DNA sequencing technology. Data generated by this approach are processed by several computer software programs that enable the automated linear quantitation and calculation of allelic ratios, allowing rapid ascertainment of LOH. As a test of this approach, genotypes at a series of loci on chromosome 4 were determined for 58 carcinomas of the uterine cervix. The results underscore the efficacy, sensitivity, and remarkable reproducibility of this approach to LOH detection and provide subchromosomal localization of two regions of chromosome 4 commonly altered in cervical tumors.
Resumo:
In the chemical textile domain experts have to analyse chemical components and substances that might be harmful for their usage in clothing and textiles. Part of this analysis is performed searching opinions and reports people have expressed concerning these products in the Social Web. However, this type of information on the Internet is not as frequent for this domain as for others, so its detection and classification is difficult and time-consuming. Consequently, problems associated to the use of chemical substances in textiles may not be detected early enough, and could lead to health problems, such as allergies or burns. In this paper, we propose a framework able to detect, retrieve, and classify subjective sentences related to the chemical textile domain, that could be integrated into a wider health surveillance system. We also describe the creation of several datasets with opinions from this domain, the experiments performed using machine learning techniques and different lexical resources such as WordNet, and the evaluation focusing on the sentiment classification, and complaint detection (i.e., negativity). Despite the challenges involved in this domain, our approach obtains promising results with an F-score of 65% for polarity classification and 82% for complaint detection.
Resumo:
When analysing software metrics, users find that visualisation tools lack support for (1) the detection of patterns within metrics; and (2) enabling analysis of software corpora. In this paper we present Explora, a visualisation tool designed for the simultaneous analysis of multiple metrics of systems in software corpora. Explora incorporates a novel lightweight visualisation technique called PolyGrid that promotes the detection of graphical patterns. We present an example where we analyse the relation of subtype polymorphism with inheritance and invocation in corpora of Smalltalk and Java systems and find that (1) subtype polymorphism is more likely to be found in large hierarchies; (2) as class hierarchies grow horizontally, they also do so vertically; and (3) in polymorphic hierarchies the length of the name of the classes is orthogonal to the cardinality of the call sites.
Resumo:
Transportation Department, Office of University Research, Washington, D.C.
Resumo:
Background: This paper describes SeqDoC, a simple, web-based tool to carry out direct comparison of ABI sequence chromatograms. This allows the rapid identification of single nucleotide polymorphisms (SNPs) and point mutations without the need to install or learn more complicated analysis software. Results: SeqDoC produces a subtracted trace showing differences between a reference and test chromatogram, and is optimised to emphasise those characteristic of single base changes. It automatically aligns sequences, and produces straightforward graphical output. The use of direct comparison of the sequence chromatograms means that artefacts introduced by automatic base-calling software are avoided. Homozygous and heterozygous substitutions and insertion/deletion events are all readily identified. SeqDoC successfully highlights nucleotide changes missed by the Staden package 'tracediff' program. Conclusion: SeqDoC is ideal for small-scale SNP identification, for identification of changes in random mutagenesis screens, and for verification of PCR amplification fidelity. Differences are highlighted, not interpreted, allowing the investigator to make the ultimate decision on the nature of the change.
Resumo:
Background: The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence-and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence. Results: GANN ( available at http://bioinformatics.org.au/gann) is a machine learning tool for the detection of conserved features in DNA. The software suite contains programs to extract different regions of genomic DNA from flat files and convert these sequences to indices that reflect sequence and structural composition or the presence of specific protein binding sites. The machine learning component allows the classification of different types of sequences based on subsamples of these indices, and can identify the best combinations of indices and machine learning architecture for sequence discrimination. Another key feature of GANN is the replicated splitting of data into training and test sets, and the implementation of negative controls. In validation experiments, GANN successfully merged important sequence and structural features to yield good predictive models for synthetic and real regulatory regions. Conclusion: GANN is a flexible tool that can search through large sets of sequence and structural feature combinations to identify those that best characterize a set of sequences.
Resumo:
Automatic signature verification is a well-established and an active area of research with numerous applications such as bank check verification, ATM access, etc. This paper proposes a novel approach to the problem of automatic off-line signature verification and forgery detection. The proposed approach is based on fuzzy modeling that employs the Takagi-Sugeno (TS) model. Signature verification and forgery detection are carried out using angle features extracted from box approach. Each feature corresponds to a fuzzy set. The features are fuzzified by an exponential membership function involved in the TS model, which is modified to include structural parameters. The structural parameters are devised to take account of possible variations due to handwriting styles and to reflect moods. The membership functions constitute weights in the TS model. The optimization of the output of the TS model with respect to the structural parameters yields the solution for the parameters. We have also derived two TS models by considering a rule for each input feature in the first formulation (Multiple rules) and by considering a single rule for all input features in the second formulation. In this work, we have found that TS model with multiple rules is better than TS model with single rule for detecting three types of forgeries; random, skilled and unskilled from a large database of sample signatures in addition to verifying genuine signatures. We have also devised three approaches, viz., an innovative approach and two intuitive approaches using the TS model with multiple rules for improved performance. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
Antigenic variation in Plasmodium falciparum erythrocyte membrane protein 1, caused by a switch in transcription of the encoding var gene, is an important feature of malaria. In this study, we quantified the relative abundance of var gene transcripts present in P. falciparum parasite clones using real-time reverse transcription-polymerase chain reaction (RT-PCR) and conventional RT-PCR combined with cloning and sequencing, with the aim of directly comparing the results obtained. When there was sufficient abundance of RNA for the real-time RT-PCR assay to be operating within the region of good reproducibility, RT-PCR and real-time RT-PCR tended to identify the same dominant transcript, although some transcript-specific issues were identified. When there were differences in the estimated relative amounts of minor transcripts, the RT-PCR assay tended to produce higher estimates than real-time RT-PCR. These results provide valuable information comparing RT-PCR and real-time RT-PCR analysis of samples with small quantities of RNA as might be expected in the analysis of field or clinical samples.
Resumo:
This paper presents an innovative approach for signature verification and forgery detection based on fuzzy modeling. The signature image is binarized and resized to a fixed size window and is then thinned. The thinned image is then partitioned into a fixed number of eight sub-images called boxes. This partition is done using the horizontal density approximation approach. Each sub-image is then further resized and again partitioned into twelve further sub-images using the uniform partitioning approach. The features of consideration are normalized vector angle (α) from each box. Each feature extracted from sample signatures gives rise to a fuzzy set. Since the choice of a proper fuzzification function is crucial for verification, we have devised a new fuzzification function with structural parameters, which is able to adapt to the variations in fuzzy sets. This function is employed to develop a complete forgery detection and verification system.
Resumo:
The matched filter detector is well known as the optimum detector for use in communication, as well as in radar systems for signals corrupted by Additive White Gaussian Noise (A.W.G.N.). Non-coherent F.S.K. and differentially coherent P.S.K. (D.P.S.K.) detection schemes, which employ a new approach in realizing the matched filter processor, are investigated. The new approach utilizes pulse compression techniques, well known in radar systems, to facilitate the implementation of the matched filter in the form of the Pulse Compressor Matched Filter (P.C.M.F.). Both detection schemes feature a mixer- P.C.M.F. Compound as their predetector processor. The Compound is utilized to convert F.S.K. modulation into pulse position modulation, and P.S.K. modulation into pulse polarity modulation. The mechanisms of both detection schemes are studied through examining the properties of the Autocorrelation function (A.C.F.) at the output of the P.C.M.F.. The effects produced by time delay, and carrier interference on the output A.C.F. are determined. Work related to the F.S.K. detection scheme is mostly confined to verifying its validity, whereas the D.P.S.K. detection scheme has not been reported before. Consequently, an experimental system was constructed, which utilized combined hardware and software, and operated under the supervision of a microprocessor system. The experimental system was used to develop error-rate models for both detection schemes under investigation. Performances of both F. S. K. and D.P. S. K. detection schemes were established in the presence of A. W. G. N. , practical imperfections, time delay, and carrier interference. The results highlight the candidacy of both detection schemes for use in the field of digital data communication and, in particular, the D.P.S.K. detection scheme, which performed very close to optimum in a background of A.W.G.N.
Resumo:
Requirements for systems to continue to operate satisfactorily in the presence of faults has led to the development of techniques for the construction of fault tolerant software. This thesis addresses the problem of error detection and recovery in distributed systems which consist of a set of communicating sequential processes. A method is presented for the `a priori' design of conversations for this class of distributed system. Petri nets are used to represent the state and to solve state reachability problems for concurrent systems. The dynamic behaviour of the system can be characterised by a state-change table derived from the state reachability tree. Systematic conversation generation is possible by defining a closed boundary on any branch of the state-change table. By relating the state-change table to process attributes it ensures all necessary processes are included in the conversation. The method also ensures properly nested conversations. An implementation of the conversation scheme using the concurrent language occam is proposed. The structure of the conversation is defined using the special features of occam. The proposed implementation gives a structure which is independent of the application and is independent of the number of processes involved. Finally, the integrity of inter-process communications is investigated. The basic communication primitives used in message passing systems are seen to have deficiencies when applied to systems with safety implications. Using a Petri net model a boundary for a time-out mechanism is proposed which will increase the integrity of a system which involves inter-process communications.