820 resultados para Search-based algorithms
Resumo:
The paper has been presented at the International Conference Pioneers of Bulgarian Mathematics, Dedicated to Nikola Obreshko and Lubomir Tschakalo , So a, July, 2006.
Resumo:
Reverse engineering is usually the stepping stone of a variety of at-tacks aiming at identifying sensitive information (keys, credentials, data, algo-rithms) or vulnerabilities and flaws for broader exploitation. Software applica-tions are usually deployed as identical binary code installed on millions of com-puters, enabling an adversary to develop a generic reverse-engineering strategy that, if working on one code instance, could be applied to crack all the other in-stances. A solution to mitigate this problem is represented by Software Diversity, which aims at creating several structurally different (but functionally equivalent) binary code versions out of the same source code, so that even if a successful attack can be elaborated for one version, it should not work on a diversified ver-sion. In this paper, we address the problem of maximizing software diversity from a search-based optimization point of view. The program to protect is subject to a catalogue of transformations to generate many candidate versions. The problem of selecting the subset of most diversified versions to be deployed is formulated as an optimisation problem, that we tackle with different search heuristics. We show the applicability of this approach on some popular Android apps.
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task
Resumo:
Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of the values of a number of parameters (e.g., the kernel and regularization parameters). In the current work, we propose the combination of meta-learning and search algorithms to deal with the problem of SVM parameter selection. In this combination, given a new problem to be solved, meta-learning is employed to recommend SVM parameter values based on parameter configurations that have been successfully adopted in previous similar problems. The parameter values returned by meta-learning are then used as initial search points by a search technique, which will further explore the parameter space. In this proposal, we envisioned that the initial solutions provided by meta-learning are located in good regions of the search space (i.e. they are closer to optimum solutions). Hence, the search algorithm would need to evaluate a lower number of candidate solutions when looking for an adequate solution. In this work, we investigate the combination of meta-learning with two search algorithms: Particle Swarm Optimization and Tabu Search. The implemented hybrid algorithms were used to select the values of two SVM parameters in the regression domain. These combinations were compared with the use of the search algorithms without meta-learning. The experimental results on a set of 40 regression problems showed that, on average, the proposed hybrid methods obtained lower error rates when compared to their components applied in isolation.
Resumo:
Solution of structural reliability problems by the First Order method require optimization algorithms to find the smallest distance between a limit state function and the origin of standard Gaussian space. The Hassofer-Lind-Rackwitz-Fiessler (HLRF) algorithm, developed specifically for this purpose, has been shown to be efficient but not robust, as it fails to converge for a significant number of problems. On the other hand, recent developments in general (augmented Lagrangian) optimization techniques have not been tested in aplication to structural reliability problems. In the present article, three new optimization algorithms for structural reliability analysis are presented. One algorithm is based on the HLRF, but uses a new differentiable merit function with Wolfe conditions to select step length in linear search. It is shown in the article that, under certain assumptions, the proposed algorithm generates a sequence that converges to the local minimizer of the problem. Two new augmented Lagrangian methods are also presented, which use quadratic penalties to solve nonlinear problems with equality constraints. Performance and robustness of the new algorithms is compared to the classic augmented Lagrangian method, to HLRF and to the improved HLRF (iHLRF) algorithms, in the solution of 25 benchmark problems from the literature. The new proposed HLRF algorithm is shown to be more robust than HLRF or iHLRF, and as efficient as the iHLRF algorithm. The two augmented Lagrangian methods proposed herein are shown to be more robust and more efficient than the classical augmented Lagrangian method.
Resumo:
To promote the use of bicycle transportation mode in times of increasing urban traffic congestion, Broward County Metropolitan Planning Organization funded the development of a Web-based trip planner for cyclists. This presentation demonstrates the integration of the ArcGIS Server 9.3 environment with the ArcGIS JavaScript Extension for Google Maps API and the Google Local Search Control for Maps API. This allows the use of Google mashup GIS functionality, i.e., Google local search for selection of trip start, trip destination, and intermediate waypoints, and the integration of Google Maps base layers. The ArcGIS Network Analyst extension is used for the route search, where algorithms for fastest, safest, simplest, most scenic, and shortest routes are imbedded. This presentation also describes how attributes of the underlying network sources have been combined to facilitate the search for optimized routes.
Resumo:
To promote the use of bicycle transportation mode in times of increasing urban traffic congestion, Broward County Metropolitan Planning Organization funded the development of a Web-based trip planner for cyclists. This presentation demonstrates the integration of the ArcGIS Server 9.3 environment with the ArcGIS JavaScript Extension for Google Maps API and the Google Local Search Control for Maps API. This allows the use of Google mashup GIS functionality, i.e., Google local search for selection of trip start, trip destination, and intermediate waypoints, and the integration of Google Maps base layers. The ArcGIS Network Analyst extension is used for the route search, where algorithms for fastest, safest, simplest, most scenic, and shortest routes are imbedded. This presentation also describes how attributes of the underlying network sources have been combined to facilitate the search for optimized routes.
Resumo:
In practical applications of optimization it is common to have several conflicting objective functions to optimize. Frequently, these functions are subject to noise or can be of black-box type, preventing the use of derivative-based techniques. We propose a novel multiobjective derivative-free methodology, calling it direct multisearch (DMS), which does not aggregate any of the objective functions. Our framework is inspired by the search/poll paradigm of direct-search methods of directional type and uses the concept of Pareto dominance to maintain a list of nondominated points (from which the new iterates or poll centers are chosen). The aim of our method is to generate as many points in the Pareto front as possible from the polling procedure itself, while keeping the whole framework general enough to accommodate other disseminating strategies, in particular, when using the (here also) optional search step. DMS generalizes to multiobjective optimization (MOO) all direct-search methods of directional type. We prove under the common assumptions used in direct search for single objective optimization that at least one limit point of the sequence of iterates generated by DMS lies in (a stationary form of) the Pareto front. However, extensive computational experience has shown that our methodology has an impressive capability of generating the whole Pareto front, even without using a search step. Two by-products of this paper are (i) the development of a collection of test problems for MOO and (ii) the extension of performance and data profiles to MOO, allowing a comparison of several solvers on a large set of test problems, in terms of their efficiency and robustness to determine Pareto fronts.
Resumo:
This paper introduces a new unsupervised hyperspectral unmixing method conceived to linear but highly mixed hyperspectral data sets, in which the simplex of minimum volume, usually estimated by the purely geometrically based algorithms, is far way from the true simplex associated with the endmembers. The proposed method, an extension of our previous studies, resorts to the statistical framework. The abundance fraction prior is a mixture of Dirichlet densities, thus automatically enforcing the constraints on the abundance fractions imposed by the acquisition process, namely, nonnegativity and sum-to-one. A cyclic minimization algorithm is developed where the following are observed: 1) The number of Dirichlet modes is inferred based on the minimum description length principle; 2) a generalized expectation maximization algorithm is derived to infer the model parameters; and 3) a sequence of augmented Lagrangian-based optimizations is used to compute the signatures of the endmembers. Experiments on simulated and real data are presented to show the effectiveness of the proposed algorithm in unmixing problems beyond the reach of the geometrically based state-of-the-art competitors.
Resumo:
O escalonamento é uma das decisões mais importantes no funcionamento de uma linha de produção. No âmbito desta dissertação foi realizada uma descrição do problema do escalonamento, identificando alguns métodos para a optimização dos problemas de escalonamento. Foi realizado um estudo ao caso do problema de máquina única através do teste de várias instâncias com o objectivo de minimizar o atraso pesado, aplicando uma Meta-Heurística baseada na Pesquisa Local e dois algoritmos baseados no SB. Os resultados obtidos reflectem que os algoritmos baseados no SB apresentaram resultados mais próximos do óptimo, em relação ao algoritmo baseado na PL. Os resultados obtidos permitem sustentar a hipótese de não existirem algoritmos específicos para os problemas de escalonamento. A melhor forma de encontrar uma solução de boa qualidade em tempo útil é experimentar diferentes algoritmos e comparar o desempenho das soluções obtidas.
Resumo:
We present new metaheuristics for solving real crew scheduling problemsin a public transportation bus company. Since the crews of thesecompanies are drivers, we will designate the problem by the bus-driverscheduling problem. Crew scheduling problems are well known and severalmathematical programming based techniques have been proposed to solvethem, in particular using the set-covering formulation. However, inpractice, there exists the need for improvement in terms of computationalefficiency and capacity of solving large-scale instances. Moreover, thereal bus-driver scheduling problems that we consider can present variantaspects of the set covering, as for example a different objectivefunction, implying that alternative solutions methods have to bedeveloped. We propose metaheuristics based on the following approaches:GRASP (greedy randomized adaptive search procedure), tabu search andgenetic algorithms. These metaheuristics also present some innovationfeatures based on and genetic algorithms. These metaheuristics alsopresent some innovation features based on the structure of the crewscheduling problem, that guide the search efficiently and able them tofind good solutions. Some of these new features can also be applied inthe development of heuristics to other combinatorial optimizationproblems. A summary of computational results with real-data problems ispresented.
Resumo:
PURPOSE: To assess how different diagnostic decision aids perform in terms of sensitivity, specificity, and harm. METHODS: Four diagnostic decision aids were compared, as applied to a simulated patient population: a findings-based algorithm following a linear or branched pathway, a serial threshold-based strategy, and a parallel threshold-based strategy. Headache in immune-compromised HIV patients in a developing country was used as an example. Diagnoses included cryptococcal meningitis, cerebral toxoplasmosis, tuberculous meningitis, bacterial meningitis, and malaria. Data were derived from literature and expert opinion. Diagnostic strategies' validity was assessed in terms of sensitivity, specificity, and harm related to mortality and morbidity. Sensitivity analyses and Monte Carlo simulation were performed. RESULTS: The parallel threshold-based approach led to a sensitivity of 92% and a specificity of 65%. Sensitivities of the serial threshold-based approach and the branched and linear algorithms were 47%, 47%, and 74%, respectively, and the specificities were 85%, 95%, and 96%. The parallel threshold-based approach resulted in the least harm, with the serial threshold-based approach, the branched algorithm, and the linear algorithm being associated with 1.56-, 1.44-, and 1.17-times higher harm, respectively. Findings were corroborated by sensitivity and Monte Carlo analyses. CONCLUSION: A threshold-based diagnostic approach is designed to find the optimal trade-off that minimizes expected harm, enhancing sensitivity and lowering specificity when appropriate, as in the given example of a symptom pointing to several life-threatening diseases. Findings-based algorithms, in contrast, solely consider clinical observations. A parallel workup, as opposed to a serial workup, additionally allows for all potential diseases to be reviewed, further reducing false negatives. The parallel threshold-based approach might, however, not be as good in other disease settings.