915 resultados para random search algorithms
Resumo:
Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of the values of a number of parameters (e.g., the kernel and regularization parameters). In the current work, we propose the combination of meta-learning and search algorithms to deal with the problem of SVM parameter selection. In this combination, given a new problem to be solved, meta-learning is employed to recommend SVM parameter values based on parameter configurations that have been successfully adopted in previous similar problems. The parameter values returned by meta-learning are then used as initial search points by a search technique, which will further explore the parameter space. In this proposal, we envisioned that the initial solutions provided by meta-learning are located in good regions of the search space (i.e. they are closer to optimum solutions). Hence, the search algorithm would need to evaluate a lower number of candidate solutions when looking for an adequate solution. In this work, we investigate the combination of meta-learning with two search algorithms: Particle Swarm Optimization and Tabu Search. The implemented hybrid algorithms were used to select the values of two SVM parameters in the regression domain. These combinations were compared with the use of the search algorithms without meta-learning. The experimental results on a set of 40 regression problems showed that, on average, the proposed hybrid methods obtained lower error rates when compared to their components applied in isolation.
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.
Resumo:
A tandem mass spectral database system consists of a library of reference spectra and a search program. State-of-the-art search programs show a high tolerance for variability in compound-specific fragmentation patterns produced by collision-induced decomposition and enable sensitive and specific 'identity search'. In this communication, performance characteristics of two search algorithms combined with the 'Wiley Registry of Tandem Mass Spectral Data, MSforID' (Wiley Registry MSMS, John Wiley and Sons, Hoboken, NJ, USA) were evaluated. The search algorithms tested were the MSMS search algorithm implemented in the NIST MS Search program 2.0g (NIST, Gaithersburg, MD, USA) and the MSforID algorithm (John Wiley and Sons, Hoboken, NJ, USA). Sample spectra were acquired on different instruments and, thus, covered a broad range of possible experimental conditions or were generated in silico. For each algorithm, more than 30,000 matches were performed. Statistical evaluation of the library search results revealed that principally both search algorithms can be combined with the Wiley Registry MSMS to create a reliable identification tool. It appears, however, that a higher degree of spectral similarity is necessary to obtain a correct match with the NIST MS Search program. This characteristic of the NIST MS Search program has a positive effect on specificity as it helps to avoid false positive matches (type I errors), but reduces sensitivity. Thus, particularly with sample spectra acquired on instruments differing in their Setup from tandem-in-space type fragmentation, a comparably higher number of false negative matches (type II errors) were observed by searching the Wiley Registry MSMS.
Resumo:
Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain.
Resumo:
Hardware/Software partitioning (HSP) is a key task for embedded system co-design. The main goal of this task is to decide which components of an application are to be executed in a general purpose processor (software) and which ones, on a specific hardware, taking into account a set of restrictions expressed by metrics. In last years, several approaches have been proposed for solving the HSP problem, directed by metaheuristic algorithms. However, due to diversity of models and metrics used, the choice of the best suited algorithm is an open problem yet. This article presents the results of applying a fuzzy approach to the HSP problem. This approach is more flexible than many others due to the fact that it is possible to accept quite good solutions or to reject other ones which do not seem good. In this work we compare six metaheuristic algorithms: Random Search, Tabu Search, Simulated Annealing, Hill Climbing, Genetic Algorithm and Evolutionary Strategy. The presented model is aimed to simultaneously minimize the hardware area and the execution time. The obtained results show that Restart Hill Climbing is the best performing algorithm in most cases.
Resumo:
El particionado hardware/software es una tarea fundamental en el co-diseño de sistemas embebidos. En ella se decide, teniendo en cuenta las métricas de diseño, qué componentes se ejecutarán en un procesador de propósito general (software) y cuáles en un hardware específico. En los últimos años se han propuesto diversas soluciones al problema del particionado dirigidas por algoritmos metaheurísticos. Sin embargo, debido a la diversidad de modelos y métricas utilizadas, la elección del algoritmo más apropiado sigue siendo un problema abierto. En este trabajo se presenta una comparación de seis algoritmos metaheurísticos: Búsqueda aleatoria (Random search), Búsqueda tabú (Tabu search), Recocido simulado (Simulated annealing), Escalador de colinas estocástico (Stochastic hill climbing), Algoritmo genético (Genetic algorithm) y Estrategia evolutiva (Evolution strategy). El modelo utilizado en la comparación está dirigido a minimizar el área ocupada y el tiempo de ejecución, las restricciones del modelo son consideradas como penalizaciones para incluir en el espacio de búsqueda otras soluciones. Los resultados muestran que los algoritmos Escalador de colinas estocástico y Estrategia evolutiva son los que mejores resultados obtienen en general, seguidos por el Algoritmo genético.
Resumo:
The problem of finding the optimal join ordering executing a query to a relational database management system is a combinatorial optimization problem, which makes deterministic exhaustive solution search unacceptable for queries with a great number of joined relations. In this work an adaptive genetic algorithm with dynamic population size is proposed for optimizing large join queries. The performance of the algorithm is compared with that of several classical non-deterministic optimization algorithms. Experiments have been performed optimizing several random queries against a randomly generated data dictionary. The proposed adaptive genetic algorithm with probabilistic selection operator outperforms in a number of test runs the canonical genetic algorithm with Elitist selection as well as two common random search strategies and proves to be a viable alternative to existing non-deterministic optimization approaches.
Resumo:
Design as seen from the designer's perspective is a series of amazing imaginative jumps or creative leaps. But design as seen by the design historian is a smooth progression or evolution of ideas that they seem self-evident and inevitable after the event. But the next step is anything but obvious for the artist/creator/inventor/designer stuck at that point just before the creative leap. They know where they have come from and have a general sense of where they are going, but often do not have a precise target or goal. This is why it is misleading to talk of design as a problem-solving activity - it is better defined as a problem-finding activity. This has been very frustrating for those trying to assist the design process with computer-based, problem-solving techniques. By the time the problem has been defined, it has been solved. Indeed the solution is often the very definition of the problem. Design must be creative-or it is mere imitation. But since this crucial creative leap seem inevitable after the event, the question must arise, can we find some way of searching the space ahead? Of course there are serious problems of knowing what we are looking for and the vastness of the search space. It may be better to discard altogether the term "searching" in the context of the design process: Conceptual analogies such as search, search spaces and fitness landscapes aim to elucidate the design process. However, the vastness of the multidimensional spaces involved make these analogies misguided and they thereby actually result in further confounding the issue. The term search becomes a misnomer since it has connotations that imply that it is possible to find what you are looking for. In such vast spaces the term search must be discarded. Thus, any attempt at searching for the highest peak in the fitness landscape as an optimal solution is also meaningless. Futhermore, even the very existence of a fitness landscape is fallacious. Although alternatives in the same region of the vast space can be compared to one another, distant alternatives will stem from radically different roots and will therefore not be comparable in any straightforward manner (Janssen 2000). Nevertheless we still have this tantalizing possibility that if a creative idea seems inevitable after the event, then somehow might the process be rserved? This may be as improbable as attempting to reverse time. A more helpful analogy is from nature, where it is generally assumed that the process of evolution is not long-term goal directed or teleological. Dennett points out a common minsunderstanding of Darwinism: the idea that evolution by natural selection is a procedure for producing human beings. Evolution can have produced humankind by an algorithmic process, without its being true that evolution is an algorithm for producing us. If we were to wind the tape of life back and run this algorithm again, the likelihood of "us" being created again is infinitesimally small (Gould 1989; Dennett 1995). But nevertheless Mother Nature has proved a remarkably successful, resourceful, and imaginative inventor generating a constant flow of incredible new design ideas to fire our imagination. Hence the current interest in the potential of the evolutionary paradigm in design. These evolutionary methods are frequently based on techniques such as the application of evolutionary algorithms that are usually thought of as search algorithms. It is necessary to abandon such connections with searching and see the evolutionary algorithm as a direct analogy with the evolutionary processes of nature. The process of natural selection can generate a wealth of alternative experiements, and the better ones survive. There is no one solution, there is no optimal solution, but there is continuous experiment. Nature is profligate with her prototyping and ruthless in her elimination of less successful experiments. Most importantly, nature has all the time in the world. As designers we cannot afford prototyping and ruthless experiment, nor can we operate on the time scale of the natural design process. Instead we can use the computer to compress space and time and to perform virtual prototyping and evaluation before committing ourselves to actual prototypes. This is the hypothesis underlying the evolutionary paradigm in design (1992, 1995).
Resumo:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.
Resumo:
Seaport container terminals are an important part of the logistics systems in international trades. This paper investigates the relationship between quay cranes, yard machines and container storage locations in a multi-berth and multi-ship environment. The aims are to develop a model for improving the operation efficiency of the seaports and to develop an analytical tool for yard operation planning. Due to the fact that the container transfer times are sequence-dependent and with the large number of variables involve, the proposed model cannot be solved in a reasonable time interval for realistically sized problems. For this reason, List Scheduling and Tabu Search algorithms have been developed to solve this formidable and NP-hard scheduling problem. Numerical implementations have been analysed and promising results have been achieved.
Resumo:
The main aim of this thesis is to analyse and optimise a public hospital Emergency Department. The Emergency Department (ED) is a complex system with limited resources and a high demand for these resources. Adding to the complexity is the stochastic nature of almost every element and characteristic in the ED. The interaction with other functional areas also complicates the system as these areas have a huge impact on the ED and the ED is powerless to change them. Therefore it is imperative that OR be applied to the ED to improve the performance within the constraints of the system. The main characteristics of the system to optimise included tardiness, adherence to waiting time targets, access block and length of stay. A validated and verified simulation model was built to model the real life system. This enabled detailed analysis of resources and flow without disruption to the actual ED. A wide range of different policies for the ED and a variety of resources were able to be investigated. Of particular interest was the number and type of beds in the ED and also the shift times of physicians. One point worth noting was that neither of these resources work in isolation and for optimisation of the system both resources need to be investigated in tandem. The ED was likened to a flow shop scheduling problem with the patients and beds being synonymous with the jobs and machines typically found in manufacturing problems. This enabled an analytic scheduling approach. Constructive heuristics were developed to reactively schedule the system in real time and these were able to improve the performance of the system. Metaheuristics that optimised the system were also developed and analysed. An innovative hybrid Simulated Annealing and Tabu Search algorithm was developed that out-performed both simulated annealing and tabu search algorithms by combining some of their features. The new algorithm achieves a more optimal solution and does so in a shorter time.
Resumo:
Multi-Objective optimization for designing of a benchmark cogeneration system known as CGAM cogeneration system has been performed. In optimization approach, the thermoeconomic and Environmental aspects have been considered, simultaneously. The environmental objective function has been defined and expressed in cost terms. One of the most suitable optimization techniques developed using a particular class of search algorithms known as; Multi-Objective Particle Swarm Optimization (MOPSO) algorithm has been used here. This approach has been applied to find the set of Pareto optimal solutions with respect to the aforementioned objective functions. An example of fuzzy decision-making with the aid of Bellman-Zadeh approach has been presented and a final optimal solution has been introduced.
Resumo:
Motivation: Gene silencing, also called RNA interference, requires reliable assessment of silencer impacts. A critical task is to find matches between silencer oligomers and sites in the genome, in accordance with one-to-many matching rules (G-U matching, with provision for mismatches). Fast search algorithms are required to support silencer impact assessments in procedures for designing effective silencer sequences.Results: The article presents a matching algorithm and data structures specialized for matching searches, including a kernel procedure that addresses a Boolean version of the database task called the skyline search. Besides exact matches, the algorithm is extended to allow for the location-specific mismatches applicable in plants. Computational tests show that the algorithm is significantly faster than suffix-tree alternatives. © The Author 2010. Published by Oxford University Press. All rights reserved.
Resumo:
Forest management is facing new challenges under climate change. By adjusting thinning regimes, conventional forest management can be adapted to various objectives of utilization of forest resources, such as wood quality, forest bioenergy, and carbon sequestration. This thesis aims to develop and apply a simulation-optimization system as a tool for an interdisciplinary understanding of the interactions between wood science, forest ecology, and forest economics. In this thesis, the OptiFor software was developed for forest resources management. The OptiFor simulation-optimization system integrated the process-based growth model PipeQual, wood quality models, biomass production and carbon emission models, as well as energy wood and commercial logging models into a single optimization model. Osyczka s direct and random search algorithm was employed to identify optimal values for a set of decision variables. The numerical studies in this thesis broadened our current knowledge and understanding of the relationships between wood science, forest ecology, and forest economics. The results for timber production show that optimal thinning regimes depend on site quality and initial stand characteristics. Taking wood properties into account, our results show that increasing the intensity of thinning resulted in lower wood density and shorter fibers. The addition of nutrients accelerated volume growth, but lowered wood quality for Norway spruce. Integrating energy wood harvesting into conventional forest management showed that conventional forest management without energy wood harvesting was still superior in sparse stands of Scots pine. Energy wood from pre-commercial thinning turned out to be optimal for dense stands. When carbon balance is taken into account, our results show that changing carbon assessment methods leads to very different optimal thinning regimes and average carbon stocks. Raising the carbon price resulted in longer rotations and a higher mean annual increment, as well as a significantly higher average carbon stock over the rotation.