100 resultados para Search-based algorithms
Resumo:
A new parameter-estimation algorithm, which minimises the cross-validated prediction error for linear-in-the-parameter models, is proposed, based on stacked regression and an evolutionary algorithm. It is initially shown that cross-validation is very important for prediction in linear-in-the-parameter models using a criterion called the mean dispersion error (MDE). Stacked regression, which can be regarded as a sophisticated type of cross-validation, is then introduced based on an evolutionary algorithm, to produce a new parameter-estimation algorithm, which preserves the parsimony of a concise model structure that is determined using the forward orthogonal least-squares (OLS) algorithm. The PRESS prediction errors are used for cross-validation, and the sunspot and Canadian lynx time series are used to demonstrate the new algorithms.
Resumo:
This paper describes the implementation of a semantic web search engine on conversation styled transcripts. Our choice of data is Hansard, a publicly available conversation style transcript of parliamentary debates. The current search engine implementation on Hansard is limited to running search queries based on keywords or phrases hence lacks the ability to make semantic inferences from user queries. By making use of knowledge such as the relationship between members of parliament, constituencies, terms of office, as well as topics of debates the search results can be improved in terms of both relevance and coverage. Our contribution is not algorithmic instead we describe how we exploit a collection of external data sources, ontologies, semantic web vocabularies and named entity extraction in the analysis of underlying semantics of user queries as well as the semantic enrichment of the search index thereby improving the quality of results.
Resumo:
With the fast development of the Internet, wireless communications and semiconductor devices, home networking has received significant attention. Consumer products can collect and transmit various types of data in the home environment. Typical consumer sensors are often equipped with tiny, irreplaceable batteries and it therefore of the utmost importance to design energy efficient algorithms to prolong the home network lifetime and reduce devices going to landfill. Sink mobility is an important technique to improve home network performance including energy consumption, lifetime and end-to-end delay. Also, it can largely mitigate the hot spots near the sink node. The selection of optimal moving trajectory for sink node(s) is an NP-hard problem jointly optimizing routing algorithms with the mobile sink moving strategy is a significant and challenging research issue. The influence of multiple static sink nodes on energy consumption under different scale networks is first studied and an Energy-efficient Multi-sink Clustering Algorithm (EMCA) is proposed and tested. Then, the influence of mobile sink velocity, position and number on network performance is studied and a Mobile-sink based Energy-efficient Clustering Algorithm (MECA) is proposed. Simulation results validate the performance of the proposed two algorithms which can be deployed in a consumer home network environment.
Resumo:
Algorithms for computer-aided diagnosis of dementia based on structural MRI have demonstrated high performance in the literature, but are difficult to compare as different data sets and methodology were used for evaluation. In addition, it is unclear how the algorithms would perform on previously unseen data, and thus, how they would perform in clinical practice when there is no real opportunity to adapt the algorithm to the data at hand. To address these comparability, generalizability and clinical applicability issues, we organized a grand challenge that aimed to objectively compare algorithms based on a clinically representative multi-center data set. Using clinical practice as the starting point, the goal was to reproduce the clinical diagnosis. Therefore, we evaluated algorithms for multi-class classification of three diagnostic groups: patients with probable Alzheimer's disease, patients with mild cognitive impairment and healthy controls. The diagnosis based on clinical criteria was used as reference standard, as it was the best available reference despite its known limitations. For evaluation, a previously unseen test set was used consisting of 354 T1-weighted MRI scans with the diagnoses blinded. Fifteen research teams participated with a total of 29 algorithms. The algorithms were trained on a small training set (n = 30) and optionally on data from other sources (e.g., the Alzheimer's Disease Neuroimaging Initiative, the Australian Imaging Biomarkers and Lifestyle flagship study of aging). The best performing algorithm yielded an accuracy of 63.0% and an area under the receiver-operating-characteristic curve (AUC) of 78.8%. In general, the best performances were achieved using feature extraction based on voxel-based morphometry or a combination of features that included volume, cortical thickness, shape and intensity. The challenge is open for new submissions via the web-based framework: http://caddementia.grand-challenge.org.
Resumo:
The pipe sizing of water networks via evolutionary algorithms is of great interest because it allows the selection of alternative economical solutions that meet a set of design requirements. However, available evolutionary methods are numerous, and methodologies to compare the performance of these methods beyond obtaining a minimal solution for a given problem are currently lacking. A methodology to compare algorithms based on an efficiency rate (E) is presented here and applied to the pipe-sizing problem of four medium-sized benchmark networks (Hanoi, New York Tunnel, GoYang and R-9 Joao Pessoa). E numerically determines the performance of a given algorithm while also considering the quality of the obtained solution and the required computational effort. From the wide range of available evolutionary algorithms, four algorithms were selected to implement the methodology: a PseudoGenetic Algorithm (PGA), Particle Swarm Optimization (PSO), a Harmony Search and a modified Shuffled Frog Leaping Algorithm (SFLA). After more than 500,000 simulations, a statistical analysis was performed based on the specific parameters each algorithm requires to operate, and finally, E was analyzed for each network and algorithm. The efficiency measure indicated that PGA is the most efficient algorithm for problems of greater complexity and that HS is the most efficient algorithm for less complex problems. However, the main contribution of this work is that the proposed efficiency ratio provides a neutral strategy to compare optimization algorithms and may be useful in the future to select the most appropriate algorithm for different types of optimization problems.
Resumo:
The challenge of moving past the classic Window Icons Menus Pointer (WIMP) interface, i.e. by turning it ‘3D’, has resulted in much research and development. To evaluate the impact of 3D on the ‘finding a target picture in a folder’ task, we built a 3D WIMP interface that allowed the systematic manipulation of visual depth, visual aides, semantic category distribution of targets versus non-targets; and the detailed measurement of lower-level stimuli features. Across two separate experiments, one large sample web-based experiment, to understand associations, and one controlled lab environment, using eye tracking to understand user focus, we investigated how visual depth, use of visual aides, use of semantic categories, and lower-level stimuli features (i.e. contrast, colour and luminance) impact how successfully participants are able to search for, and detect, the target image. Moreover in the lab-based experiment, we captured pupillometry measurements to allow consideration of the influence of increasing cognitive load as a result of either an increasing number of items on the screen, or due to the inclusion of visual depth. Our findings showed that increasing the visible layers of depth, and inclusion of converging lines, did not impact target detection times, errors, or failure rates. Low-level features, including colour, luminance, and number of edges, did correlate with differences in target detection times, errors, and failure rates. Our results also revealed that semantic sorting algorithms significantly decreased target detection times. Increased semantic contrasts between a target and its neighbours correlated with an increase in detection errors. Finally, pupillometric data did not provide evidence of any correlation between the number of visible layers of depth and pupil size, however, using structural equation modelling, we demonstrated that cognitive load does influence detection failure rates when there is luminance contrasts between the target and its surrounding neighbours. Results suggest that WIMP interaction designers should consider stimulus-driven factors, which were shown to influence the efficiency with which a target icon can be found in a 3D WIMP interface.
Resumo:
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.
Resumo:
We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.
Resumo:
In this paper, we present a distributed computing framework for problems characterized by a highly irregular search tree, whereby no reliable workload prediction is available. The framework is based on a peer-to-peer computing environment and dynamic load balancing. The system allows for dynamic resource aggregation, does not depend on any specific meta-computing middleware and is suitable for large-scale, multi-domain, heterogeneous environments, such as computational Grids. Dynamic load balancing policies based on global statistics are known to provide optimal load balancing performance, while randomized techniques provide high scalability. The proposed method combines both advantages and adopts distributed job-pools and a randomized polling technique. The framework has been successfully adopted in a parallel search algorithm for subgraph mining and evaluated on a molecular compounds dataset. The parallel application has shown good calability and close-to linear speedup in a distributed network of workstations.
Resumo:
We have designed a highly parallel design for a simple genetic algorithm using a pipeline of systolic arrays. The systolic design provides high throughput and unidirectional pipelining by exploiting the implicit parallelism in the genetic operators. The design is significant because, unlike other hardware genetic algorithms, it is independent of both the fitness function and the particular chromosome length used in a problem. We have designed and simulated a version of the mutation array using Xilinix FPGA tools to investigate the feasibility of hardware implementation. A simple 5-chromosome mutation array occupies 195 CLBs and is capable of performing more than one million mutations per second. I. Introduction Genetic algorithms (GAs) are established search and optimization techniques which have been applied to a range of engineering and applied problems with considerable success [1]. They operate by maintaining a population of trial solutions encoded, using a suitable encoding scheme.
Resumo:
A parallel hardware random number generator for use with a VLSI genetic algorithm processing device is proposed. The design uses an systolic array of mixed congruential random number generators. The generators are constantly reseeded with the outputs of the proceeding generators to avoid significant biasing of the randomness of the array which would result in longer times for the algorithm to converge to a solution. 1 Introduction In recent years there has been a growing interest in developing hardware genetic algorithm devices [1, 2, 3]. A genetic algorithm (GA) is a stochastic search and optimization technique which attempts to capture the power of natural selection by evolving a population of candidate solutions by a process of selection and reproduction [4]. In keeping with the evolutionary analogy, the solutions are called chromosomes with each chromosome containing a number of genes. Chromosomes are commonly simple binary strings, the bits being the genes.
Resumo:
Smooth flow of production in construction is hampered by disparity between individual trade teams' goals and the goals of stable production flow for the project as a whole. This is exacerbated by the difficulty of visualizing the flow of work in a construction project. While the addresses some of the issues in Building information modeling provides a powerful platform for visualizing work flow in control systems that also enable pull flow and deeper collaboration between teams on and off site. The requirements for implementation of a BIM-enabled pull flow construction management software system based on the Last Planner System™, called ‘KanBIM’, have been specified, and a set of functional mock-ups of the proposed system has been implemented and evaluated in a series of three focus group workshops. The requirements cover the areas of maintenance of work flow stability, enabling negotiation and commitment between teams, lean production planning with sophisticated pull flow control, and effective communication and visualization of flow. The evaluation results show that the system holds the potential to improve work flow and reduce waste by providing both process and product visualization at the work face.