978 resultados para SEARCH TECHNIQUE
Resumo:
The keyword based search technique suffers from the problem of synonymic and polysemic queries. Current approaches address only theproblem of synonymic queries in which different queries might have the same information requirement. But the problem of polysemic queries,i.e., same query having different intentions, still remains unaddressed. In this paper, we propose the notion of intent clusters, the members of which will have the same intention. We develop a clustering algorithm that uses the user session information in query logs in addition to query URL entries to identify cluster of queries having the same intention. The proposed approach has been studied through case examples from the actual log data from AOL, and the clustering algorithm is shown to be successful in discerning the user intentions.
Resumo:
Stochastic Diffusion Search is an efficient probabilistic bestfit search technique, capable of transformation invariant pattern matching. Although inherently parallel in operation it is difficult to implement efficiently in hardware as it requires full inter-agent connectivity. This paper describes a lattice implementation, which, while qualitatively retaining the properties of the original algorithm, restricts connectivity, enabling simpler implementation on parallel hardware. Diffusion times are examined for different network topologies, ranging from ordered lattices, over small-world networks to random graphs.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of the values of a number of parameters (e.g., the kernel and regularization parameters). In the current work, we propose the combination of meta-learning and search algorithms to deal with the problem of SVM parameter selection. In this combination, given a new problem to be solved, meta-learning is employed to recommend SVM parameter values based on parameter configurations that have been successfully adopted in previous similar problems. The parameter values returned by meta-learning are then used as initial search points by a search technique, which will further explore the parameter space. In this proposal, we envisioned that the initial solutions provided by meta-learning are located in good regions of the search space (i.e. they are closer to optimum solutions). Hence, the search algorithm would need to evaluate a lower number of candidate solutions when looking for an adequate solution. In this work, we investigate the combination of meta-learning with two search algorithms: Particle Swarm Optimization and Tabu Search. The implemented hybrid algorithms were used to select the values of two SVM parameters in the regression domain. These combinations were compared with the use of the search algorithms without meta-learning. The experimental results on a set of 40 regression problems showed that, on average, the proposed hybrid methods obtained lower error rates when compared to their components applied in isolation.
Resumo:
The Three-Layer distributed mediation architecture, designed by Secure System Architecture laboratory, employed a layered framework of presence, integration, and homogenization mediators. The architecture does not have any central component that may affect the system reliability. A distributed search technique was adapted in the system to increase its reliability. An Enhanced Chord-like algorithm (E-Chord) was designed and deployed in the integration layer. The E-Chord is a skip-list algorithm based on Distributed Hash Table (DHT) which is a distributed but structured architecture. DHT is distributed in the sense that no central unit is required to maintain indexes, and it is structured in the sense that indexes are distributed over the nodes in a systematic manner. Each node maintains three kind of routing information: a frequency list, a successor/predecessor list, and a finger table. None of the nodes in the system maintains all indexes, and each node knows about some other nodes in the system. These nodes, also called composer mediators, were connected in a P2P fashion. ^ A special composer mediator called a global mediator initiates the keyword-based matching decomposition of the request using the E-Chord. It generates an Integrated Data Structure Graph (IDSG) on the fly, creates association and dependency relations between nodes in the IDSG, and then generates a Global IDSG (GIDSG). The GIDSG graph is a plan which guides the global mediator how to integrate data. It is also used to stream data from the mediators in the homogenization layer which connected to the data sources. The connectors start sending the data to the global mediator just after the global mediator creates the GIDSG and just before the global mediator sends the answer to the presence mediator. Using the E-Chord and GIDSG made the mediation system more scalable than using a central global schema repository since all the composers in the integration layer are capable of handling and routing requests. Also, when a composer fails, it would only minimally affect the entire mediation system. ^
Resumo:
Evolutionary algorithms are playing an increasingly important role as search methods in cognitive science domains. In this study, methodological issues in the use of evolutionary algorithms were investigated via simulations in which procedures were systematically varied to modify the selection pressures on populations of evolving agents. Traditional roulette wheel, tournament, and variations of these selection algorithms were compared on the “needle-in-a-haystack” problem developed by Hinton and Nowlan in their 1987 study of the Baldwin effect. The task is an important one for cognitive science, as it demonstrates the power of learning as a local search technique in smoothing a fitness landscape that lacks gradient information. One aspect that has continued to foster interest in the problem is the observation of residual learning ability in simulated populations even after long periods of time. Effective evolutionary algorithms balance their search effort between broad exploration of the search space and in-depth exploitation of promising solutions already found. Issues discussed include the differential effects of rank and proportional selection, the tradeoff between migration of populations towards good solutions and maintenance of diversity, and the development of measures that illustrate how each selection algorithm affects the search process over generations. We show that both roulette wheel and tournament algorithms can be modified to appropriately balance search between exploration and exploitation, and effectively eliminate residual learning in this problem.
Resumo:
An A-DNA type double helical conformation was observed in the single crystal X-ray structure of the octamer d(G-G-T-A-T-A-C-C), 1, and its 5-bromouracil-containing analogue, 2. The structure of the isomorphous crystals (space group P61) was solved by a search technique based on packing criteria and R-factor calculations, with use of only low order data. At the present stage of refinement the R factors are 31 % for 1 and 28 % for 2 at a resolution of 2.25 A (0.225 nm). The molecules interact through their minor grooves by hydrogen bonding and base to sugar van der Waals contacts. The stable A conformation observed in the crystal may have some structural relevance to promoter regions where the T-A-T-A sequence is frequently found.
Resumo:
Because of the bottlenecking operations in a complex coal rail system, millions of dollars are costed by mining companies. To handle this issue, this paper investigates a real-world coal rail system and aims to optimise the coal railing operations under constraints of limited resources (e.g., limited number of locomotives and wagons). In the literature, most studies considered the train scheduling problem on a single-track railway network to be strongly NP-hard and thus developed metaheuristics as the main solution methods. In this paper, a new mathematical programming model is formulated and coded by optimization programming language based on a constraint programming (CP) approach. A new depth-first-search technique is developed and embedded inside the CP model to obtain the optimised coal railing timetable efficiently. Computational experiments demonstrate that high-quality solutions are obtainable in industry-scale applications. To provide insightful decisions, sensitivity analysis is conducted in terms of different scenarios and specific criteria. Keywords Train scheduling · Rail transportation · Coal mining · Constraint programming
Resumo:
An intelligent computer aided defect analysis (ICADA) system, based on artificial intelligence techniques, has been developed to identify design, process or material parameters which could be responsible for the occurrence of defective castings in a manufacturing campaign. The data on defective castings for a particular time frame, which is an input to the ICADA system, has been analysed. It was observed that a large proportion, i.e. 50-80% of all the defective castings produced in a foundry, have two, three or four types of defects occurring above a threshold proportion, say 10%. Also, a large number of defect types are either not found at all or found in a very small proportion, with a threshold value below 2%. An important feature of the ICADA system is the recognition of this pattern in the analysis. Thirty casting defect types and a large number of causes numbering between 50 and 70 for each, as identified in the AFS analysis of casting defects-the standard reference source for a casting process-constituted the foundation for building the knowledge base. Scientific rationale underlying the formation of a defect during the casting process was identified and 38 metacauses were coded. Process, material and design parameters which contribute to the metacauses were systematically examined and 112 were identified as rootcauses. The interconnections between defects, metacauses and rootcauses were represented as a three tier structured graph and the handling of uncertainty in the occurrence of events such as defects, metacauses and rootcauses was achieved by Bayesian analysis. The hill climbing search technique, associated with forward reasoning, was employed to recognize one or several root causes.
Resumo:
Clustering has been the most popular method for data exploration. Clustering is partitioning the data set into sub-partitions based on some measures say the distance measure, each partition has its own significant information. There are a number of algorithms explored for this purpose, one such algorithm is the Particle Swarm Optimization(PSO) which is a population based heuristic search technique derived from swarm intelligence. In this paper we present an improved version of the Particle Swarm Optimization where, each feature of the data set is given significance accordingly by adding some random weights, which also minimizes the distortions in the dataset if any. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The experimental results shows that our proposed methodology performs significantly better than the previously performed experiments.
Resumo:
Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning and data mining. Clustering is grouping of a data set or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait according to some defined distance measure. In this paper we present the genetically improved version of particle swarm optimization algorithm which is a population based heuristic search technique derived from the analysis of the particle swarm intelligence and the concepts of genetic algorithms (GA). The algorithm combines the concepts of PSO such as velocity and position update rules together with the concepts of GA such as selection, crossover and mutation. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The performance of our method is better than k-means and PSO algorithm.
Resumo:
本文提出了一种自动辨识条纹顺序的方法。它使用简化的Otsu算法获得阈值,然后在45°的范围内对干涉条纹的位置进行搜索,最后由区域搜索技术辨识不同的条纹。实验结果表明,本方法能可靠地获得近似45°到90°的干涉条纹顺序。
Resumo:
Since Wireless Sensor Networks (WSNs) are subject to failures, fault-tolerance becomes an important requirement for many WSN applications. Fault-tolerance can be enabled in different areas of WSN design and operation, including the Medium Access Control (MAC) layer and the initial topology design. To be robust to failures, a MAC protocol must be able to adapt to traffic fluctuations and topology dynamics. We design ER-MAC that can switch from energy-efficient operation in normal monitoring to reliable and fast delivery for emergency monitoring, and vice versa. It also can prioritise high priority packets and guarantee fair packet deliveries from all sensor nodes. Topology design supports fault-tolerance by ensuring that there are alternative acceptable routes to data sinks when failures occur. We provide solutions for four topology planning problems: Additional Relay Placement (ARP), Additional Backup Placement (ABP), Multiple Sink Placement (MSP), and Multiple Sink and Relay Placement (MSRP). Our solutions use a local search technique based on Greedy Randomized Adaptive Search Procedures (GRASP). GRASP-ARP deploys relays for (k,l)-sink-connectivity, where each sensor node must have k vertex-disjoint paths of length ≤ l. To count how many disjoint paths a node has, we propose Counting-Paths. GRASP-ABP deploys fewer relays than GRASP-ARP by focusing only on the most important nodes – those whose failure has the worst effect. To identify such nodes, we define Length-constrained Connectivity and Rerouting Centrality (l-CRC). Greedy-MSP and GRASP-MSP place minimal cost sinks to ensure that each sensor node in the network is double-covered, i.e. has two length-bounded paths to two sinks. Greedy-MSRP and GRASP-MSRP deploy sinks and relays with minimal cost to make the network double-covered and non-critical, i.e. all sensor nodes must have length-bounded alternative paths to sinks when an arbitrary sensor node fails. We then evaluate the fault-tolerance of each topology in data gathering simulations using ER-MAC.
Resumo:
This paper presents a new algorithm for learning the structure of a special type of Bayesian network. The conditional phase-type (C-Ph) distribution is a Bayesian network that models the probabilistic causal relationships between a skewed continuous variable, modelled by the Coxian phase-type distribution, a special type of Markov model, and a set of interacting discrete variables. The algorithm takes a dataset as input and produces the structure, parameters and graphical representations of the fit of the C-Ph distribution as output.The algorithm, which uses a greedy-search technique and has been implemented in MATLAB, is evaluated using a simulated data set consisting of 20,000 cases. The results show that the original C-Ph distribution is recaptured and the fit of the network to the data is discussed.
Resumo:
We propose a trace-driven approach to predict the performance degradation of disk request response times due to storage device contention in consolidated virtualized environments. Our performance model evaluates a queueing network with fair share scheduling using trace-driven simulation. The model parameters can be deduced from measurements obtained inside Virtual Machines (VMs) from a system where a single VM accesses a remote storage server. The parameterized model can then be used to predict the effect of storage contention when multiple VMs are consolidated on the same virtualized server. The model parameter estimation relies on a search technique that tries to estimate the splitting and merging of blocks at the the Virtual Machine Monitor (VMM) level in the case of multiple competing VMs. Simulation experiments based on traces of the Postmark and FFSB disk benchmarks show that our model is able to accurately predict the impact of workload consolidation on VM disk IO response times.