934 resultados para search methods
Resumo:
R. Daly and Q. Shen. Methods to accelerate the learning of bayesian network structures. Proceedings of the Proceedings of the 2007 UK Workshop on Computational Intelligence.
Resumo:
Many real world image analysis problems, such as face recognition and hand pose estimation, involve recognizing a large number of classes of objects or shapes. Large margin methods, such as AdaBoost and Support Vector Machines (SVMs), often provide competitive accuracy rates, but at the cost of evaluating a large number of binary classifiers, thus making it difficult to apply such methods when thousands or millions of classes need to be recognized. This thesis proposes a filter-and-refine framework, whereby, given a test pattern, a small number of candidate classes can be identified efficiently at the filter step, and computationally expensive large margin classifiers are used to evaluate these candidates at the refine step. Two different filtering methods are proposed, ClassMap and OVA-VS (One-vs.-All classification using Vector Search). ClassMap is an embedding-based method, works for both boosted classifiers and SVMs, and tends to map the patterns and their associated classes close to each other in a vector space. OVA-VS maps OVA classifiers and test patterns to vectors based on the weights and outputs of weak classifiers of the boosting scheme. At runtime, finding the strongest-responding OVA classifier becomes a classical vector search problem, where well-known methods can be used to gain efficiency. In our experiments, the proposed methods achieve significant speed-ups, in some cases up to two orders of magnitude, compared to exhaustive evaluation of all OVA classifiers. This was achieved in hand pose recognition and face recognition systems where the number of classes ranges from 535 to 48,600.
Resumo:
This thesis elaborates on the problem of preprocessing a large graph so that single-pair shortest-path queries can be answered quickly at runtime. Computing shortest paths is a well studied problem, but exact algorithms do not scale well to real-world huge graphs in applications that require very short response time. The focus is on approximate methods for distance estimation, in particular in landmarks-based distance indexing. This approach involves choosing some nodes as landmarks and computing (offline), for each node in the graph its embedding, i.e., the vector of its distances from all the landmarks. At runtime, when the distance between a pair of nodes is queried, it can be quickly estimated by combining the embeddings of the two nodes. Choosing optimal landmarks is shown to be hard and thus heuristic solutions are employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the techniques presented in this thesis is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach which considers selecting landmarks at random. Finally, they are applied in two important problems arising naturally in large-scale graphs, namely social search and community detection.
Resumo:
Much work has been done on learning from failure in search to boost solving of combinatorial problems, such as clause-learning and clause-weighting in boolean satisfiability (SAT), nogood and explanation-based learning, and constraint weighting in constraint satisfaction problems (CSPs). Many of the top solvers in SAT use clause learning to good effect. A similar approach (nogood learning) has not had as large an impact in CSPs. Constraint weighting is a less fine-grained approach where the information learnt gives an approximation as to which variables may be the sources of greatest contention. In this work we present two methods for learning from search using restarts, in order to identify these critical variables prior to solving. Both methods are based on the conflict-directed heuristic (weighted-degree heuristic) introduced by Boussemart et al. and are aimed at producing a better-informed version of the heuristic by gathering information through restarting and probing of the search space prior to solving, while minimizing the overhead of these restarts. We further examine the impact of different sampling strategies and different measurements of contention, and assess different restarting strategies for the heuristic. Finally, two applications for constraint weighting are considered in detail: dynamic constraint satisfaction problems and unary resource scheduling problems.
Resumo:
Predicting from first-principles calculations whether mixed metallic elements phase-separate or form ordered structures is a major challenge of current materials research. It can be partially addressed in cases where experiments suggest the underlying lattice is conserved, using cluster expansion (CE) and a variety of exhaustive evaluation or genetic search algorithms. Evolutionary algorithms have been recently introduced to search for stable off-lattice structures at fixed mixture compositions. The general off-lattice problem is still unsolved. We present an integrated approach of CE and high-throughput ab initio calculations (HT) applicable to the full range of compositions in binary systems where the constituent elements or the intermediate ordered structures have different lattice types. The HT method replaces the search algorithms by direct calculation of a moderate number of naturally occurring prototypes representing all crystal systems and guides CE calculations of derivative structures. This synergy achieves the precision of the CE and the guiding strengths of the HT. Its application to poorly characterized binary Hf systems, believed to be phase-separating, defines three classes of alloys where CE and HT complement each other to uncover new ordered structures.
Resumo:
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.
Resumo:
OBJECTIVE: To investigate the effect of statin use after radical prostatectomy (RP) on biochemical recurrence (BCR) in patients with prostate cancer who never received statins before RP. PATIENTS AND METHODS: We conducted a retrospective analysis of 1146 RP patients within the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Multivariable Cox proportional hazards analyses were used to examine differences in risk of BCR between post-RP statin users vs nonusers. To account for varying start dates and duration of statin use during follow-up, post-RP statin use was treated as a time-dependent variable. In a secondary analysis, models were stratified by race to examine the association of post-RP statin use with BCR among black and non-black men. RESULTS: After adjusting for clinical and pathological characteristics, post-RP statin use was significantly associated with 36% reduced risk of BCR (hazard ratio [HR] 0.64, 95% confidence interval [CI] 0.47-0.87; P = 0.004). Post-RP statin use remained associated with reduced risk of BCR after adjusting for preoperative serum cholesterol levels. In secondary analysis, after stratification by race, this protective association was significant in non-black (HR 0.49, 95% CI 0.32-0.75; P = 0.001) but not black men (HR 0.82, 95% CI 0.53-1.28; P = 0.384). CONCLUSION: In this retrospective cohort of men undergoing RP, post-RP statin use was significantly associated with reduced risk of BCR. Whether the association between post-RP statin use and BCR differs by race requires further study. Given these findings, coupled with other studies suggesting that statins may reduce risk of advanced prostate cancer, randomised controlled trials are warranted to formally test the hypothesis that statins slow prostate cancer progression.
Resumo:
The concept of 'nested methods' is adopted to solve the location-routeing problem. Unlike the sequential and iterative approaches, in this method we treat the routeing element as a sub-problem within the larger problem of location. Efficient techniques that take into account the above concept and which use a neighbourhood structure inspired from computational geometry are presented. A simple version of tabu search is also embedded into our methods to improve the solutions further. Computational testing is carried out on five sets of problems of 400 customers with five levels of depot fixed costs, and the results obtained are encouraging.
Resumo:
Unstructured grid meshes used in most commercial CFD codes inevitably adopt collocated variable solution schemes. These schemes have several shortcomings, mainly due to the interpolation of the pressure gradient, that lead to slow convergence. In this publication we show how it is possible to use a much more stable staggered mesh arrangement in an unstructured code. Several alternative groupings of variables are investigated in a search for the optimum scheme.
Resumo:
Exam timetabling is one of the most important administrative activities that takes place in academic institutions. In this paper we present a critical discussion of the research on exam timetabling in the last decade or so. This last ten years has seen an increased level of attention on this important topic. There has been a range of significant contributions to the scientific literature both in terms of theoretical andpractical aspects. The main aim of this survey is to highlight the new trends and key research achievements that have been carried out in the last decade.We also aim to outline a range of relevant important research issues and challenges that have been generated by this body of work.
We first define the problem and review previous survey papers. Algorithmic approaches are then classified and discussed. These include early techniques (e.g. graph heuristics) and state-of-the-art approaches including meta-heuristics, constraint based methods, multi-criteria techniques, hybridisations, and recent new trends concerning neighbourhood structures, which are motivated by raising the generality of the approaches. Summarising tables are presented to provide an overall view of these techniques. We discuss some issues on decomposition techniques, system tools and languages, models and complexity. We also present and discuss some important issues which have come to light concerning the public benchmark exam timetabling data. Different versions of problem datasetswith the same name have been circulating in the scientific community in the last ten years which has generated a significant amount of confusion. We clarify the situation and present a re-naming of the widely studied datasets to avoid future confusion. We also highlight which research papershave dealt with which dataset. Finally, we draw upon our discussion of the literature to present a (non-exhaustive) range of potential future research directions and open issues in exam timetabling research.
Resumo:
We present experimental results on benchmark problems in 3D cubic lattice structures with the Miyazawa-Jernigan energy function for two local search procedures that utilise the pull-move set: (i) population-based local search (PLS) that traverses the energy landscape with greedy steps towards (potential) local minima followed by upward steps up to a certain level of the objective function; (ii) simulated annealing with a logarithmic cooling schedule (LSA). The parameter settings for PLS are derived from short LSA-runs executed in pre-processing and the procedure utilises tabu lists generated for each member of the population. In terms of the total number of energy function evaluations both methods perform equally well, however. PLS has the potential of being parallelised with an expected speed-up in the region of the population size. Furthermore, both methods require a significant smaller number of function evaluations when compared to Monte Carlo simulations with kink-jump moves. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Incidence calculus is a mechanism for probabilistic reasoning in which sets of possible worlds, called incidences, are associated with axioms, and probabilities are then associated with these sets. Inference rules are used to deduce bounds on the incidence of formulae which are not axioms, and bounds for the probability of such a formula can then be obtained. In practice an assignment of probabilities directly to axioms may be given, and it is then necessary to find an assignment of incidence which will reproduce these probabilities. We show that this task of assigning incidences can be viewed as a tree searching problem, and two techniques for performing this research are discussed. One of these is a new proposal involving a depth first search, while the other incorporates a random element. A Prolog implementation of these methods has been developed. The two approaches are compared for efficiency and the significance of their results are discussed. Finally we discuss a new proposal for applying techniques from linear programming to incidence calculus.
Resumo:
Geophysics may assist scent dogs and divers in the search of water bodies for human and animal remains, contraband, weapons and explosives by surveying large areas rapidly and identifying targets or environmental hazards. The most commonly applied methods are described and evaluated for forensic searches. Seismic reflection or refraction and CHIRPS are useful for deep, openwater bodies and identifying large targets, yet limited in streams and ponds. The use of ground penetrating radar (GPR) onwater(WPR) is of limited use in deepwaters (over 20 m) but is advantageous in the search for non-metallic targets in small ditches and ponds. Largemetal or metal-bearing targets can be successfully imaged in deep waters by using towfish magnetometers: in shallow waters such a towfish cannot be used, so a non-metalliferous boat can carry a terrestrial magnetometer. Each device has its uses, depending on the target and location: unknown target make-up (e.g. a homicide victimwith or without a metal object) may be best located using a range ofmethods (the multi-proxy approach), depending on water depth. Geophysics may not definitively find the target, but can provide areas for elimination and detailed search by dogs and divers, saving time and effort.
Resumo:
Background: Postal and electronic questionnaires are widely used for data collection in epidemiological studies but non-response reduces the effective sample size and can introduce bias. Finding ways to increase response to postal and electronic questionnaires would improve the quality of health research. Objectives: To identify effective strategies to increase response to postal and electronic questionnaires. Search strategy: We searched 14 electronic databases to February 2008 and manually searched the reference lists of relevant trials and reviews, and all issues of two journals. We contacted the authors of all trials or reviews to ask about unpublished trials. Where necessary, we also contacted authors to confirm methods of allocation used and to clarify results presented. We assessed the eligibility of each trial using pre-defined criteria. Selection criteria: Randomised controlled trials of methods to increase response to postal or electronic questionnaires. Data collection and analysis: We extracted data on the trial participants, the intervention, the number randomised to intervention and comparison groups and allocation concealment. For each strategy, we estimated pooled odds ratios (OR) and 95% confidence intervals (CI) in a random-effects model. We assessed evidence for selection bias using Egger's weighted regression method and Begg's rank correlation test and funnel plot. We assessed heterogeneity among trial odds ratios using a Chi 2 test and the degree of inconsistency between trial results was quantified using the I 2 statistic. Main results: Postal We found 481 eligible trials.The trials evaluated 110 different ways of increasing response to postal questionnaires.We found substantial heterogeneity among trial results in half of the strategies. The odds of response were at least doubled using monetary incentives (odds ratio 1.87; 95% CI 1.73 to 2.04; heterogeneity P < 0.00001, I 2 = 84%), recorded delivery (1.76; 95% CI 1.43 to 2.18; P = 0.0001, I 2 = 71%), a teaser on the envelope - e.g. a comment suggesting to participants that they may benefit if they open it (3.08; 95% CI 1.27 to 7.44) and a more interesting questionnaire topic (2.00; 95% CI 1.32 to 3.04; P = 0.06, I 2 = 80%). The odds of response were substantially higher with pre-notification (1.45; 95% CI 1.29 to 1.63; P < 0.00001, I 2 = 89%), follow-up contact (1.35; 95% CI 1.18 to 1.55; P < 0.00001, I 2 = 76%), unconditional incentives (1.61; 1.36 to 1.89; P < 0.00001, I 2 = 88%), shorter questionnaires (1.64; 95%CI 1.43 to 1.87; P < 0.00001, I 2 = 91%), providing a second copy of the questionnaire at follow up (1.46; 95% CI 1.13 to 1.90; P < 0.00001, I 2 = 82%), mentioning an obligation to respond (1.61; 95% CI 1.16 to 2.22; P = 0.98, I 2 = 0%) and university sponsorship (1.32; 95% CI 1.13 to 1.54; P < 0.00001, I 2 = 83%). The odds of response were also increased with non-monetary incentives (1.15; 95% CI 1.08 to 1.22; P < 0.00001, I 2 = 79%), personalised questionnaires (1.14; 95% CI 1.07 to 1.22; P < 0.00001, I 2 = 63%), use of hand-written addresses (1.25; 95% CI 1.08 to 1.45; P = 0.32, I 2 = 14%), use of stamped return envelopes as opposed to franked return envelopes (1.24; 95% CI 1.14 to 1.35; P < 0.00001, I 2 = 69%), an assurance of confidentiality (1.33; 95% CI 1.24 to 1.42) and first class outward mailing (1.11; 95% CI 1.02 to 1.21; P = 0.78, I 2 = 0%). The odds of response were reduced when the questionnaire included questions of a sensitive nature (0.94; 95% CI 0.88 to 1.00; P = 0.51, I 2 = 0%). Electronic: We found 32 eligible trials. The trials evaluated 27 different ways of increasing response to electronic questionnaires. We found substantial heterogeneity among trial results in half of the strategies. The odds of response were increased by more than a half using non-monetary incentives (1.72; 95% CI 1.09 to 2.72; heterogeneity P < 0.00001, I 2 = 95%), shorter e-questionnaires (1.73; 1.40 to 2.13; P = 0.08, I 2 = 68%), including a statement that others had responded (1.52; 95% CI 1.36 to 1.70), and a more interesting topic (1.85; 95% CI 1.52 to 2.26). The odds of response increased by a third using a lottery with immediate notification of results (1.37; 95% CI 1.13 to 1.65), an offer of survey results (1.36; 95% CI 1.15 to 1.61), and using a white background (1.31; 95% CI 1.10 to 1.56). The odds of response were also increased with personalised e-questionnaires (1.24; 95% CI 1.17 to 1.32; P = 0.07, I 2 = 41%), using a simple header (1.23; 95% CI 1.03 to 1.48), using textual representation of response categories (1.19; 95% CI 1.05 to 1.36), and giving a deadline (1.18; 95% CI 1.03 to 1.34). The odds of response tripled when a picture was included in an e-mail (3.05; 95% CI 1.84 to 5.06; P = 0.27, I 2 = 19%). The odds of response were reduced when "Survey" was mentioned in the e-mail subject line (0.81; 95% CI 0.67 to 0.97; P = 0.33, I 2 = 0%), and when the e-mail included a male signature (0.55; 95% CI 0.38 to 0.80; P = 0.96, I 2 = 0%). Authors' conclusions: Health researchers using postal and electronic questionnaires can increase response using the strategies shown to be effective in this systematic review. Copyright © 2009 The Cochrane Collaboration. Published by John Wiley & Sons, Ltd.
--------------------------------------------------------------------------------
Reaxys Database Information|
--------------------------------------------------------------------------------
Resumo:
A technique for automatic exploration of the genetic search region through fuzzy coding (Sharma and Irwin, 2003) has been proposed. Fuzzy coding (FC) provides the value of a variable on the basis of the optimum number of selected fuzzy sets and their effectiveness in terms of degree-of-membership. It is an indirect encoding method and has been shown to perform better than other conventional binary, Gray and floating-point encoding methods. However, the static range of the membership functions is a major problem in fuzzy coding, resulting in longer times to arrive at an optimum solution in large or complicated search spaces. This paper proposes a new algorithm, called fuzzy coding with a dynamic range (FCDR), which dynamically allocates the range of the variables to evolve an effective search region, thereby achieving faster convergence. Results are presented for two benchmark optimisation problems, and also for a case study involving neural identification of a highly non-linear pH neutralisation process from experimental data. It is shown that dynamic exploration of the genetic search region is effective for parameter optimisation in problems where the search space is complicated.