66 resultados para k-nearest neighbours

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Racing algorithms have recently been proposed as a general-purpose method for performing model selection in machine teaming algorithms. In this paper, we present an empirical study of the Hoeffding racing algorithm for selecting the k parameter in a simple k-nearest neighbor classifier. Fifteen widely-used classification datasets from UCI are used and experiments conducted across different confidence levels for racing. The results reveal a significant amount of sensitivity of the k-nn classifier to its model parameter value. The Hoeffding racing algorithm also varies widely in its performance, in terms of the computational savings gained over an exhaustive evaluation. While in some cases the savings gained are quite small, the racing algorithm proved to be highly robust to the possibility of erroneously eliminating the optimal models. All results were strongly dependent on the datasets used.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents results on the simulation of the solid state sintering of copper wires using Monte Carlo techniques based on elements of lattice theory and cellular automata. The initial structure is superimposed onto a triangular, two-dimensional lattice, where each lattice site corresponds to either an atom or vacancy. The number of vacancies varies with the simulation temperature, while a cluster of vacancies is a pore. To simulate sintering, lattice sites are picked at random and reoriented in terms of an atomistic model governing mass transport. The probability that an atom has sufficient energy to jump to a vacant lattice site is related to the jump frequency, and hence the diffusion coefficient, while the probability that an atomic jump will be accepted is related to the change in energy of the system as a result of the jump, as determined by the change in the number of nearest neighbours. The jump frequency is also used to relate model time, measured in Monte Carlo Steps, to the actual sintering time. The model incorporates bulk, grain boundary and surface diffusion terms and includes vacancy annihilation on the grain boundaries. The predictions of the model were found to be consistent with experimental data, both in terms of the microstructural evolution and in terms of the sintering time. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nearest–neighbour balance is considered a desirable property for an experiment to possess in situations where experimental units are influenced by their neighbours. This paper introduces a measure of the degree of nearest–neighbour balance of a design. The measure is used in an algorithm which generates nearest–neighbour balanced designs and is readily modified to obtain designs with various types of nearest–neighbour balance. Nearest–neighbour balanced designs are produced for a wide class of parameter settings, and in particular for those settings for which such designs cannot be found by existing direct combinatorial methods. In addition, designs with unequal row and column sizes, and designs with border plots are constructed using the approach presented here.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most sugarcane breeding programs in Australia use large unreplicated trials to evaluate clones in the early stages of selection. Commercial varieties that are replicated provide a method of local control of soil fertility. Although such methods may be useful in detecting broad trends in the field, variation often occurs on a much smaller scale. Methods such as spatial analysis adjust a plot for variability by using information from immediate neighbours. These techniques are routinely used to analyse cereal data in Australia and have resulted in increased accuracy and precision in the estimates of variety effects. In this paper, spatial analyses in which the variability is decomposed into local, natural, and extraneous components are applied to early selection trials in sugarcane. Interplot competition in cane yield and trend in sugar content were substantial in many of the trials and there were often large differences in the selections between the spatial and current method used by the Bureau of Sugar Experiment Stations. A joint modelling approach for tonnes sugar per hectare in response to fertility trends and interplot competition is recommended.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this ambitious book, Burgoon, Stern, and Dillman present the most comprehensive coverage of the literature on interpersonal adaptation that I have seen in recent years. Their mission is to make a critical examination of this whole area from both theoretical and methodological perspectives, and then to present their own synthetic theory (interpersonal adaptation theory, IAT) and research agenda. Such a mission produces very high expectations in readers, and inevitably some readers will feel that the authors do not achieve all of it. Personally, I was impressed by how much they do achieve, and I was intrigued by the questions they did not answer. One can ask no more than this of any single book.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our previous investigations of possible lung mechanisms underlying the effectiveness of nebulized morphine for the relief of dyspnoea, have shown a high density of non-conventional opioid binding sites in rat airways with similar binding characteristics (opioid alkaloid-sensitive, opioid peptide-insensitive) to that of putative mu(3)-opioid receptors on immune cells. To investigate whether these lung opioid binding sites are functional receptors, this study was designed to determine (using superfusion) whether morphine modulates the K+-evoked release of the pro-inflammatory neuropeptide, substance P (SP), from rat peripheral airways. Importantly, K+-evoked SP release was Ca2+-dependent, consistent with vesicular release. Submicromolar concentrations of morphine (1 and 200 nM) inhibited K+-evoked SP release from rat peripheral airways in a naloxone (1 mu M) reversible manner. By contrast, 1 mu M morphine enhanced K+-evoked SP release and this effect was not reversed by 1 mu M naloxone. However, 100 mu M naloxone not only antagonized the facilitatory effect of 1 mu M morphine on K+-evoked SP release from rat peripheral airways but it inhibited release to a similar extent as 200 nM morphine. It is possible that these latter effects are mediated by non-conventional opioid receptors located on mast cells, activation of which causes naloxone-reversible histamine release that in turn augments the release of SP from sensory nerve terminals in the peripheral airways. Clearly, further studies are required to investigate this possibility. (C) 1997 Academic Press Limited.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

PCR-based cancer diagnosis requires detection of rare mutations in k-ras, p53 or other genes. The assumption has been that mutant and wild-type sequences amplify with near equal efficiency, so that they are eventually present in proportions representative of the starting material. Work factor IX suggests that this assumption is invalid for one case of near-sequence identity To test the generality of this phenomenon and its relevance to cancer diagnosis, primers distant from point mutations in p53 and k-ras were used to amplify, wild-type and mutant sequences from these genes. A substantial bias against PCR amplification of mutants was observed for two regions of the p53 gene and one region of k-ras. For kras and p53, bias was observed when the wild-type and mutant sequences were amplified separately or when mixed in equal proportions before PCR. Bias was present with proofreading and non-proofreading polymerases. Mutant and wild-type segments of the factor V cystic fibrosis transmembrane conductance regulator and prothrombin genes were amplified and did not exhibit PCR bias. Therefore, the assumption of equal PCR efficiency for point mutant and wild-type sequences is invalid in several systems. Quantitative or diagnostic PCR will require validation for each locus, and enrichment strategies may be needed to optimize detection of mutants.