121 resultados para k-nearest neighbours estimation

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Racing algorithms have recently been proposed as a general-purpose method for performing model selection in machine teaming algorithms. In this paper, we present an empirical study of the Hoeffding racing algorithm for selecting the k parameter in a simple k-nearest neighbor classifier. Fifteen widely-used classification datasets from UCI are used and experiments conducted across different confidence levels for racing. The results reveal a significant amount of sensitivity of the k-nn classifier to its model parameter value. The Hoeffding racing algorithm also varies widely in its performance, in terms of the computational savings gained over an exhaustive evaluation. While in some cases the savings gained are quite small, the racing algorithm proved to be highly robust to the possibility of erroneously eliminating the optimal models. All results were strongly dependent on the datasets used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents results on the simulation of the solid state sintering of copper wires using Monte Carlo techniques based on elements of lattice theory and cellular automata. The initial structure is superimposed onto a triangular, two-dimensional lattice, where each lattice site corresponds to either an atom or vacancy. The number of vacancies varies with the simulation temperature, while a cluster of vacancies is a pore. To simulate sintering, lattice sites are picked at random and reoriented in terms of an atomistic model governing mass transport. The probability that an atom has sufficient energy to jump to a vacant lattice site is related to the jump frequency, and hence the diffusion coefficient, while the probability that an atomic jump will be accepted is related to the change in energy of the system as a result of the jump, as determined by the change in the number of nearest neighbours. The jump frequency is also used to relate model time, measured in Monte Carlo Steps, to the actual sintering time. The model incorporates bulk, grain boundary and surface diffusion terms and includes vacancy annihilation on the grain boundaries. The predictions of the model were found to be consistent with experimental data, both in terms of the microstructural evolution and in terms of the sintering time. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we investigate a Bayesian procedure for the estimation of a flexible generalised distribution, notably the MacGillivray adaptation of the g-and-κ distribution. This distribution, described through its inverse cdf or quantile function, generalises the standard normal through extra parameters which together describe skewness and kurtosis. The standard quantile-based methods for estimating the parameters of generalised distributions are often arbitrary and do not rely on computation of the likelihood. MCMC, however, provides a simulation-based alternative for obtaining the maximum likelihood estimates of parameters of these distributions or for deriving posterior estimates of the parameters through a Bayesian framework. In this paper we adopt the latter approach, The proposed methodology is illustrated through an application in which the parameter of interest is slightly skewed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nearest–neighbour balance is considered a desirable property for an experiment to possess in situations where experimental units are influenced by their neighbours. This paper introduces a measure of the degree of nearest–neighbour balance of a design. The measure is used in an algorithm which generates nearest–neighbour balanced designs and is readily modified to obtain designs with various types of nearest–neighbour balance. Nearest–neighbour balanced designs are produced for a wide class of parameter settings, and in particular for those settings for which such designs cannot be found by existing direct combinatorial methods. In addition, designs with unequal row and column sizes, and designs with border plots are constructed using the approach presented here.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bioelectrical impedance analysis (BIA) was used to assess body composition in rats fed on either standard laboratory diet or on a high-fat diet designed to induce obesity. Bioelectrical impedance analysis predictions of total body water and thus fat-free mass (FFM) for the group mean values were generally within 5% of the measured values by tritiated water ((H2O)-H-3) dilution. The limits of agreement for the procedure were, however, large, approximately +/-25%, limiting the applicability of the technique for measurement of body composition in individual animals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background From the mid-1980s to mid-1990s, the WHO MONICA Project monitored coronary events and classic risk factors for coronary heart disease (CHD) in 38 populations from 21 countries. We assessed the extent to which changes in these risk factors explain the variation in the trends in coronary-event rates across the populations. Methods In men and women aged 35-64 years, non-fatal myocardial infarction and coronary deaths were registered continuously to assess trends in rates of coronary events. We carried out population surveys to estimate trends in risk factors. Trends in event rates were regressed on trends in risk score and in individual risk factors. Findings Smoking rates decreased in most male populations but trends were mixed in women; mean blood pressures and cholesterol concentrations decreased, body-mass index increased, and overall risk scores and coronary-event rates decreased. The model of trends in 10-year coronary-event rates against risk scores and single risk factors showed a poor fit, but this was improved with a 4-year time lag for coronary events. The explanatory power of the analyses was limited by imprecision of the estimates and homogeneity of trends in the study populations. Interpretation Changes in the classic risk factors seem to partly explain the variation in population trends in CHD. Residual variance is attributable to difficulties in measurement and analysis, including time lag, and to factors that were not included, such as medical interventions. The results support prevention policies based on the classic risk factors but suggest potential for prevention beyond these.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dendritic cells (DC) are considered to be the major cell type responsible for induction of primary immune responses. While they have been shown to play a critical role in eliciting allosensitization via the direct pathway, there is evidence that maturational and/or activational heterogeneity between DC in different donor organs may be crucial to allograft outcome. Despite such an important perceived role for DC, no accurate estimates of their number in commonly transplanted organs have been reported. Therefore, leukocytes and DC were visualized and enumerated in cryostat sections of normal mouse (C57BL/10, B10.BR, C3H) liver, heart, kidney and pancreas by immunohistochemistry (CD45 and MHC class II staining, respectively). Total immunopositive cell number and MHC class II+ cell density (C57BL/10 mice only) were estimated using established morphometric techniques - the fractionator and disector principles, respectively. Liver contained considerably more leukocytes (similar to 5-20 x 10(6)) and DC (similar to 1-3 x 10(6)) than the other organs examined (pancreas: similar to 0.6 x 10(6) and similar to 0.35 x 10(6): heart: similar to 0.8 x 10(6) and similar to 0.4 x 10(6); kidney similar to 1.2 x 10(6) and 0.65 x 10(6), respectively). In liver, DC comprised a lower proportion of all leukocytes (similar to 15-25%) than in the other parenchymal organs examined (similar to 40-60%). Comparatively, DC density in C57BL/10 mice was heart > kidney > pancreas much greater than liver (similar to 6.6 x 10(6), 5 x 10(6), 4.5 x 10(6) and 1.1 x 10(6) cells/cm(3), respectively). When compared to previously published data on allograft survival, the results indicate that the absolute number of MHC class II+ DC present in a donor organ is a poor predictor of graft outcome. Survival of solid organ allografts is more closely related to the density of the donor DC network within the graft. (C) 2000 Elsevier Science B.V. All rights reserved.