70 resultados para K-nearest neighbour
em University of Queensland eSpace - Australia
Resumo:
Nearest–neighbour balance is considered a desirable property for an experiment to possess in situations where experimental units are influenced by their neighbours. This paper introduces a measure of the degree of nearest–neighbour balance of a design. The measure is used in an algorithm which generates nearest–neighbour balanced designs and is readily modified to obtain designs with various types of nearest–neighbour balance. Nearest–neighbour balanced designs are produced for a wide class of parameter settings, and in particular for those settings for which such designs cannot be found by existing direct combinatorial methods. In addition, designs with unequal row and column sizes, and designs with border plots are constructed using the approach presented here.
Resumo:
Racing algorithms have recently been proposed as a general-purpose method for performing model selection in machine teaming algorithms. In this paper, we present an empirical study of the Hoeffding racing algorithm for selecting the k parameter in a simple k-nearest neighbor classifier. Fifteen widely-used classification datasets from UCI are used and experiments conducted across different confidence levels for racing. The results reveal a significant amount of sensitivity of the k-nn classifier to its model parameter value. The Hoeffding racing algorithm also varies widely in its performance, in terms of the computational savings gained over an exhaustive evaluation. While in some cases the savings gained are quite small, the racing algorithm proved to be highly robust to the possibility of erroneously eliminating the optimal models. All results were strongly dependent on the datasets used.
Resumo:
We investigate the internal dynamics of two cellular automaton models with heterogeneous strength fields and differing nearest neighbour laws. One model is a crack-like automaton, transferring ail stress from a rupture zone to the surroundings. The other automaton is a partial stress drop automaton, transferring only a fraction of the stress within a rupture zone to the surroundings. To study evolution of stress, the mean spectral density. f(k(r)) of a stress deficit held is: examined prior to, and immediately following ruptures in both models. Both models display a power-law relationship between f(k(r)) and spatial wavenumber (k(r)) of the form f(k(r)) similar tok(r)(-beta). In the crack model, the evolution of stress deficit is consistent with cyclic approach to, and retreat from a critical state in which large events occur. The approach to criticality is driven by tectonic loading. Short-range stress transfer in the model does not affect the approach to criticality of broad regions in the model. The evolution of stress deficit in the partial stress drop model is consistent with small fluctuations about a mean state of high stress, behaviour indicative of a self-organised critical system. Despite statistics similar to natural earthquakes these simplified models lack a physical basis. physically motivated models of earthquakes also display dynamical complexity similar to that of a critical point system. Studies of dynamical complexity in physical models of earthquakes may lead to advancement towards a physical theory for earthquakes.
Resumo:
Most sugarcane breeding programs in Australia use large unreplicated trials to evaluate clones in the early stages of selection. Commercial varieties that are replicated provide a method of local control of soil fertility. Although such methods may be useful in detecting broad trends in the field, variation often occurs on a much smaller scale. Methods such as spatial analysis adjust a plot for variability by using information from immediate neighbours. These techniques are routinely used to analyse cereal data in Australia and have resulted in increased accuracy and precision in the estimates of variety effects. In this paper, spatial analyses in which the variability is decomposed into local, natural, and extraneous components are applied to early selection trials in sugarcane. Interplot competition in cane yield and trend in sugar content were substantial in many of the trials and there were often large differences in the selections between the spatial and current method used by the Bureau of Sugar Experiment Stations. A joint modelling approach for tonnes sugar per hectare in response to fertility trends and interplot competition is recommended.
Resumo:
Scorpion toxins are common experimental tools for studies of biochemical and pharmacological properties of ion channels. The number of functionally annotated scorpion toxins is steadily growing, but the number of identified toxin sequences is increasing at much faster pace. With an estimated 100,000 different variants, bioinformatic analysis of scorpion toxins is becoming a necessary tool for their systematic functional analysis. Here, we report a bioinformatics-driven system involving scorpion toxin structural classification, functional annotation, database technology, sequence comparison, nearest neighbour analysis, and decision rules which produces highly accurate predictions of scorpion toxin functional properties. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
The deep-sea pearleye, Scopelarchus michaelsarsi (Scopelarchidae) is a mesopelagic teleost with asymmetric or tubular eyes. The main retina subtends a large dorsal binocular field, while the accessory retina subtends a restricted monocular field of lateral visual space. Ocular specializations to increase the lateral visual field include an oblique pupil and a corneal lens pad. A detailed morphological and topographic study of the photoreceptors and retinal ganglion cells reveals seven specializations: a centronasal region of the main retina with ungrouped rod-like photoreceptors overlying a retinal tapetum; a region of high ganglion cell density (area centralis of 56.1x10(3) cells per mm(2)) in the centrolateral region of the main retina; a centrotemporal region of the main retina with grouped rod-like photoreceptors; a region (area giganto cellularis) of large (32.2+/-5.6 mu m(2)), alpha-like ganglion cells arranged in a regular array (nearest neighbour distance 53.5+/-9.3 mu m with a conformity ratio of 5.8) in the temporal main retina; an accessory retina with grouped rod-like photoreceptors; a nasotemporal band of a mixture of rod-and cone-like photoreceptors restricted to the ventral accessory retina; and a retinal diverticulum comprised of a ventral region of differentiated accessory retina located medial to the optic nerve head. Retrograde labelling from the optic nerve with DiI shows that approximately 14% of the cells in the ganglion cell layer of the main retina are displaced amacrine cells at 1.5 mm eccentricity. Cryosectioning of the tubular eye confirms Matthiessen's ratio (2.59), and calculations of the spatial resolving power suggests that the function of the area centralis (7.4 cycles per degree/8.1 minutes of are) and the cohort of temporal alpha-like ganglion cells (0.85 cycles per degree/70.6 minutes of are) in the main retina may be different. Low summation ratios in these various retinal zones suggests that each zone may mediate distinct visual tasks in a certain region of the visual field by optimizing sensitivity and/or resolving power.
Resumo:
The effect of increasing population density on the formation of pits, their size and spatial distribution, and on levels of mortality was examined in the antlion Myrmeleon acer Walker. Antlions were kept at densities ranging from 0.4 to 12.8 individuals per 100 cm(2). The distribution of pits was regular or uniform across all densities, but antlions constructed proportionally fewer and smaller pits as density increased. Mortality through cannibalism was very low and only occurred at densities greater than five individuals per 100 cm(2). Antlions in artificially crowded situations frequently relocated their pits and when more space became available, individuals became more dispersed with time. Redistribution of this species results from active avoidance of other antlions and sand throwing associated with pit construction and maintenance, rather than any attempt to optimise prey capture per se.
Resumo:
1. Cluster analysis of reference sites with similar biota is the initial step in creating River Invertebrate Prediction and Classification System (RIVPACS) and similar river bioassessment models such as Australian River Assessment System (AUSRIVAS). This paper describes and tests an alternative prediction method, Assessment by Nearest Neighbour Analysis (ANNA), based on the same philosophy as RIVPACS and AUSRIVAS but without the grouping step that some people view as artificial. 2. The steps in creating ANNA models are: (i) weighting the predictor variables using a multivariate approach analogous to principal axis correlations, (ii) calculating the weighted Euclidian distance from a test site to the reference sites based on the environmental predictors, (iii) predicting the faunal composition based on the nearest reference sites and (iv) calculating an observed/expected (O/E) analogous to RIVPACS/AUSRIVAS. 3. The paper compares AUSRIVAS and ANNA models on 17 datasets representing a variety of habitats and seasons. First, it examines each model's regressions for Observed versus Expected number of taxa, including the r(2), intercept and slope. Second, the two models' assessments of 79 test sites in New Zealand are compared. Third, the models are compared on test and presumed reference sites along a known trace metal gradient. Fourth, ANNA models are evaluated for western Australia, a geographically distinct region of Australia. The comparisons demonstrate that ANNA and AUSRIVAS are generally equivalent in performance, although ANNA turns out to be potentially more robust for the O versus E regressions and is potentially more accurate on the trace metal gradient sites. 4. The ANNA method is recommended for use in bioassessment of rivers, at least for corroborating the results of the well established AUSRIVAS- and RIVPACS-type models, if not to replace them.
Resumo:
Genetic diversity and population structure were investigated across the core range of Tasmanian devils (Sarcophilus laniarius; Dasyuridae), a wide-ranging marsupial carnivore restricted to the island of Tasmania. Heterozygosity (0.386-0.467) and allelic diversity (2.7-3.3) were low in all subpopulations and allelic size ranges were small and almost continuous, consistent with a founder effect. Island effects and repeated periods of low population density may also have contributed to the low variation. Within continuous habitat, gene flow appears extensive up to 50 km (high assignment rates to source or close neighbour populations; nonsignificant values of pairwise F-ST), in agreement with movement data. At larger scales (150-250 km), gene flow is reduced (significant pairwise F-ST) but there is no evidence for isolation by distance. The most substantial genetic structuring was observed for comparisons spanning unsuitable habitat, implying limited dispersal of devils between the well-connected, eastern populations and a smaller northwestern population. The genetic distinctiveness of the northwestern population was reflected in all analyses: unique alleles; multivariate analyses of gene frequency (multidimensional scaling, minimum spanning tree, nearest neighbour); high self-assignment (95%); two distinct populations for Tasmania were detected in isolation by distance and in Bayesian model-based clustering analyses. Marsupial carnivores appear to have stronger population subdivisions than their placental counterparts.
Resumo:
With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.
Resumo:
Predicting the various responses of different species to changes in landscape structure is a formidable challenge to landscape ecology. Based on expert knowledge and landscape ecological theory, we develop five competing a priori models for predicting the presence/absence of the Koala (Phascolarctos cinereus) in Noosa Shire, south-east Queensland (Australia). A priori predictions were nested within three levels of ecological organization: in situ (site level) habitat (< 1 ha), patch level (100 ha) and landscape level (100-1000 ha). To test the models, Koala surveys and habitat surveys (n = 245) were conducted across the habitat mosaic. After taking into account tree species preferences, the patch and landscape context, and the neighbourhood effect of adjacent present sites, we applied logistic regression and hierarchical partitioning analyses to rank the alternative models and the explanatory variables. The strongest support was for a multilevel model, with Koala presence best predicted by the proportion of the landscape occupied by high quality habitat, the neighbourhood effect, the mean nearest neighbour distance between forest patches, the density of forest patches and the density of sealed roads. When tested against independent data (n = 105) using a receiver operator characteristic curve, the multilevel model performed moderately well. The study is consistent with recent assertions that habitat loss is the major driver of population decline, however, landscape configuration and roads have an important effect that needs to be incorporated into Koala conservation strategies.
Resumo:
The loss and fragmentation of forest habitats by human land use are recognised as important factors influencing the decline of forest-dependent fauna. Mammal species that are dependent upon forest habitats are particularly sensitive to habitat loss and fragmentation because they have highly specific habitat requirements, and in many cases have limited ability to move through and utilise the land use matrix. We addressed this problem using a case study of the koala (Phascolarctos cinereus) surveyed in a fragmented rural-urban landscape in southeast Queensland, Australia. We applied a logistic modelling and hierarchical partitioning analysis to determine the importance of forest area and its configuration relative to site (local) and patch-level habitat variables. After taking into account spatial auto-correlation and the year of survey, we found koala occurrence increased with the area of all forest habitats, habitat patch size and the proportion of primary Eucalyptus tree species; and decreased with mean nearest neighbour distance between forest patches, the density of forest patches, and the density of sealed roads. The difference between the effect of habitat area and configuration was not as strong as theory predicts, with the configuration of remnant forest becoming increasingly important as the area of forest habitat declines. We conclude that the area of forest, its configuration across the landscape, as well as the land use matrix, are important determinants of koala occurrence, and that habitat configuration should not be overlooked in the conservation of forest-dependent mammals, such as the koala. We highlight the implications of these findings for koala conservation. (c) 2006 Elsevier Ltd. All rights reserved.