969 resultados para K-nearest neighbour
Resumo:
The length and time scales accessible to optical tweezers make them an ideal tool for the examination of colloidal systems. Embedded high-refractive-index tracer particles in an index-matched hard sphere suspension provide 'handles' within the system to investigate the mechanical behaviour. Passive observations of the motion of a single probe particle give information about the linear response behaviour of the system, which can be linked to the macroscopic frequency-dependent viscous and elastic moduli of the suspension. Separate 'dragging' experiments allow observation of a sample's nonlinear response to an applied stress on a particle-by particle basis. Optical force measurements have given new data about the dynamics of phase transitions and particle interactions; an example in this study is the transition from liquid-like to solid-like behaviour, and the emergence of a yield stress and other effects attributable to nearest-neighbour caging effects. The forces needed to break such cages and the frequency of these cage breaking events are investigated in detail for systems close to the glass transition.
A benchmark-driven modelling approach for evaluating deployment choices on a multi-core architecture
Resumo:
The complexity of current and emerging architectures provides users with options about how best to use the available resources, but makes predicting performance challenging. In this work a benchmark-driven model is developed for a simple shallow water code on a Cray XE6 system, to explore how deployment choices such as domain decomposition and core affinity affect performance. The resource sharing present in modern multi-core architectures adds various levels of heterogeneity to the system. Shared resources often includes cache, memory, network controllers and in some cases floating point units (as in the AMD Bulldozer), which mean that the access time depends on the mapping of application tasks, and the core's location within the system. Heterogeneity further increases with the use of hardware-accelerators such as GPUs and the Intel Xeon Phi, where many specialist cores are attached to general-purpose cores. This trend for shared resources and non-uniform cores is expected to continue into the exascale era. The complexity of these systems means that various runtime scenarios are possible, and it has been found that under-populating nodes, altering the domain decomposition and non-standard task to core mappings can dramatically alter performance. To find this out, however, is often a process of trial and error. To better inform this process, a performance model was developed for a simple regular grid-based kernel code, shallow. The code comprises two distinct types of work, loop-based array updates and nearest-neighbour halo-exchanges. Separate performance models were developed for each part, both based on a similar methodology. Application specific benchmarks were run to measure performance for different problem sizes under different execution scenarios. These results were then fed into a performance model that derives resource usage for a given deployment scenario, with interpolation between results as necessary.
Resumo:
There is now an emerging need for an efficient modeling strategy to develop a new generation of monitoring systems. One method of approaching the modeling of complex processes is to obtain a global model. It should be able to capture the basic or general behavior of the system, by means of a linear or quadratic regression, and then superimpose a local model on it that can capture the localized nonlinearities of the system. In this paper, a novel method based on a hybrid incremental modeling approach is designed and applied for tool wear detection in turning processes. It involves a two-step iterative process that combines a global model with a local model to take advantage of their underlying, complementary capacities. Thus, the first step constructs a global model using a least squares regression. A local model using the fuzzy k-nearest-neighbors smoothing algorithm is obtained in the second step. A comparative study then demonstrates that the hybrid incremental model provides better error-based performance indices for detecting tool wear than a transductive neurofuzzy model and an inductive neurofuzzy model.
Resumo:
Scorpion toxins are common experimental tools for studies of biochemical and pharmacological properties of ion channels. The number of functionally annotated scorpion toxins is steadily growing, but the number of identified toxin sequences is increasing at much faster pace. With an estimated 100,000 different variants, bioinformatic analysis of scorpion toxins is becoming a necessary tool for their systematic functional analysis. Here, we report a bioinformatics-driven system involving scorpion toxin structural classification, functional annotation, database technology, sequence comparison, nearest neighbour analysis, and decision rules which produces highly accurate predictions of scorpion toxin functional properties. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.
Resumo:
Strontium has been substituted for calcium in the glass series (SiO2)49.46(Na2O)26.38(P2O5)1.07(CaO)23.08x(SrO)x (where x = 0, 11.54, 23.08) to elucidate their underlying atomic-scale structural characteristics as a basis for understanding features related to the bioactivity. These bioactive glasses have been investigated using isomorphic neutron and X-ray diffraction, Sr K-edge EXAFS and solid state 17O, 23Na, 29Si, 31P and 43Ca magic-angle-spinning (MAS) NMR. An effective isomorphic substitution first-order difference function has been applied to the neutron diffraction data, confirming that Ca and Sr behave in a similar manner within the glass network, with residual differences attributed to solely the variation in ionic radius between the two species. The diffraction data provides the first direct experimental evidence of split Ca–O nearest-neighbour correlations in these melt quench bioactive glasses, together with an analogous splitting of the Sr–O correlations; the correlations are attributed to the metal ions correlated either to bridging or to non-bridging oxygen atoms. Triple quantum (3Q) 43Ca MAS NMR corroborates the split Ca–O correlations. Successful simplification of the 2 < r (A) < 3 region via the difference method has also revealed two distinct Na environments. These environments are attributed to sodium correlated either to bridging or to nonbridging oxygen atoms. Complementary multinuclear MAS NMR, Sr K-edge EXAFS and X-ray diffraction data supports the structural model presented. The structural sites present will be intimately related to their release properties in physiological fluids such as plasma and saliva, and hence the bioactivity of the material. Detailed structural knowledge is therefore a prerequisite for optimising material design.
Resumo:
The full set of partial structure factors for glassy germania, or GeO2, were accurately measured by using the method of isotopic substitution in neutron diffraction in order to elucidate the nature of the pair correlations for this archetypal strong glass former. The results show that the basic tetrahedral Ge(O-1/2)(4) building blocks share corners with a mean inter-tetrahedral Ge-O-Ge bond angle of 132(2)degrees. The topological and chemical ordering in the resultant network displays two characteristic length scales at distances greater than the nearest neighbour. One of these describes the intermediate range order, and manifests itself by the appearance of a first sharp diffraction peak in the measured diffraction patterns at a scattering vector k(FSDP) approximate to 1.53 angstrom(-1), while the other describes so-called extended range order, and is associated with the principal peak at k(PP) = 2.66( 1) angstrom(-1). We find that there is an interplay between the relative importance of the ordering on these length scales for tetrahedral network forming glasses that is dominated by the extended range ordering with increasing glass fragility. The measured partial structure factors for glassy GeO2 are used to reproduce the total structure factor measured by using high energy x-ray diffraction and the experimental results are also compared to those obtained by using classical and first principles molecular dynamics simulations.
Resumo:
Nearest neighbour collaborative filtering (NNCF) algorithms are commonly used in multimedia recommender systems to suggest media items based on the ratings of users with similar preferences. However, the prediction accuracy of NNCF algorithms is affected by the reduced number of items – the subset of items co-rated by both users – typically used to determine the similarity between pairs of users. In this paper, we propose a different approach, which substantially enhances the accuracy of the neighbour selection process – a user-based CF (UbCF) with semantic neighbour discovery (SND). Our neighbour discovery methodology, which assesses pairs of users by taking into account all the items rated at least by one of the users instead of just the set of co-rated items, semantically enriches this enlarged set of items using linked data and, finally, applies the Collinearity and Proximity Similarity metric (CPS), which combines the cosine similarity with Chebyschev distance dissimilarity metric. We tested the proposed SND against the Pearson Correlation neighbour discovery algorithm off-line, using the HetRec data set, and the results show a clear improvement in terms of accuracy and execution time for the predicted recommendations.
Resumo:
A problemática relacionada com a modelação da qualidade da água de albufeiras pode ser abordada de diversos pontos de vista. Neste trabalho recorre-se a metodologias de resolução de problemas que emanam da Área Cientifica da Inteligência Artificial, assim como a ferramentas utilizadas na procura de soluções como as Árvores de Decisão, as Redes Neuronais Artificiais e a Aproximação de Vizinhanças. Actualmente os métodos de avaliação da qualidade da água são muito restritivos já que não permitem aferir a qualidade da água em tempo real. O desenvolvimento de modelos de previsão baseados em técnicas de Descoberta de Conhecimento em Bases de Dados, mostrou ser uma alternativa tendo em vista um comportamento pró-activo que pode contribuir decisivamente para diagnosticar, preservar e requalificar as albufeiras. No decurso do trabalho, foi utilizada a aprendizagem não-supervisionada tendo em vista estudar a dinâmica das albufeiras sendo descritos dois comportamentos distintos, relacionados com a época do ano. ABSTRACT: The problems related to the modelling of water quality in reservoirs can be approached from different viewpoints. This work resorts to methods of resolving problems emanating from the Scientific Area of Artificial lntelligence as well as to tools used in the search for solutions such as Decision Trees, Artificial Neural Networks and Nearest-Neighbour Method. Currently, the methods for assessing water quality are very restrictive because they do not indicate the water quality in real time. The development of forecasting models, based on techniques of Knowledge Discovery in Databases, shows to be an alternative in view of a pro-active behavior that may contribute to diagnose, maintain and requalify the water bodies. ln this work. unsupervised learning was used to study the dynamics of reservoirs, being described two distinct behaviors, related to the time of year.
Resumo:
The efficacy of fluorescence spectroscopy to detect squamous cell carcinoma is evaluated in an animal model following laser excitation at 442 and 532 nm. Lesions are chemically induced with a topical DMBA application at the left lateral tongue of Golden Syrian hamsters. The animals are investigated every 2 weeks after the 4th week of induction until a total of 26 weeks. The right lateral tongue of each animal is considered as a control site (normal contralateral tissue) and the induced lesions are analyzed as a set of points covering the entire clinically detectable area. Based on fluorescence spectral differences, four indices are determined to discriminate normal and carcinoma tissues, based on intraspectral analysis. The spectral data are also analyzed using a multivariate data analysis and the results are compared with histology as the diagnostic gold standard. The best result achieved is for blue excitation using the KNN (K-nearest neighbor, a interspectral analysis) algorithm with a sensitivity of 95.7% and a specificity of 91.6%. These high indices indicate that fluorescence spectroscopy may constitute a fast noninvasive auxiliary tool for diagnostic of cancer within the oral cavity. (C) 2008 Society of Photo-Optical Instrumentation Engineers.
Resumo:
Quality control of toys for avoiding children exposure to potentially toxic elements is of utmost relevance and it is a common requirement in national and/or international norms for health and safety reasons. Laser-induced breakdown spectroscopy (LIBS) was recently evaluated at authors` laboratory for direct analysis of plastic toys and one of the main difficulties for the determination of Cd. Cr and Pb was the variety of mixtures and types of polymers. As most norms rely on migration (lixiviation) protocols, chemometric classification models from LIBS spectra were tested for sampling toys that present potential risk of Cd, Cr and Pb contamination. The classification models were generated from the emission spectra of 51 polymeric toys and by using Partial Least Squares - Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA) and K-Nearest Neighbor (KNN). The classification models and validations were carried out with 40 and 11 test samples, respectively. Best results were obtained when KNN was used, with corrected predictions varying from 95% for Cd to 100% for Cr and Pb. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
A rapid method for classification of mineral waters is proposed. The discrimination power was evaluated by a novel combination of chemometric data analysis and qualitative multi-elemental fingerprints of mineral water samples acquired from different regions of the Brazilian territory. The classification of mineral waters was assessed using only the wavelength emission intensities obtained by inductively coupled plasma optical emission spectrometry (ICP OES), monitoring different lines of Al, B, Ba, Ca, Cl, Cu, Co, Cr, Fe, K, Mg, Mn, Na, Ni, P, Pb, S, Sb, Si, Sr, Ti, V, and Zn, and Be, Dy, Gd, In, La, Sc and Y as internal standards. Data acquisition was done under robust (RC) and non-robust (NRC) conditions. Also, the combination of signal intensities of two or more emission lines for each element were evaluated instead of the individual lines. The performance of two classification-k-nearest neighbor (kNN) and soft independent modeling of class analogy (SIMCA)-and preprocessing algorithms, autoscaling and Pareto scaling, were evaluated for the ability to differentiate between the various samples in each approach tested (combination of robust or non-robust conditions with use of individual lines or sum of the intensities of emission lines). It was shown that qualitative ICP OES fingerprinting in combination with multivariate analysis is a promising analytical tool that has potential to become a recognized procedure for rapid authenticity and adulteration testing of mineral water samples or other material whose physicochemical properties (or origin) are directly related to mineral content.
Resumo:
Recently, we have built a classification model that is capable of assigning a given sesquiterpene lactone (STL) into exactly one tribe of the plant family Asteraceae from which the STL has been isolated. Although many plant species are able to biosynthesize a set of peculiar compounds, the occurrence of the same secondary metabolites in more than one tribe of Asteraceae is frequent. Building on our previous work, in this paper, we explore the possibility of assigning an STL to more than one tribe (class) simultaneously. When an object may belong to more than one class simultaneously, it is called multilabeled. In this work, we present a general overview of the techniques available to examine multilabeled data. The problem of evaluating the performance of a multilabeled classifier is discussed. Two particular multilabeled classification methods-cross-training with support vector machines (ct-SVM) and multilabeled k-nearest neighbors (M-L-kNN)were applied to the classification of the STLs into seven tribes from the plant family Asteraceae. The results are compared to a single-label classification and are analyzed from a chemotaxonomic point of view. The multilabeled approach allowed us to (1) model the reality as closely as possible, (2) improve our understanding of the relationship between the secondary metabolite profiles of different Asteraceae tribes, and (3) significantly decrease the number of plant sources to be considered for finding a certain STL. The presented classification models are useful for the targeted collection of plants with the objective of finding plant sources of natural compounds that are biologically active or possess other specific properties of interest.
Resumo:
The supervised pattern recognition methods K-Nearest Neighbors (KNN), stepwise discriminant analysis (SDA), and soft independent modelling of class analogy (SIMCA) were employed in this work with the aim to investigate the relationship between the molecular structure of 27 cannabinoid compounds and their analgesic activity. Previous analyses using two unsupervised pattern recognition methods (PCA-principal component analysis and HCA-hierarchical cluster analysis) were performed and five descriptors were selected as the most relevants for the analgesic activity of the compounds studied: R (3) (charge density on substituent at position C(3)), Q (1) (charge on atom C(1)), A (surface area), log P (logarithm of the partition coefficient) and MR (molecular refractivity). The supervised pattern recognition methods (SDA, KNN, and SIMCA) were employed in order to construct a reliable model that can be able to predict the analgesic activity of new cannabinoid compounds and to validate our previous study. The results obtained using the SDA, KNN, and SIMCA methods agree perfectly with our previous model. Comparing the SDA, KNN, and SIMCA results with the PCA and HCA ones we could notice that all multivariate statistical methods classified the cannabinoid compounds studied in three groups exactly in the same way: active, moderately active, and inactive.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).