115 resultados para K-nearest neighbors method

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A two-stage methodology is developed to obtain future projections of daily relative humidity in a river basin for climate change scenarios. In the first stage, Support Vector Machine (SVM) models are developed to downscale nine sets of predictor variables (large-scale atmospheric variables) for Intergovernmental Panel on Climate Change Special Report on Emissions Scenarios (SRES) (A1B, A2, B1, and COMMIT) to R (H) in a river basin at monthly scale. Uncertainty in the future projections of R (H) is studied for combinations of SRES scenarios, and predictors selected. Subsequently, in the second stage, the monthly sequences of R (H) are disaggregated to daily scale using k-nearest neighbor method. The effectiveness of the developed methodology is demonstrated through application to the catchment of Malaprabha reservoir in India. For downscaling, the probable predictor variables are extracted from the (1) National Centers for Environmental Prediction reanalysis data set for the period 1978-2000 and (2) simulations of the third-generation Canadian Coupled Global Climate Model for the period 1978-2100. The performance of the downscaling and disaggregation models is evaluated by split sample validation. Results show that among the SVM models, the model developed using predictors pertaining to only land location performed better. The R (H) is projected to increase in the future for A1B and A2 scenarios, while no trend is discerned for B1 and COMMIT.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn't capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a robust method for mosaicing of document images using features derived from connected components. Each connected component is described using the Angular Radial Tran. form (ART). To ensure geometric consistency during feature matching, the ART coefficients of a connected component are augmented with those of its two nearest neighbors. The proposed method addresses two critical issues often encountered in correspondence matching: (i) The stability of features and (ii) Robustness against false matches due to the multiple instances of characters in a document image. The use of connected components guarantees a stable localization across images. The augmented features ensure a successful correspondence matching even in the presence of multiple similar regions within the page. We illustrate the effectiveness of the proposed method on camera captured document images exhibiting large variations in viewpoint, illumination and scale.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electronic, magnetic, and structural properties of graphene flakes depend sensitively upon the type of edge atoms. We present a simple software tool for determining the type of edge atoms in a honeycomb lattice. The algorithm is based on nearest neighbor counting. Whether an edge atom is of armchair or zigzag type is decided by the unique pattern of its nearest neighbors. Particular attention is paid to the practical aspects of using the tool, as additional features such as extracting out the edges from the lattice could help in analyzing images from transmission microscopy or other experimental probes. Ultimately, the tool in combination with density-functional theory or tight-binding method can also be helpful in correlating the properties of graphene flakes with the different armchair-to-zigzag ratios. Program summary Program title: edgecount Catalogue identifier: AEIA_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEIA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 66685 No. of bytes in distributed program, including test data, etc.: 485 381 Distribution format: tar.gz Programming language: FORTRAN 90/95 Computer: Most UNIX-based platforms Operating system: Linux, Mac OS Classification: 16.1, 7.8 Nature of problem: Detection and classification of edge atoms in a finite patch of honeycomb lattice. Solution method: Build nearest neighbor (NN) list; assign types to edge atoms on the basis of their NN pattern. Running time: Typically similar to second(s) for all examples. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Finite volume methods traditionally employ dimension by dimension extension of the one-dimensional reconstruction and averaging procedures to achieve spatial discretization of the governing partial differential equations on a structured Cartesian mesh in multiple dimensions. This simple approach based on tensor product stencils introduces an undesirable grid orientation dependence in the computed solution. The resulting anisotropic errors lead to a disparity in the calculations that is most prominent between directions parallel and diagonal to the grid lines. In this work we develop isotropic finite volume discretization schemes which minimize such grid orientation effects in multidimensional calculations by eliminating the directional bias in the lowest order term in the truncation error. Explicit isotropic expressions that relate the cell face averaged line and surface integrals of a function and its derivatives to the given cell area and volume averages are derived in two and three dimensions, respectively. It is found that a family of isotropic approximations with a free parameter can be derived by combining isotropic schemes based on next-nearest and next-next-nearest neighbors in three dimensions. Use of these isotropic expressions alone in a standard finite volume framework, however, is found to be insufficient in enforcing rotational invariance when the flux vector is nonlinear and/or spatially non-uniform. The rotationally invariant terms which lead to a loss of isotropy in such cases are explicitly identified and recast in a differential form. Various forms of flux correction terms which allow for a full recovery of rotational invariance in the lowest order truncation error terms, while preserving the formal order of accuracy and discrete conservation of the original finite volume method, are developed. Numerical tests in two and three dimensions attest the superior directional attributes of the proposed isotropic finite volume method. Prominent anisotropic errors, such as spurious asymmetric distortions on a circular reaction-diffusion wave that feature in the conventional finite volume implementation are effectively suppressed through isotropic finite volume discretization. Furthermore, for a given spatial resolution, a striking improvement in the prediction of kinetic energy decay rate corresponding to a general two-dimensional incompressible flow field is observed with the use of an isotropic finite volume method instead of the conventional discretization. (C) 2014 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Skew correction of complex document images is a difficult task. We propose an edge-based connected component approach for robust skew correction of documents with complex layout and content. The algorithm essentially consists of two steps - an 'initialization' step to determine the image orientation from the centroids of the connected components and a 'search' step to find the actual skew of the image. During initialization, we choose two different sets of points regularly spaced across the the image, one from the left to right and the other from top to bottom. The image orientation is determined from the slope between the two succesive nearest neighbors of each of the points in the chosen set. The search step finds succesive nearest neighbors that satisfy the parameters obtained in the initialization step. The final skew is determined from the slopes obtained in the 'search' step. Unlike other connected component based methods, the proposed method does not require any binarization step that generally precedes connected component analysis. The method works well for scanned documents with complex layout of any skew with a precision of 0.5 degrees.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While plants of a single species emit a diversity of volatile organic compounds (VOCs) to attract or repel interacting organisms, these specific messages may be lost in the midst of the hundreds of VOCs produced by sympatric plants of different species, many of which may have no signal content. Receivers must be able to reduce the babel or noise in these VOCs in order to correctly identify the message. For chemical ecologists faced with vast amounts of data on volatile signatures of plants in different ecological contexts, it is imperative to employ accurate methods of classifying messages, so that suitable bioassays may then be designed to understand message content. We demonstrate the utility of `Random Forests' (RF), a machine-learning algorithm, for the task of classifying volatile signatures and choosing the minimum set of volatiles for accurate discrimination, using datam from sympatric Ficus species as a case study. We demonstrate the advantages of RF over conventional classification methods such as principal component analysis (PCA), as well as data-mining algorithms such as support vector machines (SVM), diagonal linear discriminant analysis (DLDA) and k-nearest neighbour (KNN) analysis. We show why a tree-building method such as RF, which is increasingly being used by the bioinformatics, food technology and medical community, is particularly advantageous for the study of plant communication using volatiles, dealing, as it must, with abundant noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ultra thin films of pure silicon nitride were grown on a Si (1 1 1) surface by exposing the surface to radio-frequency (RF) nitrogen plasma with a high content of nitrogen atoms. The effect of annealing of silicon nitride surface was investigated with core-level photoelectron spectroscopy. The Si 2p photoelectron spectra reveals a characteristic series of components for the Si species, not only in stoichiometric Si3N4 (Si4+) but also in the intermediate nitridation states with one (Si1+) or three (Si3+) nitrogen nearest neighbors. The Si 2p core-level shifts for the Si1+, Si3+, and Si4+ components are determined to be 0.64, 2.20, and 3.05 eV, respectively. In annealed sample it has been observed that the Si4+ component in the Si 2p spectra is significantly improved, which clearly indicates the crystalline nature of silicon nitride. The high resolution X-ray diffraction (HRXRD), scanning electron microscopy (SEM) and photoluminescence (PL) studies showed a significant improvement of the crystalline qualities and enhancement of the optical properties of GaN grown on the stoichiometric Si3N4 by molecular beam epitaxy (MBE). (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many downscaling techniques have been developed in the past few years for projection of station-scale hydrological variables from large-scale atmospheric variables simulated by general circulation models (GCMs) to assess the hydrological impacts of climate change. This article compares the performances of three downscaling methods, viz. conditional random field (CRF), K-nearest neighbour (KNN) and support vector machine (SVM) methods in downscaling precipitation in the Punjab region of India, belonging to the monsoon regime. The CRF model is a recently developed method for downscaling hydrological variables in a probabilistic framework, while the SVM model is a popular machine learning tool useful in terms of its ability to generalize and capture nonlinear relationships between predictors and predictand. The KNN model is an analogue-type method that queries days similar to a given feature vector from the training data and classifies future days by random sampling from a weighted set of K closest training examples. The models are applied for downscaling monsoon (June to September) daily precipitation at six locations in Punjab. Model performances with respect to reproduction of various statistics such as dry and wet spell length distributions, daily rainfall distribution, and intersite correlations are examined. It is found that the CRF and KNN models perform slightly better than the SVM model in reproducing most daily rainfall statistics. These models are then used to project future precipitation at the six locations. Output from the Canadian global climate model (CGCM3) GCM for three scenarios, viz. A1B, A2, and B1 is used for projection of future precipitation. The projections show a change in probability density functions of daily rainfall amount and changes in the wet and dry spell distributions of daily precipitation. Copyright (C) 2011 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: There has been growing interest in integrative taxonomy that uses data from multiple disciplines for species delimitation. Typically, in such studies, monophyly is taken as a proxy for taxonomic distinctiveness and these units are treated as potential species. However, monophyly could arise due to stochastic processes. Thus here, we have employed a recently developed tool based on coalescent approach to ascertain the taxonomic distinctiveness of various monophyletic units. Subsequently, the species status of these taxonomic units was further tested using corroborative evidence from morphology and ecology. This inter-disciplinary approach was implemented on endemic centipedes of the genus Digitipes (Attems 1930) from the Western Ghats (WG) biodiversity hotspot of India. The species of the genus Digitipes are morphologically conserved, despite their ancient late Cretaceous origin. Principal Findings: Our coalescent analysis based on mitochondrial dataset indicated the presence of nine putative species. The integrative approach, which includes nuclear, morphology, and climate datasets supported distinctiveness of eight putative species, of which three represent described species and five were new species. Among the five new species, three were morphologically cryptic species, emphasizing the effectiveness of this approach in discovering cryptic diversity in less explored areas of the tropics like the WG. In addition, species pairs showed variable divergence along the molecular, morphological and climate axes. Conclusions: A multidisciplinary approach illustrated here is successful in discovering cryptic diversity with an indication that the current estimates of invertebrate species richness for the WG might have been underestimated. Additionally, the importance of measuring multiple secondary properties of species while defining species boundaries was highlighted given variable divergence of each species pair across the disciplines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Climate change impact assessment studies involve downscaling large-scale atmospheric predictor variables (LSAPVs) simulated by general circulation models (GCMs) to site-scale meteorological variables. This article presents a least-square support vector machine (LS-SVM)-based methodology for multi-site downscaling of maximum and minimum daily temperature series. The methodology involves (1) delineation of sites in the study area into clusters based on correlation structure of predictands, (2) downscaling LSAPVs to monthly time series of predictands at a representative site identified in each of the clusters, (3) translation of the downscaled information in each cluster from the representative site to that at other sites using LS-SVM inter-site regression relationships, and (4) disaggregation of the information at each site from monthly to daily time scale using k-nearest neighbour disaggregation methodology. Effectiveness of the methodology is demonstrated by application to data pertaining to four sites in the catchment of Beas river basin, India. Simulations of Canadian coupled global climate model (CGCM3.1/T63) for four IPCC SRES scenarios namely A1B, A2, B1 and COMMIT were downscaled to future projections of the predictands in the study area. Comparison of results with those based on recently proposed multivariate multiple linear regression (MMLR) based downscaling method and multi-site multivariate statistical downscaling (MMSD) method indicate that the proposed method is promising and it can be considered as a feasible choice in statistical downscaling studies. The performance of the method in downscaling daily minimum temperature was found to be better when compared with that in downscaling daily maximum temperature. Results indicate an increase in annual average maximum and minimum temperatures at all the sites for A1B, A2 and B1 scenarios. The projected increment is high for A2 scenario, and it is followed by that for A1B, B1 and COMMIT scenarios. Projections, in general, indicated an increase in mean monthly maximum and minimum temperatures during January to February and October to December.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Prediction of queue waiting times of jobs submitted to production parallel batch systems is important to provide overall estimates to users and can also help meta-schedulers make scheduling decisions. In this work, we have developed a framework for predicting ranges of queue waiting times for jobs by employing multi-class classification of similar jobs in history. Our hierarchical prediction strategy first predicts the point wait time of a job using dynamic k-Nearest Neighbor (kNN) method. It then performs a multi-class classification using Support Vector Machines (SVMs) among all the classes of the jobs. The probabilities given by the SVM for the class predicted using k-NN and its neighboring classes are used to provide a set of ranges of predicted wait times with probabilities. We have used these predictions and probabilities in a meta-scheduling strategy that distributes jobs to different queues/sites in a multi-queue/grid environment for minimizing wait times of the jobs. Experiments with different production supercomputer job traces show that our prediction strategies can give correct predictions for about 77-87% of the jobs, and also result in about 12% improved accuracy when compared to the next best existing method. Experiments with our meta-scheduling strategy using different production and synthetic job traces for various system sizes, partitioning schemes and different workloads, show that the meta-scheduling strategy gives much improved performance when compared to existing scheduling policies by reducing the overall average queue waiting times of the jobs by about 47%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We demonstrate the phenomenon of self-organized criticality (SOC) in a simple random walk model described by a random walk of a myopic ant, i.e., a walker who can see only nearest neighbors. The ant acts on the underlying lattice aiming at uniform digging, i.e., reduction of the height profile of the surface but is unaffected by the underlying lattice. In one, two, and three dimensions we have explored this model and have obtained power laws in the time intervals between consecutive events of "digging." Being a simple random walk, the power laws in space translate to power laws in time. We also study the finite size scaling of asymptotic scale invariant process as well as dynamic scaling in this system. This model differs qualitatively from the cascade models of SOC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A comparative first principles study has been carried out for EuLiH3 (ELH) and EuTiO3 (ETO) using the generalized gradient approximation +U approach. While ELH exhibits ferromagnetic ground state for all volumes, the magnetic ground state of ETO has the tendency to switch from G-type antiferromagnetic to a ferromagnetic state with change in volume. The marked difference in magnetic behavior and magnitude of the nearest neighbors exchange interaction of both the compounds are shown to be related to the difference in their respective electronic structure near the Fermi level. The Ti 3d states are shown to play predominant role in weakening the strength of the exchange interaction in ETO.