971 resultados para K-Nearest Neighbors
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
In this paper we investigate whether conventional text categorization methods may suffice to infer different verbal intelligence levels. This research goal relies on the hypothesis that the vocabulary that speakers make use of reflects their verbal intelligence levels. Automatic verbal intelligence estimation of users in a spoken language dialog system may be useful when defining an optimal dialog strategy by improving its adaptation capabilities. The work is based on a corpus containing descriptions (i.e. monologs) of a short film by test persons yielding different educational backgrounds and the verbal intelligence scores of the speakers. First, a one-way analysis of variance was performed to compare the monologs with the film transcription and to demonstrate that there are differences in the vocabulary used by the test persons yielding different verbal intelligence levels. Then, for the classification task, the monologs were represented as feature vectors using the classical TF–IDF weighting scheme. The Naive Bayes, k-nearest neighbors and Rocchio classifiers were tested. In this paper we describe and compare these classification approaches, define the optimal classification parameters and discuss the classification results obtained.
Resumo:
La importancia de los sistemas de recomendación ha experimentado un crecimiento exponencial como consecuencia del auge de las redes sociales. En esta tesis doctoral presentaré una amplia visión sobre el estado del arte de los sistemas de recomendación. Incialmente, estos estaba basados en fitrado demográfico, basado en contendio o colaborativo. En la actualidad, estos sistemas incorporan alguna información social al proceso de recomendación. En el futuro utilizarán información implicita, local y personal proveniente del Internet de las cosas. Los sistemas de recomendación basados en filtrado colaborativo se pueden modificar con el fin de realizar recomendaciones a grupos de usuarios. Existen trabajos previos que han incluido estas modificaciones en diferentes etapas del algoritmo de filtrado colaborativo: búsqueda de los vecinos, predicción de las votaciones y elección de las recomendaciones. En esta tesis doctoral proporcionaré un nuevo método que realizar el proceso de unficación (pasar de varios usuarios a un grupo) en el primer paso del algoritmo de filtrado colaborativo: cálculo de la métrica de similaridad. Proporcionaré una formalización completa del método propuesto. Explicaré cómo obtener el conjunto de k vecinos del grupo de usuarios y mostraré cómo obtener recomendaciones usando dichos vecinos. Asimismo, incluiré un ejemplo detallando cada paso del método propuesto en un sistema de recomendación compuesto por 8 usuarios y 10 items. Las principales características del método propuesto son: (a) es más rápido (más eficiente) que las alternativas proporcionadas por otros autores, y (b) es al menos tan exacto y preciso como otras soluciones estudiadas. Para contrastar esta hipótesis realizaré varios experimentos que miden la precisión, la exactitud y el rendimiento del método. Los resultados obtenidos se compararán con los resultados de otras alternativas utilizadas en la recomendación de grupos. Los experimentos se realizarán con las bases de datos de MovieLens y Netflix. ABSTRACT The importance of recommender systems has grown exponentially with the advent of social networks. In this PhD thesis I will provide a wide vision about the state of the art of recommender systems. They were initially based on demographic, contentbased and collaborative filtering. Currently, these systems incorporate some social information to the recommendation process. In the future, they will use implicit, local and personal information from the Internet of Things. As we will see here, recommender systems based on collaborative filtering can be used to perform recommendations to group of users. Previous works have made this modification in different stages of the collaborative filtering algorithm: establishing the neighborhood, prediction phase and determination of recommended items. In this PhD thesis I will provide a new method that carry out the unification process (many users to one group) in the first stage of the collaborative filtering algorithm: similarity metric computation. I will provide a full formalization of the proposed method. I will explain how to obtain the k nearest neighbors of the group of users and I will show how to get recommendations using those users. I will also include a running example of a recommender system with 8 users and 10 items detailing all the steps of the method I will present. The main highlights of the proposed method are: (a) it will be faster (more efficient) that the alternatives provided by other authors, and (b) it will be at least as precise and accurate as other studied solutions. To check this hypothesis I will conduct several experiments measuring the accuracy, the precision and the performance of my method. I will compare these results with the results generated by other methods of group recommendation. The experiments will be carried out using MovieLens and Netflix datasets.
Resumo:
The method of case-based reasoning for a solution of problems of real-time diagnostics and forecasting in intelligent decision support systems (IDSS) is considered. Special attention is drawn to case library structure for real-time IDSS (RT IDSS) and algorithm of k-nearest neighbors type. This work was supported by RFBR.
Resumo:
It is well established that accent recognition can be as accurate as up to 95% when the signals are noise-free, using feature extraction techniques such as mel-frequency cepstral coefficients and binary classifiers such as discriminant analysis, support vector machine and k-nearest neighbors. In this paper, we demonstrate that the predictive performance can be reduced by as much as 15% when the signals are noisy. Specifically, in this paper we perturb the signals with different levels of white noise, and as the noise become stronger, the out-of-sample predictive performance deteriorates from 95% to 80%, although the in-sample prediction gives overly-optimistic results. ACM Computing Classification System (1998): C.3, C.5.1, H.1.2, H.2.4., G.3.
Resumo:
The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques.
Resumo:
Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.
Resumo:
Credible spatial information characterizing the structure and site quality of forests is critical to sustainable forest management and planning, especially given the increasing demands and threats to forest products and services. Forest managers and planners are required to evaluate forest conditions over a broad range of scales, contingent on operational or reporting requirements. Traditionally, forest inventory estimates are generated via a design-based approach that involves generalizing sample plot measurements to characterize an unknown population across a larger area of interest. However, field plot measurements are costly and as a consequence spatial coverage is limited. Remote sensing technologies have shown remarkable success in augmenting limited sample plot data to generate stand- and landscape-level spatial predictions of forest inventory attributes. Further enhancement of forest inventory approaches that couple field measurements with cutting edge remotely sensed and geospatial datasets are essential to sustainable forest management. We evaluated a novel Random Forest based k Nearest Neighbors (RF-kNN) imputation approach to couple remote sensing and geospatial data with field inventory collected by different sampling methods to generate forest inventory information across large spatial extents. The forest inventory data collected by the FIA program of US Forest Service was integrated with optical remote sensing and other geospatial datasets to produce biomass distribution maps for a part of the Lake States and species-specific site index maps for the entire Lake State. Targeting small-area application of the state-of-art remote sensing, LiDAR (light detection and ranging) data was integrated with the field data collected by an inexpensive method, called variable plot sampling, in the Ford Forest of Michigan Tech to derive standing volume map in a cost-effective way. The outputs of the RF-kNN imputation were compared with independent validation datasets and extant map products based on different sampling and modeling strategies. The RF-kNN modeling approach was found to be very effective, especially for large-area estimation, and produced results statistically equivalent to the field observations or the estimates derived from secondary data sources. The models are useful to resource managers for operational and strategic purposes.
Resumo:
Uncorrelated random scale-free networks are useful null models to check the accuracy and the analytical solutions of dynamical processes defined on complex networks. We propose and analyze a model capable of generating random uncorrelated scale-free networks with no multiple and self-connections. The model is based on the classical configuration model, with an additional restriction on the maximum possible degree of the vertices. We check numerically that the proposed model indeed generates scale-free networks with no two- and three-vertex correlations, as measured by the average degree of the nearest neighbors and the clustering coefficient of the vertices of degree k, respectively.
Resumo:
In this thesis, a classi cation problem in predicting credit worthiness of a customer is tackled. This is done by proposing a reliable classi cation procedure on a given data set. The aim of this thesis is to design a model that gives the best classi cation accuracy to e ectively predict bankruptcy. FRPCA techniques proposed by Yang and Wang have been preferred since they are tolerant to certain type of noise in the data. These include FRPCA1, FRPCA2 and FRPCA3 from which the best method is chosen. Two di erent approaches are used at the classi cation stage: Similarity classi er and FKNN classi er. Algorithms are tested with Australian credit card screening data set. Results obtained indicate a mean classi cation accuracy of 83.22% using FRPCA1 with similarity classi- er. The FKNN approach yields a mean classi cation accuracy of 85.93% when used with FRPCA2, making it a better method for the suitable choices of the number of nearest neighbors and fuzziness parameters. Details on the calibration of the fuzziness parameter and other parameters associated with the similarity classi er are discussed.
Resumo:
The growth and magnetic properties of Tin Selenide (SnSe) doped with Eu(2+) Sn(1-x)Eu(x)Se (x=2.5%) were investigated. Q-band (34 GHz) electron paramagnetic resonance measurements show that the site symmetry of Eu(2+) at 4.2 K is orthorhombic and the Lande factor was determined to be g=1.99 +/- 0.01. The exchange coupling between nearest-neighbor (NN) Eu(2+) ions was estimated from magnetization and magnetic-susceptibility measurements using a model that takes into account the magnetic contributions of single ions, pairs and triplets. The exchange interaction between Eu(2+) nearest neighbors was found to be antiferromagnetic with an estimated average value of J(p)/k(B) =-0.18 +/- 0.03 K. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
The electronic structure of Mg impurity in zincblende (c-)GaN is investigated by using the ab initio full potential linear-augmented plane-wave method and the local density-functional approximation. Full geometry optimization calculations, including nearest and next-nearest neighbor displacements, are performed for the impurity in the neutral and negatively charged states. A value of 190 ± 10 meV was obtained for the Franck-Condon shift to the thermal energy, which is in good agreement with that observed in recent low temperature photoluminescence and Hall-effect measurements. We conclude that the nearest and next-nearest neighbors of the Mg impurity replacing Ga in C-GaN undergo outward relaxations which play an important role in the determination of the center acceptor energies.
Resumo:
Thiosemicarbazones are cruzain inhibitors which have been identified as potential antitrypanosomal agents. In this work, several molecular properties were calculated at the density functional theory (DFT)/B3LYP/6-311G* level for a set of 44 thiosemicarbazones. Unsupervised and supervised pattern recognition techniques (hierarchical cluster analysis, principal component analysis, kth-nearest neighbors, and soft independent modeling by class analogy) were used to obtain structureactivity relationship models, which are able to classify unknown compounds according to their activities. The chemometric analyses performed here revealed that 12 descriptors can be considered responsible for the discrimination between high and low activity compounds. Classification models were validated with an external test set, showing that predictive classifications were achieved with the selected variable set. The results obtained here are in good agreement with previous findings from the literature, suggesting that our models can be useful on further investigations on the molecular determinants for the antichagasic activity. (C) 2012 Wiley Periodicals, Inc.
Resumo:
2000 Mathematics Subject Classification: 68T01, 62H30, 32C09.