7 resultados para Fuzzy K Nearest Neighbor

em Universidade Federal do Rio Grande do Norte(UFRN)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of the researches in artificial intelligence is to qualify the computer to execute functions that are performed by humans using knowledge and reasoning. This work was developed in the area of machine learning, that it s the study branch of artificial intelligence, being related to the project and development of algorithms and techniques capable to allow the computational learning. The objective of this work is analyzing a feature selection method for ensemble systems. The proposed method is inserted into the filter approach of feature selection method, it s using the variance and Spearman correlation to rank the feature and using the reward and punishment strategies to measure the feature importance for the identification of the classes. For each ensemble, several different configuration were used, which varied from hybrid (homogeneous) to non-hybrid (heterogeneous) structures of ensemble. They were submitted to five combining methods (voting, sum, sum weight, multiLayer Perceptron and naïve Bayes) which were applied in six distinct database (real and artificial). The classifiers applied during the experiments were k- nearest neighbor, multiLayer Perceptron, naïve Bayes and decision tree. Finally, the performance of ensemble was analyzed comparatively, using none feature selection method, using a filter approach (original) feature selection method and the proposed method. To do this comparison, a statistical test was applied, which demonstrate that there was a significant improvement in the precision of the ensembles

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The pair contact process - PCP is a nonequilibrium stochastic model which, like the basic contact process - CP, exhibits a phase transition to an absorbing state. While the absorbing state CP corresponds to a unique configuration (empty lattice), the PCP process infinitely many. Numerical and theoretical studies, nevertheless, indicate that the PCP belongs to the same universality class as the CP (direct percolation class), but with anomalies in the critical spreading dynamics. An infinite number of absorbing configurations arise in the PCP because all process (creation and annihilation) require a nearest-neighbor pair of particles. The diffusive pair contact process - PCPD) was proposed by Grassberger in 1982. But the interest in the problem follows its rediscovery by the Langevin description. On the basis of numerical results and renormalization group arguments, Carlon, Henkel and Schollwöck (2001), suggested that certain critical exponents in the PCPD had values similar to those of the party-conserving - PC class. On the other hand, Hinrichsen (2001), reported simulation results inconsistent with the PC class, and proposed that the PCPD belongs to a new universality class. The controversy regarding the universality of the PCPD remains unresolved. In the PCPD, a nearest-neighbor pair of particles is necessary for the process of creation and annihilation, but the particles to diffuse individually. In this work we study the PCPD with diffusion of pair, in which isolated particles cannot move; a nearest-neighbor pair diffuses as a unit. Using quasistationary simulation, we determined with good precision the critical point and critical exponents for three values of the diffusive probability: D=0.5 and D=0.1. For D=0.5: PC=0.89007(3), β/v=0.252(9), z=1.573(1), =1.10(2), m=1.1758(24). For D=0.1: PC=0.9172(1), β/v=0.252(9), z=1.579(11), =1.11(4), m=1.173(4)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digital Elevation Models (DEM) are numerical representations of a portion of the earth surface. Among several factors which affect the quality of a DEM, it should be emphasized the attention on the input data and the choice of the interpolating algorithm. On the other hand, several numerical models are used nowadays to characterize nearshore hydrodynamics and morphological changes in coastal areas, whose validation is based on field data collection. Independent on the complexity of the physical processes which are modeled, little attention has been given to the intrinsic bathymetric interpolation built within the numerical models of the specific application. Therefore, this study aims to investigate and to quantify the influence of the bathymetry, as obtained by a DEM, on the hydrodynamic circulation model at a coastal stretch, off the coast of the State of Rio Grande do Norte, Northeast Brazil. This coastal region is characterized by strong hydrodynamic and littoral processes, resulting in a very dynamic morphology with shallow coastal bathymetry. Important economic activities, such as oil exploitation and production, fisheries, salt ponds, shrimp farms and tourism, also bring impacts upon the local ecosystems and influence themselves the local hydrodynamics. This fact makes the region one of the most important for the development of the State, but also enhances the possibility of serious environmental accidents. As a hydrodynamic model, SisBaHiA® - Environmental Hydrodynamics System ( Sistema Básico de Hidrodinâmica Ambiental ) was chosen, for it has been successfully employed at several locations along the Brazilian coast. This model was developed at the Coastal and Oceanographical Engineering Group of the Ocean Engineering Program at the Federal University of Rio de Janeiro. Several interpolating methods were tested for the construction of the DEM, namely Natural Neighbor, Kriging, Triangulation with Linear Interpolation, Inverse Distance to a Power, Nearest Neighbor, and Minimum Curvature, all implemented within the software Surfer®. The bathymetry which was used as reference for the DEM was obtained from nautical charts provided by the Brazilian Hydrographic Service of the Brazilian Navy and from a field survey conducted in 2005. Changes in flow velocity and free surface elevation were evaluated under three aspects: a spatial vision along three profiles perpendicular to the coast and one profile longitudinal to the coast as shown; a temporal vision from three central nodes of the grid during 30 days; a hodograph analysis of components of speed in U and V, by different tidal cycles. Small, but negligible, variations in sea surface elevation were identified. However, the differences in flow and direction of velocities were significant, depending on the DEM

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective to establish a methodology for the oil spill monitoring on the sea surface, located at the Submerged Exploration Area of the Polo Region of Guamaré, in the State of Rio Grande do Norte, using orbital images of Synthetic Aperture Radar (SAR integrated with meteoceanographycs products. This methodology was applied in the following stages: (1) the creation of a base map of the Exploration Area; (2) the processing of NOAA/AVHRR and ERS-2 images for generation of meteoceanographycs products; (3) the processing of RADARSAT-1 images for monitoring of oil spills; (4) the integration of RADARSAT-1 images with NOAA/AVHRR and ERS-2 image products; and (5) the structuring of a data base. The Integration of RADARSAT-1 image of the Potiguar Basin of day 21.05.99 with the base map of the Exploration Area of the Polo Region of Guamaré for the identification of the probable sources of the oil spots, was used successfully in the detention of the probable spot of oil detected next to the exit to the submarine emissary in the Exploration Area of the Polo Region of Guamaré. To support the integration of RADARSAT-1 images with NOAA/AVHRR and ERS-2 image products, a methodology was developed for the classification of oil spills identified by RADARSAT-1 images. For this, the following algorithms of classification not supervised were tested: K-means, Fuzzy k-means and Isodata. These algorithms are part of the PCI Geomatics software, which was used for the filtering of RADARSAT-1 images. For validation of the results, the oil spills submitted to the unsupervised classification were compared to the results of the Semivariogram Textural Classifier (STC). The mentioned classifier was developed especially for oil spill classification purposes and requires PCI software for the whole processing of RADARSAT-1 images. After all, the results of the classifications were analyzed through Visual Analysis; Calculation of Proportionality of Largeness and Analysis Statistics. Amongst the three algorithms of classifications tested, it was noted that there were no significant alterations in relation to the spills classified with the STC, in all of the analyses taken into consideration. Therefore, considering all the procedures, it has been shown that the described methodology can be successfully applied using the unsupervised classifiers tested, resulting in a decrease of time in the identification and classification processing of oil spills, if compared with the utilization of the STC classifier

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The pair contact process - PCP is a nonequilibrium stochastic model which, like the basic contact process - CP, exhibits a phase transition to an absorbing state. While the absorbing state CP corresponds to a unique configuration (empty lattice), the PCP process infinitely many. Numerical and theoretical studies, nevertheless, indicate that the PCP belongs to the same universality class as the CP (direct percolation class), but with anomalies in the critical spreading dynamics. An infinite number of absorbing configurations arise in the PCP because all process (creation and annihilation) require a nearest-neighbor pair of particles. The diffusive pair contact process - PCPD) was proposed by Grassberger in 1982. But the interest in the problem follows its rediscovery by the Langevin description. On the basis of numerical results and renormalization group arguments, Carlon, Henkel and Schollwöck (2001), suggested that certain critical exponents in the PCPD had values similar to those of the party-conserving - PC class. On the other hand, Hinrichsen (2001), reported simulation results inconsistent with the PC class, and proposed that the PCPD belongs to a new universality class. The controversy regarding the universality of the PCPD remains unresolved. In the PCPD, a nearest-neighbor pair of particles is necessary for the process of creation and annihilation, but the particles to diffuse individually. In this work we study the PCPD with diffusion of pair, in which isolated particles cannot move; a nearest-neighbor pair diffuses as a unit. Using quasistationary simulation, we determined with good precision the critical point and critical exponents for three values of the diffusive probability: D=0.5 and D=0.1. For D=0.5: PC=0.89007(3), β/v=0.252(9), z=1.573(1), =1.10(2), m=1.1758(24). For D=0.1: PC=0.9172(1), β/v=0.252(9), z=1.579(11), =1.11(4), m=1.173(4)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents