964 resultados para Classification technique
Resumo:
Semi-supervised learning is a classification paradigm in which just a few labeled instances are available for the training process. To overcome this small amount of initial label information, the information provided by the unlabeled instances is also considered. In this paper, we propose a nature-inspired semi-supervised learning technique based on attraction forces. Instances are represented as points in a k-dimensional space, and the movement of data points is modeled as a dynamical system. As the system runs, data items with the same label cooperate with each other, and data items with different labels compete among them to attract unlabeled points by applying a specific force function. In this way, all unlabeled data items can be classified when the system reaches its stable state. Stability analysis for the proposed dynamical system is performed and some heuristics are proposed for parameter setting. Simulation results show that the proposed technique achieves good classification results on artificial data sets and is comparable to well-known semi-supervised techniques using benchmark data sets.
Resumo:
The purpose of this Thesis is to develop a robust and powerful method to classify galaxies from large surveys, in order to establish and confirm the connections between the principal observational parameters of the galaxies (spectral features, colours, morphological indices), and help unveil the evolution of these parameters from $z \sim 1$ to the local Universe. Within the framework of zCOSMOS-bright survey, and making use of its large database of objects ($\sim 10\,000$ galaxies in the redshift range $0 < z \lesssim 1.2$) and its great reliability in redshift and spectral properties determinations, first we adopt and extend the \emph{classification cube method}, as developed by Mignoli et al. (2009), to exploit the bimodal properties of galaxies (spectral, photometric and morphologic) separately, and then combining together these three subclassifications. We use this classification method as a test for a newly devised statistical classification, based on Principal Component Analysis and Unsupervised Fuzzy Partition clustering method (PCA+UFP), which is able to define the galaxy population exploiting their natural global bimodality, considering simultaneously up to 8 different properties. The PCA+UFP analysis is a very powerful and robust tool to probe the nature and the evolution of galaxies in a survey. It allows to define with less uncertainties the classification of galaxies, adding the flexibility to be adapted to different parameters: being a fuzzy classification it avoids the problems due to a hard classification, such as the classification cube presented in the first part of the article. The PCA+UFP method can be easily applied to different datasets: it does not rely on the nature of the data and for this reason it can be successfully employed with others observables (magnitudes, colours) or derived properties (masses, luminosities, SFRs, etc.). The agreement between the two classification cluster definitions is very high. ``Early'' and ``late'' type galaxies are well defined by the spectral, photometric and morphological properties, both considering them in a separate way and then combining the classifications (classification cube) and treating them as a whole (PCA+UFP cluster analysis). Differences arise in the definition of outliers: the classification cube is much more sensitive to single measurement errors or misclassifications in one property than the PCA+UFP cluster analysis, in which errors are ``averaged out'' during the process. This method allowed us to behold the \emph{downsizing} effect taking place in the PC spaces: the migration between the blue cloud towards the red clump happens at higher redshifts for galaxies of larger mass. The determination of $M_{\mathrm{cross}}$ the transition mass is in significant agreement with others values in literature.
Resumo:
The main objective of this study was todo a statistical analysis of ecological type from optical satellite data, using Tipping's sparse Bayesian algorithm. This thesis uses "the Relevence Vector Machine" algorithm in ecological classification betweenforestland and wetland. Further this bi-classification technique was used to do classification of many other different species of trees and produces hierarchical classification of entire subclasses given as a target class. Also, we carried out an attempt to use airborne image of same forest area. Combining it with image analysis, using different image processing operation, we tried to extract good features and later used them to perform classification of forestland and wetland.
Resumo:
Malware has become a major threat in the last years due to the ease of spread through the Internet. Malware detection has become difficult with the use of compression, polymorphic methods and techniques to detect and disable security software. Those and other obfuscation techniques pose a problem for detection and classification schemes that analyze malware behavior. In this paper we propose a distributed architecture to improve malware collection using different honeypot technologies to increase the variety of malware collected. We also present a daemon tool developed to grab malware distributed through spam and a pre-classification technique that uses antivirus technology to separate malware in generic classes. © 2009 SPIE.
Resumo:
Traditional supervised data classification considers only physical features (e. g., distance or similarity) of the input data. Here, this type of learning is called low level classification. On the other hand, the human (animal) brain performs both low and high orders of learning and it has facility in identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is, here, referred to as high level classification. In this paper, we propose a hybrid classification technique that combines both types of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features or class topologies, while the latter measures the compliance of the test instances to the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the pattern formation, but also is able to improve the performance of traditional classification techniques. Furthermore, as the class configuration's complexity increases, such as the mixture among different classes, a larger portion of the high level term is required to get correct classification. This feature confirms that the high level classification has a special importance in complex situations of classification. Finally, we show how the proposed technique can be employed in a real-world application, where it is capable of identifying variations and distortions of handwritten digit images. As a result, it supplies an improvement in the overall pattern recognition rate.
Resumo:
Power distribution automation and control are import-ant tools in the current restructured electricity markets. Unfortunately, due to its stochastic nature, distribution systems faults are hardly avoidable. This paper proposes a novel fault diagnosis scheme for power distribution systems, composed by three different processes: fault detection and classification, fault location, and fault section determination. The fault detection and classification technique is wavelet based. The fault-location technique is impedance based and uses local voltage and current fundamental phasors. The fault section determination method is artificial neural network based and uses the local current and voltage signals to estimate the faulted section. The proposed hybrid scheme was validated through Alternate Transient Program/Electromagentic Transients Program simulations and was implemented as embedded software. It is currently used as a fault diagnosis tool in a Southern Brazilian power distribution company.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
The objective of this work was to evaluate the use of multispectral remote sensing for site-specific nitrogen fertilizer management. Satellite imagery from the advanced spaceborne thermal emission and reflection radiometer (Aster) was acquired in a 23 ha corn-planted area in Iran. For the collection of field samples, a total of 53 pixels were selected by systematic randomized sampling. The total nitrogen content in corn leaf tissues in these pixels was evaluated. To predict corn canopy nitrogen content, different vegetation indices, such as normalized difference vegetation index (NDVI), soil-adjusted vegetation index (Savi), optimized soil-adjusted vegetation index (Osavi), modified chlorophyll absorption ratio index 2 (MCARI2), and modified triangle vegetation index 2 (MTVI2), were investigated. The supervised classification technique using the spectral angle mapper classifier (SAM) was performed to generate a nitrogen fertilization map. The MTVI2 presented the highest correlation (R²=0.87) and is a good predictor of corn canopy nitrogen content in the V13 stage, at 60 days after cultivating. Aster imagery can be used to predict nitrogen status in corn canopy. Classification results indicate three levels of required nitrogen per pixel: low (0-2.5 kg), medium (2.5-3 kg), and high (3-3.3 kg).
Resumo:
In our study we use a kernel based classification technique, Support Vector Machine Regression for predicting the Melting Point of Drug – like compounds in terms of Topological Descriptors, Topological Charge Indices, Connectivity Indices and 2D Auto Correlations. The Machine Learning model was designed, trained and tested using a dataset of 100 compounds and it was found that an SVMReg model with RBF Kernel could predict the Melting Point with a mean absolute error 15.5854 and Root Mean Squared Error 19.7576
Resumo:
The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi-Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard non-linear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM's over large data sets. The main idea behind the decomposition is the iterative solution of sub-problems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a second-order variant of the Reduced Gradient Method as the solver of the sub-problems. As an application of SVM's, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images.
Resumo:
Purpose: To evaluate the evolution of clinical and functional outcomes of symptomatic discoid lateral meniscus treated arthroscopically over time and to investigate the relationship between associated intra-articular findings and outcomes. Methods: Of all patients treated arthroscopically between 1995 and 2010, patients treated for symptomatic discoid meniscus were identified in the hospital charts. Baseline data (demographics, previous trauma of ipsilateral knee, and associated intra-articular findings) and medium term outcome data from clinical follow-up examinations (pain, locking, snapping and instability of the operated knee) were extracted from clinical records. Telephone interviews were conducted at long term in 28 patients (31 knees). Interviews comprised clinical outcomes as well as functional outcomes as assessed by the International Knee Documentation Committee Subjective Knee Evaluation Form (IKDC). Results: All patients underwent arthroscopic partial meniscectomy. The mean follow-up time for data extracted from clinical records was 11 months (SD ± 12). A significant improvement was found for pain in 77% (p<0.001), locking in 13%, (p=0.045) and snapping in 39 % (p<0.005). The mean follow-up time of the telephone interview was 60 months (SD ± 43). Improvement from baseline was generally less after five years than after one year and functional outcomes of the IKDC indicated an abnormal function after surgery (IKDC mean= 84.5, SD ± 20). In some patients, 5 year-outcomes were even worse than their preoperative condition. Nonetheless, 74% of patients perceived their knee function as improved. Furthermore, better results were seen in patients without any associated intra-articular findings. Conclusions: Arthroscopical partial meniscectomy is an effective intervention to relieve symptoms in patients with discoid meniscus in the medium-term; however, results trend to deteriorate over time. A trend towards better outcome for patients with no associated intra-articular findings was observed.
Resumo:
El presente documento es un estudio realizado para empresas que pertenecen al sector de auto partes de Colombia. Esta investigación tiene como fin conocer la situación actual de la empresa, el sector económico, su desarrollo y recomendaciones para su mejoramiento. Durante el desarrollo de esta investigación se utilizó el análisis estructural de sectores estratégicos, metodología que consiste en estudiar el sector a partir del análisis de hacinamiento, levantamiento del panorama competitivo, análisis de las fuerzas del mercado y el estudio de competidores. Además se aplicó el método de impactos cruzados (MIC MAC), presentado por Michael Godet (1993), que describe las relaciones presentes en un sistema a partir de la recolección de información por los actores involucrados en el estudio. A partir de estudio se generaron recomendaciones alrededor de las variables claves para el direccionamiento estratégico de la empresa estudiada, las cuales permitirán mejorar su desempeño en el sector.
Resumo:
A strong link exists between stratospheric variability and anomalous weather patterns at the earth’s surface. Specifically, during extreme variability of the Arctic polar vortex termed a “weak vortex event,” anomalies can descend from the upper stratosphere to the surface on time scales of weeks. Subsequently the outbreak of cold-air events have been noted in high northern latitudes, as well as a quadrupole pattern in surface temperature over the Atlantic and western European sectors, but it is currently not understood why certain events descend to the surface while others do not. This study compares a new classification technique of weak vortex events, based on the distribution of potential vorticity, with that of an existing technique and demonstrates that the subdivision of such events into vortex displacements and vortex splits has important implications for tropospheric weather patterns on weekly to monthly time scales. Using reanalysis data it is found that vortex splitting events are correlated with surface weather and lead to positive temperature anomalies over eastern North America of more than 1.5 K, and negative anomalies over Eurasia of up to −3 K. Associated with this is an increase in high-latitude blocking in both the Atlantic and Pacific sectors and a decrease in European blocking. The corresponding signals are weaker during displacement events, although ultimately they are shown to be related to cold-air outbreaks over North America. Because of the importance of stratosphere–troposphere coupling for seasonal climate predictability, identifying the type of stratospheric variability in order to capture the correct surface response will be necessary.
Resumo:
This paper presents a new framework for generating triangular meshes from textured color images. The proposed framework combines a texture classification technique, called W-operator, with Imesh, a method originally conceived to generate simplicial meshes from gray scale images. An extension of W-operators to handle textured color images is proposed, which employs a combination of RGB and HSV channels and Sequential Floating Forward Search guided by mean conditional entropy criterion to extract features from the training data. The W-operator is built into the local error estimation used by Imesh to choose the mesh vertices. Furthermore, the W-operator also enables to assign a label to the triangles during the mesh construction, thus allowing to obtain a segmented mesh at the end of the process. The presented results show that the combination of W-operators with Imesh gives rise to a texture classification-based triangle mesh generation framework that outperforms pixel based methods. Crown Copyright (C) 2009 Published by Elsevier Inc. All rights reserved.
Resumo:
This work has as objectives the implementation of a intelligent computational tool to identify the non-technical losses and to select its most relevant features, considering information from the database with industrial consumers profiles of a power company. The solution to this problem is not trivial and not of regional character, the minimization of non-technical loss represents the guarantee of investments in product quality and maintenance of power systems, introduced by a competitive environment after the period of privatization in the national scene. This work presents using the WEKA software to the proposed objective, comparing various classification techniques and optimization through intelligent algorithms, this way, can be possible to automate applications on Smart Grids. © 2012 IEEE.