918 resultados para Supervised classifiers
Resumo:
Learning automata are adaptive decision making devices that are found useful in a variety of machine learning and pattern recognition applications. Although most learning automata methods deal with the case of finitely many actions for the automaton, there are also models of continuous-action-set learning automata (CALA). A team of such CALA can be useful in stochastic optimization problems where one has access only to noise-corrupted values of the objective function. In this paper, we present a novel formulation for noise-tolerant learning of linear classifiers using a CALA team. We consider the general case of nonuniform noise, where the probability that the class label of an example is wrong may be a function of the feature vector of the example. The objective is to learn the underlying separating hyperplane given only such noisy examples. We present an algorithm employing a team of CALA and prove, under some conditions on the class conditional densities, that the algorithm achieves noise-tolerant learning as long as the probability of wrong label for any example is less than 0.5. We also present some empirical results to illustrate the effectiveness of the algorithm.
Resumo:
The problem of unsupervised anomaly detection arises in a wide variety of practical applications. While one-class support vector machines have demonstrated their effectiveness as an anomaly detection technique, their ability to model large datasets is limited due to their memory and time complexity for training. To address this issue for supervised learning of kernel machines, there has been growing interest in random projection methods as an alternative to the computationally expensive problems of kernel matrix construction and sup-port vector optimisation. In this paper we leverage the theory of nonlinear random projections and propose the Randomised One-class SVM (R1SVM), which is an efficient and scalable anomaly detection technique that can be trained on large-scale datasets. Our empirical analysis on several real-life and synthetic datasets shows that our randomised 1SVM algorithm achieves comparable or better accuracy to deep auto encoder and traditional kernelised approaches for anomaly detection, while being approximately 100 times faster in training and testing.
Resumo:
State-of-the-art image-set matching techniques typically implicitly model each image-set with a Gaussian distribution. Here, we propose to go beyond these representations and model image-sets as probability distribution functions (PDFs) using kernel density estimators. To compare and match image-sets, we exploit Csiszar´ f-divergences, which bear strong connections to the geodesic distance defined on the space of PDFs, i.e., the statistical manifold. Furthermore, we introduce valid positive definite kernels on the statistical manifold, which let us make use of more powerful classification schemes to match image-sets. Finally, we introduce a supervised dimensionality reduction technique that learns a latent space where f-divergences reflect the class labels of the data. Our experiments on diverse problems, such as video-based face recognition and dynamic texture classification, evidence the benefits of our approach over the state-of-the-art image-set matching methods.
Resumo:
BACKGROUND Polygenic risk scores comprising established susceptibility variants have shown to be informative classifiers for several complex diseases including prostate cancer. For prostate cancer it is unknown if inclusion of genetic markers that have so far not been associated with prostate cancer risk at a genome-wide significant level will improve disease prediction. METHODS We built polygenic risk scores in a large training set comprising over 25,000 individuals. Initially 65 established prostate cancer susceptibility variants were selected. After LD pruning additional variants were prioritized based on their association with prostate cancer. Six-fold cross validation was performed to assess genetic risk scores and optimize the number of additional variants to be included. The final model was evaluated in an independent study population including 1,370 cases and 1,239 controls. RESULTS The polygenic risk score with 65 established susceptibility variants provided an area under the curve (AUC) of 0.67. Adding an additional 68 novel variants significantly increased the AUC to 0.68 (P = 0.0012) and the net reclassification index with 0.21 (P = 8.5E-08). All novel variants were located in genomic regions established as associated with prostate cancer risk. CONCLUSIONS Inclusion of additional genetic variants from established prostate cancer susceptibility regions improves disease prediction. Prostate 75:1467–1474, 2015. © 2015 Wiley Periodicals, Inc.
Resumo:
We report a hierarchical blind script identifier for 11 different Indian scripts. An initial grouping of the 11 scripts is accomplished at the first level of this hierarchy. At the subsequent level, we recognize the script in each group. The various nodes of this tree use different feature-classifier combinations. A database of 20,000 words of different font styles and sizes is collected and used for each script. Effectiveness of Gabor and Discrete Cosine Transform features has been independently, evaluated using nearest neighbor linear discriminant and support vector machine classifiers. The minimum and maximum accuracies obtained, using this hierarchical mechanism, are 92.2% and 97.6%, respectively.
Resumo:
Carbon fiber reinforced polymer (CFRP) composite specimens with different thickness, geometry, and stacking sequences were subjected to fatigue spectrum loading in stages. Another set of specimens was subjected to static compression load. On-line acoustic Emission (AE) monitoring was carried out during these tests. Two artificial neural networks, Kohonen-self organizing feature map (KSOM), and multi-layer perceptron (MLP) have been developed for AE signal analysis. AE signals from specimens were clustered using the unsupervised learning KSOM. These clusters were correlated to the failure modes using available a priori information such as AE signal amplitude distributions, time of occurrence of signals, ultrasonic imaging, design of the laminates (stacking sequences, orientation of fibers), and AE parametric plots. Thereafter, AE signals generated from the rest of the specimens were classified by supervised learning MLP. The network developed is made suitable for on-line monitoring of AE signals in the presence of noise, which can be used for detection and identification of failure modes and their growth. The results indicate that the characteristics of AE signals from different failure modes in CFRP remain largely unaffected by the type of load, fiber orientation, and stacking sequences, they being representatives of the type of failure phenomena. The type of loading can have effect only on the extent of damage allowed before the specimens fail and hence on the number of AE signals during the test. The artificial neural networks (ANN) developed and the methods and procedures adopted show significant success in AE signal characterization under noisy environment (detection and identification of failure modes and their growth).
Resumo:
Support Vector Machines(SVMs) are hyperplane classifiers defined in a kernel induced feature space. The data size dependent training time complexity of SVMs usually prohibits its use in applications involving more than a few thousands of data points. In this paper we propose a novel kernel based incremental data clustering approach and its use for scaling Non-linear Support Vector Machines to handle large data sets. The clustering method introduced can find cluster abstractions of the training data in a kernel induced feature space. These cluster abstractions are then used for selective sampling based training of Support Vector Machines to reduce the training time without compromising the generalization performance. Experiments done with real world datasets show that this approach gives good generalization performance at reasonable computational expense.
Resumo:
This paper focuses on optimisation algorithms inspired by swarm intelligence for satellite image classification from high resolution satellite multi- spectral images. Amongst the multiple benefits and uses of remote sensing, one of the most important has been its use in solving the problem of land cover mapping. As the frontiers of space technology advance, the knowledge derived from the satellite data has also grown in sophistication. Image classification forms the core of the solution to the land cover mapping problem. No single classifier can prove to satisfactorily classify all the basic land cover classes of an urban region. In both supervised and unsupervised classification methods, the evolutionary algorithms are not exploited to their full potential. This work tackles the land map covering by Ant Colony Optimisation (ACO) and Particle Swarm Optimisation (PSO) which are arguably the most popular algorithms in this category. We present the results of classification techniques using swarm intelligence for the problem of land cover mapping for an urban region. The high resolution Quick-bird data has been used for the experiments.
Resumo:
In recent reports, adolescents and young adults (AYA) with acute lymphoblastic leukemia (ALL) have had a better outcome with pediatric treatment than with adult protocols. ALL can be classified into biologic subgroups according to immunophenotype and cytogenetics, with different clinical characteristics and outcome. The proportions of the subgroups are different in children and adults. ALL subtypes in AYA patients are less well characterized. In this study, the treatment and outcome of ALL in AYA patients aged 10-25 years in Finland on pediatric and adult protocols was retrospectively analyzed. In total, 245 patients were included. The proportions of biologic subgroups in different age groups were determined. Patients with initially normal or failed karyotype were examined with oligonucleotide microarray-based comparative genomic hybridization (aCGH). Also deletions and instability of chromosome 9p were screened in ALL patients. In addition, patients with other hematologic malignancies were screened for 9p instability. aCGH data were also used to determine a gene set that classifies AYA patients at diagnosis according to their risk of relapse. Receiver operating characteristic analysis was used to assess the value of the set of genes as prognostic classifiers. The 5-year event-free survival of AYA patients treated with pediatric or adult protocols was 67% and 60% (p=0.30), respectively. White blood cell count larger than 100x109/l was associated with poor prognosis. Patients treated with pediatric protocols and assigned to an intermediate-risk group fared significantly better than those of the pediatric high-risk or adult treatment groups. Deletions of 9p were detected in 46% of AYA ALL patients. The chromosomal region 9p21.3 was always affected, and the CDKN2A gene was always deleted. In about 15% of AYA patients, the 9p21.3 deletion was smaller than 200 kb in size, and therefore, probably undetectable with conventional methods. Deletion of 9p was the most common aberration of AYA ALL patients with initially normal karyotype. Instability of 9p, defined as multiple separate areas of copy number loss or homozygous loss within a larger heterozygous area in 9p, was detected in 19% (n=27) of ALL patients. This abnormality was restricted to ALL; none of the patients with other hematologic malignancies had the aberration. The prognostic model identification procedure resulted in a model of four genes: BAK1, CDKN2B, GSTM1, and MT1F. The copy number profile combinations of these genes differentiated between AYA ALL patients at diagnosis depending on their risk of relapse. Deletions of CDKN2B and BAK1 in combination with amplification of GSTM1 and MT1F were associated with a higher probability of relapse. Unlike all previous studies, we found that the outcome of AYA patients with ALL treated using pediatric or adult therapeutic protocols was comparable. The success of adult ALL therapy emphasizes the benefit of referral of patients to academic centers and adherence to research protocols. 9p deletions and instability are common features of ALL and may act together with oncogene-activating translocations in leukemogenesis. New and more sensitive methods of molecular cytogenetics can reveal previously cryptic genetic aberrations with an important role in leukemic development and prognosis and that may be potential targets of therapy. aCGH also provides a viable approach for model design aiming at evaluation of risk of relapse in ALL.
Resumo:
Agricultural pests are responsible for millions of dollars in crop losses and management costs every year. In order to implement optimal site-specific treatments and reduce control costs, new methods to accurately monitor and assess pest damage need to be investigated. In this paper we explore the combination of unmanned aerial vehicles (UAV), remote sensing and machine learning techniques as a promising technology to address this challenge. The deployment of UAVs as a sensor platform is a rapidly growing field of study for biosecurity and precision agriculture applications. In this experiment, a data collection campaign is performed over a sorghum crop severely damaged by white grubs (Coleoptera: Scarabaeidae). The larvae of these scarab beetles feed on the roots of plants, which in turn impairs root exploration of the soil profile. In the field, crop health status could be classified according to three levels: bare soil where plants were decimated, transition zones of reduced plant density and healthy canopy areas. In this study, we describe the UAV platform deployed to collect high-resolution RGB imagery as well as the image processing pipeline implemented to create an orthoimage. An unsupervised machine learning approach is formulated in order to create a meaningful partition of the image into each of the crop levels. The aim of the approach is to simplify the image analysis step by minimizing user input requirements and avoiding the manual data labeling necessary in supervised learning approaches. The implemented algorithm is based on the K-means clustering algorithm. In order to control high-frequency components present in the feature space, a neighbourhood-oriented parameter is introduced by applying Gaussian convolution kernels prior to K-means. The outcome of this approach is a soft K-means algorithm similar to the EM algorithm for Gaussian mixture models. The results show the algorithm delivers decision boundaries that consistently classify the field into three clusters, one for each crop health level. The methodology presented in this paper represents a venue for further research towards automated crop damage assessments and biosecurity surveillance.
Resumo:
In this paper we focus on the challenging problem of place categorization and semantic mapping on a robot with-out environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can learn to recognize new semantic classes online. Prior domain knowledge is incorporated by embedding the classification system into a Bayesian filter framework that also ensures temporal coherence. We evaluate the classification accuracy of the system on a robot that maps a variety of places on our campus in real-time. We show how semantic information can boost robotic object detection performance and how the semantic map can be used to modulate the robot’s behaviour during navigation tasks. The system is made available to the community as a ROS module.
Resumo:
This paper investigates how students’ learning experience can be enhanced by participating in the Industry-Based Learning (IBL) program. In this program, the university students coming into the industry to experience how the business is run. The students’ learning media is now not coming from the textbooks or the lecturers but from learning by doing. This new learning experience could be very interesting for students but at the same time could also be challenging. The research involves interviewing a number of students from the IBL programs, the academic staff from the participated university who has experience in supervising students and the employees of the industry who supported and supervised the students in their work placements. The research findings offer useful insights and create new knowledge in the field of education and learning. The research contributes to the existing knowledge by providing a new understanding of the topic as it applied to the Indonesian context.
Resumo:
This paper gives a new iterative algorithm for kernel logistic regression. It is based on the solution of a dual problem using ideas similar to those of the Sequential Minimal Optimization algorithm for Support Vector Machines. Asymptotic convergence of the algorithm is proved. Computational experiments show that the algorithm is robust and fast. The algorithmic ideas can also be used to give a fast dual algorithm for solving the optimization problem arising in the inner loop of Gaussian Process classifiers.
Resumo:
This paper aims at evaluating the methods of multiclass support vector machines (SVMs) for effective use in distance relay coordination. Also, it describes a strategy of supportive systems to aid the conventional protection philosophy in combating situations where protection systems have maloperated and/or information is missing and provide selective and secure coordinations. SVMs have considerable potential as zone classifiers of distance relay coordination. This typically requires a multiclass SVM classifier to effectively analyze/build the underlying concept between reach of different zones and the apparent impedance trajectory during fault. Several methods have been proposed for multiclass classification where typically several binary SVM classifiers are combined together. Some authors have extended binary SVM classification to one-step single optimization operation considering all classes at once. In this paper, one-step multiclass classification, one-against-all, and one-against-one multiclass methods are compared for their performance with respect to accuracy, number of iterations, number of support vectors, training, and testing time. The performance analysis of these three methods is presented on three data sets belonging to training and testing patterns of three supportive systems for a region and part of a network, which is an equivalent 526-bus system of the practical Indian Western grid.
Resumo:
Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-nomalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP's with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, Nearest neighbor, Linear discriminant function, SVM's and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96.