958 resultados para Label noise
Resumo:
Convex potential minimisation is the de facto approach to binary classification. However, Long and Servedio [2008] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing. This ostensibly shows that convex losses are not SLN-robust. In this paper, we propose a convex, classification-calibrated loss and prove that it is SLN-robust. The loss avoids the Long and Servedio [2008] result by virtue of being negatively unbounded. The loss is a modification of the hinge loss, where one does not clamp at zero; hence, we call it the unhinged loss. We show that the optimal unhinged solution is equivalent to that of a strongly regularised SVM, and is the limiting solution for any convex potential; this implies that strong l2 regularisation makes most standard learners SLN-robust. Experiments confirm the unhinged loss’ SLN-robustness.
Resumo:
In many applications, the training data, from which one needs to learn a classifier, is corrupted with label noise. Many standard algorithms such as SVM perform poorly in the presence of label noise. In this paper we investigate the robustness of risk minimization to label noise. We prove a sufficient condition on a loss function for the risk minimization under that loss to be tolerant to uniform label noise. We show that the 0-1 loss, sigmoid loss, ramp loss and probit loss satisfy this condition though none of the standard convex loss functions satisfy it. We also prove that, by choosing a sufficiently large value of a parameter in the loss function, the sigmoid loss, ramp loss and probit loss can be made tolerant to nonuniform label noise also if we can assume the classes to be separable under noise-free data distribution. Through extensive empirical studies, we show that risk minimization under the 0-1 loss, the sigmoid loss and the ramp loss has much better robustness to label noise when compared to the SVM algorithm. (C) 2015 Elsevier B.V. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The social media classification problems draw more and more attention in the past few years. With the rapid development of Internet and the popularity of computers, there is astronomical amount of information in the social network (social media platforms). The datasets are generally large scale and are often corrupted by noise. The presence of noise in training set has strong impact on the performance of supervised learning (classification) techniques. A budget-driven One-class SVM approach is presented in this thesis that is suitable for large scale social media data classification. Our approach is based on an existing online One-class SVM learning algorithm, referred as STOCS (Self-Tuning One-Class SVM) algorithm. To justify our choice, we first analyze the noise-resilient ability of STOCS using synthetic data. The experiments suggest that STOCS is more robust against label noise than several other existing approaches. Next, to handle big data classification problem for social media data, we introduce several budget driven features, which allow the algorithm to be trained within limited time and under limited memory requirement. Besides, the resulting algorithm can be easily adapted to changes in dynamic data with minimal computational cost. Compared with two state-of-the-art approaches, Lib-Linear and kNN, our approach is shown to be competitive with lower requirements of memory and time.
Resumo:
Learning automata are adaptive decision making devices that are found useful in a variety of machine learning and pattern recognition applications. Although most learning automata methods deal with the case of finitely many actions for the automaton, there are also models of continuous-action-set learning automata (CALA). A team of such CALA can be useful in stochastic optimization problems where one has access only to noise-corrupted values of the objective function. In this paper, we present a novel formulation for noise-tolerant learning of linear classifiers using a CALA team. We consider the general case of nonuniform noise, where the probability that the class label of an example is wrong may be a function of the feature vector of the example. The objective is to learn the underlying separating hyperplane given only such noisy examples. We present an algorithm employing a team of CALA and prove, under some conditions on the class conditional densities, that the algorithm achieves noise-tolerant learning as long as the probability of wrong label for any example is less than 0.5. We also present some empirical results to illustrate the effectiveness of the algorithm.
Resumo:
In this paper, we explore noise-tolerant learning of classifiers. We formulate the problem as follows. We assume that there is an unobservable training set that is noise free. The actual training set given to the learning algorithm is obtained from this ideal data set by corrupting the class label of each example. The probability that the class label of an example is corrupted is a function of the feature vector of the example. This would account for most kinds of noisy data one encounters in practice. We say that a learning method is noise tolerant if the classifiers learnt with noise-free data and with noisy data, both have the same classification accuracy on the noise-free data. In this paper, we analyze the noise-tolerance properties of risk minimization (under different loss functions). We show that risk minimization under 0-1 loss function has impressive noise-tolerance properties and that under squared error loss is tolerant only to uniform noise; risk minimization under other loss functions is not noise tolerant. We conclude this paper with some discussion on the implications of these theoretical results.
Resumo:
This thesis investigates the design and implementation of a label-free optical biosensing system utilizing a robust on-chip integrated platform. The goal has been to transition optical micro-resonator based label-free biosensing from a laborious and delicate laboratory demonstration to a tool for the analytical life scientist. This has been pursued along four avenues: (1) the design and fabrication of high-$Q$ integrated planar microdisk optical resonators in silicon nitride on silica, (2) the demonstration of a high speed optoelectronic swept frequency laser source, (3) the development and integration of a microfluidic analyte delivery system, and (4) the introduction of a novel differential measurement technique for the reduction of environmental noise.
The optical part of this system combines the results of two major recent developments in the field of optical and laser physics: the high-$Q$ optical resonator and the phase-locked electronically controlled swept-frequency semiconductor laser. The laser operates at a wavelength relevant for aqueous sensing, and replaces expensive and fragile mechanically-tuned laser sources whose frequency sweeps have limited speed, accuracy and reliability. The high-$Q$ optical resonator is part of a monolithic unit with an integrated optical waveguide, and is fabricated using standard semiconductor lithography methods. Monolithic integration makes the system significantly more robust and flexible compared to current, fragile embodiments that rely on the precarious coupling of fragile optical fibers to resonators. The silicon nitride on silica material system allows for future manifestations at shorter wavelengths. The sensor also includes an integrated microfluidic flow cell for precise and low volume delivery of analytes to the resonator surface. We demonstrate the refractive index sensing action of the system as well as the specific and nonspecific adsorption of proteins onto the resonator surface with high sensitivity. Measurement challenges due to environmental noise that hamper system performance are discussed and a differential sensing measurement is proposed, implemented, and demonstrated resulting in the restoration of a high performance sensing measurement.
The instrument developed in this work represents an adaptable and cost-effective platform capable of various sensitive, label-free measurements relevant to the study of biophysics, biomolecular interactions, cell signaling, and a wide range of other life science fields. Further development is necessary for it to be capable of binding assays, or thermodynamic and kinetics measurements; however, this work has laid the foundation for the demonstration of these applications.
Resumo:
Being able to detect a single molecule without the use of labels has been a long standing goal of bioengineers and physicists. This would simplify applications ranging from single molecular binding studies to those involving public health and security, improved drug screening, medical diagnostics, and genome sequencing. One promising technique that has the potential to detect single molecules is the microtoroid optical resonator. The main obstacle to detecting single molecules, however, is decreasing the noise level of the measurements such that a single molecule can be distinguished from background. We have used laser frequency locking in combination with balanced detection and data processing techniques to reduce the noise level of these devices and report the detection of a wide range of nanoscale objects ranging from nanoparticles with radii from 100 to 2.5 nm, to exosomes, ribosomes, and single protein molecules (mouse immunoglobulin G and human interleukin-2). We further extend the exosome results towards creating a non-invasive tumor biopsy assay. Our results, covering several orders of magnitude of particle radius (100 nm to 2 nm), agree with the `reactive' model prediction for the frequency shift of the resonator upon particle binding. In addition, we demonstrate that molecular weight may be estimated from the frequency shift through a simple formula, thus providing a basis for an ``optical mass spectrometer'' in solution. We anticipate that our results will enable many applications, including more sensitive medical diagnostics and fundamental studies of single receptor-ligand and protein-protein interactions in real time. The thesis summarizes what we have achieved thus far and shows that the goal of detecting a single molecule without the use of labels can now be realized.
Resumo:
Unlabelled single- and double-stranded DNA (ssDNA and dsDNA, respectively) has been detected at concentrations =10-9?M by surface-enhanced Raman spectroscopy. Under appropriate conditions the sequences spontaneously adsorbed to the surface of both Ag and Au colloids through their nucleobases; this allowed highly reproducible spectra with good signal-to-noise ratios to be recorded on completely unmodified samples. This eliminated the need to promote absorption by introducing external linkers, such as thiols. The spectra of model ssDNA sequences contained bands of all the bases present and showed systematic changes when the overall base composition was altered. Initial tests also showed that small but reproducible changes could be detected between oligonucleotides with the same bases arranged in a different order. The spectra of five ssDNA sequences that correspond to different strains of the Escherichia coli bacterium were found to be sufficiently composition-dependent so that they could be differentiated without the need for any advanced multivariate data analysis techniques.
Resumo:
Low-frequency noise in an electrolyte-insulator- semiconductor (EIS) structure functionalized with multilayers of polyamidoamine (PAMAM) dendrimer and single-walled carbon nanotubes (SWNT) is studied. The noise spectral density exhibits 1/f(gamma) dependence with the power factor of gamma approximate to 0.8 and gamma = 0.8-1.8 for the bare and functionalized EIS sensor, respectively. The gate-voltage noise spectral density is practically independent of the pH value of the solution and increases with increasing gate voltage or gate-leakage current. It has been revealed that functionalization of an EIS structure with a PAMAM/SWNTs multilayer leads to an essential reduction of the 1/f noise. To interpret the noise behavior in bare and functionalized EIS devices, a gate-current noise model for capacitive EIS structures based on an equivalent flatband-voltage fluctuation concept has been developed.