447 resultados para classifiers
Resumo:
Using techniques from Statistical Physics, the annealed VC entropy for hyperplanes in high dimensional spaces is calculated as a function of the margin for a spherical Gaussian distribution of inputs.
Resumo:
We apply methods of Statistical Mechanics to study the generalization performance of Support vector Machines in large data spaces.
Resumo:
We address the important bioinformatics problem of predicting protein function from a protein's primary sequence. We consider the functional classification of G-Protein-Coupled Receptors (GPCRs), whose functions are specified in a class hierarchy. We tackle this task using a novel top-down hierarchical classification system where, for each node in the class hierarchy, the predictor attributes to be used in that node and the classifier to be applied to the selected attributes are chosen in a data-driven manner. Compared with a previous hierarchical classification system selecting classifiers only, our new system significantly reduced processing time without significantly sacrificing predictive accuracy.
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
Binary distributed representations of vector data (numerical, textual, visual) are investigated in classification tasks. A comparative analysis of results for various methods and tasks using artificial and real-world data is given.
Resumo:
This paper presents an analysis of different techniques that is designed to aid a researcher in determining which of the classification techniques would be most appropriate to choose the ridge, robust and linear regression methods for predicting outcomes for specific quasi-stationary process.
Resumo:
Social media has become an effective channel for communicating both trends and public opinion on current events. However the automatic topic classification of social media content pose various challenges. Topic classification is a common technique used for automatically capturing themes that emerge from social media streams. However, such techniques are sensitive to the evolution of topics when new event-dependent vocabularies start to emerge (e.g., Crimea becoming relevant to War Conflict during the Ukraine crisis in 2014). Therefore, traditional supervised classification methods which rely on labelled data could rapidly become outdated. In this paper we propose a novel transfer learning approach to address the classification task of new data when the only available labelled data belong to a previous epoch. This approach relies on the incorporation of knowledge from DBpedia graphs. Our findings show promising results in understanding how features age, and how semantic features can support the evolution of topic classifiers.
Resumo:
Acute respiratory infections caused by bacterial or viral pathogens are among the most common reasons for seeking medical care. Despite improvements in pathogen-based diagnostics, most patients receive inappropriate antibiotics. Host response biomarkers offer an alternative diagnostic approach to direct antimicrobial use. This observational cohort study determined whether host gene expression patterns discriminate noninfectious from infectious illness and bacterial from viral causes of acute respiratory infection in the acute care setting. Peripheral whole blood gene expression from 273 subjects with community-onset acute respiratory infection (ARI) or noninfectious illness, as well as 44 healthy controls, was measured using microarrays. Sparse logistic regression was used to develop classifiers for bacterial ARI (71 probes), viral ARI (33 probes), or a noninfectious cause of illness (26 probes). Overall accuracy was 87% (238 of 273 concordant with clinical adjudication), which was more accurate than procalcitonin (78%, P < 0.03) and three published classifiers of bacterial versus viral infection (78 to 83%). The classifiers developed here externally validated in five publicly available data sets (AUC, 0.90 to 0.99). A sixth publicly available data set included 25 patients with co-identification of bacterial and viral pathogens. Applying the ARI classifiers defined four distinct groups: a host response to bacterial ARI, viral ARI, coinfection, and neither a bacterial nor a viral response. These findings create an opportunity to develop and use host gene expression classifiers as diagnostic platforms to combat inappropriate antibiotic use and emerging antibiotic resistance.
Resumo:
Brain injury due to lack of oxygen or impaired blood flow around the time of birth, may cause long term neurological dysfunction or death in severe cases. The treatments need to be initiated as soon as possible and tailored according to the nature of the injury to achieve best outcomes. The Electroencephalogram (EEG) currently provides the best insight into neurological activities. However, its interpretation presents formidable challenge for the neurophsiologists. Moreover, such expertise is not widely available particularly around the clock in a typical busy Neonatal Intensive Care Unit (NICU). Therefore, an automated computerized system for detecting and grading the severity of brain injuries could be of great help for medical staff to diagnose and then initiate on-time treatments. In this study, automated systems for detection of neonatal seizures and grading the severity of Hypoxic-Ischemic Encephalopathy (HIE) using EEG and Heart Rate (HR) signals are presented. It is well known that there is a lot of contextual and temporal information present in the EEG and HR signals if examined at longer time scale. The systems developed in the past, exploited this information either at very early stage of the system without any intelligent block or at very later stage where presence of such information is much reduced. This work has particularly focused on the development of a system that can incorporate the contextual information at the middle (classifier) level. This is achieved by using dynamic classifiers that are able to process the sequences of feature vectors rather than only one feature vector at a time.
Resumo:
Requirement engineering is a key issue in the development of a software project. Like any other development activity it is not without risks. This work is about the empirical study of risks of requirements by applying machine learning techniques, specifically Bayesian networks classifiers. We have defined several models to predict the risk level for a given requirement using three dataset that collect metrics taken from the requirement specifications of different projects. The classification accuracy of the Bayesian models obtained is evaluated and compared using several classification performance measures. The results of the experiments show that the Bayesians networks allow obtaining valid predictors. Specifically, a tree augmented network structure shows a competitive experimental performance in all datasets. Besides, the relations established between the variables collected to determine the level of risk in a requirement, match with those set by requirement engineers. We show that Bayesian networks are valid tools for the automation of risks assessment in requirement engineering.
Resumo:
Pitch Estimation, also known as Fundamental Frequency (F0) estimation, has been a popular research topic for many years, and is still investigated nowadays. The goal of Pitch Estimation is to find the pitch or fundamental frequency of a digital recording of a speech or musical notes. It plays an important role, because it is the key to identify which notes are being played and at what time. Pitch Estimation of real instruments is a very hard task to address. Each instrument has its own physical characteristics, which reflects in different spectral characteristics. Furthermore, the recording conditions can vary from studio to studio and background noises must be considered. This dissertation presents a novel approach to the problem of Pitch Estimation, using Cartesian Genetic Programming (CGP).We take advantage of evolutionary algorithms, in particular CGP, to explore and evolve complex mathematical functions that act as classifiers. These classifiers are used to identify piano notes pitches in an audio signal. To help us with the codification of the problem, we built a highly flexible CGP Toolbox, generic enough to encode different kind of programs. The encoded evolutionary algorithm is the one known as 1 + , and we can choose the value for . The toolbox is very simple to use. Settings such as the mutation probability, number of runs and generations are configurable. The cartesian representation of CGP can take multiple forms and it is able to encode function parameters. It is prepared to handle with different type of fitness functions: minimization of f(x) and maximization of f(x) and has a useful system of callbacks. We trained 61 classifiers corresponding to 61 piano notes. A training set of audio signals was used for each of the classifiers: half were signals with the same pitch as the classifier (true positive signals) and the other half were signals with different pitches (true negative signals). F-measure was used for the fitness function. Signals with the same pitch of the classifier that were correctly identified by the classifier, count as a true positives. Signals with the same pitch of the classifier that were not correctly identified by the classifier, count as a false negatives. Signals with different pitch of the classifier that were not identified by the classifier, count as a true negatives. Signals with different pitch of the classifier that were identified by the classifier, count as a false positives. Our first approach was to evolve classifiers for identifying artifical signals, created by mathematical functions: sine, sawtooth and square waves. Our function set is basically composed by filtering operations on vectors and by arithmetic operations with constants and vectors. All the classifiers correctly identified true positive signals and did not identify true negative signals. We then moved to real audio recordings. For testing the classifiers, we picked different audio signals from the ones used during the training phase. For a first approach, the obtained results were very promising, but could be improved. We have made slight changes to our approach and the number of false positives reduced 33%, compared to the first approach. We then applied the evolved classifiers to polyphonic audio signals, and the results indicate that our approach is a good starting point for addressing the problem of Pitch Estimation.
Resumo:
This newsletter from the Soil Classifiers Advisory Council gives information about the Council including professional licenses for soil classifiers, office hours and state holidays, council members and staff and online services.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
This work proposes a new approach using a committee machine of artificial neural networks to classify masses found in mammograms as benign or malignant. Three shape factors, three edge-sharpness measures, and 14 texture measures are used for the classification of 20 regions of interest (ROIs) related to malignant tumors and 37 ROIs related to benign masses. A group of multilayer perceptrons (MLPs) is employed as a committee machine of neural network classifiers. The classification results are reached by combining the responses of the individual classifiers. Experiments involving changes in the learning algorithm of the committee machine are conducted. The classification accuracy is evaluated using the area A. under the receiver operating characteristics (ROC) curve. The A, result for the committee machine is compared with the A, results obtained using MLPs and single-layer perceptrons (SLPs), as well as a linear discriminant analysis (LDA) classifier Tests are carried out using the student's t-distribution. The committee machine classifier outperforms the MLP SLP, and LDA classifiers in the following cases: with the shape measure of spiculation index, the A, values of the four methods are, in order 0.93, 0.84, 0.75, and 0.76; and with the edge-sharpness measure of acutance, the values are 0.79, 0.70, 0.69, and 0.74. Although the features with which improvement is obtained with the committee machines are not the same as those that provided the maximal value of A(z) (A(z) = 0.99 with some shape features, with or without the committee machine), they correspond to features that are not critically dependent on the accuracy of the boundaries of the masses, which is an important result. (c) 2008 SPIE and IS&T.