918 resultados para Supervised classifiers
Resumo:
We study the problem of supervised linear dimensionality reduction, taking an information-theoretic viewpoint. The linear projection matrix is designed by maximizing the mutual information between the projected signal and the class label. By harnessing a recent theoretical result on the gradient of mutual information, the above optimization problem can be solved directly using gradient descent, without requiring simplification of the objective function. Theoretical analysis and empirical comparison are made between the proposed method and two closely related methods, and comparisons are also made with a method in which Rényi entropy is used to define the mutual information (in this case the gradient may be computed simply, under a special parameter setting). Relative to these alternative approaches, the proposed method achieves promising results on real datasets. Copyright 2012 by the author(s)/owner(s).
Resumo:
During mitotic cell cycles, DNA experiences many types of endogenous and exogenous damaging agents that could potentially cause double strand breaks (DSB). In S. cerevisiae, DSBs are primarily repaired by mitotic recombination and as a result, could lead to loss-of-heterozygosity (LOH). Genetic recombination can happen in both meiosis and mitosis. While genome-wide distribution of meiotic recombination events has been intensively studied, mitotic recombination events have not been mapped unbiasedly throughout the genome until recently. Methods for selecting mitotic crossovers and mapping the positions of crossovers have recently been developed in our lab. Our current approach uses a diploid yeast strain that is heterozygous for about 55,000 SNPs, and employs SNP-Microarrays to map LOH events throughout the genome. These methods allow us to examine selected crossovers and unselected mitotic recombination events (crossover, noncrossover and BIR) at about 1 kb resolution across the genome. Using this method, we generated maps of spontaneous and UV-induced LOH events. In this study, we explore machine learning and variable selection techniques to build a predictive model for where the LOH events occur in the genome.
Randomly from the yeast genome, we simulated control tracts resembling the LOH tracts in terms of tract lengths and locations with respect to single-nucleotide-polymorphism positions. We then extracted roughly 1,100 features such as base compositions, histone modifications, presence of tandem repeats etc. and train classifiers to distinguish control tracts and LOH tracts. We found interesting features of good predictive values. We also found that with the current repertoire of features, the prediction is generally better for spontaneous LOH events than UV-induced LOH events.
Resumo:
Automatic taxonomic categorisation of 23 species of dinoflagellates was demonstrated using field-collected specimens. These dinoflagellates have been responsible for the majority of toxic and noxious phytoplankton blooms which have occurred in the coastal waters of the European Union in recent years and make severe impact on the aquaculture industry. The performance by human 'expert' ecologists/taxonomists in identifying these species was compared to that achieved by 2 artificial neural network classifiers (multilayer perceptron and radial basis function networks) and 2 other statistical techniques, k-Nearest Neighbour and Quadratic Discriminant Analysis. The neural network classifiers outperform the classical statistical techniques. Over extended trials, the human experts averaged 85% while the radial basis network achieved a best performance of 83%, the multilayer perceptron 66%, k-Nearest Neighbour 60%, and the Quadratic Discriminant Analysis 56%.
Resumo:
Noise is one of the main factors degrading the quality of original multichannel remote sensing data and its presence influences classification efficiency, object detection, etc. Thus, pre-filtering is often used to remove noise and improve the solving of final tasks of multichannel remote sensing. Recent studies indicate that a classical model of additive noise is not adequate enough for images formed by modern multichannel sensors operating in visible and infrared bands. However, this fact is often ignored by researchers designing noise removal methods and algorithms. Because of this, we focus on the classification of multichannel remote sensing images in the case of signal-dependent noise present in component images. Three approaches to filtering of multichannel images for the considered noise model are analysed, all based on discrete cosine transform in blocks. The study is carried out not only in terms of conventional efficiency metrics used in filtering (MSE) but also in terms of multichannel data classification accuracy (probability of correct classification, confusion matrix). The proposed classification system combines the pre-processing stage where a DCT-based filter processes the blocks of the multichannel remote sensing image and the classification stage. Two modern classifiers are employed, radial basis function neural network and support vector machines. Simulations are carried out for three-channel image of Landsat TM sensor. Different cases of learning are considered: using noise-free samples of the test multichannel image, the noisy multichannel image and the pre-filtered one. It is shown that the use of the pre-filtered image for training produces better classification in comparison to the case of learning for the noisy image. It is demonstrated that the best results for both groups of quantitative criteria are provided if a proposed 3D discrete cosine transform filter equipped by variance stabilizing transform is applied. The classification results obtained for data pre-filtered in different ways are in agreement for both considered classifiers. Comparison of classifier performance is carried out as well. The radial basis neural network classifier is less sensitive to noise in original images, but after pre-filtering the performance of both classifiers is approximately the same.
Resumo:
Un-supervised hyperspectral remote-sensing reflectance data (<15 km from the shore) were collected from a moving research vessel. Two different processing methods were compared. The results were similar to concurrent Aqua-MODIS and Suomi-NPP-VIIRS satellite data.
Resumo:
The effect of different factors (spawning biomass, environmental conditions) on recruitment is a subject of great importance in the management of fisheries, recovery plans and scenario exploration. In this study, recently proposed supervised classification techniques, tested by the machine-learning community, are applied to forecast the recruitment of seven fish species of North East Atlantic (anchovy, sardine, mackerel, horse mackerel, hake, blue whiting and albacore), using spawning, environmental and climatic data. In addition, the use of the probabilistic flexible naive Bayes classifier (FNBC) is proposed as modelling approach in order to reduce uncertainty for fisheries management purposes. Those improvements aim is to improve probability estimations of each possible outcome (low, medium and high recruitment) based in kernel density estimation, which is crucial for informed management decision making with high uncertainty. Finally, a comparison between goodness-of-fit and generalization power is provided, in order to assess the reliability of the final forecasting models. It is found that in most cases the proposed methodology provides useful information for management whereas the case of horse mackerel is an example of the limitations of the approach. The proposed improvements allow for a better probabilistic estimation of the different scenarios, i.e. to reduce the uncertainty in the provided forecasts.
Resumo:
Objectives: To determine whether diagnostic triage by general practitioners (GPs) or rheumatology nurses (RNs) can improve the positive predictive value of referrals to early arthritis clinics (EACs).
Methods: Four GPs and two RNs were trained in the assessment of early in?ammatory arthritis (IA) by four visits to an EAC supervised by hospital rheumatologists. Patients referred to one of three EACs were recruited for study and assessed independently by a GP, an RN and one of six rheumatologists. Each assessor was asked to record their clinical ?ndings and whether they considered the patient to have IA. Each was then asked to judge the appropriateness of the referral according to predetermined guidelines. The rheumatologists had been shown previously to have a satisfactory level of agreement in the assessment of IA.
Results: Ninety-six patients were approached and all consented to take part in the study. In 49 cases (51%), the rheumatologist judged that the patient had IA and that the referral was appropriate. The assessments of GPs and RNs were compared with those of the rheumatologists. Levels of agreement were measured using the kappa value, where 1.0 represents total unanimity. The kappa value was
0.77 for the GPs when compared with the rheumatologists and 0.79 for the RNs. Signi?cant stiffness in the morning or after rest and objective joint swelling were the most important clinical features enabling the GPs and RNs to discriminate between IA and non-IA conditions.
Conclusion: Diagnostic triage by GPs or RNs improved the positive predictive value of referrals to an EAC with a degree of accuracy approaching that of a group of experienced rheumatologists.
Resumo:
The grading of crushed aggregate is carried out usually by sieving. We describe a new image-based approach to the automatic grading of such materials. The operational problem addressed is where the camera is located directly over a conveyor belt. Our approach characterizes the information content of each image, taking into account relative variation in the pixel data, and resolution scale. In feature space, we find very good class separation using a multidimensional linear classifier. The innovation in this work includes (i) introducing an effective image-based approach into this application area, and (ii) our supervised classification using wavelet entropy-based features.
Resumo:
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.
Resumo:
In this paper we follow on from our research into SLPI by assessing the immunomodulatory activity of elafin - an antiprotease related to SLPI and also present on the respiratory tract. We demonstrate for the first time that exogenously applied elafin inhibits lipopolysaccharide-induced activation of the NF-kappaB and AP-1 pathways in monocytes. I designed this project and supervised Marcus Butler during his MD thesis.
Resumo:
This paper investigates the learning of a wide class of single-hidden-layer feedforward neural networks (SLFNs) with two sets of adjustable parameters, i.e., the nonlinear parameters in the hidden nodes and the linear output weights. The main objective is to both speed up the convergence of second-order learning algorithms such as Levenberg-Marquardt (LM), as well as to improve the network performance. This is achieved here by reducing the dimension of the solution space and by introducing a new Jacobian matrix. Unlike conventional supervised learning methods which optimize these two sets of parameters simultaneously, the linear output weights are first converted into dependent parameters, thereby removing the need for their explicit computation. Consequently, the neural network (NN) learning is performed over a solution space of reduced dimension. A new Jacobian matrix is then proposed for use with the popular second-order learning methods in order to achieve a more accurate approximation of the cost function. The efficacy of the proposed method is shown through an analysis of the computational complexity and by presenting simulation results from four different examples.
Resumo:
Support vector machine (SVM) is a powerful technique for data classification. Despite of its good theoretic foundations and high classification accuracy, normal SVM is not suitable for classification of large data sets, because the training complexity of SVM is highly dependent on the size of data set. This paper presents a novel SVM classification approach for large data sets by using minimum enclosing ball clustering. After the training data are partitioned by the proposed clustering method, the centers of the clusters are used for the first time SVM classification. Then we use the clusters whose centers are support vectors or those clusters which have different classes to perform the second time SVM classification. In this stage most data are removed. Several experimental results show that the approach proposed in this paper has good classification accuracy compared with classic SVM while the training is significantly faster than several other SVM classifiers.
Resumo:
A study was performed to determine if targeted metabolic profiling of cattle sera could be used to establish a predictive tool for identifying hormone misuse in cattle. Metabolites were assayed in heifers (n ) 5) treated with nortestosterone decanoate (0.85 mg/kg body weight), untreated heifers (n ) 5), steers (n ) 5) treated with oestradiol benzoate (0.15 mg/kg body weight) and untreated steers (n ) 5). Treatments were administered on days 0, 14, and 28 throughout a 42 day study period. Two support vector machines (SVMs) were trained, respectively, from heifer and steer data to identify hormonetreated animals. Performance of both SVM classifiers were evaluated by sensitivity and specificity of treatment prediction. The SVM trained on steer data achieved 97.33% sensitivity and 93.85% specificity while the one on heifer data achieved 94.67% sensitivity and 87.69% specificity. Solutions of SVM classifiers were further exploited to determine those days when classification accuracy of the SVM was most reliable. For heifers and steers, days 17-35 were determined to be the most selective. In summary, bioinformatics applied to targeted metabolic profiles generated from standard clinical chemistry analyses, has yielded an accurate, inexpensive, high-throughput test for predicting steroid abuse in cattle.
Resumo:
Logistic regression and Gaussian mixture model (GMM) classifiers have been trained to estimate the probability of acute myocardial infarction (AMI) in patients based upon the concentrations of a panel of cardiac markers. The panel consists of two new markers, fatty acid binding protein (FABP) and glycogen phosphorylase BB (GPBB), in addition to the traditional cardiac troponin I (cTnI), creatine kinase MB (CKMB) and myoglobin. The effect of using principal component analysis (PCA) and Fisher discriminant analysis (FDA) to preprocess the marker concentrations was also investigated. The need for classifiers to give an accurate estimate of the probability of AMI is argued and three categories of performance measure are described, namely discriminatory ability, sharpness, and reliability. Numerical performance measures for each category are given and applied. The optimum classifier, based solely upon the samples take on admission, was the logistic regression classifier using FDA preprocessing. This gave an accuracy of 0.85 (95% confidence interval: 0.78-0.91) and a normalised Brier score of 0.89. When samples at both admission and a further time, 1-6 h later, were included, the performance increased significantly, showing that logistic regression classifiers can indeed use the information from the five cardiac markers to accurately and reliably estimate the probability AMI. © Springer-Verlag London Limited 2008.
Resumo:
The identification and classification of network traffic and protocols is a vital step in many quality of service and security systems. Traffic classification strategies must evolve, alongside the protocols utilising the Internet, to overcome the use of ephemeral or masquerading port numbers and transport layer encryption. This research expands the concept of using machine learning on the initial statistics of flow of packets to determine its underlying protocol. Recognising the need for efficient training/retraining of a classifier and the requirement for fast classification, the authors investigate a new application of k-means clustering referred to as 'two-way' classification. The 'two-way' classification uniquely analyses a bidirectional flow as two unidirectional flows and is shown, through experiments on real network traffic, to improve classification accuracy by as much as 18% when measured against similar proposals. It achieves this accuracy while generating fewer clusters, that is, fewer comparisons are needed to classify a flow. A 'two-way' classification offers a new way to improve accuracy and efficiency of machine learning statistical classifiers while still maintaining the fast training times associated with the k-means.