208 resultados para Classification criterion
Resumo:
Objective To develop and evaluate machine learning techniques that identify limb fractures and other abnormalities (e.g. dislocations) from radiology reports. Materials and Methods 99 free-text reports of limb radiology examinations were acquired from an Australian public hospital. Two clinicians were employed to identify fractures and abnormalities from the reports; a third senior clinician resolved disagreements. These assessors found that, of the 99 reports, 48 referred to fractures or abnormalities of limb structures. Automated methods were then used to extract features from these reports that could be useful for their automatic classification. The Naive Bayes classification algorithm and two implementations of the support vector machine algorithm were formally evaluated using cross-fold validation over the 99 reports. Result Results show that the Naive Bayes classifier accurately identifies fractures and other abnormalities from the radiology reports. These results were achieved when extracting stemmed token bigram and negation features, as well as using these features in combination with SNOMED CT concepts related to abnormalities and disorders. The latter feature has not been used in previous works that attempted classifying free-text radiology reports. Discussion Automated classification methods have proven effective at identifying fractures and other abnormalities from radiology reports (F-Measure up to 92.31%). Key to the success of these techniques are features such as stemmed token bigrams, negations, and SNOMED CT concepts associated with morphologic abnormalities and disorders. Conclusion This investigation shows early promising results and future work will further validate and strengthen the proposed approaches.
Resumo:
Spatially-explicit modelling of grassland classes is important to site-specific planning for improving grassland and environmental management over large areas. In this study, a climate-based grassland classification model, the Comprehensive and Sequential Classification System (CSCS) was integrated with spatially interpolated climate data to classify grassland in Gansu province, China. The study area is characterized by complex topographic features imposed by plateaus, high mountains, basins and deserts. To improve the quality of the interpolated climate data and the quality of the spatial classification over this complex topography, three linear regression methods, namely an analytic method based on multiple regression and residues (AMMRR), a modification of the AMMRR method through adding the effect of slope and aspect to the interpolation analysis (M-AMMRR) and a method which replaces the IDW approach for residue interpolation in M-AMMRR with an ordinary kriging approach (I-AMMRR), for interpolating climate variables were evaluated. The interpolation outcomes from the best interpolation method were then used in the CSCS model to classify the grassland in the study area. Climate variables interpolated included the annual cumulative temperature and annual total precipitation. The results indicated that the AMMRR and M-AMMRR methods generated acceptable climate surfaces but the best model fit and cross validation result were achieved by the I-AMMRR method. Twenty-six grassland classes were classified for the study area. The four grassland vegetation classes that covered more than half of the total study area were "cool temperate-arid temperate zonal semi-desert", "cool temperate-humid forest steppe and deciduous broad-leaved forest", "temperate-extra-arid temperate zonal desert", and "frigid per-humid rain tundra and alpine meadow". The vegetation classification map generated in this study provides spatial information on the locations and extents of the different grassland classes. This information can be used to facilitate government agencies' decision-making in land-use planning and environmental management, and for vegetation and biodiversity conservation. The information can also be used to assist land managers in the estimation of safe carrying capacities which will help to prevent overgrazing and land degradation.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.
Resumo:
In this paper, we propose a new multi-class steganalysis for binary image. The proposed method can identify the type of steganographic technique used by examining on the given binary image. In addition, our proposed method is also capable of differentiating an image with hidden message from the one without hidden message. In order to do that, we will extract some features from the binary image. The feature extraction method used is a combination of the method extended from our previous work and some new methods proposed in this paper. Based on the extracted feature sets, we construct our multi-class steganalysis from the SVM classifier. We also present the empirical works to demonstrate that the proposed method can effectively identify five different types of steganography.
Resumo:
Background Timely diagnosis and reporting of patient symptoms in hospital emergency departments (ED) is a critical component of health services delivery. However, due to dispersed information resources and a vast amount of manual processing of unstructured information, accurate point-of-care diagnosis is often difficult. Aims The aim of this research is to report initial experimental evaluation of a clinician-informed automated method for the issue of initial misdiagnoses associated with delayed receipt of unstructured radiology reports. Method A method was developed that resembles clinical reasoning for identifying limb abnormalities. The method consists of a gazetteer of keywords related to radiological findings; the method classifies an X-ray report as abnormal if it contains evidence contained in the gazetteer. A set of 99 narrative reports of radiological findings was sourced from a tertiary hospital. Reports were manually assessed by two clinicians and discrepancies were validated by a third expert ED clinician; the final manual classification generated by the expert ED clinician was used as ground truth to empirically evaluate the approach. Results The automated method that attempts to individuate limb abnormalities by searching for keywords expressed by clinicians achieved an F-measure of 0.80 and an accuracy of 0.80. Conclusion While the automated clinician-driven method achieved promising performances, a number of avenues for improvement were identified using advanced natural language processing (NLP) and machine learning techniques.
Resumo:
Background Cancer monitoring and prevention relies on the critical aspect of timely notification of cancer cases. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, exist as complex and time-consuming activities. Aims In this paper, approaches for the automatic detection of notifiable cancer cases as the cause of death from free-text death certificates supplied to Cancer Registries are investigated. Method A number of machine learning classifiers were studied. Features were extracted using natural language techniques and the Medtex toolkit. The numerous features encompassed stemmed words, bi-grams, and concepts from the SNOMED CT medical terminology. The baseline consisted of a keyword spotter using keywords extracted from the long description of ICD-10 cancer related codes. Results Death certificates with notifiable cancer listed as the cause of death can be effectively identified with the methods studied in this paper. A Support Vector Machine (SVM) classifier achieved best performance with an overall F-measure of 0.9866 when evaluated on a set of 5,000 free-text death certificates using the token stem feature set. The SNOMED CT concept plus token stem feature set reached the lowest variance (0.0032) and false negative rate (0.0297) while achieving an F-measure of 0.9864. The SVM classifier accounts for the first 18 of the top 40 evaluated runs, and entails the most robust classifier with a variance of 0.001141, half the variance of the other classifiers. Conclusion The selection of features significantly produced the most influences on the performance of the classifiers, although the type of classifier employed also affects performance. In contrast, the feature weighting schema created a negligible effect on performance. Specifically, it is found that stemmed tokens with or without SNOMED CT concepts create the most effective feature when combined with an SVM classifier.
Resumo:
The authors must be congratulated for their original and important study. The flooding of urbanised areas constitutes a hazard to the population and infrastructure. Floods through inundated urban environments have been studied only recently and few considered the potential impact of flowing waters on pedestrians...
Resumo:
OBJECTIVES: To compare the classification accuracy of previously published RT3 accelerometer cut-points for youth using energy expenditure, measured via portable indirect calorimetry, as a criterion measure. DESIGN: Cross-sectional cross-validation study. METHODS: 100 children (mean age 11.2±2.8 years, 61% male) completed 12 standardized activities trials (3 sedentary, 5 lifestyle and 4 ambulatory) while wearing an RT3 accelerometer. V˙O2 was measured concurrently using the Oxycon Mobile portable calorimeter. Cut-points by Vanhelst (VH), Rowlands (RW), Chu (CH), Kavouras (KV) and the RT3 manufacturer (RT3M) were used to classify PA intensity as sedentary (SED), light (LPA), moderate (MPA) or vigorous (VPA). Classification accuracy was evaluated using the area under the Receiver Operating Characteristic curve (ROC-AUC) and weighted Kappa (κ). RESULTS: For moderate-to-vigorous PA (MVPA), VH, KV and RW exhibited excellent accuracy classification (ROC-AUC≥0.90), while the CH and RT3M exhibited good classification accuracy (ROC-AUC>0.80). Classification accuracy for LPA was fair to poor (ROC-AUC<0.76). For SED, VH exhibited excellent classification accuracy (ROC-AUC>0.90), while RW, CH, and RT3M exhibited good classification accuracy (ROC-AUC>0.80). Kappa statistics ranged from 0.67 (VH) to 0.55 (CH). CONCLUSIONS: All cut-points provided acceptable classification accuracy for SED and MVPA, but limited accuracy for LPA. On the basis of classification accuracy over all four levels of intensity, the use of the VH cut-points is recommended.
Resumo:
Recent advances suggest that encoding images through Symmetric Positive Definite (SPD) matrices and then interpreting such matrices as points on Riemannian manifolds can lead to increased classification performance. Taking into account manifold geometry is typically done via (1) embedding the manifolds in tangent spaces, or (2) embedding into Reproducing Kernel Hilbert Spaces (RKHS). While embedding into tangent spaces allows the use of existing Euclidean-based learning algorithms, manifold shape is only approximated which can cause loss of discriminatory information. The RKHS approach retains more of the manifold structure, but may require non-trivial effort to kernelise Euclidean-based learning algorithms. In contrast to the above approaches, in this paper we offer a novel solution that allows SPD matrices to be used with unmodified Euclidean-based learning algorithms, with the true manifold shape well-preserved. Specifically, we propose to project SPD matrices using a set of random projection hyperplanes over RKHS into a random projection space, which leads to representing each matrix as a vector of projection coefficients. Experiments on face recognition, person re-identification and texture classification show that the proposed approach outperforms several recent methods, such as Tensor Sparse Coding, Histogram Plus Epitome, Riemannian Locality Preserving Projection and Relational Divergence Classification.
Resumo:
This paper describes a novel system for automatic classification of images obtained from Anti-Nuclear Antibody (ANA) pathology tests on Human Epithelial type 2 (HEp-2) cells using the Indirect Immunofluorescence (IIF) protocol. The IIF protocol on HEp-2 cells has been the hallmark method to identify the presence of ANAs, due to its high sensitivity and the large range of antigens that can be detected. However, it suffers from numerous shortcomings, such as being subjective as well as time and labour intensive. Computer Aided Diagnostic (CAD) systems have been developed to address these problems, which automatically classify a HEp-2 cell image into one of its known patterns (eg. speckled, homogeneous). Most of the existing CAD systems use handpicked features to represent a HEp-2 cell image, which may only work in limited scenarios. We propose a novel automatic cell image classification method termed Cell Pyramid Matching (CPM), which is comprised of regional histograms of visual words coupled with the Multiple Kernel Learning framework. We present a study of several variations of generating histograms and show the efficacy of the system on two publicly available datasets: the ICPR HEp-2 cell classification contest dataset and the SNPHEp-2 dataset.
Resumo:
Existing multi-model approaches for image set classification extract local models by clustering each image set individually only once, with fixed clusters used for matching with other image sets. However, this may result in the two closest clusters to represent different characteristics of an object, due to different undesirable environmental conditions (such as variations in illumination and pose). To address this problem, we propose to constrain the clustering of each query image set by forcing the clusters to have resemblance to the clusters in the gallery image sets. We first define a Frobenius norm distance between subspaces over Grassmann manifolds based on reconstruction error. We then extract local linear subspaces from a gallery image set via sparse representation. For each local linear subspace, we adaptively construct the corresponding closest subspace from the samples of a probe image set by joint sparse representation. We show that by minimising the sparse representation reconstruction error, we approach the nearest point on a Grassmann manifold. Experiments on Honda, ETH-80 and Cambridge-Gesture datasets show that the proposed method consistently outperforms several other recent techniques, such as Affine Hull based Image Set Distance (AHISD), Sparse Approximated Nearest Points (SANP) and Manifold Discriminant Analysis (MDA).
Resumo:
The absence of comparative validity studies has prevented researchers from reaching consensus regarding the application of intensity-related accelerometer cut points for children and adolescents. PURPOSE This study aimed to evaluate the classification accuracy of five sets of independently developed ActiGraph cut points using energy expenditure, measured by indirect calorimetry, as a criterion reference standard. METHODS A total of 206 participants between the ages of 5 and 15 yr completed 12 standardized activity trials. Trials consisted of sedentary activities (lying down, writing, computer game), lifestyle activities (sweeping, laundry, throw and catch, aerobics, basketball), and ambulatory activities (comfortable walk, brisk walk, brisk treadmill walk, running). During each trial, participants wore an ActiGraph GT1M, and VO 2 was measured breath-by-breath using the Oxycon Mobile portable metabolic system. Physical activity intensity was estimated using five independently developed cut points: Freedson/Trost (FT), Puyau (PU), Treuth (TR), Mattocks (MT), and Evenson (EV). Classification accuracy was evaluated via weighted κ statistics and area under the receiver operating characteristic curve (ROC-AUC). RESULTS Across all four intensity levels, the EV (κ = 0.68) and FT (κ = 0.66) cut points exhibited significantly better agreement than TR (κ = 0.62), MT (κ = 0.54), and PU (κ = 0.36). The EV and FT cut points exhibited significantly better classification accuracy for moderate-to vigorous-intensity physical activity (ROC-AUC = 0.90) than TR, PU, or MT cut points (ROC-AUC = 0.77-0.85). Only the EV cut points provided acceptable classification accuracy for all four levels of physical activity intensity and performed well among children of all ages. The widely applied sedentary cut point of 100 counts per minute exhibited excellent classification accuracy (ROC-AUC = 0.90). CONCLUSIONS On the basis of these findings, we recommend that researchers use the EV ActiGraph cut points to estimate time spent in sedentary, light-, moderate-, and vigorous-intensity activity in children and adolescents. Copyright © 2011 by the American College of Sports Medicine.
Resumo:
The unique physical and movement characteristics of children necessitate the development of accelerometer equations and cut points that are population specific. The purpose of this study is to develop an ecologically valid cut point for the Biotrainer Pro monitor that reflects a threshold for moderate-intensity physical activity in elementary school children. A sample of 30 children (ages 8-12) wore a Biotrainer monitor while completing a series of 7 movement tasks (calibration phase) and while participating in an organized group activity (cross-validation phase). Videotapes from each session were processed using a computerized direct-observation technique to provide a criterion measure of physical activity. Analyses involved the use of mixed-model regression and receiver operator characteristic (ROC) curves. The results indicated that a cut point of 4 counts/min provides the optimal balance between the related needs for sensitivity (accurately detecting activity) and specificity (limiting misclassification of activity as inactivity). Results with the cross-validation data demonstrated that this value yielded the best overall kappa (.58) and a high classification agreement (84%) for activity determination. The specificity of 93% demonstrates that the proposed cut point can accurately detect activity; however, the lower sensitivity value of 61% suggests that some minutes of activity might be incorrectly classified as inactivity. The cut point of 4 counts/min provides an ecologically valid cut point to capture physical activity in children using the Biotrainer Pro activity monitor.
Resumo:
Objective The present study aimed to develop accelerometer cut points to classify physical activities (PA) by intensity in preschoolers and to investigate discrepancies in PA levels when applying various accelerometer cut points. Methods To calibrate the accelerometer, 18 preschoolers (5.8 +/- 0.4 years) performed eleven structured activities and one free play session while wearing a GT1M ActiGraph accelerometer using 15 s epochs. The structured activities were chosen based on the direct observation system Children's Activity Rating Scale (CARS) while the criterion measure of PA intensity during free play was provided using a second-by-second observation protocol (modified CARS). Receiver Operating Characteristic (ROC) curve analyses were used to determine the accelerometer cut points. To examine the classification differences, accelerometer data of four consecutive days from 114 preschoolers (5.5 +/- 0.3 years) were classified by intensity according to previously published and the newly developed accelerometer cut points. Differences in predicted PA levels were evaluated using repeated measures ANOVA and Chi Square test. Results Cut points were identified at 373 counts/15 s for light (sensitivity: 86%; specificity: 91%; Area under ROC curve: 0.95), 585 counts/15 s for moderate (87%; 82%; 0.91) and 881 counts/15 s for vigorous PA (88%; 91%; 0.94). Further, applying various accelerometer cut points to the same data resulted in statistically and biologically significant differences in PA. Conclusions Accelerometer cut points were developed with good discriminatory power for differentiating between PA levels in preschoolers and the choice of accelerometer cut points can result in large discrepancies.