337 resultados para statistical classification


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This chapter addresses data modelling as a means of promoting statistical literacy in the early grades. Consideration is first given to the importance of increasing young children’s exposure to statistical reasoning experiences and how data modelling can be a rich means of doing so. Selected components of data modelling are then reviewed, followed by a report on some findings from the third-year of a three-year longitudinal study across grades one through three.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

At NDSS 2012, Yan et al. analyzed the security of several challenge-response type user authentication protocols against passive observers, and proposed a generic counting based statistical attack to recover the secret of some counting based protocols given a number of observed authentication sessions. Roughly speaking, the attack is based on the fact that secret (pass) objects appear in challenges with a different probability from non-secret (decoy) objects when the responses are taken into account. Although they mentioned that a protocol susceptible to this attack should minimize this difference, they did not give details as to how this can be achieved barring a few suggestions. In this paper, we attempt to fill this gap by generalizing the attack with a much more comprehensive theoretical analysis. Our treatment is more quantitative which enables us to describe a method to theoretically estimate a lower bound on the number of sessions a protocol can be safely used against the attack. Our results include 1) two proposed fixes to make counting protocols practically safe against the attack at the cost of usability, 2) the observation that the attack can be used on non-counting based protocols too as long as challenge generation is contrived, 3) and two main design principles for user authentication protocols which can be considered as extensions of the principles from Yan et al. This detailed theoretical treatment can be used as a guideline during the design of counting based protocols to determine their susceptibility to this attack. The Foxtail protocol, one of the protocols analyzed by Yan et al., is used as a representative to illustrate our theoretical and experimental results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a new multi-class steganalysis for binary image. The proposed method can identify the type of steganographic technique used by examining on the given binary image. In addition, our proposed method is also capable of differentiating an image with hidden message from the one without hidden message. In order to do that, we will extract some features from the binary image. The feature extraction method used is a combination of the method extended from our previous work and some new methods proposed in this paper. Based on the extracted feature sets, we construct our multi-class steganalysis from the SVM classifier. We also present the empirical works to demonstrate that the proposed method can effectively identify five different types of steganography.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Timely diagnosis and reporting of patient symptoms in hospital emergency departments (ED) is a critical component of health services delivery. However, due to dispersed information resources and a vast amount of manual processing of unstructured information, accurate point-of-care diagnosis is often difficult. Aims The aim of this research is to report initial experimental evaluation of a clinician-informed automated method for the issue of initial misdiagnoses associated with delayed receipt of unstructured radiology reports. Method A method was developed that resembles clinical reasoning for identifying limb abnormalities. The method consists of a gazetteer of keywords related to radiological findings; the method classifies an X-ray report as abnormal if it contains evidence contained in the gazetteer. A set of 99 narrative reports of radiological findings was sourced from a tertiary hospital. Reports were manually assessed by two clinicians and discrepancies were validated by a third expert ED clinician; the final manual classification generated by the expert ED clinician was used as ground truth to empirically evaluate the approach. Results The automated method that attempts to individuate limb abnormalities by searching for keywords expressed by clinicians achieved an F-measure of 0.80 and an accuracy of 0.80. Conclusion While the automated clinician-driven method achieved promising performances, a number of avenues for improvement were identified using advanced natural language processing (NLP) and machine learning techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Cancer monitoring and prevention relies on the critical aspect of timely notification of cancer cases. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, exist as complex and time-consuming activities. Aims In this paper, approaches for the automatic detection of notifiable cancer cases as the cause of death from free-text death certificates supplied to Cancer Registries are investigated. Method A number of machine learning classifiers were studied. Features were extracted using natural language techniques and the Medtex toolkit. The numerous features encompassed stemmed words, bi-grams, and concepts from the SNOMED CT medical terminology. The baseline consisted of a keyword spotter using keywords extracted from the long description of ICD-10 cancer related codes. Results Death certificates with notifiable cancer listed as the cause of death can be effectively identified with the methods studied in this paper. A Support Vector Machine (SVM) classifier achieved best performance with an overall F-measure of 0.9866 when evaluated on a set of 5,000 free-text death certificates using the token stem feature set. The SNOMED CT concept plus token stem feature set reached the lowest variance (0.0032) and false negative rate (0.0297) while achieving an F-measure of 0.9864. The SVM classifier accounts for the first 18 of the top 40 evaluated runs, and entails the most robust classifier with a variance of 0.001141, half the variance of the other classifiers. Conclusion The selection of features significantly produced the most influences on the performance of the classifiers, although the type of classifier employed also affects performance. In contrast, the feature weighting schema created a negligible effect on performance. Specifically, it is found that stemmed tokens with or without SNOMED CT concepts create the most effective feature when combined with an SVM classifier.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent advances suggest that encoding images through Symmetric Positive Definite (SPD) matrices and then interpreting such matrices as points on Riemannian manifolds can lead to increased classification performance. Taking into account manifold geometry is typically done via (1) embedding the manifolds in tangent spaces, or (2) embedding into Reproducing Kernel Hilbert Spaces (RKHS). While embedding into tangent spaces allows the use of existing Euclidean-based learning algorithms, manifold shape is only approximated which can cause loss of discriminatory information. The RKHS approach retains more of the manifold structure, but may require non-trivial effort to kernelise Euclidean-based learning algorithms. In contrast to the above approaches, in this paper we offer a novel solution that allows SPD matrices to be used with unmodified Euclidean-based learning algorithms, with the true manifold shape well-preserved. Specifically, we propose to project SPD matrices using a set of random projection hyperplanes over RKHS into a random projection space, which leads to representing each matrix as a vector of projection coefficients. Experiments on face recognition, person re-identification and texture classification show that the proposed approach outperforms several recent methods, such as Tensor Sparse Coding, Histogram Plus Epitome, Riemannian Locality Preserving Projection and Relational Divergence Classification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a novel system for automatic classification of images obtained from Anti-Nuclear Antibody (ANA) pathology tests on Human Epithelial type 2 (HEp-2) cells using the Indirect Immunofluorescence (IIF) protocol. The IIF protocol on HEp-2 cells has been the hallmark method to identify the presence of ANAs, due to its high sensitivity and the large range of antigens that can be detected. However, it suffers from numerous shortcomings, such as being subjective as well as time and labour intensive. Computer Aided Diagnostic (CAD) systems have been developed to address these problems, which automatically classify a HEp-2 cell image into one of its known patterns (eg. speckled, homogeneous). Most of the existing CAD systems use handpicked features to represent a HEp-2 cell image, which may only work in limited scenarios. We propose a novel automatic cell image classification method termed Cell Pyramid Matching (CPM), which is comprised of regional histograms of visual words coupled with the Multiple Kernel Learning framework. We present a study of several variations of generating histograms and show the efficacy of the system on two publicly available datasets: the ICPR HEp-2 cell classification contest dataset and the SNPHEp-2 dataset.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Existing multi-model approaches for image set classification extract local models by clustering each image set individually only once, with fixed clusters used for matching with other image sets. However, this may result in the two closest clusters to represent different characteristics of an object, due to different undesirable environmental conditions (such as variations in illumination and pose). To address this problem, we propose to constrain the clustering of each query image set by forcing the clusters to have resemblance to the clusters in the gallery image sets. We first define a Frobenius norm distance between subspaces over Grassmann manifolds based on reconstruction error. We then extract local linear subspaces from a gallery image set via sparse representation. For each local linear subspace, we adaptively construct the corresponding closest subspace from the samples of a probe image set by joint sparse representation. We show that by minimising the sparse representation reconstruction error, we approach the nearest point on a Grassmann manifold. Experiments on Honda, ETH-80 and Cambridge-Gesture datasets show that the proposed method consistently outperforms several other recent techniques, such as Affine Hull based Image Set Distance (AHISD), Sparse Approximated Nearest Points (SANP) and Manifold Discriminant Analysis (MDA).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose This Study evaluated the predictive validity of three previously published ActiGraph energy expenditure (EE) prediction equations developed for children and adolescents. Methods A total of 45 healthy children and adolescents (mean age: 13.7 +/- 2.6 yr) completed four 5-min activity trials (normal walking. brisk walking, easy running, and fast running) in ail indoor exercise facility. During each trial, participants were all ActiGraph accelerometer oil the right hip. EE was monitored breath by breath using the Cosmed K4b(2) portable indirect calorimetry system. Differences and associations between measured and predicted EE were assessed using dependent t-tests and Pearson correlations, respectively. Classification accuracy was assessed using percent agreement, sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve. Results None of the equations accurately predicted mean energy expenditure during each of the four activity trials. Each equation, however, accurately predicted mean EE in at least one activity trial. The Puyau equation accurately predicted EE during slow walking. The Trost equation accurately predicted EE during slow running. The Freedson equation accurately predicted EE during fast running. None of the three equations accurately predicted EE during brisk walking. The equations exhibited fair to excellent classification accuracy with respect to activity intensity. with the Trost equation exhibiting the highest classification accuracy and the Puyau equation exhibiting the lowest. Conclusions These data suggest that the three accelerometer prediction equations do not accurately predict EE on a minute-by-minute basis in children and adolescents during overground walking and running. The equations maybe, however, for estimating participation in moderate and vigorous activity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article develops methods for spatially predicting daily change of dissolved oxygen (Dochange) at both sampled locations (134 freshwater sites in 2002 and 2003) and other locations of interest throughout a river network in South East Queensland, Australia. In order to deal with the relative sparseness of the monitoring locations in comparison to the number of locations where one might want to make predictions, we make a classification of the river and stream locations. We then implement optimal spatial prediction (ordinary and constrained kriging) from geostatistics. Because of their directed-tree structure, rivers and streams offer special challenges. A complete approach to spatial prediction on a river network is given, with special attention paid to environmental exceedances. The methodology is used to produce a map of Dochange predictions for 2003. Dochange is one of the variables measured as part of the Ecosystem Health Monitoring Program conducted within the Moreton Bay Waterways and Catchments Partnership.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The cotton strip assay (CSA) is an established technique for measuring soil microbial activity. The technique involves burying cotton strips and measuring their tensile strength after a certain time. This gives a measure of the rotting rate, R, of the cotton strips. R is then a measure of soil microbial activity. This paper examines properties of the technique and indicates how the assay can be optimised. Humidity conditioning of the cotton strips before measuring their tensile strength reduced the within and between day variance and enabled the distribution of the tensile strength measurements to approximate normality. The test data came from a three-way factorial experiment (two soils, two temperatures, three moisture levels). The cotton strips were buried in the soil for intervals of time ranging up to 6 weeks. This enabled the rate of loss of cotton tensile strength with time to be studied under a range of conditions. An inverse cubic model accounted for greater than 90% of the total variation within each treatment combination. This offers support for summarising the decomposition process by a single parameter R. The approximate variance of the decomposition rate was estimated from a function incorporating the variance of tensile strength and the differential of the function for the rate of decomposition, R, with respect to tensile strength. This variance function has a minimum when the measured strength is approximately 2/3 that of the original strength. The estimates of R are almost unbiased and relatively robust against the cotton strips being left in the soil for more or less than the optimal time. We conclude that the rotting rate X should be measured using the inverse cubic equation, and that the cotton strips should be left in the soil until their strength has been reduced to about 2/3.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Time series classification has been extensively explored in many fields of study. Most methods are based on the historical or current information extracted from data. However, if interest is in a specific future time period, methods that directly relate to forecasts of time series are much more appropriate. An approach to time series classification is proposed based on a polarization measure of forecast densities of time series. By fitting autoregressive models, forecast replicates of each time series are obtained via the bias-corrected bootstrap, and a stationarity correction is considered when necessary. Kernel estimators are then employed to approximate forecast densities, and discrepancies of forecast densities of pairs of time series are estimated by a polarization measure, which evaluates the extent to which two densities overlap. Following the distributional properties of the polarization measure, a discriminant rule and a clustering method are proposed to conduct the supervised and unsupervised classification, respectively. The proposed methodology is applied to both simulated and real data sets, and the results show desirable properties.