847 resultados para Classification Methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a new multi-class steganalysis for binary image. The proposed method can identify the type of steganographic technique used by examining on the given binary image. In addition, our proposed method is also capable of differentiating an image with hidden message from the one without hidden message. In order to do that, we will extract some features from the binary image. The feature extraction method used is a combination of the method extended from our previous work and some new methods proposed in this paper. Based on the extracted feature sets, we construct our multi-class steganalysis from the SVM classifier. We also present the empirical works to demonstrate that the proposed method can effectively identify five different types of steganography.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Cancer monitoring and prevention relies on the critical aspect of timely notification of cancer cases. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, exist as complex and time-consuming activities. Aims In this paper, approaches for the automatic detection of notifiable cancer cases as the cause of death from free-text death certificates supplied to Cancer Registries are investigated. Method A number of machine learning classifiers were studied. Features were extracted using natural language techniques and the Medtex toolkit. The numerous features encompassed stemmed words, bi-grams, and concepts from the SNOMED CT medical terminology. The baseline consisted of a keyword spotter using keywords extracted from the long description of ICD-10 cancer related codes. Results Death certificates with notifiable cancer listed as the cause of death can be effectively identified with the methods studied in this paper. A Support Vector Machine (SVM) classifier achieved best performance with an overall F-measure of 0.9866 when evaluated on a set of 5,000 free-text death certificates using the token stem feature set. The SNOMED CT concept plus token stem feature set reached the lowest variance (0.0032) and false negative rate (0.0297) while achieving an F-measure of 0.9864. The SVM classifier accounts for the first 18 of the top 40 evaluated runs, and entails the most robust classifier with a variance of 0.001141, half the variance of the other classifiers. Conclusion The selection of features significantly produced the most influences on the performance of the classifiers, although the type of classifier employed also affects performance. In contrast, the feature weighting schema created a negligible effect on performance. Specifically, it is found that stemmed tokens with or without SNOMED CT concepts create the most effective feature when combined with an SVM classifier.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent advances suggest that encoding images through Symmetric Positive Definite (SPD) matrices and then interpreting such matrices as points on Riemannian manifolds can lead to increased classification performance. Taking into account manifold geometry is typically done via (1) embedding the manifolds in tangent spaces, or (2) embedding into Reproducing Kernel Hilbert Spaces (RKHS). While embedding into tangent spaces allows the use of existing Euclidean-based learning algorithms, manifold shape is only approximated which can cause loss of discriminatory information. The RKHS approach retains more of the manifold structure, but may require non-trivial effort to kernelise Euclidean-based learning algorithms. In contrast to the above approaches, in this paper we offer a novel solution that allows SPD matrices to be used with unmodified Euclidean-based learning algorithms, with the true manifold shape well-preserved. Specifically, we propose to project SPD matrices using a set of random projection hyperplanes over RKHS into a random projection space, which leads to representing each matrix as a vector of projection coefficients. Experiments on face recognition, person re-identification and texture classification show that the proposed approach outperforms several recent methods, such as Tensor Sparse Coding, Histogram Plus Epitome, Riemannian Locality Preserving Projection and Relational Divergence Classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Time series classification has been extensively explored in many fields of study. Most methods are based on the historical or current information extracted from data. However, if interest is in a specific future time period, methods that directly relate to forecasts of time series are much more appropriate. An approach to time series classification is proposed based on a polarization measure of forecast densities of time series. By fitting autoregressive models, forecast replicates of each time series are obtained via the bias-corrected bootstrap, and a stationarity correction is considered when necessary. Kernel estimators are then employed to approximate forecast densities, and discrepancies of forecast densities of pairs of time series are estimated by a polarization measure, which evaluates the extent to which two densities overlap. Following the distributional properties of the polarization measure, a discriminant rule and a clustering method are proposed to conduct the supervised and unsupervised classification, respectively. The proposed methodology is applied to both simulated and real data sets, and the results show desirable properties.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background There is a need for better understanding of the dispersion of classification-related variable to develop an evidence-based classification of athletes with a disability participating in stationary throwing events. Objectives The purposes of this study are (A) to describe tools designed to comprehend and represent the dispersion of the performance between successive classes, and (B) to present this dispersion for the elite male and female stationary shot-putters who participated in Beijing 2008 Paralympic Games. Study design Retrospective study Methods This study analysed a total of 479 attempts performed by 114 male and female stationary shot-putters in three F30s (F32-F34) and six F50s (F52-F58) classes during the course of eight events during Beijing 2008 Paralympic Games. Results The average differences of best performance were 1.46±0.46 m for males between F54 and F58 classes as well as 1.06±1.18 m for females between F55 and F58 classes. The results demonstrated a linear relationship between best performance and classification while revealing two male Gold Medallists in F33 and F52 classes were outliers. Conclusions This study confirms the benefits of the comparative matrices, performance continuum and dispersion plots to comprehend classification-related variables. The work presented here represents a stepping stone into biomechanical analyses of stationary throwers, particularly on the eve of the London 2012 Paralympic Games where new evidences could be gathered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Semantic perception and object labeling are key requirements for robots interacting with objects on a higher level. Symbolic annotation of objects allows the usage of planning algorithms for object interaction, for instance in a typical fetchand-carry scenario. In current research, perception is usually based on 3D scene reconstruction and geometric model matching, where trained features are matched with a 3D sample point cloud. In this work we propose a semantic perception method which is based on spatio-semantic features. These features are defined in a natural, symbolic way, such as geometry and spatial relation. In contrast to point-based model matching methods, a spatial ontology is used where objects are rather described how they "look like", similar to how a human would described unknown objects to another person. A fuzzy based reasoning approach matches perceivable features with a spatial ontology of the objects. The approach provides a method which is able to deal with senor noise and occlusions. Another advantage is that no training phase is needed in order to learn object features. The use-case of the proposed method is the detection of soil sample containers in an outdoor environment which have to be collected by a mobile robot. The approach is verified using real world experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis introduces a new way of using prior information in a spatial model and develops scalable algorithms for fitting this model to large imaging datasets. These methods are employed for image-guided radiation therapy and satellite based classification of land use and water quality. This study has utilized a pre-computation step to achieve a hundredfold improvement in the elapsed runtime for model fitting. This makes it much more feasible to apply these models to real-world problems, and enables full Bayesian inference for images with a million or more pixels.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bitboards allow the efficient encoding of games for computer play and the application of fast bitwiseparallel algorithms for common game-related operations. This article describes: (1) a selection of bitboard techniques including an introduction to bitboards and bitwise operations; (2) a classification scheme that distinguishes filter, query and update methods, and; (3) a sampling of bitboard algorithms for a range of games other than chess, with notes on their performance and practical application.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To classify each stage for a progressing disease such as Alzheimer’s disease is a key issue for the disease prevention and treatment. In this study, we derived structural brain networks from diffusion-weighted MRI using whole-brain tractography since there is growing interest in relating connectivity measures to clinical, cognitive, and genetic data. Relatively little work has usedmachine learning to make inferences about variations in brain networks in the progression of the Alzheimer’s disease. Here we developed a framework to utilize generalized low rank approximations of matrices (GLRAM) and modified linear discrimination analysis for unsupervised feature learning and classification of connectivity matrices. We apply the methods to brain networks derived from DWI scans of 41 people with Alzheimer’s disease, 73 people with EMCI, 38 people with LMCI, 47 elderly healthy controls and 221 young healthy controls. Our results show that this new framework can significantly improve classification accuracy when combining multiple datasets; this suggests the value of using data beyond the classification task at hand to model variations in brain connectivity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Human expert analyses are commonly used in bioacoustic studies and can potentially limit the reproducibility of these results. In this paper, a machine learning method is presented to statistically classify avian vocalizations. Automated approaches were applied to isolate bird songs from long field recordings, assess song similarities, and classify songs into distinct variants. Because no positive controls were available to assess the true classification of variants, multiple replicates of automatic classification of song variants were analyzed to investigate clustering uncertainty. The automatic classifications were more similar to the expert classifications than expected by chance. Application of these methods demonstrated the presence of discrete song variants in an island population of the New Zealand hihi (Notiomystis cincta). The geographic patterns of song variation were then revealed by integrating over classification replicates. Because this automated approach considers variation in song variant classification, it reduces potential human bias and facilitates the reproducibility of the results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Social media platforms, that foster user generated content, have altered the ways consumers search for product related information. Conducting online searches, reading product reviews, and comparing products ratings, is becoming a more common information seeking pathway. This research demonstrates that info-active consumers are becoming less reliant on information provided by retailers or manufacturers, hence marketing generated online content may have a reduced impact on their purchasing behaviour. The results of this study indicate that beyond traditional methods of segmenting consumers, in the online context, new classifications such as info-active and info-passive would be beneficial in digital marketing. This cross-sectional, mixed-methods study is based on 43 in-depth interviews and an online survey with 500 consumers from 30 countries.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A combined data matrix consisting of high performance liquid chromatography–diode array detector (HPLC–DAD) and inductively coupled plasma-mass spectrometry (ICP-MS) measurements of samples from the plant roots of the Cortex moutan (CM), produced much better classification and prediction results in comparison with those obtained from either of the individual data sets. The HPLC peaks (organic components) of the CM samples, and the ICP-MS measurements (trace metal elements) were investigated with the use of principal component analysis (PCA) and the linear discriminant analysis (LDA) methods of data analysis; essentially, qualitative results suggested that discrimination of the CM samples from three different provinces was possible with the combined matrix producing best results. Another three methods, K-nearest neighbor (KNN), back-propagation artificial neural network (BP-ANN) and least squares support vector machines (LS-SVM) were applied for the classification and prediction of the samples. Again, the combined data matrix analyzed by the KNN method produced best results (100% correct; prediction set data). Additionally, multiple linear regression (MLR) was utilized to explore any relationship between the organic constituents and the metal elements of the CM samples; the extracted linear regression equations showed that the essential metals as well as some metallic pollutants were related to the organic compounds on the basis of their concentrations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Within online learning communities, receiving timely and meaningful insights into the quality of learning activities is an important part of an effective educational experience. Commonly adopted methods – such as the Community of Inquiry framework – rely on manual coding of online discussion transcripts, which is a costly and time consuming process. There are several efforts underway to enable the automated classification of online discussion messages using supervised machine learning, which would enable the real-time analysis of interactions occurring within online learning communities. This paper investigates the importance of incorporating features that utilise the structure of on-line discussions for the classification of "cognitive presence" – the central dimension of the Community of Inquiry framework focusing on the quality of students' critical thinking within online learning communities. We implemented a Conditional Random Field classification solution, which incorporates structural features that may be useful in increasing classification performance over other implementations. Our approach leads to an improvement in classification accuracy of 5.8% over current existing techniques when tested on the same dataset, with a precision and recall of 0.630 and 0.504 respectively.