50 resultados para Discriminative Itemsets
Resumo:
In this research, we introduce an approach to enhance the discriminative capability of features by employing image-to-image variation minimization. In order to minimize image-to-image variation, we will estimate the cover image from the stego image by decompressing the stego image, transforming the decompressed image and recompressing back. Since the effect of the embedding operation in an image steganography is actually a noise adding process to the image, applying these three processes will smooth out the noise and hence the estimated cover image can be obtained.
Resumo:
Accurate and detailed measurement of an individual's physical activity is a key requirement for helping researchers understand the relationship between physical activity and health. Accelerometers have become the method of choice for measuring physical activity due to their small size, low cost, convenience and their ability to provide objective information about physical activity. However, interpreting accelerometer data once it has been collected can be challenging. In this work, we applied machine learning algorithms to the task of physical activity recognition from triaxial accelerometer data. We employed a simple but effective approach of dividing the accelerometer data into short non-overlapping windows, converting each window into a feature vector, and treating each feature vector as an i.i.d training instance for a supervised learning algorithm. In addition, we improved on this simple approach with a multi-scale ensemble method that did not need to commit to a single window size and was able to leverage the fact that physical activities produced time series with repetitive patterns and discriminative features for physical activity occurred at different temporal scales.
Resumo:
Person re-identification is particularly challenging due to significant appearance changes across separate camera views. In order to re-identify people, a representative human signature should effectively handle differences in illumination, pose and camera parameters. While general appearance-based methods are modelled in Euclidean spaces, it has been argued that some applications in image and video analysis are better modelled via non-Euclidean manifold geometry. To this end, recent approaches represent images as covariance matrices, and interpret such matrices as points on Riemannian manifolds. As direct classification on such manifolds can be difficult, in this paper we propose to represent each manifold point as a vector of similarities to class representers, via a recently introduced form of Bregman matrix divergence known as the Stein divergence. This is followed by using a discriminative mapping of similarity vectors for final classification. The use of similarity vectors is in contrast to the traditional approach of embedding manifolds into tangent spaces, which can suffer from representing the manifold structure inaccurately. Comparative evaluations on benchmark ETHZ and iLIDS datasets for the person re-identification task show that the proposed approach obtains better performance than recent techniques such as Histogram Plus Epitome, Partial Least Squares, and Symmetry-Driven Accumulation of Local Features.
Resumo:
In the TREC Web Diversity track, novelty-biased cumulative gain (α-NDCG) is one of the official measures to assess retrieval performance of IR systems. The measure is characterised by a parameter, α, the effect of which has not been thoroughly investigated. We find that common settings of α, i.e. α=0.5, may prevent the measure from behaving as desired when evaluating result diversification. This is because it excessively penalises systems that cover many intents while it rewards those that redundantly cover only few intents. This issue is crucial since it highly influences systems at top ranks. We revisit our previously proposed threshold, suggesting α be set on a query-basis. The intuitiveness of the measure is then studied by examining actual rankings from TREC 09-10 Web track submissions. By varying α according to our query-based threshold, the discriminative power of α-NDCG is not harmed and in fact, our approach improves α-NDCG's robustness. Experimental results show that the threshold for α can turn the measure to be more intuitive than using its common settings.
Resumo:
This paper evaluates the suitability of sequence classification techniques for analyzing deviant business process executions based on event logs. Deviant process executions are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as non-compliant executions or executions that undershoot or exceed performance targets. We evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions both when deviances are infrequent (unbalanced) and when deviances are as frequent as normal executions (balanced). We also analyze the ability of the discovered rules to explain potential causes and contributing factors of observed deviances. The evaluation results show that feature types extracted using pattern mining techniques only slightly outperform those based on individual activity frequency. The results also suggest that more complex feature types ought to be explored to achieve higher levels of accuracy.
Resumo:
This thesis investigates face recognition in video under the presence of large pose variations. It proposes a solution that performs simultaneous detection of facial landmarks and head poses across large pose variations, employs discriminative modelling of feature distributions of faces with varying poses, and applies fusion of multiple classifiers to pose-mismatch recognition. Experiments on several benchmark datasets have demonstrated that improved performance is achieved using the proposed solution.
Resumo:
Head and neck cancers (HNCs) represent a significant and ever-growing burden to the modern society, mainly due to the lack of early diagnostic methods. A significant number of HNCs is often associated with drinking, smoking, chewing beetle nut, and human papilloma virus (HPV) infections. We have analyzed DNA methylation patterns in tumor and normal tissue samples collected from head and neck squamous cell carcinoma (HNSCC) patients who were smokers. We have identified novel methylation sites in the promoter of the mediator complex subunit 15 (MED15/PCQAP) gene (encoing a co-factor important for regulation of transcription initiation for promoters of many genes), hypermethylated specifically in tumor cells. Two clusters of CpG dinucleotides methylated in tumors, but not in normal tissue from the same patients, were identified. These CpG methylation events in saliva samples were further validated in a separate cohort of HNSCC patients (who developed cancer due to smoking or HPV infections) and healthy controls using methylation-specific PCR (MSP). We used saliva as a biological medium because of its non-invasive nature, close proximity to the tumors, easiness and it is an economically viable option for large-scale screening studies. The methylation levels for the two identified CpG clusters were significantly different between the saliva samples collected from healthy controls and HNSCC individuals (Welch's t-test returning P, 0.05 and Mann-Whitney test P, 0.01 for both). The developed MSP assays also provided a good discriminative ability with AUC values of 0.70 (P, 0.01) and 0.63 (P, 0.05). The identified novel CpG methylation sites may serve as potential non-invasive biomarkers for detecting HNSCC. © the authors.
Resumo:
Background MicroRNAs (miRNAs) are known to play an important role in cancer development by post-transcriptionally affecting the expression of critical genes. The aims of this study were two-fold: (i) to develop a robust method to isolate miRNAs from small volumes of saliva and (ii) to develop a panel of saliva-based diagnostic biomarkers for the detection of head and neck squamous cell carcinoma (HNSCC). Methods Five differentially expressed miRNAs were selected from miScript™ miRNA microarray data generated using saliva from five HNSCC patients and five healthy controls. Their differential expression was subsequently confirmed by RT-qPCR using saliva samples from healthy controls (n = 56) and HNSCC patients (n = 56). These samples were divided into two different cohorts, i.e., a first confirmatory cohort (n = 21) and a second independent validation cohort (n = 35), to narrow down the miRNA diagnostic panel to three miRNAs: miR-9, miR-134 and miR-191. This diagnostic panel was independently validated using HNSCC miRNA expression data from The Cancer Genome Atlas (TCGA), encompassing 334 tumours and 39 adjacent normal tissues. Receiver operating characteristic (ROC) curve analysis was performed to assess the diagnostic capacity of the panel. Results On average 60 ng/μL miRNA was isolated from 200 μL of saliva. Overall a good correlation was observed between the microarray data and the RT-qPCR data. We found that miR-9 (P <0.0001), miR-134 (P <0.0001) and miR-191 (P <0.001) were differentially expressed between saliva from HNSCC patients and healthy controls, and that these miRNAs provided a good discriminative capacity with area under the curve (AUC) values of 0.85 (P <0.0001), 0.74 (P < 0.001) and 0.98 (P < 0.0001), respectively. In addition, we found that the salivary miRNA data showed a good correlation with the TCGA miRNA data, thereby providing an independent validation. Conclusions We show that we have developed a reliable method to isolate miRNAs from small volumes of saliva, and that the saliva-derived miRNAs miR-9, miR-134 and miR-191 may serve as novel biomarkers to reliably detect HNSCC. © 2014 International Society for Cellular Oncology.
Resumo:
We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy. We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy.
Resumo:
Selection of features that will permit accurate pattern classification is a difficult task. However, if a particular data set is represented by discrete valued features, it becomes possible to determine empirically the contribution that each feature makes to the discrimination between classes. This paper extends the discrimination bound method so that both the maximum and average discrimination expected on unseen test data can be estimated. These estimation techniques are the basis of a backwards elimination algorithm that can be use to rank features in order of their discriminative power. Two problems are used to demonstrate this feature selection process: classification of the Mushroom Database, and a real-world, pregnancy related medical risk prediction task - assessment of risk of perinatal death.
Resumo:
Corner detection has shown its great importance in many computer vision tasks. However, in real-world applications, noise in the image strongly affects the performance of corner detectors. Few corner detectors have been designed to be robust to heavy noise by now, partly because the noise could be reduced by a denoising procedure. In this paper, we present a corner detector that could find discriminative corners in images contaminated by noise of different levels, without any denoising procedure. Candidate corners (i.e., features) are firstly detected by a modified SUSAN approach, and then false corners in noise are rejected based on their local characteristics. Features in flat regions are removed based on their intensity centroid, and features on edge structures are removed using the Harris response. The detector is self-adaptive to noise since the image signal-to-noise ratio (SNR) is automatically estimated to choose an appropriate threshold for refining features. Experimental results show that our detector has better performance at locating discriminative corners in images with strong noise than other widely used corner or keypoint detectors.
Resumo:
Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.
Resumo:
For traditional information filtering (IF) models, it is often assumed that the documents in one collection are only related to one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling was proposed to generate statistical models to represent multiple topics in a collection of documents, but in a topic model, topics are represented by distributions over words which are limited to distinctively represent the semantics of topics. Patterns are always thought to be more discriminative than single terms and are able to reveal the inner relations between words. This paper proposes a novel information filtering model, Significant matched Pattern-based Topic Model (SPBTM). The SPBTM represents user information needs in terms of multiple topics and each topic is represented by patterns. More importantly, the patterns are organized into groups based on their statistical and taxonomic features, from which the more representative patterns, called Significant Matched Patterns, can be identified and used to estimate the document relevance. Experiments on benchmark data sets demonstrate that the SPBTM significantly outperforms the state-of-the-art models.
Resumo:
Background Symptom burden in chronic kidney disease (CKD) is poorly understood. To date, the majority of research focuses on single symptoms and there is a lack of suitable multidimensional symptom measures. The purpose of this study was to modify, translate, cross-culturally adapt and psychometrically analyse the Dialysis Symptom Index (DSI). Methods The study methods involved four phases: modification, translation, pilot-testing with a bilingual non-CKD sample and then psychometric testing with the target population. Content validity was assessed using an expert panel. Inter-rater agreement, test-retest reliability and Cronbach’s alpha coefficient were calculated to demonstrate reliability of the modified DSI. Discriminative and convergent validity were assessed to demonstrate construct validity. Results Content validity index during translation was 0.98. In the pilot study with 25 bilingual students a moderate to perfect agreement (Kappa statistic = 0.60-1.00) was found between English and Arabic versions of the modified DSI. The main study recruited 433 patients CKD with stages 4 and 5. The modified DSI was able to discriminate between non-dialysis and dialysis groups (p < 0.001) and demonstrated convergent validity with domains of the Kidney Disease Quality of Life short form. Excellent test-retest and internal consistency (Cronbach’s α = 0.91) reliability were also demonstrated. Conclusion The Arabic version of the modified DSI demonstrated good psychometric properties, measures the multidimensional nature of symptoms and can be used to assess symptom burden at different stages of CKD. The modified instrument, renamed the CKD Symptom Burden Index (CKD-SBI), should encourage greater clinical and research attention to symptom burden in CKD.
Resumo:
In this paper, we propose a highly reliable fault diagnosis scheme for incipient low-speed rolling element bearing failures. The scheme consists of fault feature calculation, discriminative fault feature analysis, and fault classification. The proposed approach first computes wavelet-based fault features, including the respective relative wavelet packet node energy and entropy, by applying a wavelet packet transform to an incoming acoustic emission signal. The most discriminative fault features are then filtered from the originally produced feature vector by using discriminative fault feature analysis based on a binary bat algorithm (BBA). Finally, the proposed approach employs one-against-all multiclass support vector machines to identify multiple low-speed rolling element bearing defects. This study compares the proposed BBA-based dimensionality reduction scheme with four other dimensionality reduction methodologies in terms of classification performance. Experimental results show that the proposed methodology is superior to other dimensionality reduction approaches, yielding an average classification accuracy of 94.9%, 95.8%, and 98.4% under bearing rotational speeds at 20 revolutions-per-minute (RPM), 80 RPM, and 140 RPM, respectively.