50 resultados para Discriminative Itemsets

em Queensland University of Technology - ePrints Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a single pass algorithm for mining discriminative Itemsets in data streams using a novel data structure and the tilted-time window model. Discriminative Itemsets are defined as Itemsets that are frequent in one data stream and their frequency in that stream is much higher than the rest of the streams in the dataset. In order to deal with the data structure size, we propose a pruning process that results in the compact tree structure containing discriminative Itemsets. Empirical analysis shows the sound time and space complexity of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work proposes to improve spoken term detection (STD) accuracy by optimising the Figure of Merit (FOM). In this article, the index takes the form of phonetic posterior-feature matrix. Accuracy is improved by formulating STD as a discriminative training problem and directly optimising the FOM, through its use as an objective function to train a transformation of the index. The outcome of indexing is then a matrix of enhanced posterior-features that are directly tailored for the STD task. The technique is shown to improve the FOM by up to 13% on held-out data. Additional analysis explores the effect of the technique on phone recognition accuracy, examines the actual values of the learned transform, and demonstrates that using an extended training data set results in further improvement in the FOM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Eigen-based techniques and other monolithic approaches to face recognition have long been a cornerstone in the face recognition community due to the high dimensionality of face images. Eigen-face techniques provide minimal reconstruction error and limit high-frequency content while linear discriminant-based techniques (fisher-faces) allow the construction of subspaces which preserve discriminatory information. This paper presents a frequency decomposition approach for improved face recognition performance utilising three well-known techniques: Wavelets; Gabor / Log-Gabor; and the Discrete Cosine Transform. Experimentation illustrates that frequency domain partitioning prior to dimensionality reduction increases the information available for classification and greatly increases face recognition performance for both eigen-face and fisher-face approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective. To investigate the reliability and validity of five squat-based loading tests that are clinically appropriate for jumper's knee. The loading tests were step up, double leg squat, double leg squat on a 25-degree decline (decline squat), single leg decline squat, and decline hop. Design. Cross-sectional controlled cohort. Subjects without knee pain comprised controls, those with extensor tendon pain comprised the jumper's knee group. Setting. Institutional athlete study group in Australia Participants. Fifty-six elite adolescent basketball players participated in this study, thirteen comprised the jumper's knee group, fifteen athletes formed a control group. Intervention. Each subject performed each loading test for baseline and reliability data on the first testing day. Subjects then performed three days of intensive (6 h daily) basketball training, after which each loading test was reexamined. Main outcome measures. Eleven point interval scale for pain. Results. The tests that best detected a change in pain due to intensive workload were the single leg decline squat and single leg decline hop. This study found that decline tests have better discriminative ability than the standard squat to detect change in jumper's knee pain due to intensive training. The typical error for these tests ranged from 0.3 to 0.5, however, caution should be exercised in the interpretation of these reliability figures due to relatively low scores. Conclusions. The single leg decline squat is recommended in the physical assessment of adolescent jumper's knee. The decline squat was selected as the best clinical test over the decline hop because it was easier to standardise performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a highly reliable fault diagnosis approach for low-speed bearings. The proposed approach first extracts wavelet-based fault features that represent diverse symptoms of multiple low-speed bearing defects. The most useful fault features for diagnosis are then selected by utilizing a genetic algorithm (GA)-based kernel discriminative feature analysis cooperating with one-against-all multicategory support vector machines (OAA MCSVMs). Finally, each support vector machine is individually trained with its own feature vector that includes the most discriminative fault features, offering the highest classification performance. In this study, the effectiveness of the proposed GA-based kernel discriminative feature analysis and the classification ability of individually trained OAA MCSVMs are addressed in terms of average classification accuracy. In addition, the proposedGA- based kernel discriminative feature analysis is compared with four other state-of-the-art feature analysis approaches. Experimental results indicate that the proposed approach is superior to other feature analysis methodologies, yielding an average classification accuracy of 98.06% and 94.49% under rotational speeds of 50 revolutions-per-minute (RPM) and 80 RPM, respectively. Furthermore, the individually trained MCSVMs with their own optimal fault features based on the proposed GA-based kernel discriminative feature analysis outperform the standard OAA MCSVMs, showing an average accuracy of 98.66% and 95.01% for bearings under rotational speeds of 50 RPM and 80 RPM, respectively.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For most of the work done in developing association rule mining, the primary focus has been on the efficiency of the approach and to a lesser extent the quality of the derived rules has been emphasized. Often for a dataset, a huge number of rules can be derived, but many of them can be redundant to other rules and thus are useless in practice. The extremely large number of rules makes it difficult for the end users to comprehend and therefore effectively use the discovered rules and thus significantly reduces the effectiveness of rule mining algorithms. If the extracted knowledge can’t be effectively used in solving real world problems, the effort of extracting the knowledge is worth little. This is a serious problem but not yet solved satisfactorily. In this paper, we propose a concise representation called Reliable Approximate basis for representing non-redundant approximate association rules. We prove that the redundancy elimination based on the proposed basis does not reduce the belief to the extracted rules. We also prove that all approximate association rules can be deduced from the Reliable Approximate basis. Therefore the basis is a lossless representation of approximate association rules.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Association rule mining has made many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we firstly propose a definition for redundancy; then we propose a concise representation called Reliable basis for representing non-redundant association rules for both exact rules and approximate rules. An important contribution of this paper is that we propose to use the certainty factor as the criteria to measure the strength of the discovered association rules. With the criteria, we can determine the boundary between redundancy and non-redundancy to ensure eliminating as many redundant rules as possible without reducing the inference capacity of and the belief to the remaining extracted non-redundant rules. We prove that the redundancy elimination based on the proposed Reliable basis does not reduce the belief to the extracted rules. We also prove that all association rules can be deduced from the Reliable basis. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Automatic recognition of people is an active field of research with important forensic and security applications. In these applications, it is not always possible for the subject to be in close proximity to the system. Voice represents a human behavioural trait which can be used to recognise people in such situations. Automatic Speaker Verification (ASV) is the process of verifying a persons identity through the analysis of their speech and enables recognition of a subject at a distance over a telephone channel { wired or wireless. A significant amount of research has focussed on the application of Gaussian mixture model (GMM) techniques to speaker verification systems providing state-of-the-art performance. GMM's are a type of generative classifier trained to model the probability distribution of the features used to represent a speaker. Recently introduced to the field of ASV research is the support vector machine (SVM). An SVM is a discriminative classifier requiring examples from both positive and negative classes to train a speaker model. The SVM is based on margin maximisation whereby a hyperplane attempts to separate classes in a high dimensional space. SVMs applied to the task of speaker verification have shown high potential, particularly when used to complement current GMM-based techniques in hybrid systems. This work aims to improve the performance of ASV systems using novel and innovative SVM-based techniques. Research was divided into three main themes: session variability compensation for SVMs; unsupervised model adaptation; and impostor dataset selection. The first theme investigated the differences between the GMM and SVM domains for the modelling of session variability | an aspect crucial for robust speaker verification. Techniques developed to improve the robustness of GMMbased classification were shown to bring about similar benefits to discriminative SVM classification through their integration in the hybrid GMM mean supervector SVM classifier. Further, the domains for the modelling of session variation were contrasted to find a number of common factors, however, the SVM-domain consistently provided marginally better session variation compensation. Minimal complementary information was found between the techniques due to the similarities in how they achieved their objectives. The second theme saw the proposal of a novel model for the purpose of session variation compensation in ASV systems. Continuous progressive model adaptation attempts to improve speaker models by retraining them after exploiting all encountered test utterances during normal use of the system. The introduction of the weight-based factor analysis model provided significant performance improvements of over 60% in an unsupervised scenario. SVM-based classification was then integrated into the progressive system providing further benefits in performance over the GMM counterpart. Analysis demonstrated that SVMs also hold several beneficial characteristics to the task of unsupervised model adaptation prompting further research in the area. In pursuing the final theme, an innovative background dataset selection technique was developed. This technique selects the most appropriate subset of examples from a large and diverse set of candidate impostor observations for use as the SVM background by exploiting the SVM training process. This selection was performed on a per-observation basis so as to overcome the shortcoming of the traditional heuristic-based approach to dataset selection. Results demonstrate the approach to provide performance improvements over both the use of the complete candidate dataset and the best heuristically-selected dataset whilst being only a fraction of the size. The refined dataset was also shown to generalise well to unseen corpora and be highly applicable to the selection of impostor cohorts required in alternate techniques for speaker verification.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For the first time in human history, large volumes of spoken audio are being broadcast, made available on the internet, archived, and monitored for surveillance every day. New technologies are urgently required to unlock these vast and powerful stores of information. Spoken Term Detection (STD) systems provide access to speech collections by detecting individual occurrences of specified search terms. The aim of this work is to develop improved STD solutions based on phonetic indexing. In particular, this work aims to develop phonetic STD systems for applications that require open-vocabulary search, fast indexing and search speeds, and accurate term detection. Within this scope, novel contributions are made within two research themes, that is, accommodating phone recognition errors and, secondly, modelling uncertainty with probabilistic scores. A state-of-the-art Dynamic Match Lattice Spotting (DMLS) system is used to address the problem of accommodating phone recognition errors with approximate phone sequence matching. Extensive experimentation on the use of DMLS is carried out and a number of novel enhancements are developed that provide for faster indexing, faster search, and improved accuracy. Firstly, a novel comparison of methods for deriving a phone error cost model is presented to improve STD accuracy, resulting in up to a 33% improvement in the Figure of Merit. A method is also presented for drastically increasing the speed of DMLS search by at least an order of magnitude with no loss in search accuracy. An investigation is then presented of the effects of increasing indexing speed for DMLS, by using simpler modelling during phone decoding, with results highlighting the trade-off between indexing speed, search speed and search accuracy. The Figure of Merit is further improved by up to 25% using a novel proposal to utilise word-level language modelling during DMLS indexing. Analysis shows that this use of language modelling can, however, be unhelpful or even disadvantageous for terms with a very low language model probability. The DMLS approach to STD involves generating an index of phone sequences using phone recognition. An alternative approach to phonetic STD is also investigated that instead indexes probabilistic acoustic scores in the form of a posterior-feature matrix. A state-of-the-art system is described and its use for STD is explored through several experiments on spontaneous conversational telephone speech. A novel technique and framework is proposed for discriminatively training such a system to directly maximise the Figure of Merit. This results in a 13% improvement in the Figure of Merit on held-out data. The framework is also found to be particularly useful for index compression in conjunction with the proposed optimisation technique, providing for a substantial index compression factor in addition to an overall gain in the Figure of Merit. These contributions significantly advance the state-of-the-art in phonetic STD, by improving the utility of such systems in a wide range of applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Purpose. The objective of this study was to explore the discriminative capacity of non-contact corneal esthesiometry (NCCE) when compared with the neuropathy disability score (NDS) score—a validated, standard method of diagnosing clinically significant diabetic neuropathy. Methods. Eighty-one participants with type 2 diabetes, no history of ocular disease, trauma, or surgery and no history of systemic disease that may affect the cornea were enrolled. Participants were ineligible if there was history of neuropathy due to non-diabetic cause or current diabetic foot ulcer or infection. Corneal sensitivity threshold was measured on the eye of dominant hand side at a distance of 10 mm from the center of the cornea using a stimulus duration of 0.9 s. The NDS was measured producing a score ranging from 0 to 10. To determine the optimal cutoff point of corneal sensitivity that identified the presence of neuropathy (diagnosed by NDS), the Youden index and “closest-to-(0,1)” criteria were used. Results. The receiver-operator characteristic curve for NCCE for the presence of neuropathy (NDS ≥3) had an area under the curve of 0.73 (p = 0.001) and, for the presence of moderate neuropathy (NDS ≥6), area of 0.71 (p = 0.003). By using the Youden index, for an NDS ≥3, the sensitivity of NCCE was 70% and specificity was 75%, and a corneal sensitivity threshold of 0.66 mbar or higher indicated the presence of neuropathy. When NDS ≥6 (indicating risk of foot ulceration) was applied, the sensitivity was 52% with a specificity of 85%. Conclusions. NCCE is a sensitive test for the diagnosis of minimal and more advanced diabetic neuropathy and may serve as a useful surrogate marker for diabetic and perhaps other neuropathies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: In response to the need for more comprehensive quality assessment within Australian residential aged care facilities, the Clinical Care Indicator (CCI) Tool was developed to collect outcome data as a means of making inferences about quality. A national trial of its effectiveness and a Brisbane-based trial of its use within the quality improvement context determined the CCI Tool represented a potentially valuable addition to the Australian aged care system. This document describes the next phase in the CCI Tool.s development; the aims of which were to establish validity and reliability of the CCI Tool, and to develop quality indicator thresholds (benchmarks) for use in Australia. The CCI Tool is now known as the ResCareQA (Residential Care Quality Assessment). Methods: The study aims were achieved through a combination of quantitative data analysis, and expert panel consultations using modified Delphi process. The expert panel consisted of experienced aged care clinicians, managers, and academics; they were initially consulted to determine face and content validity of the ResCareQA, and later to develop thresholds of quality. To analyse its psychometric properties, ResCareQA forms were completed for all residents (N=498) of nine aged care facilities throughout Queensland. Kappa statistics were used to assess inter-rater and test-retest reliability, and Cronbach.s alpha coefficient calculated to determine internal consistency. For concurrent validity, equivalent items on the ResCareQA and the Resident Classification Scales (RCS) were compared using Spearman.s rank order correlations, while discriminative validity was assessed using known-groups technique, comparing ResCareQA results between groups with differing care needs, as well as between male and female residents. Rank-ordered facility results for each clinical care indicator (CCI) were circulated to the panel; upper and lower thresholds for each CCI were nominated by panel members and refined through a Delphi process. These thresholds indicate excellent care at one extreme and questionable care at the other. Results: Minor modifications were made to the assessment, and it was renamed the ResCareQA. Agreement on its content was reached after two Delphi rounds; the final version contains 24 questions across four domains, enabling generation of 36 CCIs. Both test-retest and inter-rater reliability were sound with median kappa values of 0.74 (test-retest) and 0.91 (inter-rater); internal consistency was not as strong, with a Chronbach.s alpha of 0.46. Because the ResCareQA does not provide a single combined score, comparisons for concurrent validity were made with the RCS on an item by item basis, with most resultant correlations being quite low. Discriminative validity analyses, however, revealed highly significant differences in total number of CCIs between high care and low care groups (t199=10.77, p=0.000), while the differences between male and female residents were not significant (t414=0.56, p=0.58). Clinical outcomes varied both within and between facilities; agreed upper and lower thresholds were finalised after three Delphi rounds. Conclusions: The ResCareQA provides a comprehensive, easily administered means of monitoring quality in residential aged care facilities that can be reliably used on multiple occasions. The relatively modest internal consistency score was likely due to the multi-factorial nature of quality, and the absence of an aggregate result for the assessment. Measurement of concurrent validity proved difficult in the absence of a gold standard, but the sound discriminative validity results suggest that the ResCareQA has acceptable validity and could be confidently used as an indication of care quality within Australian residential aged care facilities. The thresholds, while preliminary due to small sample size, enable users to make judgements about quality within and between facilities. Thus it is recommended the ResCareQA be adopted for wider use.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Association rule mining has contributed to many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we first propose a definition for redundancy, then propose a concise representation, called a Reliable basis, for representing non-redundant association rules. The Reliable basis contains a set of non-redundant rules which are derived using frequent closed itemsets and their generators instead of using frequent itemsets that are usually used by traditional association rule mining approaches. An important contribution of this paper is that we propose to use the certainty factor as the criterion to measure the strength of the discovered association rules. Using this criterion, we can ensure the elimination of as many redundant rules as possible without reducing the inference capacity of the remaining extracted non-redundant rules. We prove that the redundancy elimination, based on the proposed Reliable basis, does not reduce the strength of belief in the extracted rules. We also prove that all association rules, their supports and confidences, can be retrieved from the Reliable basis without accessing the dataset. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules. We also conduct experiments on the application of association rules to the area of product recommendation. The experimental results show that the non-redundant association rules extracted using the proposed method retain the same inference capacity as the entire rule set. This result indicates that using non-redundant rules only is sufficient to solve real problems needless using the entire rule set.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Facial expression recognition (FER) algorithms mainly focus on classification into a small discrete set of emotions or representation of emotions using facial action units (AUs). Dimensional representation of emotions as continuous values in an arousal-valence space is relatively less investigated. It is not fully known whether fusion of geometric and texture features will result in better dimensional representation of spontaneous emotions. Moreover, the performance of many previously proposed approaches to dimensional representation has not been evaluated thoroughly on publicly available databases. To address these limitations, this paper presents an evaluation framework for dimensional representation of spontaneous facial expressions using texture and geometric features. SIFT, Gabor and LBP features are extracted around facial fiducial points and fused with FAP distance features. The CFS algorithm is adopted for discriminative texture feature selection. Experimental results evaluated on the publicly accessible NVIE database demonstrate that fusion of texture and geometry does not lead to a much better performance than using texture alone, but does result in a significant performance improvement over geometry alone. LBP features perform the best when fused with geometric features. Distributions of arousal and valence for different emotions obtained via the feature extraction process are compared with those obtained from subjective ground truth values assigned by viewers. Predicted valence is found to have a more similar distribution to ground truth than arousal in terms of covariance or Bhattacharya distance, but it shows a greater distance between the means.