855 resultados para optimal feature selection


Relevância:

80.00% 80.00%

Publicador:

Resumo:

For the first time in this paper we present results showing the effect of speaker head pose angle on automatic lip-reading performance over a wide range of closely spaced angles. We analyse the effect head pose has upon the features themselves and show that by selecting coefficients with minimum variance w.r.t. pose angle, recognition performance can be improved when train-test pose angles differ. Experiments are conducted using the initial phase of a unique multi view Audio-Visual database designed specifically for research and development of pose-invariant lip-reading systems. We firstly show that it is the higher order horizontal spatial frequency components that become most detrimental as the pose deviates. Secondly we assess the performance of different feature selection masks across a range of pose angles including a new mask based on Minimum Cross-Pose Variance coefficients. We report a relative improvement of 50% in Word Error Rate when using our selection mask over a common energy based selection during profile view lip-reading.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Blind steganalysis of JPEG images is addressed by modeling the correlations among the DCT coefficients using K -variate (K = 2) p.d.f. estimates (p.d.f.s) constructed by means of Markov random field (MRF) cliques. The reasoning of using high variate p.d.f.s together with MRF cliques for image steganalysis is explained via a classical detection problem. Although our approach has many improvements over the current state-of-the-art, it suffers from the high dimensionality and the sparseness of the high variate p.d.f.s. The dimensionality problem as well as the sparseness problem are solved heuristically by means of dimensionality reduction and feature selection algorithms. The detection accuracy of the proposed method(s) is evaluated over Memon's (30.000 images) and Goljan's (1912 images) image sets. It is shown that practically applicable steganalysis systems are possible with a suitable dimensionality reduction technique and these systems can provide, in general, improved detection accuracy over the current state-of-the-art. Experimental results also justify this assertion.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Both embodied and symbolic accounts of conceptual organization would predict partial sharing and partial differentiation between the neural activations seen for concepts activated via different stimulus modalities. But cross-participant and cross-session variability in BOLD activity patterns makes analyses of such patterns with MVPA methods challenging. Here, we examine the effect of cross-modal and individual variation on the machine learning analysis of fMRI data recorded during a word property generation task. We present the same set of living and non-living concepts (land-mammals, or work tools) to a cohort of Japanese participants in two sessions: the first using auditory presentation of spoken words; the second using visual presentation of words written in Japanese characters. Classification accuracies confirmed that these semantic categories could be detected in single trials, with within-session predictive accuracies of 80-90%. However cross-session prediction (learning from auditory-task data to classify data from the written-word-task, or vice versa) suffered from a performance penalty, achieving 65-75% (still individually significant at p « 0.05). We carried out several follow-on analyses to investigate the reason for this shortfall, concluding that distributional differences in neither time nor space alone could account for it. Rather, combined spatio-temporal patterns of activity need to be identified for successful cross-session learning, and this suggests that feature selection strategies could be modified to take advantage of this.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Classification methods with embedded feature selection capability are very appealing for the analysis of complex processes since they allow the analysis of root causes even when the number of input variables is high. In this work, we investigate the performance of three techniques for classification within a Monte Carlo strategy with the aim of root cause analysis. We consider the naive bayes classifier and the logistic regression model with two different implementations for controlling model complexity, namely, a LASSO-like implementation with a L1 norm regularization and a fully Bayesian implementation of the logistic model, the so called relevance vector machine. Several challenges can arise when estimating such models mainly linked to the characteristics of the data: a large number of input variables, high correlation among subsets of variables, the situation where the number of variables is higher than the number of available data points and the case of unbalanced datasets. Using an ecological and a semiconductor manufacturing dataset, we show advantages and drawbacks of each method, highlighting the superior performance in term of classification accuracy for the relevance vector machine with respect to the other classifiers. Moreover, we show how the combination of the proposed techniques and the Monte Carlo approach can be used to get more robust insights into the problem under analysis when faced with challenging modelling conditions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present the results of exploratory experiments using lexical valence extracted from brain using electroencephalography (EEG) for sentiment analysis. We selected 78 English words (36 for training and 42 for testing), presented as stimuli to 3 English native speakers. EEG signals were recorded from the subjects while they performed a mental imaging task for each word stimulus. Wavelet decomposition was employed to extract EEG features from the time-frequency domain. The extracted features were used as inputs to a sparse multinomial logistic regression (SMLR) classifier for valence classification, after univariate ANOVA feature selection. After mapping EEG signals to sentiment valences, we exploited the lexical polarity extracted from brain data for the prediction of the valence of 12 sentences taken from the SemEval-2007 shared task, and compared it against existing lexical resources.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Influence diagrams are intuitive and concise representations of structured decision problems. When the problem is non-Markovian, an optimal strategy can be exponentially large in the size of the diagram. We can avoid the inherent intractability by constraining the size of admissible strategies, giving rise to limited memory influence diagrams. A valuable question is then how small do strategies need to be to enable efficient optimal planning. Arguably, the smallest strategies one can conceive simply prescribe an action for each time step, without considering past decisions or observations. Previous work has shown that finding such optimal strategies even for polytree-shaped diagrams with ternary variables and a single value node is NP-hard, but the case of binary variables was left open. In this paper we address such a case, by first noting that optimal strategies can be obtained in polynomial time for polytree-shaped diagrams with binary variables and a single value node. We then show that the same problem is NP-hard if the diagram has multiple value nodes. These two results close the fixed-parameter complexity analysis of optimal strategy selection in influence diagrams parametrized by the shape of the diagram, the number of value nodes and the maximum variable cardinality.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Urothelial cancer (UC) is highly recurrent and can progress from non-invasive (NMIUC) to a more aggressive muscle-invasive (MIUC) subtype that invades the muscle tissue layer of the bladder. We present a proof of principle study that network-based features of gene pairs can be used to improve classifier performance and the functional analysis of urothelial cancer gene expression data. In the first step of our procedure each individual sample of a UC gene expression dataset is inflated by gene pair expression ratios that are defined based on a given network structure. In the second step an elastic net feature selection procedure for network-based signatures is applied to discriminate between NMIUC and MIUC samples. We performed a repeated random subsampling cross validation in three independent datasets. The network signatures were characterized by a functional enrichment analysis and studied for the enrichment of known cancer genes. We observed that the network-based gene signatures from meta collections of proteinprotein interaction (PPI) databases such as CPDB and the PPI databases HPRD and BioGrid improved the classification performance compared to single gene based signatures. The network based signatures that were derived from PPI databases showed a prominent enrichment of cancer genes (e.g., TP53, TRIM27 and HNRNPA2Bl). We provide a novel integrative approach for large-scale gene expression analysis for the identification and development of novel diagnostical targets in bladder cancer. Further, our method allowed to link cancer gene associations to network-based expression signatures that are not observed in gene-based expression signatures.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2013

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2014

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The objective of this study was to develop, test and benchmark a framework and a predictive risk model for hospital emergency readmission within 12 months. We performed the development using routinely collected Hospital Episode Statistics data covering inpatient hospital admissions in England. Three different timeframes were used for training, testing and benchmarking: 1999 to 2004, 2000 to 2005 and 2004 to 2009 financial years. Each timeframe includes 20% of all inpatients admitted within the trigger year. The comparisons were made using positive predictive value, sensitivity and specificity for different risk cut-offs, risk bands and top risk segments, together with the receiver operating characteristic curve. The constructed Bayes Point Machine using this feature selection framework produces a risk probability for each admitted patient, and it was validated for different timeframes, sub-populations and cut-off points. At risk cut-off of 50%, the positive predictive value was 69.3% to 73.7%, the specificity was 88.0% to 88.9% and sensitivity was 44.5% to 46.3% across different timeframes. Also, the area under the receiver operating characteristic curve was 73.0% to 74.3%. The developed framework and model performed considerably better than existing modelling approaches with high precision and moderate sensitivity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação para obtenção do grau de Mestre em Engenharia Informática

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Difficult tracheal intubation assessment is an important research topic in anesthesia as failed intubations are important causes of mortality in anesthetic practice. The modified Mallampati score is widely used, alone or in conjunction with other criteria, to predict the difficulty of intubation. This work presents an automatic method to assess the modified Mallampati score from an image of a patient with the mouth wide open. For this purpose we propose an active appearance models (AAM) based method and use linear support vector machines (SVM) to select a subset of relevant features obtained using the AAM. This feature selection step proves to be essential as it improves drastically the performance of classification, which is obtained using SVM with RBF kernel and majority voting. We test our method on images of 100 patients undergoing elective surgery and achieve 97.9% accuracy in the leave-one-out crossvalidation test and provide a key element to an automatic difficult intubation assessment system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Feature selection plays an important role in knowledge discovery and data mining nowadays. In traditional rough set theory, feature selection using reduct - the minimal discerning set of attributes - is an important area. Nevertheless, the original definition of a reduct is restrictive, so in one of the previous research it was proposed to take into account not only the horizontal reduction of information by feature selection, but also a vertical reduction considering suitable subsets of the original set of objects. Following the work mentioned above, a new approach to generate bireducts using a multi--objective genetic algorithm was proposed. Although the genetic algorithms were used to calculate reduct in some previous works, we did not find any work where genetic algorithms were adopted to calculate bireducts. Compared to the works done before in this area, the proposed method has less randomness in generating bireducts. The genetic algorithm system estimated a quality of each bireduct by values of two objective functions as evolution progresses, so consequently a set of bireducts with optimized values of these objectives was obtained. Different fitness evaluation methods and genetic operators, such as crossover and mutation, were applied and the prediction accuracies were compared. Five datasets were used to test the proposed method and two datasets were used to perform a comparison study. Statistical analysis using the one-way ANOVA test was performed to determine the significant difference between the results. The experiment showed that the proposed method was able to reduce the number of bireducts necessary in order to receive a good prediction accuracy. Also, the influence of different genetic operators and fitness evaluation strategies on the prediction accuracy was analyzed. It was shown that the prediction accuracies of the proposed method are comparable with the best results in machine learning literature, and some of them outperformed it.