874 resultados para correlation-based feature selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main objective of the project is to enhance the already effective health-monitoring system (HUMS) for helicopters by analysing structural vibrations to recognise different flight conditions directly from sensor information. The goal of this paper is to develop a new method to select those sensors and frequency bands that are best for detecting changes in flight conditions. We projected frequency information to a 2-dimensional space in order to visualise flight-condition transitions using the Generative Topographic Mapping (GTM) and a variant which supports simultaneous feature selection. We created an objective measure of the separation between different flight conditions in the visualisation space by calculating the Kullback-Leibler (KL) divergence between Gaussian mixture models (GMMs) fitted to each class: the higher the KL-divergence, the better the interclass separation. To find the optimal combination of sensors, they were considered in pairs, triples and groups of four sensors. The sensor triples provided the best result in terms of KL-divergence. We also found that the use of a variational training algorithm for the GMMs gave more reliable results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of “the curse of dimensionality”. Three different eigenvector-based feature extraction approaches are discussed and three different kinds of applications with respect to classification tasks are considered. The summary of obtained results concerning the accuracy of classification schemes is presented with the conclusion about the search for the most appropriate feature extraction method. The problem how to discover knowledge needed to integrate the feature extraction and classification processes is stated. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the decision support system and its basic structure are defined. The means of knowledge acquisition needed to build up the proposed system are considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis studies survival analysis techniques dealing with censoring to produce predictive tools that predict the risk of endovascular aortic aneurysm repair (EVAR) re-intervention. Censoring indicates that some patients do not continue follow up, so their outcome class is unknown. Methods dealing with censoring have drawbacks and cannot handle the high censoring of the two EVAR datasets collected. Therefore, this thesis presents a new solution to high censoring by modifying an approach that was incapable of differentiating between risks groups of aortic complications. Feature selection (FS) becomes complicated with censoring. Most survival FS methods depends on Cox's model, however machine learning classifiers (MLC) are preferred. Few methods adopted MLC to perform survival FS, but they cannot be used with high censoring. This thesis proposes two FS methods which use MLC to evaluate features. The two FS methods use the new solution to deal with censoring. They combine factor analysis with greedy stepwise FS search which allows eliminated features to enter the FS process. The first FS method searches for the best neural networks' configuration and subset of features. The second approach combines support vector machines, neural networks, and K nearest neighbor classifiers using simple and weighted majority voting to construct a multiple classifier system (MCS) for improving the performance of individual classifiers. It presents a new hybrid FS process by using MCS as a wrapper method and merging it with the iterated feature ranking filter method to further reduce the features. The proposed techniques outperformed FS methods based on Cox's model such as; Akaike and Bayesian information criteria, and least absolute shrinkage and selector operator in the log-rank test's p-values, sensitivity, and concordance. This proves that the proposed techniques are more powerful in correctly predicting the risk of re-intervention. Consequently, they enable doctors to set patients’ appropriate future observation plan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

lmage super-resolution is defined as a class of techniques that enhance the spatial resolution of images. Super-resolution methods can be subdivided in single and multi image methods. This thesis focuses on developing algorithms based on mathematical theories for single image super­ resolution problems. lndeed, in arder to estimate an output image, we adopta mixed approach: i.e., we use both a dictionary of patches with sparsity constraints (typical of learning-based methods) and regularization terms (typical of reconstruction-based methods). Although the existing methods already per- form well, they do not take into account the geometry of the data to: regularize the solution, cluster data samples (samples are often clustered using algorithms with the Euclidean distance as a dissimilarity metric), learn dictionaries (they are often learned using PCA or K-SVD). Thus, state-of-the-art methods still suffer from shortcomings. In this work, we proposed three new methods to overcome these deficiencies. First, we developed SE-ASDS (a structure tensor based regularization term) in arder to improve the sharpness of edges. SE-ASDS achieves much better results than many state-of-the- art algorithms. Then, we proposed AGNN and GOC algorithms for determining a local subset of training samples from which a good local model can be computed for recon- structing a given input test sample, where we take into account the underlying geometry of the data. AGNN and GOC methods outperform spectral clustering, soft clustering, and geodesic distance based subset selection in most settings. Next, we proposed aSOB strategy which takes into account the geometry of the data and the dictionary size. The aSOB strategy outperforms both PCA and PGA methods. Finally, we combine all our methods in a unique algorithm, named G2SR. Our proposed G2SR algorithm shows better visual and quantitative results when compared to the results of state-of-the-art methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Piotr Omenzetter and Simon Hoell's work within the Lloyd's Register Foundation Centre for Safety and Reliability Engineering at the University of Aberdeen is supported by Lloyd’s Register Foundation. The Foundation helps to protect life and property by supporting engineering-related education, public engagement and the application of research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Piotr Omenzetter and Simon Hoell's work within the Lloyd's Register Foundation Centre for Safety and Reliability Engineering at the University of Aberdeen is supported by Lloyd’s Register Foundation. The Foundation helps to protect life and property by supporting engineering-related education, public engagement and the application of research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hypertrophic cardiomyopathy (HCM) is a cardiovascular disease where the heart muscle is partially thickened and blood flow is - potentially fatally - obstructed. It is one of the leading causes of sudden cardiac death in young people. Electrocardiography (ECG) and Echocardiography (Echo) are the standard tests for identifying HCM and other cardiac abnormalities. The American Heart Association has recommended using a pre-participation questionnaire for young athletes instead of ECG or Echo tests due to considerations of cost and time involved in interpreting the results of these tests by an expert cardiologist. Initially we set out to develop a classifier for automated prediction of young athletes’ heart conditions based on the answers to the questionnaire. Classification results and further in-depth analysis using computational and statistical methods indicated significant shortcomings of the questionnaire in predicting cardiac abnormalities. Automated methods for analyzing ECG signals can help reduce cost and save time in the pre-participation screening process by detecting HCM and other cardiac abnormalities. Therefore, the main goal of this dissertation work is to identify HCM through computational analysis of 12-lead ECG. ECG signals recorded on one or two leads have been analyzed in the past for classifying individual heartbeats into different types of arrhythmia as annotated primarily in the MIT-BIH database. In contrast, we classify complete sequences of 12-lead ECGs to assign patients into two groups: HCM vs. non-HCM. The challenges and issues we address include missing ECG waves in one or more leads and the dimensionality of a large feature-set. We address these by proposing imputation and feature-selection methods. We develop heartbeat-classifiers by employing Random Forests and Support Vector Machines, and propose a method to classify full 12-lead ECGs based on the proportion of heartbeats classified as HCM. The results from our experiments show that the classifiers developed using our methods perform well in identifying HCM. Thus the two contributions of this thesis are the utilization of computational and statistical methods for discovering shortcomings in a current screening procedure and the development of methods to identify HCM through computational analysis of 12-lead ECG signals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results: We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion: ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interest rate sensitivity assessment framework based on fixed income yield indexes is developed and applied to two types of emerging market corporate debt: investment grade and high yield exposures. Our research advances beyond the correlation analyses focused on co- movements in yields and/or spreads of risky and risk-free assets. We show that correlation- based analyses of interest rate sensitivity could appear rather inconclusive and, hence, we investigate the bottom line profit and loss of a hypothetical model portfolio of corporates. We consider historical data covering the period 2002 – 2015, which enable us to assess interest rate sensitivity of assets during the development, the apogee, and the aftermath of the global financial crisis. Based on empirical evidence, both for investment and speculative grades securities, we find that the emerging market corporates exhibit two different regimes of sensitivity to interest rate changes. We observe switching from a positive sensitivity under the normal market conditions to a negative one during distressed phases of business cycles. This research sheds light on how financial institutions may approach interest rate risk management, evidencing that even plain vanilla portfolios of emerging market corporates, which on average could appear rather insensitive to the interest rate risk in fact present a binary behavior of their interest rate sensitivities. Our findings allow banks and financial institutions for optimizing economic capital under Basel III regulatory capital rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a methodology for short-term load forecasting based on genetic algorithm feature selection and artificial neural network modeling. A feed forward artificial neural network is used to model the 24-h ahead load based on past consumption, weather and stock index data. A genetic algorithm is used in order to find the best subset of variables for modeling. Three data sets of different geographical locations, encompassing areas of different dimensions with distinct load profiles are used in order to evaluate the methodology. The developed approach was found to generate models achieving a minimum mean average percentage error under 2 %. The feature selection algorithm was able to significantly reduce the number of used features and increase the accuracy of the models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Forage peanut improvement for use in grass?legume mixtures is expected to have a great impact on the sustainability of Brazilian livestock production. Eighteen cloned Arachis spp. ecotypes were evaluated under clipping in a Brazilian Cerrado region and results analysed using a mixed model methodology. The objective was to estimate genetic and phenotypic parameters and to select the best ecotypes based on selection index applied on their predicted genotypic value. The traits of total dry-matter (DM) and leaf DM yield presented moderate (0_30 < h2g < 0_50) to high (>0_50) broad-sense heritability, in contrast to the low genetic variability in nutritional quality-associated traits. Ecotypes of Arachis spp. contained average crude protein concentrations of 224 g kg _1 DM in leaves and 138 g kg _1 DM in stems, supporting the potential role of these species to overcome the low protein content in Cerrado pastures. The correlations between yield traits and traits associated with low nutritional value in leaves were consistently significant and positive. Genetic correlations among all the yield traits evaluated during the rainy or dry seasons were significant and positive. The ecotypes were ranked based on selection index. The next step is to validate long-term selection of grass?Arachis in combination with pastures under competition and adjusted grazing in the Cerrado region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Breast weight has great economic importance in poultry industry, and may be associated with other variables. This work aimed to estimate phenotypic correlations between performance (live body weight at 7 and 28 days, and at slaughter, and depth of the breast muscle measured by ultrasonography), carcass (eviscerated body weight and leg weight) and body composition (heart, liver and abdominal fat weight) traits in a broiler line, and quantify the direct and indirect influence of these traits on breast weight. Path analysis was used by expanding the matrix of partial correlation in coefficients which give the direct influence of one trait on another, regardless the effect of the other traits. The simultaneous maintenance of live body weight at slaughter and eviscerated body weight in the matrix of correlations might be harmful for statistical analysis involving systems of normal equations, like path analysis, due to the observed multicollinearity. The live body weight at slaughter and the depth of the breast muscle as measured by ultrasonography directly affected breast weight and were identified as the most responsible factors for the magnitude of the correlation coefficients obtained between the studied traits and breast weight. Individual pre-selection for these traits could favor an increased breast weight in the future reproducer candidates of this line if the broilers' environmental conditions and housing are maintained, since the live body weight at slaughter and the depth of breast muscle measured by ultrasonography were directly related to breast weight.