6 resultados para data redundancy

em Deakin Research Online - Australia


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.
Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.
Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The success of a Wireless Sensor Network (WSN) deployment strongly depends on the quality of service (QoS) it provides regarding issues such as data accuracy, data aggregation delays and network lifetime maximisation. This is especially challenging in data fusion mechanisms, where a small fraction of low quality data in the fusion input may negatively impact the overall fusion result. In this paper, we present a fuzzy-based data fusion approach for WSN with the aim of increasing the QoS whilst reducing the energy consumption of the sensor network. The proposed approach is able to distinguish and aggregate only true values of the collected data as such, thus reducing the burden of processing the entire data at the base station (BS). It is also able to eliminate redundant data and consequently reduce energy consumption thus increasing the network lifetime. We studied the effectiveness of the proposed data fusion approach experimentally and compared it with two baseline approaches in terms of data collection, number of transferred data packets and energy consumption. The results of the experiments show that the proposed approach achieves better results than the baseline approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, much attention has been given to the mass spectrometry (MS) technology based disease classification, diagnosis, and protein-based biomarker identification. Similar to microarray based investigation, proteomic data generated by such kind of high-throughput experiments are often with high feature-to-sample ratio. Moreover, biological information and pattern are compounded with data noise, redundancy and outliers. Thus, the development of algorithms and procedures for the analysis and interpretation of such kind of data is of paramount importance. In this paper, we propose a hybrid system for analyzing such high dimensional data. The proposed method uses the k-mean clustering algorithm based feature extraction and selection procedure to bridge the filter selection and wrapper selection methods. The potential informative mass/charge (m/z) markers selected by filters are subject to the k-mean clustering algorithm for correlation and redundancy reduction, and a multi-objective Genetic Algorithm selector is then employed to identify discriminative m/z markers generated by k-mean clustering algorithm. Experimental results obtained by using the proposed method indicate that it is suitable for m/z biomarker selection and MS based sample classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

People with special medical monitoring needs can, these days, be sent home and remotely monitored through the use of data logging medical sensors and a transmission base-station. While this can improve quality of life by allowing the patient to spend most of their time at home, most current technologies rely on hardwired landline technology or expensive mobile data transmissions to transmit data to a medical facility. The aim of this paper is to investigate and develop an approach to increase the freedom of a monitored patient and decrease costs by utilising mobile technologies and SMS messaging to transmit data from patient to medico. To this end, we evaluated the capabilities of SMS and propose a generic communications protocol which can work within the constraints of the SMS format, but provide the necessary redundancy and robustness to be used for the transmission of non-critical medical telemetry from data logging medical sensors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a novel hierarchical data fusion technique for the non-destructive testing (NDT) and condition assessment of timber utility poles. The new method analyzes stress wave data from multisensor and multiexcitation guided wave testing using a hierarchical data fusion model consisting of feature extraction, data compression, pattern recognition, and decision fusion algorithms. The researchers validate the proposed technique using guided wave tests of a sample of in situ timber poles. The actual health states of these poles are known from autopsies conducted after the testing, forming a ground-truth for supervised classification. In the proposed method, a data fusion level extracts the main features from the sampled stress wave signals using power spectrum density (PSD) estimation, wavelet packet transform (WPT), and empirical mode decomposition (EMD). These features are then compiled to a feature vector via real-number encoding and sent to the next level for further processing. Principal component analysis (PCA) is also adopted for feature compression and to minimize information redundancy and noise interference. In the feature fusion level, two classifiers based on support vector machine (SVM) are applied to sensor separated data of the two excitation types and the pole condition is identified. In the decision making fusion level, the Dempster–Shafer (D-S) evidence theory is employed to integrate the results from the individual sensors obtaining a final decision. The results of the in situ timber pole testing show that the proposed hierarchical data fusion model was able to distinguish between healthy and faulty poles, demonstrating the effectiveness of the new method.