855 resultados para optimal feature selection


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Die zunehmende Vernetzung der Informations- und Kommunikationssysteme führt zu einer weiteren Erhöhung der Komplexität und damit auch zu einer weiteren Zunahme von Sicherheitslücken. Klassische Schutzmechanismen wie Firewall-Systeme und Anti-Malware-Lösungen bieten schon lange keinen Schutz mehr vor Eindringversuchen in IT-Infrastrukturen. Als ein sehr wirkungsvolles Instrument zum Schutz gegenüber Cyber-Attacken haben sich hierbei die Intrusion Detection Systeme (IDS) etabliert. Solche Systeme sammeln und analysieren Informationen von Netzwerkkomponenten und Rechnern, um ungewöhnliches Verhalten und Sicherheitsverletzungen automatisiert festzustellen. Während signatur-basierte Ansätze nur bereits bekannte Angriffsmuster detektieren können, sind anomalie-basierte IDS auch in der Lage, neue bisher unbekannte Angriffe (Zero-Day-Attacks) frühzeitig zu erkennen. Das Kernproblem von Intrusion Detection Systeme besteht jedoch in der optimalen Verarbeitung der gewaltigen Netzdaten und der Entwicklung eines in Echtzeit arbeitenden adaptiven Erkennungsmodells. Um diese Herausforderungen lösen zu können, stellt diese Dissertation ein Framework bereit, das aus zwei Hauptteilen besteht. Der erste Teil, OptiFilter genannt, verwendet ein dynamisches "Queuing Concept", um die zahlreich anfallenden Netzdaten weiter zu verarbeiten, baut fortlaufend Netzverbindungen auf, und exportiert strukturierte Input-Daten für das IDS. Den zweiten Teil stellt ein adaptiver Klassifikator dar, der ein Klassifikator-Modell basierend auf "Enhanced Growing Hierarchical Self Organizing Map" (EGHSOM), ein Modell für Netzwerk Normalzustand (NNB) und ein "Update Model" umfasst. In dem OptiFilter werden Tcpdump und SNMP traps benutzt, um die Netzwerkpakete und Hostereignisse fortlaufend zu aggregieren. Diese aggregierten Netzwerkpackete und Hostereignisse werden weiter analysiert und in Verbindungsvektoren umgewandelt. Zur Verbesserung der Erkennungsrate des adaptiven Klassifikators wird das künstliche neuronale Netz GHSOM intensiv untersucht und wesentlich weiterentwickelt. In dieser Dissertation werden unterschiedliche Ansätze vorgeschlagen und diskutiert. So wird eine classification-confidence margin threshold definiert, um die unbekannten bösartigen Verbindungen aufzudecken, die Stabilität der Wachstumstopologie durch neuartige Ansätze für die Initialisierung der Gewichtvektoren und durch die Stärkung der Winner Neuronen erhöht, und ein selbst-adaptives Verfahren eingeführt, um das Modell ständig aktualisieren zu können. Darüber hinaus besteht die Hauptaufgabe des NNB-Modells in der weiteren Untersuchung der erkannten unbekannten Verbindungen von der EGHSOM und der Überprüfung, ob sie normal sind. Jedoch, ändern sich die Netzverkehrsdaten wegen des Concept drif Phänomens ständig, was in Echtzeit zur Erzeugung nicht stationärer Netzdaten führt. Dieses Phänomen wird von dem Update-Modell besser kontrolliert. Das EGHSOM-Modell kann die neuen Anomalien effektiv erkennen und das NNB-Model passt die Änderungen in Netzdaten optimal an. Bei den experimentellen Untersuchungen hat das Framework erfolgversprechende Ergebnisse gezeigt. Im ersten Experiment wurde das Framework in Offline-Betriebsmodus evaluiert. Der OptiFilter wurde mit offline-, synthetischen- und realistischen Daten ausgewertet. Der adaptive Klassifikator wurde mit dem 10-Fold Cross Validation Verfahren evaluiert, um dessen Genauigkeit abzuschätzen. Im zweiten Experiment wurde das Framework auf einer 1 bis 10 GB Netzwerkstrecke installiert und im Online-Betriebsmodus in Echtzeit ausgewertet. Der OptiFilter hat erfolgreich die gewaltige Menge von Netzdaten in die strukturierten Verbindungsvektoren umgewandelt und der adaptive Klassifikator hat sie präzise klassifiziert. Die Vergleichsstudie zwischen dem entwickelten Framework und anderen bekannten IDS-Ansätzen zeigt, dass der vorgeschlagene IDSFramework alle anderen Ansätze übertrifft. Dies lässt sich auf folgende Kernpunkte zurückführen: Bearbeitung der gesammelten Netzdaten, Erreichung der besten Performanz (wie die Gesamtgenauigkeit), Detektieren unbekannter Verbindungen und Entwicklung des in Echtzeit arbeitenden Erkennungsmodells von Eindringversuchen.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In rapid scan Fourier transform spectrometry, we show that the noise in the wavelet coefficients resulting from the filter bank decomposition of the complex insertion loss function is linearly related to the noise power in the sample interferogram by a noise amplification factor. By maximizing an objective function composed of the power of the wavelet coefficients divided by the noise amplification factor, optimal feature extraction in the wavelet domain is performed. The performance of a classifier based on the output of a filter bank is shown to be considerably better than that of an Euclidean distance classifier in the original spectral domain. An optimization procedure results in a further improvement of the wavelet classifier. The procedure is suitable for enhancing the contrast or classifying spectra acquired by either continuous wave or THz transient spectrometers as well as for increasing the dynamic range of THz imaging systems. (C) 2003 Optical Society of America.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

One of the major aims of BCI research is devoted to achieving faster and more efficient control of external devices. The identification of individual tap events in a motor imagery BCI is therefore a desirable goal. EEG is recorded from subjects performing and imagining finger taps with their left and right hands. A Differential Evolution based feature selection wrapper is used in order to identify optimal features in the spatial and frequency domains for tap identification. Channel-frequency band combinations are found which allow differentiation of tap vs. no-tap control conditions for executed and imagined taps. Left vs. right hand taps may also be differentiated with features found in this manner. A sliding time window is then used to accurately identify individual taps in the executed tap and imagined tap conditions. Highly statistically significant classification accuracies are achieved with time windows of 0.5 s and more allowing taps to be identified on a single trial basis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work investigates the problem of feature selection in neuroimaging features from structural MRI brain images for the classification of subjects as healthy controls, suffering from Mild Cognitive Impairment or Alzheimer’s Disease. A Genetic Algorithm wrapper method for feature selection is adopted in conjunction with a Support Vector Machine classifier. In very large feature sets, feature selection is found to be redundant as the accuracy is often worsened when compared to an Support Vector Machine with no feature selection. However, when just the hippocampal subfields are used, feature selection shows a significant improvement of the classification accuracy. Three-class Support Vector Machines and two-class Support Vector Machines combined with weighted voting are also compared with the former and found more useful. The highest accuracy achieved at classifying the test data was 65.5% using a genetic algorithm for feature selection with a three-class Support Vector Machine classifier.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The rule creation to clone selection in different projects is a hard task to perform by using traditional implementations to control all the processes of the system. The use of an algebraic language is an alternative approach to manage all of system flow in a flexible way. In order to increase the power of versatility and consistency in defining the rules for optimal clone selection, this paper presents the software OCI 2 in which uses process algebra in the flow behavior of the system. OCI 2, controlled by an algebraic approach was applied in the rules elaboration for clone selection containing unique genes in the partial genome of the bacterium Bradyrhizobium elkanii Semia 587 and in the whole genome of the bacterium Xanthomonas axonopodis pv. citri. Copyright© (2009) by the International Society for Research in Science and Technology.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work proposes a system for classification of industrial steel pieces by means of magnetic nondestructive device. The proposed classification system presents two main stages, online system stage and off-line system stage. In online stage, the system classifies inputs and saves misclassification information in order to perform posterior analyses. In the off-line optimization stage, the topology of a Probabilistic Neural Network is optimized by a Feature Selection algorithm combined with the Probabilistic Neural Network to increase the classification rate. The proposed Feature Selection algorithm searches for the signal spectrogram by combining three basic elements: a Sequential Forward Selection algorithm, a Feature Cluster Grow algorithm with classification rate gradient analysis and a Sequential Backward Selection. Also, a trash-data recycling algorithm is proposed to obtain the optimal feedback samples selected from the misclassified ones.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Most data stream classification techniques assume that the underlying feature space is static. However, in real-world applications the set of features and their relevance to the target concept may change over time. In addition, when the underlying concepts reappear, reusing previously learnt models can enhance the learning process in terms of accuracy and processing time at the expense of manageable memory consumption. In this paper, we propose mining recurring concepts in a dynamic feature space (MReC-DFS), a data stream classification system to address the challenges of learning recurring concepts in a dynamic feature space while simultaneously reducing the memory cost associated with storing past models. MReC-DFS is able to detect and adapt to concept changes using the performance of the learning process and contextual information. To handle recurring concepts, stored models are combined in a dynamically weighted ensemble. Incremental feature selection is performed to reduce the combined feature space. This contribution allows MReC-DFS to store only the features most relevant to the learnt concepts, which in turn increases the memory efficiency of the technique. In addition, an incremental feature selection method is proposed that dynamically determines the threshold between relevant and irrelevant features. Experimental results demonstrating the high accuracy of MReC-DFS compared with state-of-the-art techniques on a variety of real datasets are presented. The results also show the superior memory efficiency of MReC-DFS.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Una de las barreras para la aplicación de las técnicas de monitorización de la integridad estructural (SHM) basadas en ondas elásticas guiadas (GLW) en aeronaves es la influencia perniciosa de las condiciones ambientales y de operación (EOC). En esta tesis se ha estudiado dicha influencia y la compensación de la misma, particularizando en variaciones del estado de carga y temperatura. La compensación de dichos efectos se fundamenta en Redes Neuronales Artificiales (ANN) empleando datos experimentales procesados con la Transformada Chirplet. Los cambios en la geometría y en las propiedades del material respecto al estado inicial de la estructura (lo daños) provocan cambios en la forma de onda de las GLW (lo que denominamos característica sensible al daño o DSF). Mediante técnicas de tratamiento de señal se puede buscar una relación entre dichas variaciones y los daños, esto se conoce como SHM. Sin embargo, las variaciones en las EOC producen también cambios en los datos adquiridos relativos a las GLW (DSF) que provocan errores en los algoritmos de diagnóstico de daño (SHM). Esto sucede porque las firmas de daño y de las EOC en la DSF son del mismo orden. Por lo tanto, es necesario cuantificar y compensar el efecto de las EOC sobre la GLW. Si bien existen diversas metodologías para compensar los efectos de las EOC como por ejemplo “Optimal Baseline Selection” (OBS) o “Baseline Signal Stretching” (BSS), estas, se emplean exclusivamente en la compensación de los efectos térmicos. El método propuesto en esta tesis mezcla análisis de datos experimentales, como en el método OBS, y modelos basados en Redes Neuronales Artificiales (ANN) que reemplazan el modelado físico requerido por el método BSS. El análisis de datos experimentales consiste en aplicar la Transformada Chirplet (CT) para extraer la firma de las EOC sobre la DSF. Con esta información, obtenida bajo diversas EOC, se entrena una ANN. A continuación, la ANN actuará como un interpolador de referencias de la estructura sin daño, generando información de referencia para cualquier EOC. La comparación de las mediciones reales de la DSF con los valores simulados por la ANN, dará como resultado la firma daño en la DSF, lo que permite el diagnóstico de daño. Este esquema se ha aplicado y verificado, en diversas EOC, para una estructura unidimensional con un único camino de daño, y para una estructura representativa de un fuselaje de una aeronave, con curvatura y múltiples elementos rigidizadores, sometida a un estado de cargas complejo, con múltiples caminos de daños. Los efectos de las EOC se han estudiado en detalle en la estructura unidimensional y se han generalizado para el fuselaje, demostrando la independencia del método respecto a la configuración de la estructura y el tipo de sensores utilizados para la adquisición de datos GLW. Por otra parte, esta metodología se puede utilizar para la compensación simultánea de una variedad medible de EOC, que afecten a la adquisición de datos de la onda elástica guiada. El principal resultado entre otros, de esta tesis, es la metodología CT-ANN para la compensación de EOC en técnicas SHM basadas en ondas elásticas guiadas para el diagnóstico de daño. ABSTRACT One of the open problems to implement Structural Health Monitoring techniques based on elastic guided waves in real aircraft structures at operation is the influence of the environmental and operational conditions (EOC) on the damage diagnosis problem. This thesis deals with the compensation of these environmental and operational effects, specifically, the temperature and the external loading, by the use of the Chirplet Transform working with Artificial Neural Networks. It is well known that the guided elastic wave form is affected by the damage appearance (what is known as the damage sensitive feature or DSF). The DSF is modified by the temperature and by the load applied to the structure. The EOC promotes variations in the acquired data (DSF) and cause mistakes in damage diagnosis algorithms. This effect promotes changes on the waveform due to the EOC variations of the same order than the damage occurrence. It is difficult to separate both effects in order to avoid damage diagnosis mistakes. Therefore it is necessary to quantify and compensate the effect of EOC over the GLW forms. There are several approaches to compensate the EOC effects such as Optimal Baseline Selection (OBS) or Baseline Signal Stretching (BSS). Usually, they are used for temperature compensation. The new method proposed here mixes experimental data analysis, as in the OBS method, and Artificial Neural Network (ANN) models to replace the physical modelling which involves the BSS method. The experimental data analysis studied is based on apply the Chirplet Transform (CT) to extract the EOC signature on the DSF. The information obtained varying EOC is employed to train an ANN. Then, the ANN will act as a baselines interpolator of the undamaged structure. The ANN generates reference information at any EOC. By comparing real measurements of the DSF against the ANN simulated values, the damage signature appears clearly in the DSF, enabling an accurate damage diagnosis. This schema has been applied in a range of EOC for a one-dimensional structure containing single damage path and two dimensional real fuselage structure with stiffener elements and multiple damage paths. The EOC effects tested in the one-dimensional structure have been generalized to the fuselage showing its independence from structural arrangement and the type of sensors used for GLW data acquisition. Moreover, it can be used for the simultaneous compensation of a variety of measurable EOC, which affects the guided wave data acquisition. The main result, among others, of this thesis is the CT-ANN methodology for the compensation of EOC in GLW based SHM technique for damage diagnosis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis seeks to describe the development of an inexpensive and efficient clustering technique for multivariate data analysis. The technique starts from a multivariate data matrix and ends with graphical representation of the data and pattern recognition discriminant function. The technique also results in distances frequency distribution that might be useful in detecting clustering in the data or for the estimation of parameters useful in the discrimination between the different populations in the data. The technique can also be used in feature selection. The technique is essentially for the discovery of data structure by revealing the component parts of the data. lhe thesis offers three distinct contributions for cluster analysis and pattern recognition techniques. The first contribution is the introduction of transformation function in the technique of nonlinear mapping. The second contribution is the us~ of distances frequency distribution instead of distances time-sequence in nonlinear mapping, The third contribution is the formulation of a new generalised and normalised error function together with its optimal step size formula for gradient method minimisation. The thesis consists of five chapters. The first chapter is the introduction. The second chapter describes multidimensional scaling as an origin of nonlinear mapping technique. The third chapter describes the first developing step in the technique of nonlinear mapping that is the introduction of "transformation function". The fourth chapter describes the second developing step of the nonlinear mapping technique. This is the use of distances frequency distribution instead of distances time-sequence. The chapter also includes the new generalised and normalised error function formulation. Finally, the fifth chapter, the conclusion, evaluates all developments and proposes a new program. for cluster analysis and pattern recognition by integrating all the new features.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The standard reference clinical score quantifying average Parkinson's disease (PD) symptom severity is the Unified Parkinson's Disease Rating Scale (UPDRS). At present, UPDRS is determined by the subjective clinical evaluation of the patient's ability to adequately cope with a range of tasks. In this study, we extend recent findings that UPDRS can be objectively assessed to clinically useful accuracy using simple, self-administered speech tests, without requiring the patient's physical presence in the clinic. We apply a wide range of known speech signal processing algorithms to a large database (approx. 6000 recordings from 42 PD patients, recruited to a six-month, multi-centre trial) and propose a number of novel, nonlinear signal processing algorithms which reveal pathological characteristics in PD more accurately than existing approaches. Robust feature selection algorithms select the optimal subset of these algorithms, which is fed into non-parametric regression and classification algorithms, mapping the signal processing algorithm outputs to UPDRS. We demonstrate rapid, accurate replication of the UPDRS assessment with clinically useful accuracy (about 2 UPDRS points difference from the clinicians' estimates, p < 0.001). This study supports the viability of frequent, remote, cost-effective, objective, accurate UPDRS telemonitoring based on self-administered speech tests. This technology could facilitate large-scale clinical trials into novel PD treatments.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we investigate the secrecy performance of an energy harvesting relay system, where a legitimate source communicates with a legitimate destination via the assistance of multiple trusted relays. In the considered system, the source and relays deploy the time-switching-based radio frequency energy harvesting technique to harvest energy from a multi-antenna beacon. Different antenna selection and relay selection schemes are applied to enhance the security of the system. Specifically, two relay selection schemes based on the partial and full knowledge of channel state information, i.e., optimal relay selection and partial relay selection, and two antenna selection schemes for harvesting energy at source and relays, i.e., maximizing energy harvesting channel for the source and maximizing energy harvesting channel for the selected relay, are proposed. The exact and asymptotic expressions of secrecy outage probability in these schemes are derived. We demonstrate that applying relay selection approaches in the considered energy harvesting system can enhance the security performance. In particular, optimal relay selection scheme outperforms partial relay selection scheme and achieves full secrecy diversity order, regardless of energy harvesting scenarios.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper deals with the problem of using the data mining models in a real-world situation where the user can not provide all the inputs with which the predictive model is built. A learning system framework, Query Based Learning System (QBLS), is developed for improving the performance of the predictive models in practice where not all inputs are available for querying to the system. The automatic feature selection algorithm called Query Based Feature Selection (QBFS) is developed for selecting features to obtain a balance between the relative minimum subset of features and the relative maximum classification accuracy. Performance of the QBLS system and the QBFS algorithm is successfully demonstrated with a real-world application

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Gabor representations have been widely used in facial analysis (face recognition, face detection and facial expression detection) due to their biological relevance and computational properties. Two popular Gabor representations used in literature are: 1) Log-Gabor and 2) Gabor energy filters. Even though these representations are somewhat similar, they also have distinct differences as the Log-Gabor filters mimic the simple cells in the visual cortex while the Gabor energy filters emulate the complex cells, which causes subtle differences in the responses. In this paper, we analyze the difference between these two Gabor representations and quantify these differences on the task of facial action unit (AU) detection. In our experiments conducted on the Cohn-Kanade dataset, we report an average area underneath the ROC curve (A`) of 92.60% across 17 AUs for the Gabor energy filters, while the Log-Gabor representation achieved an average A` of 96.11%. This result suggests that small spatial differences that the Log-Gabor filters pick up on are more useful for AU detection than the differences in contours and edges that the Gabor energy filters extract.