864 resultados para Feature selection algorithm
Resumo:
In this paper we present a system for aircraft structural health monitoring based on artificial immune systems with negative selection. Inspired by a biological process, the principle of discrimination proper/non-proper, identifies and characterizes the signs of structural failure. The main application of this method is to assist in the inspection of aircraft structures, to detect and characterize flaws and decision making in order to avoid disasters. We proposed a model of an aluminum beam to perform the tests of the method. The results obtained by this method are excellent, showing robustness and accuracy.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.
Resumo:
Die Materialverfolgung gewinnt in der Metallindustrie immer mehr an Bedeutung:rnEs ist notwendig, dass ein Metallband im Fertigungsprozess ein festgelegtes Programm durchläuft - erst dann ist die Qualität des Endprodukts garantiert. Die bisherige Praxis besteht darin, jedem Metallband eine Nummer zuzuordnen, mit der dieses Band beschriftet wird. Bei einer tagelangen Lagerung der Bänder zwischen zwei Produktionsschritten erweist sich diese Methode als fehleranfällig: Die Beschriftungen können z.B. verloren gehen, verwechselt, falsch ausgelesen oder unleserlich werden. 2007 meldete die iba AG das Patent zur Identifikation der Metallbänder anhand ihres Dickenprofils an (Anhaus [3]) - damit kann die Identität des Metallbandes zweifelsfrei nachgewiesen werden, eine zuverlässige Materialverfolgung wurde möglich.Es stellte sich jedoch heraus, dass die messfehlerbehafteten Dickenprofile, die als lange Zeitreihen aufgefasst werden können, mit Hilfe von bisherigen Verfahren (z.B. L2-Abstandsminimierung oder Dynamic Time Warping) nicht erfolgreich verglichen werden können.Diese Arbeit stellt einen effizienten feature-basierten Algorithmus zum Vergleichrnzweier Zeitreihen vor. Er ist sowohl robust gegenüber Rauschen und Messausfällen als auch invariant gegenüber solchen Koordinatentransformationen der Zeitreihen wie Skalierung und Translation. Des Weiteren sind auch Vergleiche mit Teilzeitreihen möglich. Unser Framework zeichnet sich sowohl durch seine hohe Genauigkeit als auch durch seine hohe Geschwindigkeit aus: Mehr als 99.5% der Anfragen an unsere aus realen Profilen bestehende Testdatenbank werden richtig beantwortet. Mit mehreren hundert Zeitreihen-Vergleichen pro Sekunde ist es etwa um den Faktor 10 schneller als die auf dem Gebiet der Zeitreihenanalyse etablierten Verfahren, die jedoch nicht im Stande sind, mehr als 90% der Anfragen korrekt zu verarbeiten. Der Algorithmus hat sich als industrietauglich erwiesen. Die iba AG setzt ihn in einem weltweit einzigartigen dickenprofilbasierten Überwachungssystemrnzur Materialverfolgung ein, das in ersten Stahl- und Aluminiumwalzwerkenrnbereits erfolgreich zum Einsatz kommt.
Resumo:
Aim of this paper is to evaluate the diagnostic contribution of various types of texture features in discrimination of hepatic tissue in abdominal non-enhanced Computed Tomography (CT) images. Regions of Interest (ROIs) corresponding to the classes: normal liver, cyst, hemangioma, and hepatocellular carcinoma were drawn by an experienced radiologist. For each ROI, five distinct sets of texture features are extracted using First Order Statistics (FOS), Spatial Gray Level Dependence Matrix (SGLDM), Gray Level Difference Method (GLDM), Laws' Texture Energy Measures (TEM), and Fractal Dimension Measurements (FDM). In order to evaluate the ability of the texture features to discriminate the various types of hepatic tissue, each set of texture features, or its reduced version after genetic algorithm based feature selection, was fed to a feed-forward Neural Network (NN) classifier. For each NN, the area under Receiver Operating Characteristic (ROC) curves (Az) was calculated for all one-vs-all discriminations of hepatic tissue. Additionally, the total Az for the multi-class discrimination task was estimated. The results show that features derived from FOS perform better than other texture features (total Az: 0.802+/-0.083) in the discrimination of hepatic tissue.
Resumo:
In this paper, a computer-aided diagnostic (CAD) system for the classification of hepatic lesions from computed tomography (CT) images is presented. Regions of interest (ROIs) taken from nonenhanced CT images of normal liver, hepatic cysts, hemangiomas, and hepatocellular carcinomas have been used as input to the system. The proposed system consists of two modules: the feature extraction and the classification modules. The feature extraction module calculates the average gray level and 48 texture characteristics, which are derived from the spatial gray-level co-occurrence matrices, obtained from the ROIs. The classifier module consists of three sequentially placed feed-forward neural networks (NNs). The first NN classifies into normal or pathological liver regions. The pathological liver regions are characterized by the second NN as cyst or "other disease." The third NN classifies "other disease" into hemangioma or hepatocellular carcinoma. Three feature selection techniques have been applied to each individual NN: the sequential forward selection, the sequential floating forward selection, and a genetic algorithm for feature selection. The comparative study of the above dimensionality reduction methods shows that genetic algorithms result in lower dimension feature vectors and improved classification performance.
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Most data stream classification techniques assume that the underlying feature space is static. However, in real-world applications the set of features and their relevance to the target concept may change over time. In addition, when the underlying concepts reappear, reusing previously learnt models can enhance the learning process in terms of accuracy and processing time at the expense of manageable memory consumption. In this paper, we propose mining recurring concepts in a dynamic feature space (MReC-DFS), a data stream classification system to address the challenges of learning recurring concepts in a dynamic feature space while simultaneously reducing the memory cost associated with storing past models. MReC-DFS is able to detect and adapt to concept changes using the performance of the learning process and contextual information. To handle recurring concepts, stored models are combined in a dynamically weighted ensemble. Incremental feature selection is performed to reduce the combined feature space. This contribution allows MReC-DFS to store only the features most relevant to the learnt concepts, which in turn increases the memory efficiency of the technique. In addition, an incremental feature selection method is proposed that dynamically determines the threshold between relevant and irrelevant features. Experimental results demonstrating the high accuracy of MReC-DFS compared with state-of-the-art techniques on a variety of real datasets are presented. The results also show the superior memory efficiency of MReC-DFS.
Resumo:
On-line partial discharge (PD) measurements have become a common technique for assessing the insulation condition of installed high voltage (HV) insulated cables. When on-line tests are performed in noisy environments, or when more than one source of pulse-shaped signals are present in a cable system, it is difficult to perform accurate diagnoses. In these cases, an adequate selection of the non-conventional measuring technique and the implementation of effective signal processing tools are essential for a correct evaluation of the insulation degradation. Once a specific noise rejection filter is applied, many signals can be identified as potential PD pulses, therefore, a classification tool to discriminate the PD sources involved is required. This paper proposes an efficient method for the classification of PD signals and pulse-type noise interferences measured in power cables with HFCT sensors. By using a signal feature generation algorithm, representative parameters associated to the waveform of each pulse acquired are calculated so that they can be separated in different clusters. The efficiency of the clustering technique proposed is demonstrated through an example with three different PD sources and several pulse-shaped interferences measured simultaneously in a cable system with a high frequency current transformer (HFCT).
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
The standard reference clinical score quantifying average Parkinson's disease (PD) symptom severity is the Unified Parkinson's Disease Rating Scale (UPDRS). At present, UPDRS is determined by the subjective clinical evaluation of the patient's ability to adequately cope with a range of tasks. In this study, we extend recent findings that UPDRS can be objectively assessed to clinically useful accuracy using simple, self-administered speech tests, without requiring the patient's physical presence in the clinic. We apply a wide range of known speech signal processing algorithms to a large database (approx. 6000 recordings from 42 PD patients, recruited to a six-month, multi-centre trial) and propose a number of novel, nonlinear signal processing algorithms which reveal pathological characteristics in PD more accurately than existing approaches. Robust feature selection algorithms select the optimal subset of these algorithms, which is fed into non-parametric regression and classification algorithms, mapping the signal processing algorithm outputs to UPDRS. We demonstrate rapid, accurate replication of the UPDRS assessment with clinically useful accuracy (about 2 UPDRS points difference from the clinicians' estimates, p < 0.001). This study supports the viability of frequent, remote, cost-effective, objective, accurate UPDRS telemonitoring based on self-administered speech tests. This technology could facilitate large-scale clinical trials into novel PD treatments.
Resumo:
Feature selection is important in medical field for many reasons. However, selecting important variables is a difficult task with the presence of censoring that is a unique feature in survival data analysis. This paper proposed an approach to deal with the censoring problem in endovascular aortic repair survival data through Bayesian networks. It was merged and embedded with a hybrid feature selection process that combines cox's univariate analysis with machine learning approaches such as ensemble artificial neural networks to select the most relevant predictive variables. The proposed algorithm was compared with common survival variable selection approaches such as; least absolute shrinkage and selection operator LASSO, and Akaike information criterion AIC methods. The results showed that it was capable of dealing with high censoring in the datasets. Moreover, ensemble classifiers increased the area under the roc curves of the two datasets collected from two centers located in United Kingdom separately. Furthermore, ensembles constructed with center 1 enhanced the concordance index of center 2 prediction compared to the model built with a single network. Although the size of the final reduced model using the neural networks and its ensembles is greater than other methods, the model outperformed the others in both concordance index and sensitivity for center 2 prediction. This indicates the reduced model is more powerful for cross center prediction.
Resumo:
The ability to use Software Defined Radio (SDR) in the civilian mobile applications will make it possible for the next generation of mobile devices to handle multi-standard personal wireless devices and ubiquitous wireless devices. The original military standard created many beneficial characteristics for SDR, but resulted in a number of disadvantages as well. Many challenges in commercializing SDR are still the subject of interest in the software radio research community. Four main issues that have been already addressed are performance, size, weight, and power. ^ This investigation presents an in-depth study of SDR inter-components communications in terms of total link delay related to the number of components and packet sizes in systems based on Software Communication Architecture (SCA). The study is based on the investigation of the controlled environment platform. Results suggest that the total link delay does not linearly increase with the number of components and the packet sizes. The closed form expression of the delay was modeled using a logistic function in terms of the number of components and packet sizes. The model performed well when the number of components was large. ^ Based upon the mobility applications, energy consumption has become one of the most crucial limitations. SDR will not only provide flexibility of multi-protocol support, but this desirable feature will also bring a choice of mobile protocols. Having such a variety of choices available creates a problem in the selection of the most appropriate protocol to transmit. An investigation in a real-time algorithm to optimize energy efficiency was also performed. Communication energy models were used including switching estimation to develop a waveform selection algorithm. Simulations were performed to validate the concept.^
Resumo:
Piotr Omenzetter and Simon Hoell's work within the Lloyd's Register Foundation Centre for Safety and Reliability Engineering at the University of Aberdeen is supported by Lloyd’s Register Foundation. The Foundation helps to protect life and property by supporting engineering-related education, public engagement and the application of research.
Resumo:
Piotr Omenzetter and Simon Hoell's work within the Lloyd's Register Foundation Centre for Safety and Reliability Engineering at the University of Aberdeen is supported by Lloyd’s Register Foundation. The Foundation helps to protect life and property by supporting engineering-related education, public engagement and the application of research.