864 resultados para Feature selection algorithm


Relevância:

90.00% 90.00%

Publicador:

Resumo:

As an immune-inspired algorithm, the Dendritic Cell Algorithm (DCA), produces promising performance in the field of anomaly detection. This paper presents the application of the DCA to a standard data set, the KDD 99 data set. The results of different implementation versions of the DCA, including antigen multiplier and moving time windows, are reported. The real-valued Negative Selection Algorithm (NSA) using constant-sized detectors and the C4.5 decision tree algorithm are used, to conduct a baseline comparison. The results suggest that the DCA is applicable to KDD 99 data set, and the antigen multiplier and moving time windows have the same effect on the DCA for this particular data set. The real-valued NSA with contant-sized detectors is not applicable to the data set. And the C4.5 decision tree algorithm provides a benchmark of the classification performance for this data set.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The ability to use Software Defined Radio (SDR) in the civilian mobile applications will make it possible for the next generation of mobile devices to handle multi-standard personal wireless devices and ubiquitous wireless devices. The original military standard created many beneficial characteristics for SDR, but resulted in a number of disadvantages as well. Many challenges in commercializing SDR are still the subject of interest in the software radio research community. Four main issues that have been already addressed are performance, size, weight, and power. This investigation presents an in-depth study of SDR inter-components communications in terms of total link delay related to the number of components and packet sizes in systems based on Software Communication Architecture (SCA). The study is based on the investigation of the controlled environment platform. Results suggest that the total link delay does not linearly increase with the number of components and the packet sizes. The closed form expression of the delay was modeled using a logistic function in terms of the number of components and packet sizes. The model performed well when the number of components was large. Based upon the mobility applications, energy consumption has become one of the most crucial limitations. SDR will not only provide flexibility of multi-protocol support, but this desirable feature will also bring a choice of mobile protocols. Having such a variety of choices available creates a problem in the selection of the most appropriate protocol to transmit. An investigation in a real-time algorithm to optimize energy efficiency was also performed. Communication energy models were used including switching estimation to develop a waveform selection algorithm. Simulations were performed to validate the concept.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The continuous growth of peer-to-peer networks has made them responsible for a considerable portion of the current Internet traffic. For this reason, improvements in P2P network resources usage are of central importance. One effective approach for addressing this issue is the deployment of locality algorithms, which allow the system to optimize the peers` selection policy for different network situations and, thus, maximize performance. To date, several locality algorithms have been proposed for use in P2P networks. However, they usually adopt heterogeneous criteria for measuring the proximity between peers, which hinders a coherent comparison between the different solutions. In this paper, we develop a thoroughly review of popular locality algorithms, based on three main characteristics: the adopted network architecture, distance metric, and resulting peer selection algorithm. As result of this study, we propose a novel and generic taxonomy for locality algorithms in peer-to-peer networks, aiming to enable a better and more coherent evaluation of any individual locality algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this work, we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) systems and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to improve the precision of the similarity queries. We refer to the association rule-based method to improve CBIR systems proposed here as Feature selection through Association Rules (FAR). To improve CAD systems, we propose the Image Diagnosis Enhancement through Association rules (IDEA) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can either accelerate the process of diagnosing or to strengthen a hypothesis, increasing the probability of a prescribed treatment be successful. Two new algorithms are proposed to support the IDEA method: to pre-process low-level features and to propose a preliminary diagnosis based on association rules. We performed several experiments to validate the proposed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A organização automática de mensagens de correio electrónico é um desafio actual na área da aprendizagem automática. O número excessivo de mensagens afecta cada vez mais utilizadores, especialmente os que usam o correio electrónico como ferramenta de comunicação e trabalho. Esta tese aborda o problema da organização automática de mensagens de correio electrónico propondo uma solução que tem como objectivo a etiquetagem automática de mensagens. A etiquetagem automática é feita com recurso às pastas de correio electrónico anteriormente criadas pelos utilizadores, tratando-as como etiquetas, e à sugestão de múltiplas etiquetas para cada mensagem (top-N). São estudadas várias técnicas de aprendizagem e os vários campos que compõe uma mensagem de correio electrónico são analisados de forma a determinar a sua adequação como elementos de classificação. O foco deste trabalho recai sobre os campos textuais (o assunto e o corpo das mensagens), estudando-se diferentes formas de representação, selecção de características e algoritmos de classificação. É ainda efectuada a avaliação dos campos de participantes através de algoritmos de classificação que os representam usando o modelo vectorial ou como um grafo. Os vários campos são combinados para classificação utilizando a técnica de combinação de classificadores Votação por Maioria. Os testes são efectuados com um subconjunto de mensagens de correio electrónico da Enron e um conjunto de dados privados disponibilizados pelo Institute for Systems and Technologies of Information, Control and Communication (INSTICC). Estes conjuntos são analisados de forma a perceber as características dos dados. A avaliação do sistema é realizada através da percentagem de acerto dos classificadores. Os resultados obtidos apresentam melhorias significativas em comparação com os trabalhos relacionados.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação para obtenção do grau de Mestre em Engenharia Informática

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Trabalho de Projeto para obtenção do grau de Mestre em Engenharia de Eletrónica e Telecomunicações

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Locomotor tasks characterization plays an important role in trying to improve the quality of life of a growing elderly population. This paper focuses on this matter by trying to characterize the locomotion of two population groups with different functional fitness levels (high or low) while executing three different tasks-gait, stair ascent and stair descent. Features were extracted from gait data, and feature selection methods were used in order to get the set of features that allow differentiation between functional fitness level. Unsupervised learning was used to validate the sets obtained and, ultimately, indicated that it is possible to distinguish the two population groups. The sets of best discriminate features for each task are identified and thoroughly analysed. Copyright © 2014 SCITEPRESS - Science and Technology Publications. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the field of appearance-based robot localization, the mainstream approach uses a quantized representation of local image features. An alternative strategy is the exploitation of raw feature descriptors, thus avoiding approximations due to quantization. In this work, the quantized and non-quantized representations are compared with respect to their discriminativity, in the context of the robot global localization problem. Having demonstrated the advantages of the non-quantized representation, the paper proposes mechanisms to reduce the computational burden this approach would carry, when applied in its simplest form. This reduction is achieved through a hierarchical strategy which gradually discards candidate locations and by exploring two simplifying assumptions about the training data. The potential of the non-quantized representation is exploited by resorting to the entropy-discriminativity relation. The idea behind this approach is that the non-quantized representation facilitates the assessment of the distinctiveness of features, through the entropy measure. Building on this finding, the robustness of the localization system is enhanced by modulating the importance of features according to the entropy measure. Experimental results support the effectiveness of this approach, as well as the validity of the proposed computation reduction methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertation to Obtain Master Degree in Biomedical Engineering

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Botnets are a group of computers infected with a specific sub-set of a malware family and controlled by one individual, called botmaster. This kind of networks are used not only, but also for virtual extorsion, spam campaigns and identity theft. They implement different types of evasion techniques that make it harder for one to group and detect botnet traffic. This thesis introduces one methodology, called CONDENSER, that outputs clusters through a self-organizing map and that identify domain names generated by an unknown pseudo-random seed that is known by the botnet herder(s). Aditionally DNS Crawler is proposed, this system saves historic DNS data for fast-flux and double fastflux detection, and is used to identify live C&Cs IPs used by real botnets. A program, called CHEWER, was developed to automate the calculation of the SVM parameters and features that better perform against the available domain names associated with DGAs. CONDENSER and DNS Crawler were developed with scalability in mind so the detection of fast-flux and double fast-flux networks become faster. We used a SVM for the DGA classififer, selecting a total of 11 attributes and achieving a Precision of 77,9% and a F-Measure of 83,2%. The feature selection method identified the 3 most significant attributes of the total set of attributes. For clustering, a Self-Organizing Map was used on a total of 81 attributes. The conclusions of this thesis were accepted in Botconf through a submited article. Botconf is known conferênce for research, mitigation and discovery of botnets tailled for the industry, where is presented current work and research. This conference is known for having security and anti-virus companies, law enforcement agencies and researchers.