791 resultados para Rule-Based Classification
Resumo:
Internet traffic classification is a relevant and mature research field, anyway of growing importance and with still open technical challenges, also due to the pervasive presence of Internet-connected devices into everyday life. We claim the need for innovative traffic classification solutions capable of being lightweight, of adopting a domain-based approach, of not only concentrating on application-level protocol categorization but also classifying Internet traffic by subject. To this purpose, this paper originally proposes a classification solution that leverages domain name information extracted from IPFIX summaries, DNS logs, and DHCP leases, with the possibility to be applied to any kind of traffic. Our proposed solution is based on an extension of Word2vec unsupervised learning techniques running on a specialized Apache Spark cluster. In particular, learning techniques are leveraged to generate word-embeddings from a mixed dataset composed by domain names and natural language corpuses in a lightweight way and with general applicability. The paper also reports lessons learnt from our implementation and deployment experience that demonstrates that our solution can process 5500 IPFIX summaries per second on an Apache Spark cluster with 1 slave instance in Amazon EC2 at a cost of $ 3860 year. Reported experimental results about Precision, Recall, F-Measure, Accuracy, and Cohen's Kappa show the feasibility and effectiveness of the proposal. The experiments prove that words contained in domain names do have a relation with the kind of traffic directed towards them, therefore using specifically trained word embeddings we are able to classify them in customizable categories. We also show that training word embeddings on larger natural language corpuses leads improvements in terms of precision up to 180%.
Resumo:
Includes index.
Resumo:
Contribution from Bureau of Entomology and Plant Quarantine.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Terrain classification based on markov random field texture modeling of SAR and SAR coherency images
Resumo:
This paper presents a neural network based technique for the classification of segments of road images into cracks and normal images. The density and histogram features are extracted. The features are passed to a neural network for the classification of images into images with and without cracks. Once images are classified into cracks and non-cracks, they are passed to another neural network for the classification of a crack type after segmentation. Some experiments were conducted and promising results were obtained. The selected results and a comparative analysis are included in this paper.
Resumo:
Task classification is introduced as a method for the evaluation of monitoring behaviour in different task situations. On the basis of an analysis of different monitoring tasks, a task classification system comprising four task 'dimensions' is proposed. The perceptual speed and flexibility of closure categories, which are identified with signal discrimination type, comprise the principal dimension in this taxonomy, the others being sense modality, the time course of events, and source complexity. It is also proposed that decision theory provides the most complete method for the analysis of performance in monitoring tasks. Several different aspects of decision theory in relation to monitoring behaviour are described. A method is also outlined whereby both accuracy and latency measures of performance may be analysed within the same decision theory framework. Eight experiments and an organizational study are reported. The results show that a distinction can be made between the perceptual efficiency (sensitivity) of a monitor and his criterial level of response, and that in most monitoring situations, there is no decrement in efficiency over the work period, but an increase in the strictness of the response criterion. The range of tasks exhibiting either or both of these performance trends can be specified within the task classification system. In particular, it is shown that a sensitivity decrement is only obtained for 'speed' tasks with a high stimulation rate. A distinctive feature of 'speed' tasks is that target detection requires the discrimination of a change in a stimulus relative to preceding stimuli, whereas in 'closure' tasks, the information required for the discrimination of targets is presented at the same point In time. In the final study, the specification of tasks yielding sensitivity decrements is shown to be consistent with a task classification analysis of the monitoring literature. It is also demonstrated that the signal type dimension has a major influence on the consistency of individual differences in performance in different tasks. The results provide an empirical validation for the 'speed' and 'closure' categories, and suggest that individual differences are not completely task specific but are dependent on the demands common to different tasks. Task classification is therefore shovn to enable improved generalizations to be made of the factors affecting 1) performance trends over time, and 2) the consistencv of performance in different tasks. A decision theory analysis of response latencies is shown to support the view that criterion shifts are obtained in some tasks, while sensitivity shifts are obtained in others. The results of a psychophysiological study also suggest that evoked potential latency measures may provide temporal correlates of criterion shifts in monitoring tasks. Among other results, the finding that the latencies of negative responses do not increase over time is taken to invalidate arousal-based theories of performance trends over a work period. An interpretation in terms of expectancy, however, provides a more reliable explanation of criterion shifts. Although the mechanisms underlying the sensitivity decrement are not completely clear, the results rule out 'unitary' theories such as observing response and coupling theory. It is suggested that an interpretation in terms of the memory data limitations on information processing provides the most parsimonious explanation of all the results in the literature relating to sensitivity decrement. Task classification therefore enables the refinement and selection of theories of monitoring behaviour in terms of their reliability in generalizing predictions to a wide range of tasks. It is thus concluded that task classification and decision theory provide a reliable basis for the assessment and analysis of monitoring behaviour in different task situations.
Resumo:
The analysis of bacterial genomes for epidemiological purposes often results in the production of a banding profile of DNA fragments characteristic of the genome under investigation. These may be produced using various methods, many of which involve the cutting or amplification of DNA into defined and reproducible characteristic fragments. It is frequently of interest to enquire whether the bacterial isolates are naturally classifiable into distinct groups based on their DNA profiles. A major problem with this approach is whether classification or clustering of the data is even appropriate. It is always possible to classify such data but it does not follow that the strains they represent are ‘actually’ classifiable into well-defined separate parts. Hence, the act of classification does not in itself answer the question: do the strains consist of a number of different distinct groups or species or do they merge imperceptibly into one another because DNA profiles vary continuously? Nevertheless, we may still wish to classify the data for ‘convenience’ even though strains may vary continuously, and such a classification has been called a ‘dissection’. This Statnote discusses the use of classificatory methods in analyzing the DNA profiles from a sample of bacterial isolates.
Resumo:
Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.
Resumo:
This paper presents an approach to development of intelligent search system and automatic document classification and cataloging tools for CASE-system based on metadata. The described method uses advantages of ontology approach and traditional approach based on keywords. The method has powerful intelligent means and it can be integrated with existing document search systems.