163 resultados para User classification
em CentAUR: Central Archive University of Reading - UK
Resumo:
An extensive set of machine learning and pattern classification techniques trained and tested on KDD dataset failed in detecting most of the user-to-root attacks. This paper aims to provide an approach for mitigating negative aspects of the mentioned dataset, which led to low detection rates. Genetic algorithm is employed to implement rules for detecting various types of attacks. Rules are formed of the features of the dataset identified as the most important ones for each attack type. In this way we introduce high level of generality and thus achieve high detection rates, but also gain high reduction of the system training time. Thenceforth we re-check the decision of the user-to- root rules with the rules that detect other types of attacks. In this way we decrease the false-positive rate. The model was verified on KDD 99, demonstrating higher detection rates than those reported by the state- of-the-art while maintaining low false-positive rate.
Resumo:
We extend extreme learning machine (ELM) classifiers to complex Reproducing Kernel Hilbert Spaces (RKHS) where the input/output variables as well as the optimization variables are complex-valued. A new family of classifiers, called complex-valued ELM (CELM) suitable for complex-valued multiple-input–multiple-output processing is introduced. In the proposed method, the associated Lagrangian is computed using induced RKHS kernels, adopting a Wirtinger calculus approach formulated as a constrained optimization problem similarly to the conventional ELM classifier formulation. When training the CELM, the Karush–Khun–Tuker (KKT) theorem is used to solve the dual optimization problem that consists of satisfying simultaneously smallest training error as well as smallest norm of output weights criteria. The proposed formulation also addresses aspects of quaternary classification within a Clifford algebra context. For 2D complex-valued inputs, user-defined complex-coupled hyper-planes divide the classifier input space into four partitions. For 3D complex-valued inputs, the formulation generates three pairs of complex-coupled hyper-planes through orthogonal projections. The six hyper-planes then divide the 3D space into eight partitions. It is shown that the CELM problem formulation is equivalent to solving six real-valued ELM tasks, which are induced by projecting the chosen complex kernel across the different user-defined coordinate planes. A classification example of powdered samples on the basis of their terahertz spectral signatures is used to demonstrate the advantages of the CELM classifiers compared to their SVM counterparts. The proposed classifiers retain the advantages of their ELM counterparts, in that they can perform multiclass classification with lower computational complexity than SVM classifiers. Furthermore, because of their ability to perform classification tasks fast, the proposed formulations are of interest to real-time applications.
Resumo:
This paper reports the current state of work to simplify our previous model-based methods for visual tracking of vehicles for use in a real-time system intended to provide continuous monitoring and classification of traffic from a fixed camera on a busy multi-lane motorway. The main constraints of the system design were: (i) all low level processing to be carried out by low-cost auxiliary hardware, (ii) all 3-D reasoning to be carried out automatically off-line, at set-up time. The system developed uses three main stages: (i) pose and model hypothesis using 1-D templates, (ii) hypothesis tracking, and (iii) hypothesis verification, using 2-D templates. Stages (i) & (iii) have radically different computing performance and computational costs, and need to be carefully balanced for efficiency. Together, they provide an effective way to locate, track and classify vehicles.
Resumo:
Site-specific management requires accurate knowledge of the spatial variation in a range of soil properties within fields. This involves considerable sampling effort, which is costly. Ancillary data, such as crop yield, elevation and apparent electrical conductivity (ECa) of the soil, can provide insight into the spatial variation of some soil properties. A multivariate classification with spatial constraint imposed by the variogram was used to classify data from two arable crop fields. The yield data comprised 5 years of crop yield, and the ancillary data 3 years of yield data, elevation and ECa. Information on soil chemical and physical properties was provided by intensive surveys of the soil. Multivariate variograms computed from these data were used to constrain sites spatially within classes to increase their contiguity. The constrained classifications resulted in coherent classes, and those based on the ancillary data were similar to those from the soil properties. The ancillary data seemed to identify areas in the field where the soil is reasonably homogeneous. The results of targeted sampling showed that these classes could be used as a basis for management and to guide future sampling of the soil.
Resumo:
Bloom-forming and toxin-producing cyanobacteria remain a persistent nuisance across the world. Modelling of cyanobacteria in freshwaters is an important tool for understanding their population dynamics and predicting bloom occurrence in lakes and rivers. In this paper existing key models of cyanobacteria are reviewed, evaluated and classified. Two major groups emerge: deterministic mathematical and artificial neural network models. Mathematical models can be further subcategorized into those models concerned with impounded water bodies and those concerned with rivers. Most existing models focus on a single aspect such as the growth of transport mechanisms, but there are a few models which couple both.
Resumo:
The aim of the study was to establish and verify a predictive vegetation model for plant community distribution in the alti-Mediterranean zone of the Lefka Ori massif, western Crete. Based on previous work three variables were identified as significant determinants of plant community distribution, namely altitude, slope angle and geomorphic landform. The response of four community types against these variables was tested using classification trees analysis in order to model community type occurrence. V-fold cross-validation plots were used to determine the length of the best fitting tree. The final 9node tree selected, classified correctly 92.5% of the samples. The results were used to provide decision rules for the construction of a spatial model for each community type. The model was implemented within a Geographical Information System (GIS) to predict the distribution of each community type in the study site. The evaluation of the model in the field using an error matrix gave an overall accuracy of 71%. The user's accuracy was higher for the Crepis-Cirsium (100%) and Telephium-Herniaria community type (66.7%) and relatively lower for the Peucedanum-Alyssum and Dianthus-Lomelosia community types (63.2% and 62.5%, respectively). Misclassification and field validation points to the need for improved geomorphological mapping and suggests the presence of transitional communities between existing community types.
Resumo:
This paper describes the user modeling component of EPIAIM, a consultation system for data analysis in epidemiology. The component is aimed at representing knowledge of concepts in the domain, so that their explanations can be adapted to user needs. The first part of the paper describes two studies aimed at analysing user requirements. The first one is a questionnaire study which examines the respondents' familiarity with concepts. The second one is an analysis of concept descriptions in textbooks and from expert epidemiologists, which examines how discourse strategies are tailored to the level of experience of the expected audience. The second part of the paper describes how the results of these studies have been used to design the user modeling component of EPIAIM. This module works in a two-step approach. In the first step, a few trigger questions allow the activation of a stereotype that includes a "body" and an "inference component". The body is the representation of the body of knowledge that a class of users is expected to know, along with the probability that the knowledge is known. In the inference component, the learning process of concepts is represented as a belief network. Hence, in the second step the belief network is used to refine the initial default information in the stereotype's body. This is done by asking a few questions on those concepts where it is uncertain whether or not they are known to the user, and propagating this new evidence to revise the whole situation. The system has been implemented on a workstation under UNIX. An example of functioning is presented, and advantages and limitations of the approach are discussed.
Resumo:
In recent years there has been a growing debate over whether or not standards should be produced for user system interfaces. Those in favor of standardization argue that standards in this area will result in more usable systems, while those against argue that standardization is neither practical nor desirable. The present paper reviews both sides of this debate in relation to expert systems. It argues that in many areas guidelines are more appropriate than standards for user interface design.
Resumo:
Context: Learning can be regarded as knowledge construction in which prior knowledge and experience serve as basis for the learners to expand their knowledge base. Such a process of knowledge construction has to take place continuously in order to enhance the learners’ competence in a competitive working environment. As the information consumers, the individual users demand personalised information provision which meets their own specific purposes, goals, and expectations. Objectives: The current methods in requirements engineering are capable of modelling the common user’s behaviour in the domain of knowledge construction. The users’ requirements can be represented as a case in the defined structure which can be reasoned to enable the requirements analysis. Such analysis needs to be enhanced so that personalised information provision can be tackled and modelled. However, there is a lack of suitable modelling methods to achieve this end. This paper presents a new ontological method for capturing individual user’s requirements and transforming the requirements onto personalised information provision specifications. Hence the right information can be provided to the right user for the right purpose. Method: An experiment was conducted based on the qualitative method. A medium size of group of users participated to validate the method and its techniques, i.e. articulates, maps, configures, and learning content. The results were used as the feedback for the improvement. Result: The research work has produced an ontology model with a set of techniques which support the functions for profiling user’s requirements, reasoning requirements patterns, generating workflow from norms, and formulating information provision specifications. Conclusion: The current requirements engineering approaches provide the methodical capability for developing solutions. Our research outcome, i.e. the ontology model with the techniques, can further enhance the RE approaches for modelling the individual user’s needs and discovering the user’s requirements.
Resumo:
In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database.