820 resultados para Data classification
Resumo:
Fast Classification (FC) networks were inspired by a biologically plausible mechanism for short term memory where learning occurs instantaneously. Both weights and the topology for an FC network are mapped directly from the training samples by using a prescriptive training scheme. Only two presentations of the training data are required to train an FC network. Compared with iterative learning algorithms such as Back-propagation (which may require many hundreds of presentations of the training data), the training of FC networks is extremely fast and learning convergence is always guaranteed. Thus FC networks may be suitable for applications where real-time classification is needed. In this paper, the FC networks are applied for the real-time extraction of gene expressions for Chlamydia microarray data. Both the classification performance and learning time of the FC networks are compared with the Multi-Layer Proceptron (MLP) networks and support-vector-machines (SVM) in the same classification task. The FC networks are shown to have extremely fast learning time and comparable classification accuracy.
Resumo:
In this paper, we describe the evaluation of a method for building detection by the Dempster-Shafer fusion of LIDAR data and multispectral images. For that purpose, ground truth was digitised for two test sites with quite different characteristics. Using these data sets, the heuristic model for the probability mass assignments of the method is validated, and rules for the tuning of the parameters of this model are discussed. Further we evaluate the contributions of the individual cues used in the classification process to the quality of the classification results. Our results show the degree to which the overall correctness of the results can be improved by fusing LIDAR data with multispectral images.
Resumo:
This paper presents load profiles of electricity customers, using the knowledge discovery in databases (KDD) procedure, a data mining technique, to determine the load profiles for different types of customers. In this paper, the current load profiling methods are compared using data mining techniques, by analysing and evaluating these classification techniques. The objective of this study is to determine the best load profiling methods and data mining techniques to classify, detect and predict non-technical losses in the distribution sector, due to faulty metering and billing errors, as well as to gather knowledge on customer behaviour and preferences so as to gain a competitive advantage in the deregulated market. This paper focuses mainly on the comparative analysis of the classification techniques selected; a forthcoming paper will focus on the detection and prediction methods.
Resumo:
A reliable perception of the real world is a key-feature for an autonomous vehicle and the Advanced Driver Assistance Systems (ADAS). Obstacles detection (OD) is one of the main components for the correct reconstruction of the dynamic world. Historical approaches based on stereo vision and other 3D perception technologies (e.g. LIDAR) have been adapted to the ADAS first and autonomous ground vehicles, after, providing excellent results. The obstacles detection is a very broad field and this domain counts a lot of works in the last years. In academic research, it has been clearly established the essential role of these systems to realize active safety systems for accident prevention, reflecting also the innovative systems introduced by industry. These systems need to accurately assess situational criticalities and simultaneously assess awareness of these criticalities by the driver; it requires that the obstacles detection algorithms must be reliable and accurate, providing: a real-time output, a stable and robust representation of the environment and an estimation independent from lighting and weather conditions. Initial systems relied on only one exteroceptive sensor (e.g. radar or laser for ACC and camera for LDW) in addition to proprioceptive sensors such as wheel speed and yaw rate sensors. But, current systems, such as ACC operating at the entire speed range or autonomous braking for collision avoidance, require the use of multiple sensors since individually they can not meet these requirements. It has led the community to move towards the use of a combination of them in order to exploit the benefits of each one. Pedestrians and vehicles detection are ones of the major thrusts in situational criticalities assessment, still remaining an active area of research. ADASs are the most prominent use case of pedestrians and vehicles detection. Vehicles should be equipped with sensing capabilities able to detect and act on objects in dangerous situations, where the driver would not be able to avoid a collision. A full ADAS or autonomous vehicle, with regard to pedestrians and vehicles, would not only include detection but also tracking, orientation, intent analysis, and collision prediction. The system detects obstacles using a probabilistic occupancy grid built from a multi-resolution disparity map. Obstacles classification is based on an AdaBoost SoftCascade trained on Aggregate Channel Features. A final stage of tracking and fusion guarantees stability and robustness to the result.
Resumo:
Relationships between clustering, description length, and regularisation are pointed out, motivating the introduction of a cost function with a description length interpretation and the unusual and useful property of having its minimum approximated by the densest mode of a distribution. A simple inverse kinematics example is used to demonstrate that this property can be used to select and learn one branch of a multi-valued mapping. This property is also used to develop a method for setting regularisation parameters according to the scale on which structure is exhibited in the training data. The regularisation technique is demonstrated on two real data sets, a classification problem and a regression problem.
Resumo:
We consider the problem of assigning an input vector bfx to one of m classes by predicting P(c|bfx) for c = 1, ldots, m. For a two-class problem, the probability of class 1 given bfx is estimated by s(y(bfx)), where s(y) = 1/(1 + e-y). A Gaussian process prior is placed on y(bfx), and is combined with the training data to obtain predictions for new bfx points. We provide a Bayesian treatment, integrating over uncertainty in y and in the parameters that control the Gaussian process prior; the necessary integration over y is carried out using Laplace's approximation. The method is generalized to multi-class problems (m >2) using the softmax function. We demonstrate the effectiveness of the method on a number of datasets.
Resumo:
We discuss the Application of TAP mean field methods known from Statistical Mechanics of disordered systems to Bayesian classification with Gaussian processes. In contrast to previous applications, no knowledge about the distribution of inputs is needed. Simulation results for the Sonar data set are given.
Resumo:
We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler 'naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach.
Resumo:
While the retrieval of existing designs to prevent unnecessary duplication of parts is a recognised strategy in the control of design costs the available techniques to achieve this, even in product data management systems, are limited in performance or require large resources. A novel system has been developed based on a new version of an existing coding system (CAMAC) that allows automatic coding of engineering drawings and their subsequent retrieval using a drawing of the desired component as the input. The ability to find designs using a detail drawing rather than textual descriptions is a significant achievement in itself. Previous testing of the system has demonstrated this capability but if a means could be found to find parts from a simple sketch then its practical application would be much more effective. This paper describes the development and testing of such a search capability using a database of over 3000 engineering components.
Resumo:
We consider the problem of assigning an input vector to one of m classes by predicting P(c|x) for c=1,...,m. For a two-class problem, the probability of class one given x is estimated by s(y(x)), where s(y)=1/(1+e-y). A Gaussian process prior is placed on y(x), and is combined with the training data to obtain predictions for new x points. We provide a Bayesian treatment, integrating over uncertainty in y and in the parameters that control the Gaussian process prior the necessary integration over y is carried out using Laplace's approximation. The method is generalized to multiclass problems (m>2) using the softmax function. We demonstrate the effectiveness of the method on a number of datasets.
Resumo:
Conventional feed forward Neural Networks have used the sum-of-squares cost function for training. A new cost function is presented here with a description length interpretation based on Rissanen's Minimum Description Length principle. It is a heuristic that has a rough interpretation as the number of data points fit by the model. Not concerned with finding optimal descriptions, the cost function prefers to form minimum descriptions in a naive way for computational convenience. The cost function is called the Naive Description Length cost function. Finding minimum description models will be shown to be closely related to the identification of clusters in the data. As a consequence the minimum of this cost function approximates the most probable mode of the data rather than the sum-of-squares cost function that approximates the mean. The new cost function is shown to provide information about the structure of the data. This is done by inspecting the dependence of the error to the amount of regularisation. This structure provides a method of selecting regularisation parameters as an alternative or supplement to Bayesian methods. The new cost function is tested on a number of multi-valued problems such as a simple inverse kinematics problem. It is also tested on a number of classification and regression problems. The mode-seeking property of this cost function is shown to improve prediction in time series problems. Description length principles are used in a similar fashion to derive a regulariser to control network complexity.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
In any investigation in optometry involving more that two treatment or patient groups, an investigator should be using ANOVA to analyse the results assuming that the data conform reasonably well to the assumptions of the analysis. Ideally, specific null hypotheses should be built into the experiment from the start so that the treatments variation can be partitioned to test these effects directly. If 'post-hoc' tests are used, then an experimenter should examine the degree of protection offered by the test against the possibilities of making either a type 1 or a type 2 error. All experimenters should be aware of the complexity of ANOVA. The present article describes only one common form of the analysis, viz., that which applies to a single classification of the treatments in a randomised design. There are many different forms of the analysis each of which is appropriate to the analysis of a specific experimental design. The uses of some of the most common forms of ANOVA in optometry have been described in a further article. If in any doubt, an investigator should consult a statistician with experience of the analysis of experiments in optometry since once embarked upon an experiment with an unsuitable design, there may be little that a statistician can do to help.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Task classification is introduced as a method for the evaluation of monitoring behaviour in different task situations. On the basis of an analysis of different monitoring tasks, a task classification system comprising four task 'dimensions' is proposed. The perceptual speed and flexibility of closure categories, which are identified with signal discrimination type, comprise the principal dimension in this taxonomy, the others being sense modality, the time course of events, and source complexity. It is also proposed that decision theory provides the most complete method for the analysis of performance in monitoring tasks. Several different aspects of decision theory in relation to monitoring behaviour are described. A method is also outlined whereby both accuracy and latency measures of performance may be analysed within the same decision theory framework. Eight experiments and an organizational study are reported. The results show that a distinction can be made between the perceptual efficiency (sensitivity) of a monitor and his criterial level of response, and that in most monitoring situations, there is no decrement in efficiency over the work period, but an increase in the strictness of the response criterion. The range of tasks exhibiting either or both of these performance trends can be specified within the task classification system. In particular, it is shown that a sensitivity decrement is only obtained for 'speed' tasks with a high stimulation rate. A distinctive feature of 'speed' tasks is that target detection requires the discrimination of a change in a stimulus relative to preceding stimuli, whereas in 'closure' tasks, the information required for the discrimination of targets is presented at the same point In time. In the final study, the specification of tasks yielding sensitivity decrements is shown to be consistent with a task classification analysis of the monitoring literature. It is also demonstrated that the signal type dimension has a major influence on the consistency of individual differences in performance in different tasks. The results provide an empirical validation for the 'speed' and 'closure' categories, and suggest that individual differences are not completely task specific but are dependent on the demands common to different tasks. Task classification is therefore shovn to enable improved generalizations to be made of the factors affecting 1) performance trends over time, and 2) the consistencv of performance in different tasks. A decision theory analysis of response latencies is shown to support the view that criterion shifts are obtained in some tasks, while sensitivity shifts are obtained in others. The results of a psychophysiological study also suggest that evoked potential latency measures may provide temporal correlates of criterion shifts in monitoring tasks. Among other results, the finding that the latencies of negative responses do not increase over time is taken to invalidate arousal-based theories of performance trends over a work period. An interpretation in terms of expectancy, however, provides a more reliable explanation of criterion shifts. Although the mechanisms underlying the sensitivity decrement are not completely clear, the results rule out 'unitary' theories such as observing response and coupling theory. It is suggested that an interpretation in terms of the memory data limitations on information processing provides the most parsimonious explanation of all the results in the literature relating to sensitivity decrement. Task classification therefore enables the refinement and selection of theories of monitoring behaviour in terms of their reliability in generalizing predictions to a wide range of tasks. It is thus concluded that task classification and decision theory provide a reliable basis for the assessment and analysis of monitoring behaviour in different task situations.