886 resultados para Feature sizes
Resumo:
Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.
Resumo:
Software simulation models are computer programs that need to be verified and debugged like any other software. In previous work, a method for error isolation in simulation models has been proposed. The method relies on a set of feature matrices that can be used to determine which part of the model implementation is responsible for deviations in the output of the model. Currrently these feature matrices have to be generated by hand from the model implementation, which is a tedious and error-prone task. In this paper, a method based on mutation analysis, as well as prototype tool support for the verification of the manually generated feature matrices is presented. The application of the method and tool to a model for wastewater treatment shows that the feature matrices can be verified effectively using a minimal number of mutants.
Resumo:
The software implementation of the emergency shutdown feature in a major radiotherapy system was analyzed, using a directed form of code review based on module dependences. Dependences between modules are labelled by particular assumptions; this allows one to trace through the code, and identify those fragments responsible for critical features. An `assumption tree' is constructed in parallel, showing the assumptions which each module makes about others. The root of the assumption tree is the critical feature of interest, and its leaves represent assumptions which, if not valid, might cause the critical feature to fail. The analysis revealed some unexpected assumptions that motivated improvements to the code.
Resumo:
A practical Bayesian approach for inference in neural network models has been available for ten years, and yet it is not used frequently in medical applications. In this chapter we show how both regularisation and feature selection can bring significant benefits in diagnostic tasks through two case studies: heart arrhythmia classification based on ECG data and the prognosis of lupus. In the first of these, the number of variables was reduced by two thirds without significantly affecting performance, while in the second, only the Bayesian models had an acceptable accuracy. In both tasks, neural networks outperformed other pattern recognition approaches.
Resumo:
Data visualization algorithms and feature selection techniques are both widely used in bioinformatics but as distinct analytical approaches. Until now there has been no method of measuring feature saliency while training a data visualization model. We derive a generative topographic mapping (GTM) based data visualization approach which estimates feature saliency simultaneously with the training of the visualization model. The approach not only provides a better projection by modeling irrelevant features with a separate noise model but also gives feature saliency values which help the user to assess the significance of each feature. We compare the quality of projection obtained using the new approach with the projections from traditional GTM and self-organizing maps (SOM) algorithms. The results obtained on a synthetic and a real-life chemoinformatics dataset demonstrate that the proposed approach successfully identifies feature significance and provides coherent (compact) projections. © 2006 IEEE.
Resumo:
The thesis presents a two-dimensional Risk Assessment Method (RAM) where the assessment of risk to the groundwater resources incorporates both the quantification of the probability of the occurrence of contaminant source terms, as well as the assessment of the resultant impacts. The approach emphasizes the need for a greater dependency on the potential pollution sources, rather than the traditional approach where assessment is based mainly on the intrinsic geo-hydrologic parameters. The risk is calculated using Monte Carlo simulation methods whereby random pollution events were generated to the same distribution as historically occurring events or a priori potential probability distribution. Integrated mathematical models then simulate contaminant concentrations at the predefined monitoring points within the aquifer. The spatial and temporal distributions of the concentrations were calculated from repeated realisations, and the number of times when a user defined concentration magnitude was exceeded is quantified as a risk. The method was setup by integrating MODFLOW-2000, MT3DMS and a FORTRAN coded risk model, and automated, using a DOS batch processing file. GIS software was employed in producing the input files and for the presentation of the results. The functionalities of the method, as well as its sensitivities to the model grid sizes, contaminant loading rates, length of stress periods, and the historical frequencies of occurrence of pollution events were evaluated using hypothetical scenarios and a case study. Chloride-related pollution sources were compiled and used as indicative potential contaminant sources for the case study. At any active model cell, if a random generated number is less than the probability of pollution occurrence, then the risk model will generate synthetic contaminant source term as an input into the transport model. The results of the applications of the method are presented in the form of tables, graphs and spatial maps. Varying the model grid sizes indicates no significant effects on the simulated groundwater head. The simulated frequency of daily occurrence of pollution incidents is also independent of the model dimensions. However, the simulated total contaminant mass generated within the aquifer, and the associated volumetric numerical error appear to increase with the increasing grid sizes. Also, the migration of contaminant plume advances faster with the coarse grid sizes as compared to the finer grid sizes. The number of daily contaminant source terms generated and consequently the total mass of contaminant within the aquifer increases in a non linear proportion to the increasing frequency of occurrence of pollution events. The risk of pollution from a number of sources all occurring by chance together was evaluated, and quantitatively presented as risk maps. This capability to combine the risk to a groundwater feature from numerous potential sources of pollution proved to be a great asset to the method, and a large benefit over the contemporary risk and vulnerability methods.