21 resultados para CLASSIFICATIONS
Resumo:
Learning from mistakes has proven to be an effective way of learning in the interactive document classifications. In this paper we propose an approach to effectively learning from mistakes in the email filtering process. Our system has employed both SVM and Winnow machine learning algorithms to learn from misclassified email documents and refine the email filtering process accordingly. Our experiments have shown that the training of an email filter becomes much effective and faster
Resumo:
There have been many models developed by scientists to assist decision-makers in making socio-economic and environmental decisions. It is now recognised that there is a shift in the dominant paradigm to making decisions with stakeholders, rather than making decisions for stakeholders. Our paper investigates two case studies where group model building has been undertaken for maintaining biodiversity in Australia. The first case study focuses on preservation and management of green spaces and biodiversity in metropolitan Melbourne under the umbrella of the Melbourne 2030 planning strategy. A geographical information system is used to collate a number of spatial datasets encompassing a range of cultural and natural assets data layers including: existing open spaces, waterways, threatened fauna and flora, ecological vegetation covers, registered cultural heritage sites, and existing land parcel zoning. Group model building is incorporated into the study through eliciting weightings and ratings of importance for each datasets from urban planners to formulate different urban green system scenarios. The second case study focuses on modelling ecoregions from spatial datasets for the state of Queensland. The modelling combines collaborative expert knowledge and a vast amount of environmental data to build biogeographical classifications of regions. An information elicitation process is used to capture expert knowledge of ecoregions as geographical descriptions, and to transform this into prior probability distributions that characterise regions in terms of environmental variables. This prior information is combined with measured data on the environmental variables within a Bayesian modelling technique to produce the final classified regions. We describe how linked views between descriptive information, mapping and statistical plots are used to decide upon representative regions that satisfy a number of criteria for biodiversity and conservation. This paper discusses the advantages and problems encountered when undertaking group model building. Future research will extend the group model building approach to include interested individuals and community groups.
Resumo:
Sex segregation in employment is a phenomenon that can be observed and analysed at different levels, ranging from comparisons between broad classifications by industry or occupation through to finely defined jobs within such classifications. From an aggregate perspective, the contribution of information technology (IT) employment to sex segregation is clear--it remains a highly male-dominated field apparently imbued with the ongoing masculinity of science and technology. While this situation is clearly contrary to hopes of a new industry freed from traditional distinctions between 'men's' and 'women's' work, it comes as little surprise to most feminist and labour studies analysts. An extensive literature documents the persistently masculine culture of IT employment and education (see, among many, Margolis and Fisher 2002; Wajcman 1991; Webster 1996; Wright 1996, 1997), and the idea that new occupations might escape sexism by sidestepping 'old traditions' has been effectively critiqued by writers such as Adam, who notes the fallacy of assuming a spontaneous emergence of equality in new settings (2005: 140).
Resumo:
The Operator Choice Model (OCM) was developed to model the behaviour of operators attending to complex tasks involving interdependent concurrent activities, such as in Air Traffic Control (ATC). The purpose of the OCM is to provide a flexible framework for modelling and simulation that can be used for quantitative analyses in human reliability assessment, comparison between human computer interaction (HCI) designs, and analysis of operator workload. The OCM virtual operator is essentially a cycle of four processes: Scan Classify Decide Action Perform Action. Once a cycle is complete, the operator will return to the Scan process. It is also possible to truncate a cycle and return to Scan after each of the processes. These processes are described using Continuous Time Probabilistic Automata (CTPA). The details of the probability and timing models are specific to the domain of application, and need to be specified using domain experts. We are building an application of the OCM for use in ATC. In order to develop a realistic model we are calibrating the probability and timing models that comprise each process using experimental data from a series of experiments conducted with student subjects. These experiments have identified the factors that influence perception and decision making in simplified conflict detection and resolution tasks. This paper presents an application of the OCM approach to a simple ATC conflict detection experiment. The aim is to calibrate the OCM so that its behaviour resembles that of the experimental subjects when it is challenged with the same task. Its behaviour should also interpolate when challenged with scenarios similar to those used to calibrate it. The approach illustrated here uses logistic regression to model the classifications made by the subjects. This model is fitted to the calibration data, and provides an extrapolation to classifications in scenarios outside of the calibration data. A simple strategy is used to calibrate the timing component of the model, and the results for reaction times are compared between the OCM and the student subjects. While this approach to timing does not capture the full complexity of the reaction time distribution seen in the data from the student subjects, the mean and the tail of the distributions are similar.
Resumo:
Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.
Resumo:
Machine learning techniques for prediction and rule extraction from artificial neural network methods are used. The hypothesis that market sentiment and IPO specific attributes are equally responsible for first-day IPO returns in the US stock market is tested. Machine learning methods used are Bayesian classifications, support vector machines, decision tree techniques, rule learners and artificial neural networks. The outcomes of the research are predictions and rules associated With first-day returns of technology IPOs. The hypothesis that first-day returns of technology IPOs are equally determined by IPO specific and market sentiment is rejected. Instead lower yielding IPOs are determined by IPO specific and market sentiment attributes, while higher yielding IPOs are largely dependent on IPO specific attributes.