786 resultados para Data mining models


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Die zunehmende Vernetzung der Informations- und Kommunikationssysteme führt zu einer weiteren Erhöhung der Komplexität und damit auch zu einer weiteren Zunahme von Sicherheitslücken. Klassische Schutzmechanismen wie Firewall-Systeme und Anti-Malware-Lösungen bieten schon lange keinen Schutz mehr vor Eindringversuchen in IT-Infrastrukturen. Als ein sehr wirkungsvolles Instrument zum Schutz gegenüber Cyber-Attacken haben sich hierbei die Intrusion Detection Systeme (IDS) etabliert. Solche Systeme sammeln und analysieren Informationen von Netzwerkkomponenten und Rechnern, um ungewöhnliches Verhalten und Sicherheitsverletzungen automatisiert festzustellen. Während signatur-basierte Ansätze nur bereits bekannte Angriffsmuster detektieren können, sind anomalie-basierte IDS auch in der Lage, neue bisher unbekannte Angriffe (Zero-Day-Attacks) frühzeitig zu erkennen. Das Kernproblem von Intrusion Detection Systeme besteht jedoch in der optimalen Verarbeitung der gewaltigen Netzdaten und der Entwicklung eines in Echtzeit arbeitenden adaptiven Erkennungsmodells. Um diese Herausforderungen lösen zu können, stellt diese Dissertation ein Framework bereit, das aus zwei Hauptteilen besteht. Der erste Teil, OptiFilter genannt, verwendet ein dynamisches "Queuing Concept", um die zahlreich anfallenden Netzdaten weiter zu verarbeiten, baut fortlaufend Netzverbindungen auf, und exportiert strukturierte Input-Daten für das IDS. Den zweiten Teil stellt ein adaptiver Klassifikator dar, der ein Klassifikator-Modell basierend auf "Enhanced Growing Hierarchical Self Organizing Map" (EGHSOM), ein Modell für Netzwerk Normalzustand (NNB) und ein "Update Model" umfasst. In dem OptiFilter werden Tcpdump und SNMP traps benutzt, um die Netzwerkpakete und Hostereignisse fortlaufend zu aggregieren. Diese aggregierten Netzwerkpackete und Hostereignisse werden weiter analysiert und in Verbindungsvektoren umgewandelt. Zur Verbesserung der Erkennungsrate des adaptiven Klassifikators wird das künstliche neuronale Netz GHSOM intensiv untersucht und wesentlich weiterentwickelt. In dieser Dissertation werden unterschiedliche Ansätze vorgeschlagen und diskutiert. So wird eine classification-confidence margin threshold definiert, um die unbekannten bösartigen Verbindungen aufzudecken, die Stabilität der Wachstumstopologie durch neuartige Ansätze für die Initialisierung der Gewichtvektoren und durch die Stärkung der Winner Neuronen erhöht, und ein selbst-adaptives Verfahren eingeführt, um das Modell ständig aktualisieren zu können. Darüber hinaus besteht die Hauptaufgabe des NNB-Modells in der weiteren Untersuchung der erkannten unbekannten Verbindungen von der EGHSOM und der Überprüfung, ob sie normal sind. Jedoch, ändern sich die Netzverkehrsdaten wegen des Concept drif Phänomens ständig, was in Echtzeit zur Erzeugung nicht stationärer Netzdaten führt. Dieses Phänomen wird von dem Update-Modell besser kontrolliert. Das EGHSOM-Modell kann die neuen Anomalien effektiv erkennen und das NNB-Model passt die Änderungen in Netzdaten optimal an. Bei den experimentellen Untersuchungen hat das Framework erfolgversprechende Ergebnisse gezeigt. Im ersten Experiment wurde das Framework in Offline-Betriebsmodus evaluiert. Der OptiFilter wurde mit offline-, synthetischen- und realistischen Daten ausgewertet. Der adaptive Klassifikator wurde mit dem 10-Fold Cross Validation Verfahren evaluiert, um dessen Genauigkeit abzuschätzen. Im zweiten Experiment wurde das Framework auf einer 1 bis 10 GB Netzwerkstrecke installiert und im Online-Betriebsmodus in Echtzeit ausgewertet. Der OptiFilter hat erfolgreich die gewaltige Menge von Netzdaten in die strukturierten Verbindungsvektoren umgewandelt und der adaptive Klassifikator hat sie präzise klassifiziert. Die Vergleichsstudie zwischen dem entwickelten Framework und anderen bekannten IDS-Ansätzen zeigt, dass der vorgeschlagene IDSFramework alle anderen Ansätze übertrifft. Dies lässt sich auf folgende Kernpunkte zurückführen: Bearbeitung der gesammelten Netzdaten, Erreichung der besten Performanz (wie die Gesamtgenauigkeit), Detektieren unbekannter Verbindungen und Entwicklung des in Echtzeit arbeitenden Erkennungsmodells von Eindringversuchen.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acquired brain injury (ABI) is one of the leading causes of death and disability in the world and is associated with high health care costs as a result of the acute treatment and long term rehabilitation involved. Different algorithms and methods have been proposed to predict the effectiveness of rehabilitation programs. In general, research has focused on predicting the overall improvement of patients with ABI. The purpose of this study is the novel application of data mining (DM) techniques to predict the outcomes of cognitive rehabilitation in patients with ABI. We generate three predictive models that allow us to obtain new knowledge to evaluate and improve the effectiveness of the cognitive rehabilitation process. Decision tree (DT), multilayer perceptron (MLP) and general regression neural network (GRNN) have been used to construct the prediction models. 10-fold cross validation was carried out in order to test the algorithms, using the Institut Guttmann Neurorehabilitation Hospital (IG) patients database. Performance of the models was tested through specificity, sensitivity and accuracy analysis and confusion matrix analysis. The experimental results obtained by DT are clearly superior with a prediction average accuracy of 90.38%, while MLP and GRRN obtained a 78.7% and 75.96%, respectively. This study allows to increase the knowledge about the contributing factors of an ABI patient recovery and to estimate treatment efficacy in individual patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this position paper, we claim that the need for time consuming data preparation and result interpretation tasks in knowledge discovery, as well as for costly expert consultation and consensus building activities required for ontology building can be reduced through exploiting the interplay of data mining and ontology engineering. The aim is to obtain in a semi-automatic way new knowledge from distributed data sources that can be used for inference and reasoning, as well as to guide the extraction of further knowledge from these data sources. The proposed approach is based on the creation of a novel knowledge discovery method relying on the combination, through an iterative ?feedbackloop?, of (a) data mining techniques to make emerge implicit models from data and (b) pattern-based ontology engineering to capture these models in reusable, conceptual and inferable artefacts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last few years there has been a heightened interest in data treatment and analysis with the aim of discovering hidden knowledge and eliciting relationships and patterns within this data. Data mining techniques (also known as Knowledge Discovery in Databases) have been applied over a wide range of fields such as marketing, investment, fraud detection, manufacturing, telecommunications and health. In this study, well-known data mining techniques such as artificial neural networks (ANN), genetic programming (GP), forward selection linear regression (LR) and k-means clustering techniques, are proposed to the health and sports community in order to aid with resistance training prescription. Appropriate resistance training prescription is effective for developing fitness, health and for enhancing general quality of life. Resistance exercise intensity is commonly prescribed as a percent of the one repetition maximum. 1RM, dynamic muscular strength, one repetition maximum or one execution maximum, is operationally defined as the heaviest load that can be moved over a specific range of motion, one time and with correct performance. The safety of the 1RM assessment has been questioned as such an enormous effort may lead to muscular injury. Prediction equations could help to tackle the problem of predicting the 1RM from submaximal loads, in order to avoid or at least, reduce the associated risks. We built different models from data on 30 men who performed up to 5 sets to exhaustion at different percentages of the 1RM in the bench press action, until reaching their actual 1RM. Also, a comparison of different existing prediction equations is carried out. The LR model seems to outperform the ANN and GP models for the 1RM prediction in the range between 1 and 10 repetitions. At 75% of the 1RM some subjects (n = 5) could perform 13 repetitions with proper technique in the bench press action, whilst other subjects (n = 20) performed statistically significant (p < 0:05) more repetitions at 70% than at 75% of their actual 1RM in the bench press action. Rate of perceived exertion (RPE) seems not to be a good predictor for 1RM when all the sets are performed until exhaustion, as no significant differences (p < 0:05) were found in the RPE at 75%, 80% and 90% of the 1RM. Also, years of experience and weekly hours of strength training are better correlated to 1RM (p < 0:05) than body weight. O'Connor et al. 1RM prediction equation seems to arise from the data gathered and seems to be the most accurate 1RM prediction equation from those proposed in literature and used in this study. Epley's 1RM prediction equation is reproduced by means of data simulation from 1RM literature equations. Finally, future lines of research are proposed related to the problem of the 1RM prediction by means of genetic algorithms, neural networks and clustering techniques. RESUMEN En los últimos años ha habido un creciente interés en el tratamiento y análisis de datos con el propósito de descubrir relaciones, patrones y conocimiento oculto en los mismos. Las técnicas de data mining (también llamadas de \Descubrimiento de conocimiento en bases de datos\) se han aplicado consistentemente a lo gran de un gran espectro de áreas como el marketing, inversiones, detección de fraude, producción industrial, telecomunicaciones y salud. En este estudio, técnicas bien conocidas de data mining como las redes neuronales artificiales (ANN), programación genética (GP), regresión lineal con selección hacia adelante (LR) y la técnica de clustering k-means, se proponen a la comunidad del deporte y la salud con el objetivo de ayudar con la prescripción del entrenamiento de fuerza. Una apropiada prescripción de entrenamiento de fuerza es efectiva no solo para mejorar el estado de forma general, sino para mejorar la salud e incrementar la calidad de vida. La intensidad en un ejercicio de fuerza se prescribe generalmente como un porcentaje de la repetición máxima. 1RM, fuerza muscular dinámica, una repetición máxima o una ejecución máxima, se define operacionalmente como la carga máxima que puede ser movida en un rango de movimiento específico, una vez y con una técnica correcta. La seguridad de las pruebas de 1RM ha sido cuestionada debido a que el gran esfuerzo requerido para llevarlas a cabo puede derivar en serias lesiones musculares. Las ecuaciones predictivas pueden ayudar a atajar el problema de la predicción de la 1RM con cargas sub-máximas y son empleadas con el propósito de eliminar o al menos, reducir los riesgos asociados. En este estudio, se construyeron distintos modelos a partir de los datos recogidos de 30 hombres que realizaron hasta 5 series al fallo en el ejercicio press de banca a distintos porcentajes de la 1RM, hasta llegar a su 1RM real. También se muestra una comparación de algunas de las distintas ecuaciones de predicción propuestas con anterioridad. El modelo LR parece superar a los modelos ANN y GP para la predicción de la 1RM entre 1 y 10 repeticiones. Al 75% de la 1RM algunos sujetos (n = 5) pudieron realizar 13 repeticiones con una técnica apropiada en el ejercicio press de banca, mientras que otros (n = 20) realizaron significativamente (p < 0:05) más repeticiones al 70% que al 75% de su 1RM en el press de banca. El ínndice de esfuerzo percibido (RPE) parece no ser un buen predictor del 1RM cuando todas las series se realizan al fallo, puesto que no existen diferencias signifiativas (p < 0:05) en el RPE al 75%, 80% y el 90% de la 1RM. Además, los años de experiencia y las horas semanales dedicadas al entrenamiento de fuerza están más correlacionadas con la 1RM (p < 0:05) que el peso corporal. La ecuación de O'Connor et al. parece surgir de los datos recogidos y parece ser la ecuación de predicción de 1RM más precisa de aquellas propuestas en la literatura y empleadas en este estudio. La ecuación de predicción de la 1RM de Epley es reproducida mediante simulación de datos a partir de algunas ecuaciones de predicción de la 1RM propuestas con anterioridad. Finalmente, se proponen futuras líneas de investigación relacionadas con el problema de la predicción de la 1RM mediante algoritmos genéticos, redes neuronales y técnicas de clustering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electricity market price forecast is a changeling yet very important task for electricity market managers and participants. Due to the complexity and uncertainties in the power grid, electricity prices are highly volatile and normally carry with spikes. which may be (ens or even hundreds of times higher than the normal price. Such electricity spikes are very difficult to be predicted. So far. most of the research on electricity price forecast is based on the normal range electricity prices. This paper proposes a data mining based electricity price forecast framework, which can predict the normal price as well as the price spikes. The normal price can be, predicted by a previously proposed wavelet and neural network based forecast model, while the spikes are forecasted based on a data mining approach. This paper focuses on the spike prediction and explores the reasons for price spikes based on the measurement of a proposed composite supply-demand balance index (SDI) and relative demand index (RDI). These indices are able to reflect the relationship among electricity demand, electricity supply and electricity reserve capacity. The proposed model is based on a mining database including market clearing price, trading hour. electricity), demand, electricity supply and reserve. Bayesian classification and similarity searching techniques are used to mine the database to find out the internal relationships between electricity price spikes and these proposed. The mining results are used to form the price spike forecast model. This proposed model is able to generate forecasted price spike, level of spike and associated forecast confidence level. The model is tested with the Queensland electricity market data with promising results. Crown Copyright (C) 2004 Published by Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining, as a heatedly discussed term, has been studied in various fields. Its possibilities in refining the decision-making process, realizing potential patterns and creating valuable knowledge have won attention of scholars and practitioners. However, there are less studies intending to combine data mining and libraries where data generation occurs all the time. Therefore, this thesis plans to fill such a gap. Meanwhile, potential opportunities created by data mining are explored to enhance one of the most important elements of libraries: reference service. In order to thoroughly demonstrate the feasibility and applicability of data mining, literature is reviewed to establish a critical understanding of data mining in libraries and attain the current status of library reference service. The result of the literature review indicates that free online data resources other than data generated on social media are rarely considered to be applied in current library data mining mandates. Therefore, the result of the literature review motivates the presented study to utilize online free resources. Furthermore, the natural match between data mining and libraries is established. The natural match is explained by emphasizing the data richness reality and considering data mining as one kind of knowledge, an easy choice for libraries, and a wise method to overcome reference service challenges. The natural match, especially the aspect that data mining could be helpful for library reference service, lays the main theoretical foundation for the empirical work in this study. Turku Main Library was selected as the case to answer the research question: whether data mining is feasible and applicable for reference service improvement. In this case, the daily visit from 2009 to 2015 in Turku Main Library is considered as the resource for data mining. In addition, corresponding weather conditions are collected from Weather Underground, which is totally free online. Before officially being analyzed, the collected dataset is cleansed and preprocessed in order to ensure the quality of data mining. Multiple regression analysis is employed to mine the final dataset. Hourly visits are the independent variable and weather conditions, Discomfort Index and seven days in a week are dependent variables. In the end, four models in different seasons are established to predict visiting situations in each season. Patterns are realized in different seasons and implications are created based on the discovered patterns. In addition, library-climate points are generated by a clustering method, which simplifies the process for librarians using weather data to forecast library visiting situation. Then the data mining result is interpreted from the perspective of improving reference service. After this data mining work, the result of the case study is presented to librarians so as to collect professional opinions regarding the possibility of employing data mining to improve reference services. In the end, positive opinions are collected, which implies that it is feasible to utilizing data mining as a tool to enhance library reference service.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Melanoma is a highly aggressive and therapy resistant tumor for which the identification of specific markers and therapeutic targets is highly desirable. We describe here the development and use of a bioinformatic pipeline tool, made publicly available under the name of EST2TSE, for the in silico detection of candidate genes with tissue-specific expression. Using this tool we mined the human EST (Expressed Sequence Tag) database for sequences derived exclusively from melanoma. We found 29 UniGene clusters of multiple ESTs with the potential to predict novel genes with melanoma-specific expression. Using a diverse panel of human tissues and cell lines, we validated the expression of a subset of three previously uncharacterized genes (clusters Hs.295012, Hs.518391, and Hs.559350) to be highly restricted to melanoma/melanocytes and named them RMEL1, 2 and 3, respectively. Expression analysis in nevi, primary melanomas, and metastatic melanomas revealed RMEL1 as a novel melanocytic lineage-specific gene up-regulated during melanoma development. RMEL2 expression was restricted to melanoma tissues and glioblastoma. RMEL3 showed strong up-regulation in nevi and was lost in metastatic tumors. Interestingly, we found correlations of RMEL2 and RMEL3 expression with improved patient outcome, suggesting tumor and/or metastasis suppressor functions for these genes. The three genes are composed of multiple exons and map to 2q12.2, 1q25.3, and 5q11.2, respectively. They are well conserved throughout primates, but not other genomes, and were predicted as having no coding potential, although primate-conserved and human-specific short ORFs could be found. Hairpin RNA secondary structures were also predicted. Concluding, this work offers new melanoma-specific genes for future validation as prognostic markers or as targets for the development of therapeutic strategies to treat melanoma.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the domain concepts related to biological data and databases, as well as current KDD and data mining developments in biology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper develops an interactive approach for exploratory spatial data analysis. Measures of attribute similarity and spatial proximity are combined in a clustering model to support the identification of patterns in spatial information. Relationships between the developed clustering approach, spatial data mining and choropleth display are discussed. Analysis of property crime rates in Brisbane, Australia is presented. A surprising finding in this research is that there are substantial inconsistencies in standard choropleth display options found in two widely used commercial geographical information systems, both in terms of definition and performance. The comparative results demonstrate the usefulness and appeal of the developed approach in a geographical information system environment for exploratory spatial data analysis.