944 resultados para data mining applications


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This report demonstrates the development of: • Development of software agents for data mining • Link data mining to building model in virtual environments • Link knowledge development with building model in virtual environments • Demonstration of software agents for data mining • Populate with maintenance data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The building life cycle process is complex and prone to fragmentation as it moves through its various stages. The number of participants, and the diversity, specialisation and isolation both in space and time of their activities, have dramatically increased over time. The data generated within the construction industry has become increasingly overwhelming. Most currently available computer tools for the building industry have offered productivity improvement in the transmission of graphical drawings and textual specifications, without addressing more fundamental changes in building life cycle management. Facility managers and building owners are primarily concerned with highlighting areas of existing or potential maintenance problems in order to be able to improve the building performance, satisfying occupants and minimising turnover especially the operational cost of maintenance. In doing so, they collect large amounts of data that is stored in the building’s maintenance database. The work described in this paper is targeted at adding value to the design and maintenance of buildings by turning maintenance data into information and knowledge. Data mining technology presents an opportunity to increase significantly the rate at which the volumes of data generated through the maintenance process can be turned into useful information. This can be done using classification algorithms to discover patterns and correlations within a large volume of data. This paper presents how and what data mining techniques can be applied on maintenance data of buildings to identify the impediments to better performance of building assets. It demonstrates what sorts of knowledge can be found in maintenance records. The benefits to the construction industry lie in turning passive data in databases into knowledge that can improve the efficiency of the maintenance process and of future designs that incorporate that maintenance knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This project is an extension of a previous CRC project (220-059-B) which developed a program for life prediction of gutters in Queensland schools. A number of sources of information on service life of metallic building components were formed into databases linked to a Case-Based Reasoning Engine which extracted relevant cases from each source. In the initial software, no attempt was made to choose between the results offered or construct a case for retention in the casebase. In this phase of the project, alternative data mining techniques will be explored and evaluated. A process for selecting a unique service life prediction for each query will also be investigated. This report summarises the initial evaluation of several data mining techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The management of main material prices of provincial highway project quota has problems of lag and blindness. Framework of provincial highway project quota data MIS and main material price data warehouse were established based on WEB firstly. Then concrete processes of provincial highway project main material prices were brought forward based on BP neural network algorithmic. After that standard BP algorithmic, additional momentum modify BP network algorithmic, self-adaptive study speed improved BP network algorithmic were compared in predicting highway project main prices. The result indicated that it is feasible to predict highway main material prices using BP NN, and using self-adaptive study speed improved BP network algorithmic is the relatively best one.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The wide range of contributing factors and circumstances surrounding crashes on road curves suggest that no single intervention can prevent these crashes. This paper presents a novel methodology, based on data mining techniques, to identify contributing factors and the relationship between them. It identifies contributing factors that influence the risk of a crash. Incident records, described using free text, from a large insurance company were analysed with rough set theory. Rough set theory was used to discover dependencies among data, and reasons using the vague, uncertain and imprecise information that characterised the insurance dataset. The results show that male drivers, who are between 50 and 59 years old, driving during evening peak hours are involved with a collision, had a lowest crash risk. Drivers between 25 and 29 years old, driving from around midnight to 6 am and in a new car has the highest risk. The analysis of the most significant contributing factors on curves suggests that drivers with driving experience of 25 to 42 years, who are driving a new vehicle have the highest crash cost risk, characterised by the vehicle running off the road and hitting a tree. This research complements existing statistically based tools approach to analyse road crashes. Our data mining approach is supported with proven theory and will allow road safety practitioners to effectively understand the dependencies between contributing factors and the crash type with the view to designing tailored countermeasures.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a seminal data mining article, Leo Breiman [1] argued that to develop effective predictive classification and regression models, we need to move away from the sole dependency on statistical algorithms and embrace a wider toolkit of modeling algorithms that include data mining procedures. Nevertheless, many researchers still rely solely on statistical procedures when undertaking data modeling tasks; the sole reliance on these procedures has lead to the development of irrelevant theory and questionable research conclusions ([1], p.199). We will outline initiatives that the HPC & Research Support group is undertaking to engage researchers with data mining tools and techniques; including a new range of seminars, workshops, and one-on-one consultations covering data mining algorithms, the relationship between data mining and the research cycle, and limitations and problems with these new algorithms. Organisational limitations and restrictions to these initiatives are also discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present rate of technological advance continues to place significant demands on data storage devices. The sheer amount of digital data being generated each year along with consumer expectations, fuels these demands. At present, most digital data is stored magnetically, in the form of hard disk drives or on magnetic tape. The increase in areal density (AD) of magnetic hard disk drives over the past 50 years has been of the order of 100 million times, and current devices are storing data at ADs of the order of hundreds of gigabits per square inch. However, it has been known for some time that the progress in this form of data storage is approaching fundamental limits. The main limitation relates to the lower size limit that an individual bit can have for stable storage. Various techniques for overcoming these fundamental limits are currently the focus of considerable research effort. Most attempt to improve current data storage methods, or modify these slightly for higher density storage. Alternatively, three dimensional optical data storage is a promising field for the information storage needs of the future, offering very high density, high speed memory. There are two ways in which data may be recorded in a three dimensional optical medium; either bit-by-bit (similar in principle to an optical disc medium such as CD or DVD) or by using pages of bit data. Bit-by-bit techniques for three dimensional storage offer high density but are inherently slow due to the serial nature of data access. Page-based techniques, where a two-dimensional page of data bits is written in one write operation, can offer significantly higher data rates, due to their parallel nature. Holographic Data Storage (HDS) is one such page-oriented optical memory technique. This field of research has been active for several decades, but with few commercial products presently available. Another page-oriented optical memory technique involves recording pages of data as phase masks in a photorefractive medium. A photorefractive material is one by which the refractive index can be modified by light of the appropriate wavelength and intensity, and this property can be used to store information in these materials. In phase mask storage, two dimensional pages of data are recorded into a photorefractive crystal, as refractive index changes in the medium. A low-intensity readout beam propagating through the medium will have its intensity profile modified by these refractive index changes and a CCD camera can be used to monitor the readout beam, and thus read the stored data. The main aim of this research was to investigate data storage using phase masks in the photorefractive crystal, lithium niobate (LiNbO3). Firstly the experimental methods for storing the two dimensional pages of data (a set of vertical stripes of varying lengths) in the medium are presented. The laser beam used for writing, whose intensity profile is modified by an amplitudemask which contains a pattern of the information to be stored, illuminates the lithium niobate crystal and the photorefractive effect causes the patterns to be stored as refractive index changes in the medium. These patterns are read out non-destructively using a low intensity probe beam and a CCD camera. A common complication of information storage in photorefractive crystals is the issue of destructive readout. This is a problem particularly for holographic data storage, where the readout beam should be at the same wavelength as the beam used for writing. Since the charge carriers in the medium are still sensitive to the read light field, the readout beam erases the stored information. A method to avoid this is by using thermal fixing. Here the photorefractive medium is heated to temperatures above 150�C; this process forms an ionic grating in the medium. This ionic grating is insensitive to the readout beam and therefore the information is not erased during readout. A non-contact method for determining temperature change in a lithium niobate crystal is presented in this thesis. The temperature-dependent birefringent properties of the medium cause intensity oscillations to be observed for a beam propagating through the medium during a change in temperature. It is shown that each oscillation corresponds to a particular temperature change, and by counting the number of oscillations observed, the temperature change of the medium can be deduced. The presented technique for measuring temperature change could easily be applied to a situation where thermal fixing of data in a photorefractive medium is required. Furthermore, by using an expanded beam and monitoring the intensity oscillations over a wide region, it is shown that the temperature in various locations of the crystal can be monitored simultaneously. This technique could be used to deduce temperature gradients in the medium. It is shown that the three dimensional nature of the recording medium causes interesting degradation effects to occur when the patterns are written for a longer-than-optimal time. This degradation results in the splitting of the vertical stripes in the data pattern, and for long writing exposure times this process can result in the complete deterioration of the information in the medium. It is shown in that simply by using incoherent illumination, the original pattern can be recovered from the degraded state. The reason for the recovery is that the refractive index changes causing the degradation are of a smaller magnitude since they are induced by the write field components scattered from the written structures. During incoherent erasure, the lower magnitude refractive index changes are neutralised first, allowing the original pattern to be recovered. The degradation process is shown to be reversed during the recovery process, and a simple relationship is found relating the time at which particular features appear during degradation and recovery. A further outcome of this work is that the minimum stripe width of 30 ìm is required for accurate storage and recovery of the information in the medium, any size smaller than this results in incomplete recovery. The degradation and recovery process could be applied to an application in image scrambling or cryptography for optical information storage. A two dimensional numerical model based on the finite-difference beam propagation method (FD-BPM) is presented and used to gain insight into the pattern storage process. The model shows that the degradation of the patterns is due to the complicated path taken by the write beam as it propagates through the crystal, and in particular the scattering of this beam from the induced refractive index structures in the medium. The model indicates that the highest quality pattern storage would be achieved with a thin 0.5 mm medium; however this type of medium would also remove the degradation property of the patterns and the subsequent recovery process. To overcome the simplistic treatment of the refractive index change in the FD-BPM model, a fully three dimensional photorefractive model developed by Devaux is presented. This model shows significant insight into the pattern storage, particularly for the degradation and recovery process, and confirms the theory that the recovery of the degraded patterns is possible since the refractive index changes responsible for the degradation are of a smaller magnitude. Finally, detailed analysis of the pattern formation and degradation dynamics for periodic patterns of various periodicities is presented. It is shown that stripe widths in the write beam of greater than 150 ìm result in the formation of different types of refractive index changes, compared with the stripes of smaller widths. As a result, it is shown that the pattern storage method discussed in this thesis has an upper feature size limit of 150 ìm, for accurate and reliable pattern storage.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in data mining have provided techniques for automatically discovering underlying knowledge and extracting useful information from large volumes of data. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large complex databases. Application of data mining to manufacturing is relatively limited mainly because of complexity of manufacturing data. Growing self organizing map (GSOM) algorithm has been proven to be an efficient algorithm to analyze unsupervised DNA data. However, it produced unsatisfactory clustering when used on some large manufacturing data. In this paper a data mining methodology has been proposed using a GSOM tool which was developed using a modified GSOM algorithm. The proposed method is used to generate clusters for good and faulty products from a manufacturing dataset. The clustering quality (CQ) measure proposed in the paper is used to evaluate the performance of the cluster maps. The paper also proposed an automatic identification of variables to find the most probable causative factor(s) that discriminate between good and faulty product by quickly examining the historical manufacturing data. The proposed method offers the manufacturers to smoothen the production flow and improve the quality of the products. Simulation results on small and large manufacturing data show the effectiveness of the proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Road safety is a major concern worldwide. Road safety will improve as road conditions and their effects on crashes are continually investigated. This paper proposes to use the capability of data mining to include the greater set of road variables for all available crashes with skid resistance values across the Queensland state main road network in order to understand the relationships among crash, traffic and road variables. This paper presents a data mining based methodology for the road asset management data to find out the various road properties that contribute unduly to crashes. The models demonstrate high levels of accuracy in predicting crashes in roads when various road properties are included. This paper presents the findings of these models to show the relationships among skid resistance, crashes, crash characteristics and other road characteristics such as seal type, seal age, road type, texture depth, lane count, pavement width, rutting, speed limit, traffic rates intersections, traffic signage and road design and so on.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is commonly accepted that wet roads have higher risk of crash than dry roads; however, providing evidence to support this assumption presents some difficulty. This paper presents a data mining case study in which predictive data mining is applied to model the skid resistance and crash relationship to search for discernable differences in the probability of wet and dry road segments having crashes based on skid resistance. The models identify an increased probability of wet road segments having crashes for mid-range skid resistance values.