Biblioteca Digital

916 resultados para Rough Set

Applying data mining to assess crash risk on curves

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The wide range of contributing factors and circumstances surrounding crashes on road curves suggest that no single intervention can prevent these crashes. This paper presents a novel methodology, based on data mining techniques, to identify contributing factors and the relationship between them. It identifies contributing factors that influence the risk of a crash. Incident records, described using free text, from a large insurance company were analysed with rough set theory. Rough set theory was used to discover dependencies among data, and reasons using the vague, uncertain and imprecise information that characterised the insurance dataset. The results show that male drivers, who are between 50 and 59 years old, driving during evening peak hours are involved with a collision, had a lowest crash risk. Drivers between 25 and 29 years old, driving from around midnight to 6 am and in a new car has the highest risk. The analysis of the most significant contributing factors on curves suggests that drivers with driving experience of 25 to 42 years, who are driving a new vehicle have the highest crash cost risk, characterised by the vehicle running off the road and hitting a tree. This research complements existing statistically based tools approach to analyse road crashes. Our data mining approach is supported with proven theory and will allow road safety practitioners to effectively understand the dependencies between contributing factors and the crash type with the view to designing tailored countermeasures.

Two-stage model for information filtering

Relevância:

60.00% 60.00%

Publicador:

Mining patterns and factors contributing to crash severity on road curves

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Road curves are an important feature of road infrastructure and many serious crashes occur on road curves. In Queensland, the number of fatalities is twice as many on curves as that on straight roads. Therefore, there is a need to reduce drivers’ exposure to crash risk on road curves. Road crashes in Australia and in the Organisation for Economic Co-operation and Development(OECD) have plateaued in the last five years (2004 to 2008) and the road safety community is desperately seeking innovative interventions to reduce the number of crashes. However, designing an innovative and effective intervention may prove to be difficult as it relies on providing theoretical foundation, coherence, understanding, and structure to both the design and validation of the efficiency of the new intervention. Researchers from multiple disciplines have developed various models to determine the contributing factors for crashes on road curves with a view towards reducing the crash rate. However, most of the existing methods are based on statistical analysis of contributing factors described in government crash reports. In order to further explore the contributing factors related to crashes on road curves, this thesis designs a novel method to analyse and validate these contributing factors. The use of crash claim reports from an insurance company is proposed for analysis using data mining techniques. To the best of our knowledge, this is the first attempt to use data mining techniques to analyse crashes on road curves. Text mining technique is employed as the reports consist of thousands of textual descriptions and hence, text mining is able to identify the contributing factors. Besides identifying the contributing factors, limited studies to date have investigated the relationships between these factors, especially for crashes on road curves. Thus, this study proposed the use of the rough set analysis technique to determine these relationships. The results from this analysis are used to assess the effect of these contributing factors on crash severity. The findings obtained through the use of data mining techniques presented in this thesis, have been found to be consistent with existing identified contributing factors. Furthermore, this thesis has identified new contributing factors towards crashes and the relationships between them. A significant pattern related with crash severity is the time of the day where severe road crashes occur more frequently in the evening or night time. Tree collision is another common pattern where crashes that occur in the morning and involves hitting a tree are likely to have a higher crash severity. Another factor that influences crash severity is the age of the driver. Most age groups face a high crash severity except for drivers between 60 and 100 years old, who have the lowest crash severity. The significant relationship identified between contributing factors consists of the time of the crash, the manufactured year of the vehicle, the age of the driver and hitting a tree. Having identified new contributing factors and relationships, a validation process is carried out using a traffic simulator in order to determine their accuracy. The validation process indicates that the results are accurate. This demonstrates that data mining techniques are a powerful tool in road safety research, and can be usefully applied within the Intelligent Transport System (ITS) domain. The research presented in this thesis provides an insight into the complexity of crashes on road curves. The findings of this research have important implications for both practitioners and academics. For road safety practitioners, the results from this research illustrate practical benefits for the design of interventions for road curves that will potentially help in decreasing related injuries and fatalities. For academics, this research opens up a new research methodology to assess crash severity, related to road crashes on curves.

Discovering novel knowledge using granule mining

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an extended granule mining based methodology, to effectively describe the relationships between granules not only by traditional support and confidence, but by diversity and condition diversity as well. Diversity measures how diverse of a granule associated with the other granules, it provides a kind of novel knowledge in databases. We also provide an algorithm to implement the proposed methodology. The experiments conducted to characterize a real network traffic data collection show that the proposed concepts and algorithm are promising.

Granule mining and its application for network traffic characterization

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Decision table and decision rules play an important role in rough set based data analysis, which compress databases into granules and describe the associations between granules. Granule mining was also proposed to interpret decision rules in terms of association rules and multi-tier structure. In this paper, we further extend granule mining to describe the relationships between granules not only by traditional support and confidence, but by diversity and condition diversity as well. Diversity measures how diverse of a granule associated with the other ganules, it provides a kind of novel knowledge in databases. Some experiments are conducted to test the proposed new concepts for describing the characteristics of a real network traffic data collection. The results show that the proposed concepts are promising.

Association hierarchy mining and its application for network traffic characterisation

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis presents an association rule mining approach, association hierarchy mining (AHM). Different to the traditional two-step bottom-up rule mining, AHM adopts one-step top-down rule mining strategy to improve the efficiency and effectiveness of mining association rules from datasets. The thesis also presents a novel approach to evaluate the quality of knowledge discovered by AHM, which focuses on evaluating information difference between the discovered knowledge and the original datasets. Experiments performed on the real application, characterizing network traffic behaviour, have shown that AHM achieves encouraging performance.

A novel approach to assessing road-curve crash severity

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Curves are a common feature of road infrastructure; however crashes on road curves are associated with increased risk of injury and fatality to vehicle occupants. Countermeasures require the identification of contributing factors. However, current approaches to identifying contributors use traditional statistical methods and have not used self-reported narrative claim to identify factors related to the driver, vehicle and environment in a systemic way. Text mining of 3434 road-curve crash claim records filed between 1 January 2003 and 31 December 2005 at a major insurer in Queensland, Australia, was undertaken to identify risk levels and contributing factors. Rough set analysis was used on insurance claim narratives to identify significant contributing factors to crashes and their associated severity. New contributing factors unique to curve crashes were identified (e.g., tree, phone, over-steer) in addition to those previously identified via traditional statistical analysis of Police and licensing authority records. Text mining is a novel methodology to improve knowledge related to risk and contributing factors to road-curve crash severity. Future road-curve crash countermeasures should more fully consider the interrelationships between environment, the road, the driver and the vehicle, and education campaigns in particular could highlight the increased risk of crash on road-curves.

基于粗糙集和多Agent系统的知识挖掘

Relevância:

60.00% 60.00%

Publicador:

Resumo:

通过优化知识表达系统中条件属性对决策属性的依赖度,深入研究了粗糙集并与多Agent系统相结合。利用离散粒子群算法,提出一种基于粒子群优化的粗糙集知识约简算法,该算法解决了启发式算法无法全局搜索进行约简的问题。最后通过在矿井中调度信息的应用验证了有效性。

基于粗糙集-BP神经网络的发动机故障诊断

Relevância:

60.00% 60.00%

Publicador:

Resumo:

由于发动机光谱分析监控数据中磨损微粒种类过多,如果将这些微粒信息直接作为神经网络的输入,则存在输入层神经元过多、网络结构复杂等诸多问题。本文将粗糙集引入到发动机故障诊断中来,利用粗糙集在属性约简方面的优势,删除冗余磨损微粒,提取出重要磨损微粒,并将其作为BP神经网络的输入,建立发动机故障诊断模型。该方法降低输入层的神经元个数,简化了网络结构,缩短网络训练时间,并且由于剔除了冗余磨损微粒,减少了由该部分微粒信息不准确而带来的误差,有效提高了故障诊断的精确度。最后通过算例分析验证了相关算法和诊断模型的准确性和有效性。

A computational intelligent approach to multi-factor analysis of violent crime information systems

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Various scientific studies have explored the causes of violent behaviour from different perspectives, with psychological tests, in particular, applied to the analysis of crime factors. The relationship between bi-factors has also been extensively studied including the link between age and crime. In reality, many factors interact to contribute to criminal behaviour and as such there is a need to have a greater level of insight into its complex nature. In this article we analyse violent crime information systems containing data on psychological, environmental and genetic factors. Our approach combines elements of rough set theory with fuzzy logic and particle swarm optimisation to yield an algorithm and methodology that can effectively extract multi-knowledge from information systems. The experimental results show that our approach outperforms alternative genetic algorithm and dynamic reduct-based techniques for reduct identification and has the added advantage of identifying multiple reducts and hence multi-knowledge (rules). Identified rules are consistent with classical statistical analysis of violent crime data and also reveal new insights into the interaction between several factors. As such, the results are helpful in improving our understanding of the factors contributing to violent crime and in highlighting the existence of hidden and intangible relationships between crime factors.

Statistical Machine Learning Techniques for the Prediction of Learning Disabilities in School-Age Children

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.

Prediction of the bulking phenomenon in wastewater treatment plants

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The control and prediction of wastewater treatment plants poses an important goal: to avoid breaking the environmental balance by always keeping the system in stable operating conditions. It is known that qualitative information — coming from microscopic examinations and subjective remarks — has a deep influence on the activated sludge process. In particular, on the total amount of effluent suspended solids, one of the measures of overall plant performance. The search for an input–output model of this variable and the prediction of sudden increases (bulking episodes) is thus a central concern to ensure the fulfillment of current discharge limitations. Unfortunately, the strong interrelation between variables, their heterogeneity and the very high amount of missing information makes the use of traditional techniques difficult, or even impossible. Through the combined use of several methods — rough set theory and artificial neural networks, mainly — reasonable prediction models are found, which also serve to show the different importance of variables and provide insight into the process dynamics

Revising the National Ambient Air Quality Standards: The integration of science and policy

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Under the Clean Air Act, Congress granted discretionary decision making authority to the Administrator of the Environmental Protection Agency (EPA). This discretionary authority involves setting standards to protect the public's health with an "adequate margin of safety" based on current scientific knowledge. The Administrator of the EPA is usually not a scientist, and for the National Ambient Air Quality Standard (NAAQS) for particulate matter (PM), the Administrator faced the task of revising a standard when several scientific factors were ambiguous. These factors included: (1) no identifiable threshold below which health effects are not manifested, (2) no biological basis to explain the reported associations between particulate matter and adverse health effects, and (3) no consensus among the members of the Clean Air Scientific Advisory Committee (CASAC) as to what an appropriate PM indicator, averaging period, or value would be for the revised standard. ^ This project recommends and demonstrates a tool, integrated assessment (IA), to aid the Administrator in making a public health policy decision in the face of ambiguous scientific factors. IA is an interdisciplinary approach to decision making that has been used to deal with complex issues involving many uncertainties, particularly climate change analyses. Two IA approaches are presented; a rough set analysis by which the expertise of CASAC members can be better utilized, and a flag model for incorporating the views of stakeholders into the standard setting process. ^ The rough set analysis can describe minimal and maximal conditions about the current science pertaining to PM and health effects. Similarly, a flag model can evaluate agreement or lack of agreement by various stakeholder groups to the proposed standard in the PM review process. ^ The use of these IA tools will enable the Administrator to (1) complete the NAAQS review in a manner that is in closer compliance with the Clean Air Act, (2) expand the input from CASAC, (3) take into consideration the views of the stakeholders, and (4) retain discretionary decision making authority. ^

Clustering web documents using hierarchical representation with multi-granularity

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.

A Distance-Based Method for Attribute Reduction in Incomplete Decision Systems

Relevância:

60.00% 60.00%

Publicador:

Resumo:

There are limitations in recent research undertaken on attribute reduction in incomplete decision systems. In this paper, we propose a distance-based method for attribute reduction in an incomplete decision system. In addition, we prove theoretically that our method is more effective than some other methods.

«
1
2
3
4
5
6
7
8
...
61
62
»