360 resultados para Data Mining, Clustering, PSA, Pavement Deflection
Resumo:
In a seminal data mining article, Leo Breiman [1] argued that to develop effective predictive classification and regression models, we need to move away from the sole dependency on statistical algorithms and embrace a wider toolkit of modeling algorithms that include data mining procedures. Nevertheless, many researchers still rely solely on statistical procedures when undertaking data modeling tasks; the sole reliance on these procedures has lead to the development of irrelevant theory and questionable research conclusions ([1], p.199). We will outline initiatives that the HPC & Research Support group is undertaking to engage researchers with data mining tools and techniques; including a new range of seminars, workshops, and one-on-one consultations covering data mining algorithms, the relationship between data mining and the research cycle, and limitations and problems with these new algorithms. Organisational limitations and restrictions to these initiatives are also discussed.
Resumo:
Developing safe and sustainable road systems is a common goal in all countries. Applications to assist with road asset management and crash minimization are sought universally. This paper presents a data mining methodology using decision trees for modeling the crash proneness of road segments using available road and crash attributes. The models quantify the concept of crash proneness and demonstrate that road segments with only a few crashes have more in common with non-crash roads than roads with higher crash counts. This paper also examines ways of dealing with highly unbalanced data sets encountered in the study.
Resumo:
It is commonly accepted that wet roads have higher risk of crash than dry roads; however, providing evidence to support this assumption presents some difficulty. This paper presents a data mining case study in which predictive data mining is applied to model the skid resistance and crash relationship to search for discernable differences in the probability of wet and dry road segments having crashes based on skid resistance. The models identify an increased probability of wet road segments having crashes for mid-range skid resistance values.
Resumo:
Road crashes cost world and Australian society a significant proportion of GDP, affecting productivity and causing significant suffering for communities and individuals. This paper presents a case study that generates data mining models that contribute to understanding of road crashes by allowing examination of the role of skid resistance (F60) and other road attributes in road crashes. Predictive data mining algorithms, primarily regression trees, were used to produce road segment crash count models from the road and traffic attributes of crash scenarios. The rules derived from the regression trees provide evidence of the significance of road attributes in contributing to crash, with a focus on the evaluation of skid resistance.
Resumo:
We propose to use the Tensor Space Modeling (TSM) to represent and analyze the user’s web log data that consists of multiple interests and spans across multiple dimensions. Further we propose to use the decomposition factors of the Tensors for clustering the users based on similarity of search behaviour. Preliminary results show that the proposed method outperforms the traditional Vector Space Model (VSM) based clustering.
Resumo:
With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.
Resumo:
Data mining techniques extract repeated and useful patterns from a large data set that in turn are utilized to predict the outcome of future events. The main purpose of the research presented in this paper is to investigate data mining strategies and develop an efficient framework for multi-attribute project information analysis to predict the performance of construction projects. The research team first reviewed existing data mining algorithms, applied them to systematically analyze a large project data set collected by the survey, and finally proposed a data-mining-based decision support framework for project performance prediction. To evaluate the potential of the framework, a case study was conducted using data collected from 139 capital projects and analyzed the relationship between use of information technology and project cost performance. The study results showed that the proposed framework has potential to promote fast, easy to use, interpretable, and accurate project data analysis.
Resumo:
Key decisions at the collection, pre-processing, transformation, mining and interpretation phase of any knowledge discovery from database (KDD) process depend heavily on assumptions and theorectical perspectives relating to the type of task to be performed and characteristics of data sourced. In this article, we compare and contrast theoretical perspectives and assumptions taken in data mining exercises in the legal domain with those adopted in data mining in TCM and allopathic medicine. The juxtaposition results in insights for the application of KDD for Traditional Chinese Medicine.
Resumo:
This thesis takes a new data mining approach for analyzing road/crash data by developing models for the whole road network and generating a crash risk profile. Roads with an elevated crash risk due to road surface friction deficit are identified. The regression tree model, predicting road segment crash rate, is applied in a novel deployment coined regression tree extrapolation that produces a skid resistance/crash rate curve. Using extrapolation allows the method to be applied across the network and cope with the high proportion of missing road surface friction values. This risk profiling method can be applied in other domains.
Resumo:
Road surface skid resistance has been shown to have a strong relationship to road crash risk, however, applying the current method of using investigatory levels to identify crash prone roads is problematic as they may fail in identifying risky roads outside of the norm. The proposed method analyses a complex and formerly impenetrable volume of data from roads and crashes using data mining. This method rapidly identifies roads with elevated crash-rate, potentially due to skid resistance deficit, for investigation. A hypothetical skid resistance/crash risk curve is developed for each road segment, driven by the model deployed in a novel regression tree extrapolation method. The method potentially solves the problem of missing skid resistance values which occurs during network-wide crash analysis, and allows risk assessment of the major proportion of roads without skid resistance values.
Resumo:
This thesis is a study for automatic discovery of text features for describing user information needs. It presents an innovative data-mining approach that discovers useful knowledge from both relevance and non-relevance feedback information. The proposed approach can largely reduce noises in discovered patterns and significantly improve the performance of text mining systems. This study provides a promising method for the study of Data Mining and Web Intelligence.
Resumo:
This thesis describes the development of a robust and novel prototype to address the data quality problems that relate to the dimension of outlier data. It thoroughly investigates the associated problems with regards to detecting, assessing and determining the severity of the problem of outlier data; and proposes granule-mining based alternative techniques to significantly improve the effectiveness of mining and assessing outlier data.
Resumo:
This paper uses innovative content analysis techniques to map how the death of Oscar Pistorius' girlfriend, Reeva Steenkamp, was framed on Twitter conversations. Around 1.5 million posts from a two-week timeframe are analyzed with a combination of syntactic and semantic methods. This analysis is grounded in the frame analysis perspective and is different than sentiment analysis. Instead of looking for explicit evaluations, such as “he is guilty” or “he is innocent”, we showcase through the results how opinions can be identified by complex articulations of more implicit symbolic devices such as examples and metaphors repeatedly mentioned. Different frames are adopted by users as more information about the case is revealed: from a more episodic one, highly used in the very beginning, to more systemic approaches, highlighting the association of the event with urban violence, gun control issues, and violence against women. A detailed timeline of the discussions is provided.
Resumo:
Road asset managers are seeking analysis of the whole road network to supplement statistical analyses of small subsets of homogeneous roadway. This study outlines the use of data mining capable of analyzing the wide range of situations found on the network, with a focus on the role of skid resistance in the cause of crashes. Results from the analyses show that on non-crash-prone roads with low crash rates, skid resistance contributes only in a minor way, whereas on high-crash roadways, skid resistance often contributes significantly in the calculation of the crash rate. The results provide evidence supporting a causal relationship between skid resistance and crashes and highlight the importance of the role of skid resistance in decision making in road asset management.