361 resultados para mining areas
Resumo:
This paper deals with the problem of using the data mining models in a real-world situation where the user can not provide all the inputs with which the predictive model is built. A learning system framework, Query Based Learning System (QBLS), is developed for improving the performance of the predictive models in practice where not all inputs are available for querying to the system. The automatic feature selection algorithm called Query Based Feature Selection (QBFS) is developed for selecting features to obtain a balance between the relative minimum subset of features and the relative maximum classification accuracy. Performance of the QBLS system and the QBFS algorithm is successfully demonstrated with a real-world application
Resumo:
The management of main material prices of provincial highway project quota has problems of lag and blindness. Framework of provincial highway project quota data MIS and main material price data warehouse were established based on WEB firstly. Then concrete processes of provincial highway project main material prices were brought forward based on BP neural network algorithmic. After that standard BP algorithmic, additional momentum modify BP network algorithmic, self-adaptive study speed improved BP network algorithmic were compared in predicting highway project main prices. The result indicated that it is feasible to predict highway main material prices using BP NN, and using self-adaptive study speed improved BP network algorithmic is the relatively best one.
Resumo:
Queensland University of Technology (QUT) is faced with a rapidly growing research agenda built upon a strategic research capacity-building program. This presentation will outline the results of a project that has recently investigated QUT’s research support requirements and which has developed a model for the support of eResearch across the university. QUT’s research building strategy has produced growth at the faculty level and within its research institutes. This increased research activity is pushing the need for university-wide eResearch platforms capable of providing infrastructure and support in areas such as collaboration, data, networking, authentication and authorisation, workflows and the grid. One of the driving forces behind the investigation is data-centric nature of modern research. It is now critical that researchers have access to supported infrastructure that allows the collection, analysis, aggregation and sharing of large data volumes for exploration and mining in order to gain new insights and to generate new knowledge. However, recent surveys into current research data management practices by the Australian Partnership for Sustainable Repositories (APSR) and by QUT itself, has revealed serious shortcomings in areas such as research data management, especially its long term maintenance for reuse and authoritative evidence of research findings. While these internal university pressures are building, at the same time there are external pressures that are magnifying them. For example, recent compliance guidelines from bodies such as the ARC, and NHMRC and Universities Australia indicate that institutions need to provide facilities for the safe and secure storage of research data along with a surrounding set of policies, on its retention, ownership and accessibility. The newly formed Australian National Data Service (ANDS) is developing strategies and guidelines for research data management and research institutions are a central focus, responsible for managing and storing institutional data on platforms that can be federated nationally and internationally for wider use. For some time QUT has recognised the importance of eResearch and has been active in a number of related areas: ePrints to digitally publish research papers, grid computing portals and workflows, institutional-wide provisioning and authentication systems, and legal protocols for copyright management. QUT also has two widely recognised centres focused on fundamental research into eResearch itself: The OAK LAW project (Open Access to Knowledge) which focuses upon legal issues relating eResearch and the Microsoft QUT eResearch Centre whose goal is to accelerate scientific research discovery, through new smart software. In order to better harness all of these resources and improve research outcomes, the university recently established a project to investigate how it might better organise the support of eResearch. This presentation will outline the project outcomes, which include a flexible and sustainable eResearch support service model addressing short and longer term research needs, identification of resource requirements required to establish and sustain the service, and the development of research data management policies and implementation plans.
Resumo:
A number of factors have been shown to influence residential property prices in various locations. Studies have identified the importance of location in relation to services, transport and proximity to negative factors such as power lines and cell phone towers. Often the socio-economic status of a residential precinct can determine the overall quality and nature of the streetscapes in that area, with higher value suburbs or locations offering a better visual appearance compared to areas where these factors are not present. However, does the same value for a good streetscape apply in lower socio-economic areas or a buyers more motivated by less aesthetic factors such as size of the house, construction materials or land size. This paper analyses specific streets in a lower to middle socio-economic suburb of Christchurch New Zealand to determine if the location of a house in a street with good streetscape appeal has greater value, investment performance and saleability compared to adjoining streets with less aesthetic appeal.
Resumo:
Real-Time Kinematic (RTK) positioning is a technique used to provide precise positioning services at centimetre accuracy level in the context of Global Navigation Satellite Systems (GNSS). While a Network-based RTK (N-RTK) system involves multiple continuously operating reference stations (CORS), the simplest form of a NRTK system is a single-base RTK. In Australia there are several NRTK services operating in different states and over 1000 single-base RTK systems to support precise positioning applications for surveying, mining, agriculture, and civil construction in regional areas. Additionally, future generation GNSS constellations, including modernised GPS, Galileo, GLONASS, and Compass, with multiple frequencies have been either developed or will become fully operational in the next decade. A trend of future development of RTK systems is to make use of various isolated operating network and single-base RTK systems and multiple GNSS constellations for extended service coverage and improved performance. Several computational challenges have been identified for future NRTK services including: • Multiple GNSS constellations and multiple frequencies • Large scale, wide area NRTK services with a network of networks • Complex computation algorithms and processes • Greater part of positioning processes shifting from user end to network centre with the ability to cope with hundreds of simultaneous users’ requests (reverse RTK) There are two major requirements for NRTK data processing based on the four challenges faced by future NRTK systems, expandable computing power and scalable data sharing/transferring capability. This research explores new approaches to address these future NRTK challenges and requirements using the Grid Computing facility, in particular for large data processing burdens and complex computation algorithms. A Grid Computing based NRTK framework is proposed in this research, which is a layered framework consisting of: 1) Client layer with the form of Grid portal; 2) Service layer; 3) Execution layer. The user’s request is passed through these layers, and scheduled to different Grid nodes in the network infrastructure. A proof-of-concept demonstration for the proposed framework is performed in a five-node Grid environment at QUT and also Grid Australia. The Networked Transport of RTCM via Internet Protocol (Ntrip) open source software is adopted to download real-time RTCM data from multiple reference stations through the Internet, followed by job scheduling and simplified RTK computing. The system performance has been analysed and the results have preliminarily demonstrated the concepts and functionality of the new NRTK framework based on Grid Computing, whilst some aspects of the performance of the system are yet to be improved in future work.
Resumo:
Research has noted a ‘pronounced pattern of increase with increasing remoteness' of death rates in road crashes. However, crash characteristics by remoteness are not commonly or consistently reported, with definitions of rural and urban often relying on proxy representations such as prevailing speed limit. The current paper seeks to evaluate the efficacy of the Accessibility / Remoteness Index of Australia (ARIA+) to identifying trends in road crashes. ARIA+ does not rely on road-specific measures and uses distances to populated centres to attribute a score to an area, which can in turn be grouped into 5 classifications of increasing remoteness. The current paper uses applications of these classifications at the broad level of Australian Bureau of Statistics' Statistical Local Areas, thus avoiding precise crash locating or dedicated mapping software. Analyses used Queensland road crash database details for all 31,346 crashes resulting in a fatality or hospitalisation occurring between 1st July, 2001 and 30th June 2006 inclusive. Results showed that this simplified application of ARIA+ aligned with previous definitions such as speed limit, while also providing further delineation. Differences in crash contributing factors were noted with increasing remoteness such as a greater representation of alcohol and ‘excessive speed for circumstances.' Other factors such as the predominance of younger drivers in crashes differed little by remoteness classification. The results are discussed in terms of the utility of remoteness as a graduated rather than binary (rural/urban) construct and the potential for combining ARIA crash data with census and hospital datasets.
Resumo:
Research has shown that road lane width impacts on driver behaviour. This literature review provides guidelines to assist in the design, construction and retrofitting of urban roads to accommodate road users' safety requirements. It focuses on the impacts of lane widths on cyclists and motor vehicle safety behaviour. The literature review commenced with a search of library databases. Peer reviewed articles and road authority (local, state and national) reports were reviewed. The majority of studies investigating the effects of lane width on driver behaviour were simulator based, while research into cycling safety involved data collected from actual traffic environments. Results show that marked road lane width influences perceived task difficulty, risk perception and possibly speed choice. The positioning of cyclists in traffic lanes is influenced by the presence of on-road cycling facilities and the total roadway width. The lateral displacement between bicycle and vehicle is smallest when a bicycle facility is present. Lower, or reduced, vehicle speeds play a significant role in improving bicyclist and pedestrian safety. It is also shown that if road lane widths in urban areas were reduced, to a functional width that was less than the current guidelines of 3.5m, it could result in a safer road environment for all road users.
Resumo:
Classical negotiation models are weak in supporting real-world business negotiations because these models often assume that the preference information of each negotiator is made public. Although parametric learning methods have been proposed for acquiring the preference information of negotiation opponents, these methods suffer from the strong assumptions about the specific utility function and negotiation mechanism employed by the opponents. Consequently, it is difficult to apply these learning methods to the heterogeneous negotiation agents participating in e‑marketplaces. This paper illustrates the design, development, and evaluation of a nonparametric negotiation knowledge discovery method which is underpinned by the well-known Bayesian learning paradigm. According to our empirical testing, the novel knowledge discovery method can speed up the negotiation processes while maintaining negotiation effectiveness. To the best of our knowledge, this is the first nonparametric negotiation knowledge discovery method developed and evaluated in the context of multi-issue bargaining over e‑marketplaces.
Resumo:
It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.
Resumo:
Dealing with the ever-growing information overload in the Internet, Recommender Systems are widely used online to suggest potential customers item they may like or find useful. Collaborative Filtering is the most popular techniques for Recommender Systems which collects opinions from customers in the form of ratings on items, services or service providers. In addition to the customer rating about a service provider, there is also a good number of online customer feedback information available over the Internet as customer reviews, comments, newsgroups post, discussion forums or blogs which is collectively called user generated contents. This information can be used to generate the public reputation of the service providers’. To do this, data mining techniques, specially recently emerged opinion mining could be a useful tool. In this paper we present a state of the art review of Opinion Mining from online customer feedback. We critically evaluate the existing work and expose cutting edge area of interest in opinion mining. We also classify the approaches taken by different researchers into several categories and sub-categories. Each of those steps is analyzed with their strength and limitations in this paper.
Resumo:
An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).
Resumo:
The wide range of contributing factors and circumstances surrounding crashes on road curves suggest that no single intervention can prevent these crashes. This paper presents a novel methodology, based on data mining techniques, to identify contributing factors and the relationship between them. It identifies contributing factors that influence the risk of a crash. Incident records, described using free text, from a large insurance company were analysed with rough set theory. Rough set theory was used to discover dependencies among data, and reasons using the vague, uncertain and imprecise information that characterised the insurance dataset. The results show that male drivers, who are between 50 and 59 years old, driving during evening peak hours are involved with a collision, had a lowest crash risk. Drivers between 25 and 29 years old, driving from around midnight to 6 am and in a new car has the highest risk. The analysis of the most significant contributing factors on curves suggests that drivers with driving experience of 25 to 42 years, who are driving a new vehicle have the highest crash cost risk, characterised by the vehicle running off the road and hitting a tree. This research complements existing statistically based tools approach to analyse road crashes. Our data mining approach is supported with proven theory and will allow road safety practitioners to effectively understand the dependencies between contributing factors and the crash type with the view to designing tailored countermeasures.
Resumo:
Despite all attempts to prevent fraud, it continues to be a major threat to industry and government. Traditionally, organizations have focused on fraud prevention rather than detection, to combat fraud. In this paper we present a role mining inspired approach to represent user behaviour in Enterprise Resource Planning (ERP) systems, primarily aimed at detecting opportunities to commit fraud or potentially suspicious activities. We have adapted an approach which uses set theory to create transaction profiles based on analysis of user activity records. Based on these transaction profiles, we propose a set of (1) anomaly types to detect potentially suspicious user behaviour and (2) scenarios to identify inadequate segregation of duties in an ERP environment. In addition, we present two algorithms to construct a directed acyclic graph to represent relationships between transaction profiles. Experiments were conducted using a real dataset obtained from a teaching environment and a demonstration dataset, both using SAP R/3, presently the most predominant ERP system. The results of this empirical research demonstrate the effectiveness of the proposed approach.