851 resultados para Graph mining
Resumo:
Many large coal mining operations in Australia rely heavily on the rail network to transport coal from mines to coal terminals at ports for shipment. Over the last few years, due to the fast growing demand, the coal rail network is becoming one of the worst industrial bottlenecks in Australia. As a result, this provides great incentives for pursuing better optimisation and control strategies for the operation of the whole rail transportation system under network and terminal capacity constraints. This PhD research aims to achieve a significant efficiency improvement in a coal rail network on the basis of the development of standard modelling approaches and generic solution techniques. Generally, the train scheduling problem can be modelled as a Blocking Parallel- Machine Job-Shop Scheduling (BPMJSS) problem. In a BPMJSS model for train scheduling, trains and sections respectively are synonymous with jobs and machines and an operation is regarded as the movement/traversal of a train across a section. To begin, an improved shifting bottleneck procedure algorithm combined with metaheuristics has been developed to efficiently solve the Parallel-Machine Job- Shop Scheduling (PMJSS) problems without the blocking conditions. Due to the lack of buffer space, the real-life train scheduling should consider blocking or hold-while-wait constraints, which means that a track section cannot release and must hold a train until the next section on the routing becomes available. As a consequence, the problem has been considered as BPMJSS with the blocking conditions. To develop efficient solution techniques for BPMJSS, extensive studies on the nonclassical scheduling problems regarding the various buffer conditions (i.e. blocking, no-wait, limited-buffer, unlimited-buffer and combined-buffer) have been done. In this procedure, an alternative graph as an extension of the classical disjunctive graph is developed and specially designed for the non-classical scheduling problems such as the blocking flow-shop scheduling (BFSS), no-wait flow-shop scheduling (NWFSS), and blocking job-shop scheduling (BJSS) problems. By exploring the blocking characteristics based on the alternative graph, a new algorithm called the topological-sequence algorithm is developed for solving the non-classical scheduling problems. To indicate the preeminence of the proposed algorithm, we compare it with two known algorithms (i.e. Recursive Procedure and Directed Graph) in the literature. Moreover, we define a new type of non-classical scheduling problem, called combined-buffer flow-shop scheduling (CBFSS), which covers four extreme cases: the classical FSS (FSS) with infinite buffer, the blocking FSS (BFSS) with no buffer, the no-wait FSS (NWFSS) and the limited-buffer FSS (LBFSS). After exploring the structural properties of CBFSS, we propose an innovative constructive algorithm named the LK algorithm to construct the feasible CBFSS schedule. Detailed numerical illustrations for the various cases are presented and analysed. By adjusting only the attributes in the data input, the proposed LK algorithm is generic and enables the construction of the feasible schedules for many types of non-classical scheduling problems with different buffer constraints. Inspired by the shifting bottleneck procedure algorithm for PMJSS and characteristic analysis based on the alternative graph for non-classical scheduling problems, a new constructive algorithm called the Feasibility Satisfaction Procedure (FSP) is proposed to obtain the feasible BPMJSS solution. A real-world train scheduling case is used for illustrating and comparing the PMJSS and BPMJSS models. Some real-life applications including considering the train length, upgrading the track sections, accelerating a tardy train and changing the bottleneck sections are discussed. Furthermore, the BPMJSS model is generalised to be a No-Wait Blocking Parallel- Machine Job-Shop Scheduling (NWBPMJSS) problem for scheduling the trains with priorities, in which prioritised trains such as express passenger trains are considered simultaneously with non-prioritised trains such as freight trains. In this case, no-wait conditions, which are more restrictive constraints than blocking constraints, arise when considering the prioritised trains that should traverse continuously without any interruption or any unplanned pauses because of the high cost of waiting during travel. In comparison, non-prioritised trains are allowed to enter the next section immediately if possible or to remain in a section until the next section on the routing becomes available. Based on the FSP algorithm, a more generic algorithm called the SE algorithm is developed to solve a class of train scheduling problems in terms of different conditions in train scheduling environments. To construct the feasible train schedule, the proposed SE algorithm consists of many individual modules including the feasibility-satisfaction procedure, time-determination procedure, tune-up procedure and conflict-resolve procedure algorithms. To find a good train schedule, a two-stage hybrid heuristic algorithm called the SE-BIH algorithm is developed by combining the constructive heuristic (i.e. the SE algorithm) and the local-search heuristic (i.e. the Best-Insertion- Heuristic algorithm). To optimise the train schedule, a three-stage algorithm called the SE-BIH-TS algorithm is developed by combining the tabu search (TS) metaheuristic with the SE-BIH algorithm. Finally, a case study is performed for a complex real-world coal rail network under network and terminal capacity constraints. The computational results validate that the proposed methodology would be very promising because it can be applied as a fundamental tool for modelling and solving many real-world scheduling problems.
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
In a seminal data mining article, Leo Breiman [1] argued that to develop effective predictive classification and regression models, we need to move away from the sole dependency on statistical algorithms and embrace a wider toolkit of modeling algorithms that include data mining procedures. Nevertheless, many researchers still rely solely on statistical procedures when undertaking data modeling tasks; the sole reliance on these procedures has lead to the development of irrelevant theory and questionable research conclusions ([1], p.199). We will outline initiatives that the HPC & Research Support group is undertaking to engage researchers with data mining tools and techniques; including a new range of seminars, workshops, and one-on-one consultations covering data mining algorithms, the relationship between data mining and the research cycle, and limitations and problems with these new algorithms. Organisational limitations and restrictions to these initiatives are also discussed.
Resumo:
This paper examines Australian media representations of the male managers of two global mining corporations, Rio Tinto and BHP Billiton. These organizations are transnational (or multinational) corporations with assets and/or operations across national boundaries (Dunning and Lundan, 2008), and indeed their respective Chief Executive Officers, Tom Albanese and Marius Kloppers are two of the most economically (and arguably politically) powerful in the world overseeing 37 000 and 39 000 employees internationally. With a 2008 profit of US$15.962 billion and assets of US$ 75.889 Billion BHP Billiton is the world's largest mining company. In terms of its profits and assets Rio Tinto ranks fourth in the world, but with operations in six countries (mainly Canada and Australia) and a 2008 profit of US$10.3 billion it is also emblematic of the transnational in that its ‘budget is larger than that of all but a few nations’ (Giddens, 2003, p. 62).
Resumo:
Information Overload and Mismatch are two fundamental problems affecting the effectiveness of information filtering systems. Even though both term-based and patternbased approaches have been proposed to address the problems of overload and mismatch, neither of these approaches alone can provide a satisfactory solution to address these problems. This paper presents a novel two-stage information filtering model which combines the merits of term-based and pattern-based approaches to effectively filter sheer volume of information. In particular, the first filtering stage is supported by a novel rough analysis model which efficiently removes a large number of irrelevant documents, thereby addressing the overload problem. The second filtering stage is empowered by a semantically rich pattern taxonomy mining model which effectively fetches incoming documents according to the specific information needs of a user, thereby addressing the mismatch problem. The experimental results based on the RCV1 corpus show that the proposed twostage filtering model significantly outperforms the both termbased and pattern-based information filtering models.
Resumo:
Advances in data mining have provided techniques for automatically discovering underlying knowledge and extracting useful information from large volumes of data. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large complex databases. Application of data mining to manufacturing is relatively limited mainly because of complexity of manufacturing data. Growing self organizing map (GSOM) algorithm has been proven to be an efficient algorithm to analyze unsupervised DNA data. However, it produced unsatisfactory clustering when used on some large manufacturing data. In this paper a data mining methodology has been proposed using a GSOM tool which was developed using a modified GSOM algorithm. The proposed method is used to generate clusters for good and faulty products from a manufacturing dataset. The clustering quality (CQ) measure proposed in the paper is used to evaluate the performance of the cluster maps. The paper also proposed an automatic identification of variables to find the most probable causative factor(s) that discriminate between good and faulty product by quickly examining the historical manufacturing data. The proposed method offers the manufacturers to smoothen the production flow and improve the quality of the products. Simulation results on small and large manufacturing data show the effectiveness of the proposed method.
Resumo:
In the late 20th century, a value-shift began to influence political thinking, recognising the need for environmentally, socially and culturally sustainable resource development. This shift entailed moves away from thinking of nature and culture as separate entities - The former existing merely to serve the latter. Cultural landscape theory recognises 'nature' as at once both 'natural', and as a 'cultural' construct. As such it may offer a framework through which to progress in the quest for 'sustainable development'. This 2005 Masters thesis makes a contribution to that quest by asking whether contemporary developments in cultural landscape theory can contribute to rehabilitation strategies for Australian open-cut coal mining landscapes, an examplar resource development landscape. A thematic historial overview of landscape values and resource development in Australis post-1788, and a review of cultural landscape theory literature contribute to the formation of the theoretical framework: "reconnecting the interrupted landscape". The author then explores a possible application of this framework within the Australian open-cut coal mining landscape.
Resumo:
Road safety is a major concern worldwide. Road safety will improve as road conditions and their effects on crashes are continually investigated. This paper proposes to use the capability of data mining to include the greater set of road variables for all available crashes with skid resistance values across the Queensland state main road network in order to understand the relationships among crash, traffic and road variables. This paper presents a data mining based methodology for the road asset management data to find out the various road properties that contribute unduly to crashes. The models demonstrate high levels of accuracy in predicting crashes in roads when various road properties are included. This paper presents the findings of these models to show the relationships among skid resistance, crashes, crash characteristics and other road characteristics such as seal type, seal age, road type, texture depth, lane count, pavement width, rutting, speed limit, traffic rates intersections, traffic signage and road design and so on.
Resumo:
Developing safe and sustainable road systems is a common goal in all countries. Applications to assist with road asset management and crash minimization are sought universally. This paper presents a data mining methodology using decision trees for modeling the crash proneness of road segments using available road and crash attributes. The models quantify the concept of crash proneness and demonstrate that road segments with only a few crashes have more in common with non-crash roads than roads with higher crash counts. This paper also examines ways of dealing with highly unbalanced data sets encountered in the study.
Resumo:
It is commonly accepted that wet roads have higher risk of crash than dry roads; however, providing evidence to support this assumption presents some difficulty. This paper presents a data mining case study in which predictive data mining is applied to model the skid resistance and crash relationship to search for discernable differences in the probability of wet and dry road segments having crashes based on skid resistance. The models identify an increased probability of wet road segments having crashes for mid-range skid resistance values.
Resumo:
Road crashes cost world and Australian society a significant proportion of GDP, affecting productivity and causing significant suffering for communities and individuals. This paper presents a case study that generates data mining models that contribute to understanding of road crashes by allowing examination of the role of skid resistance (F60) and other road attributes in road crashes. Predictive data mining algorithms, primarily regression trees, were used to produce road segment crash count models from the road and traffic attributes of crash scenarios. The rules derived from the regression trees provide evidence of the significance of road attributes in contributing to crash, with a focus on the evaluation of skid resistance.
Resumo:
Item folksonomy or tag information is a kind of typical and prevalent web 2.0 information. Item folksonmy contains rich opinion information of users on item classifications and descriptions. It can be used as another important information source to conduct opinion mining. On the other hand, each item is associated with taxonomy information that reflects the viewpoints of experts. In this paper, we propose to mine for users’ opinions on items based on item taxonomy developed by experts and folksonomy contributed by users. In addition, we explore how to make personalized item recommendations based on users’ opinions. The experiments conducted on real word datasets collected from Amazon.com and CiteULike demonstrated the effectiveness of the proposed approaches.
Resumo:
This is the final report from a study into the social impact of mining in Queensland.
Resumo:
It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well.