866 resultados para mining closure
Resumo:
This thesis presents an association rule mining approach, association hierarchy mining (AHM). Different to the traditional two-step bottom-up rule mining, AHM adopts one-step top-down rule mining strategy to improve the efficiency and effectiveness of mining association rules from datasets. The thesis also presents a novel approach to evaluate the quality of knowledge discovered by AHM, which focuses on evaluating information difference between the discovered knowledge and the original datasets. Experiments performed on the real application, characterizing network traffic behaviour, have shown that AHM achieves encouraging performance.
Resumo:
This research is a step forward in improving the accuracy of detecting anomaly in a data graph representing connectivity between people in an online social network. The proposed hybrid methods are based on fuzzy machine learning techniques utilising different types of structural input features. The methods are presented within a multi-layered framework which provides the full requirements needed for finding anomalies in data graphs generated from online social networks, including data modelling and analysis, labelling, and evaluation.
Resumo:
This paper presents a single pass algorithm for mining discriminative Itemsets in data streams using a novel data structure and the tilted-time window model. Discriminative Itemsets are defined as Itemsets that are frequent in one data stream and their frequency in that stream is much higher than the rest of the streams in the dataset. In order to deal with the data structure size, we propose a pruning process that results in the compact tree structure containing discriminative Itemsets. Empirical analysis shows the sound time and space complexity of the proposed method.
Resumo:
This paper examines the social licence to operate (SLO) of Western Australia's (WA's) mining industry in the context of the state's ‘developmentalist’ agenda. We draw on the findings of a multi-disciplinary body of new research on the risks and challenges posed byWA's mining industry for environmental, social and economic sustainability. We synthesise the findings of this work against the backdrop of the broader debates on corporate social responsibility (CSR) and resource governance. In light of the data presented, this paper takes issue with the mining sector's SLO and its assessment of social and environmental impacts in WA for three inter-related reasons. A state government ideologically wedded to resource-led growth is seen to offer the resource sector a political licence to operate and to give insufficient attention to its potential social and environmental impacts. As a result, the resource sector can adopt a self-serving CSR agenda built on a limited win–win logic and operate with a ‘quasi social licence’ that is restricted to mere economic legitimacy. Overall, this paper problematises the political-cum-commercial construction and neoliberalisation of the SLO and raises questions about the impact of mining in WA.
Resumo:
Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.
Resumo:
This paper presents an algorithm for mining unordered embedded subtrees using the balanced-optimal-search canonical form (BOCF). A tree structure guided scheme based enumeration approach is defined using BOCF for systematically enumerating the valid subtrees only. Based on this canonical form and enumeration technique, the balanced optimal search embedded subtree mining algorithm (BEST) is introduced for mining embedded subtrees from a database of labelled rooted unordered trees. The extensive experiments on both synthetic and real datasets demonstrate the efficiency of BEST over the two state-of-the-art algorithms for mining embedded unordered subtrees, SLEUTH and U3.
Resumo:
Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.
Resumo:
Recent growth and expansion of the fly-in/fly-out (FIFO) model of mining in remote rural Australia has led to concerns about the health and well-being of those employed by the mines and those in the small rural communities where they are based. A particular concern has been the potential disruption to sexual norms in mining towns and increases in sexually transmitted infections (STIs) and HIV.
Resumo:
Can the mining boom be blamed for the rising rates of sexually transmitted infections (STIs) in some states? The Australian Medical Association thinks so, with its Queensland president Dr Richard Kidd attributing rising rates of gonorrhoea, syphilis and chlamydia in Queensland and Western Australia to bored and cashed-up miners.
Resumo:
This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.
Resumo:
Objective: To explore fly-in fly-out (FIFO) mining workers' attitudes towards the leisure time they spend in mining camps, the recreational and social aspects of mining camp culture, the camps' communal and recreational infrastructure and activities, and implications for health. Design: In-depth semistructured interviews. Setting: Individual interviews at locations convenient for each participant. Participants: A total of seven participants, one female and six males. The age group varied within 20–59 years. Marital status varied across participants. Main outcome measures: A qualitative approach was used to interview participants, with responses thematically analysed. Findings highlight how the recreational infrastructure and activities at mining camps impact participants' enjoyment of the camps and their feelings of community and social inclusion. Results: Three main areas of need were identified in the interviews, as follows: (i) on-site facilities and activities; (ii) the role of infrastructure in facilitating a sense of community; and (iii) barriers to social interaction. Conclusion: Recreational infrastructure and activities enhance the experience of FIFO workers at mining camps. The availability of quality recreational facilities helps promote social interaction, provides for greater social inclusion and improves the experience of mining camps for their temporary FIFO residents. The infrastructure also needs to allow for privacy and individual recreational activities, which participants identified as important emotional needs. Developing appropriate recreational infrastructure at mining camps would enhance social interactions among FIFO workers, improve their well-being and foster a sense of community. Introducing infrastructure to promote social and recreational activities could also reduce alcohol-related social exclusion.
Resumo:
Due to the availability of huge number of web services, finding an appropriate Web service according to the requirements of a service consumer is still a challenge. Moreover, sometimes a single web service is unable to fully satisfy the requirements of the service consumer. In such cases, combinations of multiple inter-related web services can be utilised. This paper proposes a method that first utilises a semantic kernel model to find related services and then models these related Web services as nodes of a graph. An all-pair shortest-path algorithm is applied to find the best compositions of Web services that are semantically related to the service consumer requirement. The recommendation of individual and composite Web services composition for a service request is finally made. Empirical evaluation confirms that the proposed method significantly improves the accuracy of service discovery in comparison to traditional keyword-based discovery methods.
Resumo:
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.
Resumo:
Human resources are often responsible for the execution of business processes. In order to evaluate resource performance and identify best practices as well as opportunities for improvement, managers need objective information about resource behaviours. Companies often use information systems to support their processes and these systems record information about process execution in event logs. We present a framework for analysing and evaluating resource behaviour through mining such event logs. The framework provides a method for extracting descriptive information about resource skills, utilisation, preferences, productivity and collaboration patterns; a method for analysing relationships between different resource behaviours and outcomes; and a method for evaluating the overall resource productivity, tracking its changes over time and comparing it with the productivity of other resources. To demonstrate the applicability of our framework we apply it to analyse behaviours of employees in an Australian company and evaluate its usefulness by a survey among managers in industry.
Resumo:
Protein adsorption at solid-liquid interfaces is critical to many applications, including biomaterials, protein microarrays and lab-on-a-chip devices. Despite this general interest, and a large amount of research in the last half a century, protein adsorption cannot be predicted with an engineering level, design-orientated accuracy. Here we describe a Biomolecular Adsorption Database (BAD), freely available online, which archives the published protein adsorption data. Piecewise linear regression with breakpoint applied to the data in the BAD suggests that the input variables to protein adsorption, i.e., protein concentration in solution; protein descriptors derived from primary structure (number of residues, global protein hydrophobicity and range of amino acid hydrophobicity, isoelectric point); surface descriptors (contact angle); and fluid environment descriptors (pH, ionic strength), correlate well with the output variable-the protein concentration on the surface. Furthermore, neural network analysis revealed that the size of the BAD makes it sufficiently representative, with a neural network-based predictive error of 5% or less. Interestingly, a consistently better fit is obtained if the BAD is divided in two separate sub-sets representing protein adsorption on hydrophilic and hydrophobic surfaces, respectively. Based on these findings, selected entries from the BAD have been used to construct neural network-based estimation routines, which predict the amount of adsorbed protein, the thickness of the adsorbed layer and the surface tension of the protein-covered surface. While the BAD is of general interest, the prediction of the thickness and the surface tension of the protein-covered layers are of particular relevance to the design of microfluidics devices.