849 resultados para Incremental mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent growth and expansion of the fly-in/fly-out (FIFO) model of mining in remote rural Australia has led to concerns about the health and well-being of those employed by the mines and those in the small rural communities where they are based. A particular concern has been the potential disruption to sexual norms in mining towns and increases in sexually transmitted infections (STIs) and HIV.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Can the mining boom be blamed for the rising rates of sexually transmitted infections (STIs) in some states? The Australian Medical Association thinks so, with its Queensland president Dr Richard Kidd attributing rising rates of gonorrhoea, syphilis and chlamydia in Queensland and Western Australia to bored and cashed-up miners.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in terms of stability and robustness, of active learning models built using conditional random fields (CRFs) for information extraction applications. Stability, defined as a small variation of performance when small variation of the training data or a small variation of the parameters occur, is a major issue for machine learning models, but even more so in the active learning framework which aims to minimise the amount of training data required. The factors we investigate are a) the choice of incremental vs. standard active learning, b) the feature set used as a representation of the text (i.e., morphological features, syntactic features, or semantic features) and c) Gaussian prior variance as one of the important CRFs parameters. Our empirical findings show that incremental learning and the Gaussian prior variance lead to more stable and robust models across iterations. Our study also demonstrates that orthographical, morphological and contextual features as a group of basic features play an important role in learning effective models across all iterations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: To explore fly-in fly-out (FIFO) mining workers' attitudes towards the leisure time they spend in mining camps, the recreational and social aspects of mining camp culture, the camps' communal and recreational infrastructure and activities, and implications for health. Design: In-depth semistructured interviews. Setting: Individual interviews at locations convenient for each participant. Participants: A total of seven participants, one female and six males. The age group varied within 20–59 years. Marital status varied across participants. Main outcome measures: A qualitative approach was used to interview participants, with responses thematically analysed. Findings highlight how the recreational infrastructure and activities at mining camps impact participants' enjoyment of the camps and their feelings of community and social inclusion. Results: Three main areas of need were identified in the interviews, as follows: (i) on-site facilities and activities; (ii) the role of infrastructure in facilitating a sense of community; and (iii) barriers to social interaction. Conclusion: Recreational infrastructure and activities enhance the experience of FIFO workers at mining camps. The availability of quality recreational facilities helps promote social interaction, provides for greater social inclusion and improves the experience of mining camps for their temporary FIFO residents. The infrastructure also needs to allow for privacy and individual recreational activities, which participants identified as important emotional needs. Developing appropriate recreational infrastructure at mining camps would enhance social interactions among FIFO workers, improve their well-being and foster a sense of community. Introducing infrastructure to promote social and recreational activities could also reduce alcohol-related social exclusion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to the availability of huge number of web services, finding an appropriate Web service according to the requirements of a service consumer is still a challenge. Moreover, sometimes a single web service is unable to fully satisfy the requirements of the service consumer. In such cases, combinations of multiple inter-related web services can be utilised. This paper proposes a method that first utilises a semantic kernel model to find related services and then models these related Web services as nodes of a graph. An all-pair shortest-path algorithm is applied to find the best compositions of Web services that are semantically related to the service consumer requirement. The recommendation of individual and composite Web services composition for a service request is finally made. Empirical evaluation confirms that the proposed method significantly improves the accuracy of service discovery in comparison to traditional keyword-based discovery methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human resources are often responsible for the execution of business processes. In order to evaluate resource performance and identify best practices as well as opportunities for improvement, managers need objective information about resource behaviours. Companies often use information systems to support their processes and these systems record information about process execution in event logs. We present a framework for analysing and evaluating resource behaviour through mining such event logs. The framework provides a method for extracting descriptive information about resource skills, utilisation, preferences, productivity and collaboration patterns; a method for analysing relationships between different resource behaviours and outcomes; and a method for evaluating the overall resource productivity, tracking its changes over time and comparing it with the productivity of other resources. To demonstrate the applicability of our framework we apply it to analyse behaviours of employees in an Australian company and evaluate its usefulness by a survey among managers in industry.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Protein adsorption at solid-liquid interfaces is critical to many applications, including biomaterials, protein microarrays and lab-on-a-chip devices. Despite this general interest, and a large amount of research in the last half a century, protein adsorption cannot be predicted with an engineering level, design-orientated accuracy. Here we describe a Biomolecular Adsorption Database (BAD), freely available online, which archives the published protein adsorption data. Piecewise linear regression with breakpoint applied to the data in the BAD suggests that the input variables to protein adsorption, i.e., protein concentration in solution; protein descriptors derived from primary structure (number of residues, global protein hydrophobicity and range of amino acid hydrophobicity, isoelectric point); surface descriptors (contact angle); and fluid environment descriptors (pH, ionic strength), correlate well with the output variable-the protein concentration on the surface. Furthermore, neural network analysis revealed that the size of the BAD makes it sufficiently representative, with a neural network-based predictive error of 5% or less. Interestingly, a consistently better fit is obtained if the BAD is divided in two separate sub-sets representing protein adsorption on hydrophilic and hydrophobic surfaces, respectively. Based on these findings, selected entries from the BAD have been used to construct neural network-based estimation routines, which predict the amount of adsorbed protein, the thickness of the adsorbed layer and the surface tension of the protein-covered surface. While the BAD is of general interest, the prediction of the thickness and the surface tension of the protein-covered layers are of particular relevance to the design of microfluidics devices.