962 resultados para Process mining


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The requirement for dual screening of titles and abstracts to select papers to examine in full text can create a huge workload, not least when the topic is complex and a broad search strategy is required, resulting in a large number of results. An automated system to reduce this burden, while still assuring high accuracy, has the potential to provide huge efficiency savings within the review process. Objectives To undertake a direct comparison of manual screening with a semi‐automated process (priority screening) using a machine classifier. The research is being carried out as part of the current update of a population‐level public health review. Methods Authors have hand selected studies for the review update, in duplicate, using the standard Cochrane Handbook methodology. A retrospective analysis, simulating a quasi‐‘active learning’ process (whereby a classifier is repeatedly trained based on ‘manually’ labelled data) will be completed, using different starting parameters. Tests will be carried out to see how far different training sets, and the size of the training set, affect the classification performance; i.e. what percentage of papers would need to be manually screened to locate 100% of those papers included as a result of the traditional manual method. Results From a search retrieval set of 9555 papers, authors excluded 9494 papers at title/abstract and 52 at full text, leaving 9 papers for inclusion in the review update. The ability of the machine classifier to reduce the percentage of papers that need to be manually screened to identify all the included studies, under different training conditions, will be reported. Conclusions The findings of this study will be presented along with an estimate of any efficiency gains for the author team if the screening process can be semi‐automated using text mining methodology, along with a discussion of the implications for text mining in screening papers within complex health reviews.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Companies standardise and automate their business processes in order to improve process eff ciency and minimise operational risks. However, it is di fficult to eliminate all process risks during the process design stage due to the fact that processes often run in complex and changeable environments and rely on human resources. Timely identification of process risks is crucial in order to insure the achievement of process goals. Business processes are often supported by information systems that record information about their executions in event logs. In this article we present an approach and a supporting tool for the evaluation of the overall process risk and for the prediction of process outcomes based on the analysis of information recorded in event logs. It can help managers evaluate the overall risk exposure of their business processes, track the evolution of overall process risk, identify changes and predict process outcomes based on the current value of overall process risk. The approach was implemented and validated using synthetic event logs and through a case study with a real event log.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we illustrate a set of features of the Apromore process model repository for analyzing business process variants. Two types of analysis are provided: one is static and based on differences on the process control flow, the other is dynamic and based on differences in the process behavior between the variants. These features combine techniques for the management of large process model collections with those for mining process knowledge from process execution logs. The tool demonstration will be useful for researchers and practitioners working on large process model collections and process execution logs, and specifically for those with an interest in understanding, managing and consolidating business process variants both within and across organizational boundaries.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Springsure Creek Coal (SCC) intends to develop a coal mine using the long wall mining process under grain farming land near Emerald in Central Queensland (CQ). While this technology will result in some subsidence of the land surface, SCC wishes to maintain productivity of the grain cropping land in the precinct after coal mining. However, the impact of the surface subsidence resulting from that mining process on productivity of cropping land in any Australian landscape is currently unclear. A research protocol to investigate the impacts of subsidence on grain productivity for when the SCC project becomes operational is proposed. The protocol has wider application for other similar mining projects throughout the country. A copy of the full report is accessible on www.aginstitute.com.au.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A mathematical model has been developed for the gas carburising (diffusion) process using finite volume method. The computer simulation has been carried out for an industrial gas carburising process. The model's predictions are in good agreement with industrial experimental data and with data collected from the literature. A study of various mass transfer and diffusion coefficients has been carried out in order to suggest which correlations should be used for the gas carburising process. The model has been interfaced in a Windows environment using a graphical user interface. In this way, the model is extremely user friendly. The sensitivity analysis of various parameters such as initial carbon concentration in the specimen, carbon potential of the atmosphere, temperature of the process, etc. has been carried out using the model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The utility of a soil microbe, namely Bacillus polymyxa, in the removal of organic reagents such as dodecylamine, ether diamine, isopropyl xanthate and sodium oleate from aqueous solutions is demonstrated. Time-bound removal of the above organic reagents from an alkaline solution was investigated under different experimental conditions during bacterial growth and in the presence of metabolites by frequent monitoring of residual concentrations as a function of time, reagent concentration and cell density. The stages and mechanisms in the biodegradation process were monitored through UV-visible and FTIR spectroscopy. Surface chemistry of the bacterial cells as well as the biosorption tendency for various organics were also established through electrokinetic and adsorption density measurements. Both the cationic amines were found to be biosorbed followed by their degradation through bacterial metabolism. The presence of the organic reagents promoted bacterial growth through effective bacterial utilization of nitrogen and carbon from the organics. Under optimal conditions, complete degradation and bioremoval of all the organics could be achieved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We study the problem of analyzing influence of various factors affecting individual messages posted in social media. The problem is challenging because of various types of influences propagating through the social media network that act simultaneously on any user. Additionally, the topic composition of the influencing factors and the susceptibility of users to these influences evolve over time. This problem has not been studied before, and off-the-shelf models are unsuitable for this purpose. To capture the complex interplay of these various factors, we propose a new non-parametric model called the Dynamic Multi-Relational Chinese Restaurant Process. This accounts for the user network for data generation and also allows the parameters to evolve over time. Designing inference algorithms for this model suited for large scale social-media data is another challenge. To this end, we propose a scalable and multi-threaded inference algorithm based on online Gibbs Sampling. Extensive evaluations on large-scale Twitter and Face book data show that the extracted topics when applied to authorship and commenting prediction outperform state-of-the-art baselines. More importantly, our model produces valuable insights on topic trends and user personality trends beyond the capability of existing approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most research on technology roadmapping has focused on its practical applications and the development of methods to enhance its operational process. Thus, despite a demand for well-supported, systematic information, little attention has been paid to how/which information can be utilised in technology roadmapping. Therefore, this paper aims at proposing a methodology to structure technological information in order to facilitate the process. To this end, eight methods are suggested to provide useful information for technology roadmapping: summary, information extraction, clustering, mapping, navigation, linking, indicators and comparison. This research identifies the characteristics of significant data that can potentially be used in roadmapping, and presents an approach to extracting important information from such raw data through various data mining techniques including text mining, multi-dimensional scaling and K-means clustering. In addition, this paper explains how this approach can be applied in each step of roadmapping. The proposed approach is applied to develop a roadmap of radio-frequency identification (RFID) technology to illustrate the process practically. © 2013 © 2013 Taylor & Francis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Grattan, J.P., Al-Saad, Z., Gilbertson, D.D., Karaki, L.O., Pyatt, F.B 2005 Analyses of patterns of copper and lead mineralisation in human skeletons excavated from an ancient mining and smelting centre in the Jordanian desert Mineralogical Magazine. 69(5) 653-666.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Two efficient methods to find frequent arrangements of temporal intervals are described; the first one is tree-based and uses depth first search to mine the set of frequent arrangements, whereas the second one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints consisting of regular expressions and gap constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A modeling strategy is presented to solve the governing equations of fluid flow, temperature (with solidification), and stress in an integrated manner. These equations are discretized using finite volume methods on unstructured grids, which provide the capability to represent complex domains. Both the cell-centered and vertex-based forms of the finite volume discretization procedure are explained, and the overall integrated solution procedure using these techniques with suitable solvers is detailed. Two industrial processes, based on the casting of metals, are used to demonstrate the capabilities of the resultant modeling framework. This manufacturing process requires a high degree of coupling between the governing physical equations to accurately predict potential defects. Comparisons between model predictions and experimental observations are given.