963 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automated process discovery techniques aim at extracting process models from information system logs. Existing techniques in this space are effective when applied to relatively small or regular logs, but generate spaghetti-like and sometimes inaccurate models when confronted to logs with high variability. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. This leads to a collection of process models – each one representing a variant of the business process – as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity and low fitness. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically using subprocess extraction. Splitting is performed in a controlled manner in order to achieve user-defined complexity or fitness thresholds. Experiments on real-life logs show that the technique produces collections of models substantially smaller than those extracted by applying existing trace clustering techniques, while allowing the user to control the fitness of the resulting models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose - There are many library automation packages available as open-source software, comprising two modules: staff-client module and online public access catalogue (OPAC). Although the OPAC of these library automation packages provides advanced features of searching and retrieval of bibliographic records, none of them facilitate full-text searching. Most of the available open-source digital library software facilitates indexing and searching of full-text documents in different formats. This paper makes an effort to enable full-text search features in the widely used open-source library automation package Koha, by integrating it with two open-source digital library software packages, Greenstone Digital Library Software (GSDL) and Fedora Generic Search Service (FGSS), independently. Design/methodology/approach - The implementation is done by making use of the Search and Retrieval by URL (SRU) feature available in Koha, GSDL and FGSS. The full-text documents are indexed both in Koha and GSDL and FGSS. Findings - Full-text searching capability in Koha is achieved by integrating either GSDL or FGSS into Koha and by passing an SRU request to GSDL or FGSS from Koha. The full-text documents are indexed both in the library automation package (Koha) and digital library software (GSDL, FGSS) Originality/value - This is the first implementation enabling the full-text search feature in a library automation software by integrating it into digital library software.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Frequent episode discovery is one of the methods used for temporal pattern discovery in sequential data. An episode is a partially ordered set of nodes with each node associated with an event type. For more than a decade, algorithms existed for episode discovery only when the associated partial order is total (serial episode) or trivial (parallel episode). Recently, the literature has seen algorithms for discovering episodes with general partial orders. In frequent pattern mining, the threshold beyond which a pattern is inferred to be interesting is typically user-defined and arbitrary. One way of addressing this issue in the pattern mining literature has been based on the framework of statistical hypothesis testing. This paper presents a method of assessing statistical significance of episode patterns with general partial orders. A method is proposed to calculate thresholds, on the non-overlapped frequency, beyond which an episode pattern would be inferred to be statistically significant. The method is first explained for the case of injective episodes with general partial orders. An injective episode is one where event-types are not allowed to repeat. Later it is pointed out how the method can be extended to the class of all episodes. The significance threshold calculations for general partial order episodes proposed here also generalize the existing significance results for serial episodes. Through simulations studies, the usefulness of these statistical thresholds in pruning uninteresting patterns is illustrated. (C) 2014 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a method to generate new melodies, based on conserving the semiotic structure of a template piece. A pattern discovery algorithm is applied to a template piece to extract significant segments: those that are repeated and those that are transposed in the piece. Two strategies are combined to describe the semiotic coherence structure of the template piece: inter-segment coherence and intra-segment coherence. Once the structure is described it is used as a template for new musical content that is generated using a statistical model created from a corpus of bertso melodies and iteratively improved using a stochastic optimization method. Results show that the method presented here effectively describes a coherence structure of a piece by discovering repetition and transposition relations between segments, and also by representing the relations among notes within the segments. For bertso generation the method correctly conserves all intra and inter-segment coherence of the template, and the optimization method produces coherent generated melodies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automated assembly of mechanical devices is studies by researching methods of operating assembly equipment in a variable manner; that is, systems which may be configured to perform many different assembly operations are studied. The general parts assembly operation involves the removal of alignment errors within some tolerance and without damaging the parts. Two methods for eliminating alignment errors are discussed: a priori suppression and measurement and removal. Both methods are studied with the more novel measurement and removal technique being studied in greater detail. During the study of this technique, a fast and accurate six degree-of-freedom position sensor based on a light-stripe vision technique was developed. Specifications for the sensor were derived from an assembly-system error analysis. Studies on extracting accurate information from the sensor by optimally reducing redundant information, filtering quantization noise, and careful calibration procedures were performed. Prototype assembly systems for both error elimination techniques were implemented and used to assemble several products. The assembly system based on the a priori suppression technique uses a number of mechanical assembly tools and software systems which extend the capabilities of industrial robots. The need for the tools was determined through an assembly task analysis of several consumer and automotive products. The assembly system based on the measurement and removal technique used the six degree-of-freedom position sensor to measure part misalignments. Robot commands for aligning the parts were automatically calculated based on the sensor data and executed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gait disturbances are a common feature of Parkinson’s disease, one of the most severe being freezing of gait. Sensory cueing is a common method used to facilitate stepping in people with Parkinson’s. Recent work has shown that, compared to walking to a metronome, Parkinson’s patients without freezing of gait (nFOG) showed reduced gait variability when imitating recorded sounds of footsteps made on gravel. However, it is not known if these benefits are realised through the continuity of the acoustic information or the action-relevance. Furthermore, no study has examined if these benefits extend to PD with freezing of gait. We prepared four different auditory cues (varying in action-relevance and acoustic continuity) and asked 19 Parkinson’s patients (10 nFOG, 9 with freezing of gait (FOG)) to step in place to each cue. Results showed a superiority of action-relevant cues (regardless of cue-continuity) for inducing reductions in Step coefficient of variation (CV). Acoustic continuity was associated with a significant reduction in Swing CV. Neither cue-continuity nor action-relevance was independently sufficient to increase the time spent stepping before freezing. However, combining both attributes in the same cue did yield significant improvements. This study demonstrates the potential of using action-sounds as sensory cues for Parkinson’s patients with freezing of gait. We suggest that the improvements shown might be considered audio-motor ‘priming’ (i.e., listening to the sounds of footsteps will engage sensorimotor circuitry relevant to the production of that same action, thus effectively bypassing the defective basal ganglia).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the recent past, hardly anyone could predict this course of GIS development. GIS is moving from desktop to cloud. Web 2.0 enabled people to input data into web. These data are becoming increasingly geolocated. Big amounts of data formed something that is called "Big Data". Scientists still don't know how to deal with it completely. Different Data Mining tools are used for trying to extract some useful information from this Big Data. In our study, we also deal with one part of these data - User Generated Geographic Content (UGGC). The Panoramio initiative allows people to upload photos and describe them with tags. These photos are geolocated, which means that they have exact location on the Earth's surface according to a certain spatial reference system. By using Data Mining tools, we are trying to answer if it is possible to extract land use information from Panoramio photo tags. Also, we tried to answer to what extent this information could be accurate. At the end, we compared different Data Mining methods in order to distinguish which one has the most suited performances for this kind of data, which is text. Our answers are quite encouraging. With more than 70% of accuracy, we proved that extracting land use information is possible to some extent. Also, we found Memory Based Reasoning (MBR) method the most suitable method for this kind of data in all cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information systems for business are frequently heavily reliant on software. Two important feedback-related effects of embedding software in a business process are identified. First, the system dynamics of the software maintenance process can become complex, particularly in the number and scope of the feedback loops. Secondly, responsiveness to feedback can have a big effect on the evolvability of the information system. Ways have been explored to provide an effective mechanism for improving the quality of feedback between stakeholders during software maintenance. Understanding can be improved by using representations of information systems that are both service-based and architectural in scope. The conflicting forces that encourage change or stability can be resolved using patterns and pattern languages. A morphology of information systems pattern languages has been described to facilitate the identification and reuse of patterns and pattern languages. The kind of planning process needed to achieve consensus on a system's evolution is also considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We show how multivariate GARCH models can be used to generate a time-varying “information share” (Hasbrouck, 1995) to represent the changing patterns of price discovery in closely related securities. We find that time-varying information shares can improve credit spread predictions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Case studies of the organizational implementation of traditional business computing have often emphasized the importance of context in research design and data analysis. The emergence of computing phenomena that pervade different contexts within and even beyond the organizational boundary suggests the need to disaggregate the notion of context to allow for finer levels of contextual analysis. Indeed we demonstrate that a failure to consider interdependent levels of context in organizational case studies of computing technologies that even approach ubiquity runs the risk of partial and even incorrect conclusions being drawn. We illustrate this argument by means of two explanatory case studies of intranet and mobile technology implementation in organizations. Based on the extant literature on context in case study design and examples drawn from the cases, we propose a range of interconnected and interrelated contexts to consider in the research design of explanatory cases of ubiquitous technology implementation in organizations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discovering frequent patterns plays an essential role in many data mining applications. The aim of frequent patterns is to obtain the information about the most common patterns that appeared together. However, designing an efficient model to mine these patterns is still demanding due to the capacity of current database size. Therefore, we propose an Efficient Frequent Pattern Mining Model (EFP-M2) to mine the frequent patterns in timely manner. The result shows that the algorithm in EFP-M2l is outperformed at least at 2 orders of magnitudes against the benchmarked FP-Growth.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Indirect pattern is considered as valuable and hidden information in transactional database. It represents the property of high dependencies between two items that are rarely occurred together but indirectly appeared via another items. Indirect pattern mining is very important because it can reveal a new knowledge in certain domain applications. Therefore, we propose an Indirect Pattern Mining Algorithm (IPMA) in an attempt to mine the indirect patterns from data repository. IPMA embeds with a measure called Critical Relative Support (CRS) measure rather than the common interesting measures. The result shows that IPMA is successful in generating the indirect patterns with the various threshold values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a large-scale mood analysis in social media texts. We organise the paper in three parts: (1) addressing the problem of feature selection and classification of mood in blogosphere, (2) we extract global mood patterns at different level of aggregation from a large-scale data set of approximately 18 millions documents (3) and finally, we extract mood trajectory for an egocentric user and study how it can be used to detect subtle emotion signals in a user-centric manner, supporting discovery of hyper-groups of communities based on sentiment information. For mood classification, two feature sets proposed in psychology are used, showing that these features are efficient, do not require a training phase and yield classification results comparable to state of the art, supervised feature selection schemes, on mood patterns, empirical results for mood organisation in the blogosphere are provided, analogous to the structure of human emotion proposed independently in the psychology literature, and on community structure discovery, sentiment-based approach can yield useful insights into community formation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Text clustering can be considered as a four step process consisting of feature extraction, text representation, document clustering and cluster interpretation. Most text clustering models consider text as an unordered collection of words. However the semantics of text would be better captured if word sequences are taken into account.

In this paper we propose a sequence based text clustering model where four novel sequence based components are introduced in each of the four steps in the text clustering process.

Experiments conducted on the Reuters dataset and Sydney Morning Herald (SMH) news archives demonstrate the advantage of the proposed sequence based model, in terms of capturing context with semantics, accuracy and speed, compared to clustering of documents based on single words and n-gram based models.