The importance of actively managing and analyzing business processes is acknowledged more than ever in organizations nowadays. Business processes form an essential part of an organization and their ap-plication areas are manifold. Most organizations keep records of various activities that have been carried out for auditing purposes, but they are rarely used for analysis purposes. This paper describes the design and implementation of a process analysis tool that replays, analyzes and visualizes a variety of performance metrics using a process definition and its execution logs. Performing performance analysis on existing and planned process models offers a great way for organizations to detect bottlenecks within their processes and allow them to make more effective process improvement decisions. Our technique is applied to processes modeled in the YAWL language. Execution logs of process instances are compared against the corresponding YAWL process model and replayed in a robust manner, taking into account any noise in the logs. Finally, performance characteristics, obtained from replaying the log in the model, are projected onto the model.


In the field of process mining, the use of event logs for the purpose of root cause analysis is increasingly studied. In such an analysis, the availability of attributes/features that may explain the root cause of some phenomena is crucial. Currently, the process of obtaining these attributes from raw event logs is performed more or less on a case-by-case basis: there is still a lack of generalized systematic approach that captures this process. This paper proposes a systematic approach to enrich and transform event logs in order to obtain the required attributes for root cause analysis using classical data mining techniques, the classification techniques. This approach is formalized and its applicability has been validated using both self-generated and publicly-available logs.


Citizen Science projects are initiatives in which members of the general public participate in scientific research projects and perform or manage research-related tasks such as data collection and/or data annotation. Citizen Science is technologically possible and scientifically significant. However, although research teams can save time and money by recruiting general citizens to volunteer their time and skills to help data analysis, the reliability of contributed data varies a lot. Data reliability issues are significant to the domain of Citizen Science due to the quantity and diversity of people and devices involved. Participants may submit low quality, misleading, inaccurate, or even malicious data. Therefore, finding a way to improve the data reliability has become an urgent demand. This study aims to investigate techniques to enhance the reliability of data contributed by general citizens in scientific research projects especially for acoustic sensing projects. In particular, we propose to design a reputation framework to enhance data reliability and also investigate some critical elements that should be aware of during developing and designing new reputation systems.


This paper addresses the problem of identifying and explaining behavioral differences between two business process event logs. The paper presents a method that, given two event logs, returns a set of statements in natural language capturing behavior that is present or frequent in one log, while absent or infrequent in the other. This log delta analysis method allows users to diagnose differences between normal and deviant executions of a process or between two versions or variants of a process. The method relies on a novel approach to losslessly encode an event log as an event structure, combined with a frequency-enhanced technique for differencing pairs of event structures. A validation of the proposed method shows that it accurately diagnoses typical change patterns and can explain differences between normal and deviant cases in a real-life log, more compactly and precisely than previously proposed methods.


Facet-based sentiment analysis involves discovering the latent facets, sentiments and their associations. Traditional facet-based sentiment analysis algorithms typically perform the various tasks in sequence, and fail to take advantage of the mutual reinforcement of the tasks. Additionally,inferring sentiment levels typically requires domain knowledge or human intervention. In this paper, we propose aseries of probabilistic models that jointly discover latent facets and sentiment topics, and also order the sentiment topics with respect to a multi-point scale, in a language and domain independent manner. This is achieved by simultaneously capturing both short-range syntactic structure and long range semantic dependencies between the sentiment and facet words. The models further incorporate coherence in reviews, where reviewers dwell on one facet or sentiment level before moving on, for more accurate facet and sentiment discovery. For reviews which are supplemented with ratings, our models automatically order the latent sentiment topics, without requiring seed-words or domain-knowledge. To the best of our knowledge, our work is the first attempt to combine the notions of syntactic and semantic dependencies in the domain of review mining. Further, the concept of facet and sentiment coherence has not been explored earlier either. Extensive experimental results on real world review data show that the proposed models outperform various state of the art baselines for facet-based sentiment analysis.


This thesis does not set out to focus on the dynamics relationship between Twitter and stock prices, but instead tries to understand if using relevant information extracted from tweets has the power to increase investors’ stock picking ability, and generate alpha in portfolio’s choice relative to a benchmark. Despite the short period analyzed, it gives promising results that the sentiment analysis performed by Social Market Analytics Inc. applied to an equity portfolio, is able to generate positive abnormal returns, statistically significant in and out of sample.


peaker(s): Jon Hare Organiser: Time: 25/06/2014 11:00-11:50 Location: B32/3077 Abstract The aggregation of items from social media streams, such as Flickr photos and Twitter tweets, into meaningful groups can help users contextualise and effectively consume the torrents of information on the social web. This task is challenging due to the scale of the streams and the inherently multimodal nature of the information being contextualised. In this talk I'll describe some of our recent work on trend and event detection in multimedia data streams. We focus on scalable streaming algorithms that can be applied to multimedia data streams from the web and the social web. The talk will cover two particular aspects of our work: mining Twitter for trending images by detecting near duplicates; and detecting social events in multimedia data with streaming clustering algorithms. I'll will describe in detail our techniques, and explore open questions and areas of potential future work, in both these tasks.


Aircraft Maintenance, Repair and Overhaul (MRO) feedback commonly includes an engineer’s complex text-based inspection report. Capturing and normalizing the content of these textual descriptions is vital to cost and quality benchmarking, and provides information to facilitate continuous improvement of MRO process and analytics. As data analysis and mining tools requires highly normalized data, raw textual data is inadequate. This paper offers a textual-mining solution to efficiently analyse bulk textual feedback data. Despite replacement of the same parts and/or sub-parts, the actual service cost for the same repair is often distinctly different from similar previously jobs. Regular expression algorithms were incorporated with an aircraft MRO glossary dictionary in order to help provide additional information concerning the reason for cost variation. Professional terms and conventions were included within the dictionary to avoid ambiguity and improve the outcome of the result. Testing results show that most descriptive inspection reports can be appropriately interpreted, allowing extraction of highly normalized data. This additional normalized data strongly supports data analysis and data mining, whilst also increasing the accuracy of future quotation costing. This solution has been effectively used by a large aircraft MRO agency with positive results.


Education is a complex systematic engineering, which is the guarantee of training high-quality talent, helping society make full use of educational outcomes and promote the healthy development of education. In the education, the students' score is a very important quantitative evaluation indicator, which can objectively reflect the effects of educational system and is an important basis to make lots of scientific decisions. This paper uses clustering algorithm and decision tree to comprehensively analyze the students' score, and obtains useful results. It can be observed that the results are valuable for the teaching and management.


Engineering academic units might engage with social media for a range of purposes including for general communication with students, staff, alumni, other important stakeholders and the wider community at large; for student recruitment and for marketing and promotion more generally. This paper presents an investigation into the use of Twitter by six engineering academic units internationally, using publicly available Twitter data over an 18-month period for analysis and visualization, to characterize the engagement by engineering academic units with one popular social media tool. Widely varying levels of activity were observed, from essentially undirected 'Megaphone' Tweeting, through to sustained and complex interactions with multiple external accounts. This work provides insights into how engineering academic units are using Twitter and how they might more effectively use the platform to achieve their individual objectives for institutional social media communications and marketing, and offers a methodology for future research. © 2014 © 2014 Taylor & Francis.


The work presented here characterise the engagement of one university library with two social media platforms popular with academic libraries. The collected data are analysed to identify the forms of Twitter and Facebook activity that engage library stakeholders in social media conversations. Associations were observed between: i) directed tweets from the library and mentions of the library by others on Twitter; and ii) comments from the library and comments from others on Facebook. Three broad classes of Twitter user interacting with the library were revealed: i) accounts strongly linked to the library with multiple to/from tweets; ii) those weakly linked to the library with, typically, a single tweet; and iii) those indirectly linked to the library through tweets mentioning the library and sent by other users. Two divergent forms of Facebook interaction with the library were highlighted: i) a library post generating a large sequence of comments, typically in response to a competition/challenge and ii) a library post with no comments, typically a photo post or a post inviting readers to click a link to find out more about an event/service. The work presented here is an initial investigation that provides useful insights, and offers a methodology for future research.


Online communities offer a platform to support and discuss health issues. They provide a more accessible way to bring people of the same concerns or interests. This paper aims to study the characteristics of online autism communities (called Clinical) in comparison with other online communities (called Control) using data from 110 Live Journal weblog communities. Using machine learning techniques, we comprehensively analyze these online autism communities. We study three key aspects expressed in the blog posts made by members of the communities: sentiment, topics and language style. Sentiment analysis shows that the sentiment of the clinical group has lower valence, indicative of poorer moods than people in control. Topics and language styles are shown to be good predictors of autism posts. The result shows the potential of social media in medical studies for a broad range of purposes such as screening, monitoring and subsequently providing supports for online communities of individuals with special needs.


Technological development in the IT and telecommunications sectors have transformed the way organizations communicate with their audiences. Social Media allow the exchange of information instantaneously, in a communication from many to many. With few studies in the area, many companies venture into the social media without a strategy and a generally end up denigrating their image itself. Thus, this study was conceived from the idea of contributing, analytically, based on the main concepts of Public Relations so the organizations effectively take advantage of their online presence to generate relationships, more specifically, on Twitter.Twitter is a social media with a mature public, requiring dynamics of information and quick answers, based on dialogue, referring to the idea of text messages (SMS). To better expose the results of this research, three organizations with expertise in twitter were chosen: Bradesco , Positivo Informática and Ponto Frio. The choice of case studies was based on the different segments that each one operates and that they are large companies with reputable commercial operations in the Brazilian scenario. To analyze their profiles, several authors were studied, like Fábio França, Maria Aparecida Ferrari, Margarida Kunsch e Marlene Machiori.The intent of the analysis of the Twitter profiles of these organizations is to understand whether they are using strategies for creating and maintaining relationships with your followers and how this occurs from specific categories, as other companies have committed serious errors and impairing their business because of mismanagement in social media. Therefore, the profiles were analyzed from the netnográfica methodology. As a result, it was observed that organizations have not yet developed the character of relationships in social media , treating this channel as another advertising channel It was observed that Positivo Informática has no specific strategy for Twitter...


“Dì che ti piace questa pagina”. Questo è uno dei tanti inviti rivolti a chi, ogni giorno, naviga in Internet. Che si stia leggendo un articolo sul sito de La Repubblica, o visitando il blog di un personaggio famoso o di un politico, i riferimenti ai social network sono ormai una presenza costante nelle pagine web. La facilità di restare in contatto con i propri amici, e la possibilità di collegarsi in qualsiasi momento, hanno portato gli utenti del Web 2.0 ad intensificare le discussioni, ed a commentare gli argomenti ed i contenuti prodotti dagli altri in un continuo e complesso “botta e risposta”. È possibile che quest'ambiente abbia favorito lo sviluppo di una nuova prospettiva della Rete, inteso come un nuovo modo di vedersi e di rapportarsi con gli altri, di esprimersi e di condividere le proprie storie e la propria storia. Per approfondire queste tematiche si è deciso di osservare alcuni dei social networks più diffusi, tra i quali Twitter e Facebook e, per raccogliere i dati più significativi di quest'ultimo, di sviluppare un'apposita applicazione software. Questa tesi tratterà gli aspetti teorici che hanno portato questa ricerca su scala nazionale e l'analisi dei requisiti del progetto; approfondirà le dinamiche progettuali e lo sviluppo dell'applicazione entro i vincoli imposti da Facebook, integrando un questionario per l'utente alla lettura dei dati. Dopo la descrizione delle fasi di testing e deployment, l'elaborato includerà un'analisi preliminare dei dati ottenuti per mezzo di una pre-elaborazione all'interno dell'applicazione stessa.