93 resultados para analytics
Resumo:
Today’s information systems log vast amounts of data. These collections of data (implicitly) describe events (e.g. placing an order or taking a blood test) and, hence, provide information on the actual execution of business processes. The analysis of such data provides an excellent starting point for business process improvement. This is the realm of process mining, an area which has provided a repertoire of many analysis techniques. Despite the impressive capabilities of existing process mining algorithms, dealing with the abundance of data recorded by contemporary systems and devices remains a challenge. Of particular importance is the capability to guide the meaningful interpretation of “oceans of data” by process analysts. To this end, insights from the field of visual analytics can be leveraged. This article proposes an approach where process states are reconstructed from event logs and visualised in succession, leading to an animated history of a process. This approach is customisable in how a process state, partially defined through a collection of activity instances, is visualised: one can select a map and specify a projection of events on this map based on the properties of the events. This paper describes a comprehensive implementation of the proposal. It was realised using the open-source process mining framework ProM. Moreover, this paper also reports on an evaluation of the approach conducted with Suncorp, one of Australia’s largest insurance companies.
Resumo:
In this chapter, we draw out the relevant themes from a range of critical scholarship from the small body of digital media and software studies work that has focused on the politics of Twitter data and the sociotechnical means by which access is regulated. We highlight in particular the contested relationships between social media research (in both academic and non-academic contexts) and the data wholesale, retail, and analytics industries that feed on them. In the second major section of the chapter we discuss in detail the pragmatic edge of these politics in terms of what kinds of scientific research is and is not possible in the current political economy of Twitter data access. Finally, at the end of the chapter we return to the much broader implications of these issues for the politics of knowledge, demonstrating how the apparently microscopic level of how the Twitter API mediates access to Twitter data actually inscribes and influences the macro level of the global political economy of science itself, through re-inscribing institutional and traditional disciplinary privilege We conclude with some speculations about future developments in data rights and data philanthropy that may at least mitigate some of these negative impacts.
Resumo:
Background: A major challenge for assessing students’ conceptual understanding of STEM subjects is the capacity of assessment tools to reliably and robustly evaluate student thinking and reasoning. Multiple-choice tests are typically used to assess student learning and are designed to include distractors that can indicate students’ incomplete understanding of a topic or concept based on which distractor the student selects. However, these tests fail to provide the critical information uncovering the how and why of students’ reasoning for their multiple-choice selections. Open-ended or structured response questions are one method for capturing higher level thinking, but are often costly in terms of time and attention to properly assess student responses. Purpose: The goal of this study is to evaluate methods for automatically assessing open-ended responses, e.g. students’ written explanations and reasoning for multiple-choice selections. Design/Method: We incorporated an open response component for an online signals and systems multiple-choice test to capture written explanations of students’ selections. The effectiveness of an automated approach for identifying and assessing student conceptual understanding was evaluated by comparing results of lexical analysis software packages (Leximancer and NVivo) to expert human analysis of student responses. In order to understand and delineate the process for effectively analysing text provided by students, the researchers evaluated strengths and weakness for both the human and automated approaches. Results: Human and automated analyses revealed both correct and incorrect associations for certain conceptual areas. For some questions, that were not anticipated or included in the distractor selections, showing how multiple-choice questions alone fail to capture the comprehensive picture of student understanding. The comparison of textual analysis methods revealed the capability of automated lexical analysis software to assist in the identification of concepts and their relationships for large textual data sets. We also identified several challenges to using automated analysis as well as the manual and computer-assisted analysis. Conclusions: This study highlighted the usefulness incorporating and analysing students’ reasoning or explanations in understanding how students think about certain conceptual ideas. The ultimate value of automating the evaluation of written explanations is that it can be applied more frequently and at various stages of instruction to formatively evaluate conceptual understanding and engage students in reflective
Resumo:
Due to their unobtrusive nature, vision-based approaches to tracking sports players have been preferred over wearable sensors as they do not require the players to be instrumented for each match. Unfortunately however, due to the heavy occlusion between players, variation in resolution and pose, in addition to fluctuating illumination conditions, tracking players continuously is still an unsolved vision problem. For tasks like clustering and retrieval, having noisy data (i.e. missing and false player detections) is problematic as it generates discontinuities in the input data stream. One method of circumventing this issue is to use an occupancy map, where the field is discretised into a series of zones and a count of player detections in each zone is obtained. A series of frames can then be concatenated to represent a set-play or example of team behaviour. A problem with this approach though is that the compressibility is low (i.e. the variability in the feature space is incredibly high). In this paper, we propose the use of a bilinear spatiotemporal basis model using a role representation to clean-up the noisy detections which operates in a low-dimensional space. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed high-definition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the-art real-time player detector and compare it to manually labeled data.
Resumo:
The upstream oil and gas industry has been contending with massive data sets and monolithic files for many years, but “Big Data” is a relatively new concept that has the potential to significantly re-shape the industry. Despite the impressive amount of value that is being realized by Big Data technologies in other parts of the marketplace, however, much of the data collected within the oil and gas sector tends to be discarded, ignored, or analyzed in a very cursory way. This viewpoint examines existing data management practices in the upstream oil and gas industry, and compares them to practices and philosophies that have emerged in organizations that are leading the way in Big Data. The comparison shows that, in companies that are widely considered to be leaders in Big Data analytics, data is regarded as a valuable asset—but this is usually not true within the oil and gas industry insofar as data is frequently regarded there as descriptive information about a physical asset rather than something that is valuable in and of itself. The paper then discusses how the industry could potentially extract more value from data, and concludes with a series of policy-related questions to this end.
Resumo:
Many organizations realize that increasing amounts of data (“Big Data”) need to be dealt with intelligently in order to compete with other organizations in terms of efficiency, speed and services. The goal is not to collect as much data as possible, but to turn event data into valuable insights that can be used to improve business processes. However, data-oriented analysis approaches fail to relate event data to process models. At the same time, large organizations are generating piles of process models that are disconnected from the real processes and information systems. In this chapter we propose to manage large collections of process models and event data in an integrated manner. Observed and modeled behavior need to be continuously compared and aligned. This results in a “liquid” business process model collection, i.e. a collection of process models that is in sync with the actual organizational behavior. The collection should self-adapt to evolving organizational behavior and incorporate relevant execution data (e.g. process performance and resource utilization) extracted from the logs, thereby allowing insightful reports to be produced from factual organizational data.
Resumo:
Traditional text classification technology based on machine learning and data mining techniques has made a big progress. However, it is still a big problem on how to draw an exact decision boundary between relevant and irrelevant objects in binary classification due to much uncertainty produced in the process of the traditional algorithms. The proposed model CTTC (Centroid Training for Text Classification) aims to build an uncertainty boundary to absorb as many indeterminate objects as possible so as to elevate the certainty of the relevant and irrelevant groups through the centroid clustering and training process. The clustering starts from the two training subsets labelled as relevant or irrelevant respectively to create two principal centroid vectors by which all the training samples are further separated into three groups: POS, NEG and BND, with all the indeterminate objects absorbed into the uncertain decision boundary BND. Two pairs of centroid vectors are proposed to be trained and optimized through the subsequent iterative multi-learning process, all of which are proposed to collaboratively help predict the polarities of the incoming objects thereafter. For the assessment of the proposed model, F1 and Accuracy have been chosen as the key evaluation measures. We stress the F1 measure because it can display the overall performance improvement of the final classifier better than Accuracy. A large number of experiments have been completed using the proposed model on the Reuters Corpus Volume 1 (RCV1) which is important standard dataset in the field. The experiment results show that the proposed model has significantly improved the binary text classification performance in both F1 and Accuracy compared with three other influential baseline models.
Resumo:
Clinical Data Warehousing: A Business Analytic approach for managing health data
Resumo:
Research on attrition has focused on the economic significance of low graduation rates in terms of costs to students (fees that do not culminate in a credential) and impact on future income. For a student who fails a unit and repeats the unit multiple times, the financial impact is significant and lasting (Bexley, Daroesman, Arkoudis & James 2013). There are obvious advantages for the timely completion of a degree, both for the student and the institution. Advantages to students include fee minimisation, enhanced engagement opportunities, effectual pathway to employment and a sense of worth, morale and cohort-identity benefits. Work undertaken by the QUT Analytics Project in 2013 and 2014 explored student engagement patterns capturing a variety of data sources and specifically, the use of LMS amongst students in 804 undergraduate units in one semester. Units with high failure rates were given further attention and it was found that students who were repeating a unit were less likely to pass the unit than students attempting it for the first time. In this repeating cohort, academic and behavioural variables were consistently more significant in the modelling than were any demographic variables, indicating that a student’s performance at university is far more impacted by what they do once they arrive than it is by where they come from. The aim of this poster session is to examine the findings and commonalities of a number of case studies that articulated the engagement activities of repeating students (which included collating data from Individual Unit Reports, academic and peer advising programs and engagement with virtual learning resources). Understanding the profile of the repeating student cohort is therefore as important as considering the characteristics of successful students so that the institution might be better placed to target the repeating students and make proactive interventions as early as possible.
Resumo:
With organisations facing significant challenges to remain competitive, Business Process Improvement (BPI) initiatives are often conducted to improve the efficiency and effectiveness of their business processes, focussing on time, cost, and quality improvements. Event logs which contain a detailed record of business operations over a certain time period, recorded by an organisation's information systems, are the first step towards initiating evidence-based BPI activities. Given an (original) event log as a starting point, an approach to explore better ways to execute a business process was developed, resulting in an improved (perturbed) event log. Identifying the differences between the original event log and the perturbed event log can provide valuable insights, helping organisations to improve their processes. However, there is a lack of automated techniques to detect the differences between two event logs. Therefore, this research aims to develop visualisation techniques to provide targeted analysis of resource reallocation and activity rescheduling. The differences between two event logs are first identified. The changes between the two event logs are conceptualised and realised with a number of visualisations. With the proposed visualisations, analysts will then be able to identify the changes related to resource and time, resulting in a more efficient business process. Ultimately, analysts can make use of this comparative information to initiate evidence-based BPI activities.
Resumo:
One of the aims of Deleuze. Guattari. Schizoanalysis. Education. is to focus on the radical reconfiguration that education is undergoing, impacting educator, administrator, institution and ‘sector’ alike. More to the point, it is the responses to that process of reconfiguration - this newly emerging assemblage - that are a key focal point in this issue. Essential to these responses, we propose, is Deleuze and Guattari’s method of schizonalysis, which offers a way to not only understand the rules of this new game, but also, hopefully, some escape from the promise of a brave new world of continuous education and motivation. A brave new world of digitised courses, impersonal and corporate expertise, updatable performance metrics, Massive Open Online Courses (MOOCs), learning analytics, transformative teaching and learning, online high-stakes testing in the name of transforming and augmenting human capital overlays the corporeal practices of institutional surveillance, examination and categorical sorting. A brave new world, importantly, where people’s continuous education is instituted less, or not simply, through disciplinary practices, and increasingly through a constant and continuous sampling and profiling of not simply performance but their activity, measured against the profiled activity of a ‘like’ age group, person, or an institution. This continuous education, including the sampling that accompanies it, we are all informed through various information and marketing campaigns, is in our best interest. An interest that is driven and governed by an ever-increasing corporatisation and monetisation of ‘the knowledge sector’, as well as an interest that is sustained through an ever-increasing, as well as continuous, debt.
Resumo:
Product reviews are the foremost source of information for customers and manufacturers to help them make appropriate purchasing and production decisions. Natural language data is typically very sparse; the most common words are those that do not carry a lot of semantic content, and occurrences of any particular content-bearing word are rare, while co-occurrences of these words are rarer. Mining product aspects, along with corresponding opinions, is essential for Aspect-Based Opinion Mining (ABOM) as a result of the e-commerce revolution. Therefore, the need for automatic mining of reviews has reached a peak. In this work, we deal with ABOM as sequence labelling problem and propose a supervised extraction method to identify product aspects and corresponding opinions. We use Conditional Random Fields (CRFs) to solve the extraction problem and propose a feature function to enhance accuracy. The proposed method is evaluated using two different datasets. We also evaluate the effectiveness of feature function and the optimisation through multiple experiments.
Resumo:
As critical infrastructure such as transportation hubs continue to grow in complexity, greater importance is placed on monitoring these facilities to ensure their secure and efficient operation. In order to achieve these goals, technology continues to evolve in response to the needs of various infrastructure. To date, however, the focus of technology for surveillance has been primarily concerned with security, and little attention has been placed on assisting operations and monitoring performance in real-time. Consequently, solutions have emerged to provide real-time measurements of queues and crowding in spaces, but have been installed as system add-ons (rather than making better use of existing infrastructure), resulting in expensive infrastructure outlay for the owner/operator, and an overload of surveillance systems which in itself creates further complexity. Given many critical infrastructure already have camera networks installed, it is much more desirable to better utilise these networks to address operational monitoring as well as security needs. Recently, a growing number of approaches have been proposed to monitor operational aspects such as pedestrian throughput, crowd size and dwell times. In this paper, we explore how these techniques relate to and complement the more commonly seen security analytics, and demonstrate the value that can be added by operational analytics by demonstrating their performance on airport surveillance data. We explore how multiple analytics and systems can be combined to better leverage the large amount of data that is available, and we discuss the applicability and resulting benefits of the proposed framework for the ongoing operation of airports and airport networks.
Resumo:
Techniques to align spatio-temporal data for large-scale analysis of human group behaviour have been developed. Application of the techniques to sports databases enable sport team's characteristic styles of play to be discovered and compared for tactical analysis. Applications in surveillance to recognise group activities in real-time for person re-identification from low-resolution video footage have also been developed.