624 resultados para logs
Resumo:
Recent studies on automatic new topic identification in Web search engine user sessions demonstrated that neural networks are successful in automatic new topic identification. However most of this work applied their new topic identification algorithms on data logs from a single search engine. In this study, we investigate whether the application of neural networks for automatic new topic identification are more successful on some search engines than others. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that query logs with more topic shifts tend to provide more successful results on shift-based performance measures, whereas logs with more topic continuations tend to provide better results on continuation-based performance measures.
Resumo:
Handling information overload online, from the user's point of view is a big challenge, especially when the number of websites is growing rapidly due to growth in e-commerce and other related activities. Personalization based on user needs is the key to solving the problem of information overload. Personalization methods help in identifying relevant information, which may be liked by a user. User profile and object profile are the important elements of a personalization system. When creating user and object profiles, most of the existing methods adopt two-dimensional similarity methods based on vector or matrix models in order to find inter-user and inter-object similarity. Moreover, for recommending similar objects to users, personalization systems use the users-users, items-items and users-items similarity measures. In most cases similarity measures such as Euclidian, Manhattan, cosine and many others based on vector or matrix methods are used to find the similarities. Web logs are high-dimensional datasets, consisting of multiple users, multiple searches with many attributes to each. Two-dimensional data analysis methods may often overlook latent relationships that may exist between users and items. In contrast to other studies, this thesis utilises tensors, the high-dimensional data models, to build user and object profiles and to find the inter-relationships between users-users and users-items. To create an improved personalized Web system, this thesis proposes to build three types of profiles: individual user, group users and object profiles utilising decomposition factors of tensor data models. A hybrid recommendation approach utilising group profiles (forming the basis of a collaborative filtering method) and object profiles (forming the basis of a content-based method) in conjunction with individual user profiles (forming the basis of a model based approach) is proposed for making effective recommendations. A tensor-based clustering method is proposed that utilises the outcomes of popular tensor decomposition techniques such as PARAFAC, Tucker and HOSVD to group similar instances. An individual user profile, showing the user's highest interest, is represented by the top dimension values, extracted from the component matrix obtained after tensor decomposition. A group profile, showing similar users and their highest interest, is built by clustering similar users based on tensor decomposed values. A group profile is represented by the top association rules (containing various unique object combinations) that are derived from the searches made by the users of the cluster. An object profile is created to represent similar objects clustered on the basis of their similarity of features. Depending on the category of a user (known, anonymous or frequent visitor to the website), any of the profiles or their combinations is used for making personalized recommendations. A ranking algorithm is also proposed that utilizes the personalized information to order and rank the recommendations. The proposed methodology is evaluated on data collected from a real life car website. Empirical analysis confirms the effectiveness of recommendations made by the proposed approach over other collaborative filtering and content-based recommendation approaches based on two-dimensional data analysis methods.
Resumo:
A service-oriented system is composed of independent software units, namely services, that interact with one another exclusively through message exchanges. The proper functioning of such system depends on whether or not each individual service behaves as the other services expect it to behave. Since services may be developed and operated independently, it is unrealistic to assume that this is always the case. This article addresses the problem of checking and quantifying how much the actual behavior of a service, as recorded in message logs, conforms to the expected behavior as specified in a process model.We consider the case where the expected behavior is defined using the BPEL industry standard (Business Process Execution Language for Web Services). BPEL process definitions are translated into Petri nets and Petri net-based conformance checking techniques are applied to derive two complementary indicators of conformance: fitness and appropriateness. The approach has been implemented in a toolset for business process analysis and mining, namely ProM, and has been tested in an environment comprising multiple Oracle BPEL servers.
Resumo:
Introduction—Human herpesvirus 8 (HHV8) is necessary for Kaposi sarcoma (KS) to develop, but whether peripheral blood viral load is a marker of KS burden (total number of KS lesions), KS progression (the rate of eruption of new KS lesions), or both is unclear. We investigated these relationships in persons with AIDS. Methods—Newly diagnosed patients with AIDS-related KS attending Mulago Hospital, in Kampala, Uganda, were assessed for KS burden and progression by questionnaire and medical examination. Venous blood samples were taken for HHV8 load measurements by PCR. Associations were examined with odds ratio (OR) and 95% confidence intervals (CI) from logistic regression models and with t-tests. Results—Among 74 patients (59% men), median age was 34.5 years (interquartile range [IQR], 28.5-41). HHV8 DNA was detected in 93% and quantified in 77% patients. Median virus load was 3.8 logs10/106 peripheral blood cells (IQR 3.4-5.0) and was higher in men than women (4.4 vs. 3.8 logs; p=0.04), in patients with faster (>20 lesions per year) than slower rate of KS lesion eruption (4.5 vs. 3.6 logs; p<0.001), and higher, but not significantly, among patients with more (>median [20] KS lesions) than fewer KS lesions (4.4 vs. 4.0 logs; p=0.16). HHV8 load was unrelated to CD4 lymphocyte count (p=0.23). Conclusions—We show significant association of HHV8 load in peripheral blood with rate of eruption of KS lesions, but not with total lesion count. Our results suggest that viral load increases concurrently with development of new KS lesions.
Resumo:
The importance of actively managing and analyzing business processes is acknowledged more than ever in organizations nowadays. Business processes form an essential part of an organization and their ap-plication areas are manifold. Most organizations keep records of various activities that have been carried out for auditing purposes, but they are rarely used for analysis purposes. This paper describes the design and implementation of a process analysis tool that replays, analyzes and visualizes a variety of performance metrics using a process definition and its execution logs. Performing performance analysis on existing and planned process models offers a great way for organizations to detect bottlenecks within their processes and allow them to make more effective process improvement decisions. Our technique is applied to processes modeled in the YAWL language. Execution logs of process instances are compared against the corresponding YAWL process model and replayed in a robust manner, taking into account any noise in the logs. Finally, performance characteristics, obtained from replaying the log in the model, are projected onto the model.
Resumo:
Organisations are constantly seeking efficiency improvements for their business processes in terms of time and cost. Management accounting enables reporting of detailed cost of operations for decision making purpose, although significant effort is required to gather accurate operational data. Business process management is concerned with systematically documenting, managing, automating, and optimising processes. Process mining gives valuable insight into processes through analysis of events recorded by an IT system in the form of an event log with the focus on efficient utilisation of time and resources, although its primary focus is not on cost implications. In this paper, we propose a framework to support management accounting decisions on cost control by automatically incorporating cost data with historical data from event logs for monitoring, predicting and reporting process-related costs. We also illustrate how accurate, relevant and timely management accounting style cost reports can be produced on demand by extending open-source process mining framework ProM.
Resumo:
This paper proposes a technique that supports process participants in making risk-informed decisions, with the aim to reduce the process risks. Risk reduction involves decreasing the likelihood and severity of a process fault from occurring. Given a process exposed to risks, e.g. a financial process exposed to a risk of reputation loss, we enact this process and whenever a process participant needs to provide input to the process, e.g. by selecting the next task to execute or by filling out a form, we prompt the participant with the expected risk that a given fault will occur given the particular input. These risks are predicted by traversing decision trees generated from the logs of past process executions and considering process data, involved resources, task durations and contextual information like task frequencies. The approach has been implemented in the YAWL system and its effectiveness evaluated. The results show that the process instances executed in the tests complete with substantially fewer faults and with lower fault severities, when taking into account the recommendations provided by our technique.
Resumo:
Smartphones are steadily gaining popularity, creating new application areas as their capabilities increase in terms of computational power, sensors and communication. Emerging new features of mobile devices give opportunity to new threats. Android is one of the newer operating systems targeting smartphones. While being based on a Linux kernel, Android has unique properties and specific limitations due to its mobile nature. This makes it harder to detect and react upon malware attacks if using conventional techniques. In this paper, we propose an Android Application Sandbox (AASandbox) which is able to perform both static and dynamic analysis on Android programs to automatically detect suspicious applications. Static analysis scans the software for malicious patterns without installing it. Dynamic analysis executes the application in a fully isolated environment, i.e. sandbox, which intervenes and logs low-level interactions with the system for further analysis. Both the sandbox and the detection algorithms can be deployed in the cloud, providing a fast and distributed detection of suspicious software in a mobile software store akin to Google's Android Market. Additionally, AASandbox might be used to improve the efficiency of classical anti-virus applications available for the Android operating system.
Resumo:
Automated process discovery techniques aim at extracting models from information system logs in order to shed light into the business processes supported by these systems. Existing techniques in this space are effective when applied to relatively small or regular logs, but otherwise generate large and spaghetti-like models. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. The result is a collection of process models -- each one representing a variant of the business process -- as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically by means of subprocess extraction. The proposed technique allows users to set a desired bound for the complexity of the produced models. Experiments on real-life logs show that the technique produces collections of models that are up to 64% smaller than those extracted under the same complexity bounds by applying existing trace clustering techniques.
Resumo:
Extracting and aggregating the relevant event records relating to an identified security incident from the multitude of heterogeneous logs in an enterprise network is a difficult challenge. Presenting the information in a meaningful way is an additional challenge. This paper looks at solutions to this problem by first identifying three main transforms; log collection, correlation, and visual transformation. Having identified that the CEE project will address the first transform, this paper focuses on the second, while the third is left for future work. To aggregate by correlating event records we demonstrate the use of two correlation methods, simple and composite. These make use of a defined mapping schema and confidence values to dynamically query the normalised dataset and to constrain result events to within a time window. Doing so improves the quality of results, required for the iterative re-querying process being undertaken. Final results of the process are output as nodes and edges suitable for presentation as a network graph.
Resumo:
This study uses borehole geophysical log data of sonic velocity and electrical resistivity to estimate permeability in sandstones in the northern Galilee Basin, Queensland. The prior estimates of permeability are calculated according to the deterministic log–log linear empirical correlations between electrical resistivity and measured permeability. Both negative and positive relationships are influenced by the clay content. The prior estimates of permeability are updated in a Bayesian framework for three boreholes using both the cokriging (CK) method and a normal linear regression (NLR) approach to infer the likelihood function. The results show that the mean permeability estimated from the CK-based Bayesian method is in better agreement with the measured permeability when a fairly apparent linear relationship exists between the logarithm of permeability and sonic velocity. In contrast, the NLR-based Bayesian approach gives better estimates of permeability for boreholes where no linear relationship exists between logarithm permeability and sonic velocity.
Resumo:
This thesis examines the social practice of homework. It explores how homework is shaped by the discourses, policies and guidelines in circulation in a society at any given time with particular reference to one school district in the province of Newfoundland and Labrador, Canada. This study investigates how contemporary homework reconstitutes the home as a pedagogical site where the power of the institution of schooling circulates regularly from school to home. It examines how the educational system shapes the organization of family life and how family experiences with homework may be different in different sites depending on the accessibility of various forms of cultural capital. This study employs a qualitative approach, incorporating multiple case studies, and is complemented by insights from institutional ethnography and critical discourse analysis. It draws on the theoretical concepts of Foucault including power and power relations, and governmentality and surveillance, as well as Bourdieu’s concepts of economic, social and cultural capital for analysis. It employs concepts from Bourdieu’s work as they have been expanded on by researchers including Reay (1998), Lareau (2000), and Griffith and Smith (2005). The studies of these researchers allowed for an examination of homework as it related to families and mothers’ work. Smith’s (1987; 1999) concepts of ruling relations, mothers’ unpaid labour, and the engine of inequality were also employed in the analysis. Family interviews with ten volunteer families, teacher focus group sessions with 15 teachers from six schools, homework artefacts, school newsletters, homework brochures, and publicly available assessment and evaluation policy documents from one school district were analyzed. From this analysis key themes emerged and the findings are documented throughout five data analysis chapters. This study shows a change in education in response to a system shaped by standards, accountability and testing. It documents an increased transference of educational responsibility from one educational stakeholder to another. This transference of responsibility shifts downward until it eventually reaches the family in the form of homework and educational activities. Texts in the form of brochures and newsletters, sent home from school, make available to parents specific subject positions that act as instruments of normalization. These subject positions promote a particular ‘ideal’ family that has access to certain types of cultural capital needed to meet the school’s expectations. However, the study shows that these resources are not equally available to all and some families struggle to obtain what is necessary to complete educational activities in the home. The increase in transference of educational work from the school to the home results in greater work for parents, particularly mothers. As well, consideration is given to mother’s role in homework and how, in turn, classroom instructional practices are sometimes dependent on the work completed at home with differential effects for children. This study confirms previous findings that it is mothers who assume the greatest role in the educational trajectory of their children. An important finding in this research is that it is not only middle-class mothers who dedicate extensive time working hard to ensure their children’s educational success; working-class mothers also make substantial contributions of time and resources to their children’s education. The assignments and educational activities distributed as homework require parents’ knowledge of technical school pedagogy to help their children. Much of the homework being sent home from schools is in the area of literacy, particularly reading, but requires parents to do more than read with children. A key finding is that the practices of parents are changing and being reconfigured by the expectations of schools in regard to reading. Parents are now being required to monitor and supervise children’s reading, as well as help children complete reading logs, written reading responses, and follow up questions. The reality of family life as discussed by the participants in this study does not match the ‘ideal’ as portrayed in the educational documents. Homework sessions often create frustrations and tensions between parents and children. Some of the greatest struggles for families were created by mathematical homework, homework for those enrolled in the French Immersion program, and the work required to complete Literature, Heritage and Science Fair projects. Even when institutionalized and objectified capital was readily available, many families still encountered struggles when trying to carry out the assigned educational tasks. This thesis argues that homework and education-related activities play out differently in different homes. Consideration of this significance may assist educators to better understand and appreciate the vast difference in families and the ways in which each family can contribute to their children’s educational trajectory.
Resumo:
The rapid growth of visual information on Web has led to immense interest in multimedia information retrieval (MIR). While advancement in MIR systems has achieved some success in specific domains, particularly the content-based approaches, general Web users still struggle to find the images they want. Despite the success in content-based object recognition or concept extraction, the major problem in current Web image searching remains in the querying process. Since most online users only express their needs in semantic terms or objects, systems that utilize visual features (e.g., color or texture) to search images create a semantic gap which hinders general users from fully expressing their needs. In addition, query-by-example (QBE) retrieval imposes extra obstacles for exploratory search because users may not always have the representative image at hand or in mind when starting a search (i.e. the page zero problem). As a result, the majority of current online image search engines (e.g., Google, Yahoo, and Flickr) still primarily use textual queries to search. The problem with query-based retrieval systems is that they only capture users’ information need in terms of formal queries;; the implicit and abstract parts of users’ information needs are inevitably overlooked. Hence, users often struggle to formulate queries that best represent their needs, and some compromises have to be made. Studies of Web search logs suggest that multimedia searches are more difficult than textual Web searches, and Web image searching is the most difficult compared to video or audio searches. Hence, online users need to put in more effort when searching multimedia contents, especially for image searches. Most interactions in Web image searching occur during query reformulation. While log analysis provides intriguing views on how the majority of users search, their search needs or motivations are ultimately neglected. User studies on image searching have attempted to understand users’ search contexts in terms of users’ background (e.g., knowledge, profession, motivation for search and task types) and the search outcomes (e.g., use of retrieved images, search performance). However, these studies typically focused on particular domains with a selective group of professional users. General users’ Web image searching contexts and behaviors are little understood although they represent the majority of online image searching activities nowadays. We argue that only by understanding Web image users’ contexts can the current Web search engines further improve their usefulness and provide more efficient searches. In order to understand users’ search contexts, a user study was conducted based on university students’ Web image searching in News, Travel, and commercial Product domains. The three search domains were deliberately chosen to reflect image users’ interests in people, time, event, location, and objects. We investigated participants’ Web image searching behavior, with the focus on query reformulation and search strategies. Participants’ search contexts such as their search background, motivation for search, and search outcomes were gathered by questionnaires. The searching activity was recorded with participants’ think aloud data for analyzing significant search patterns. The relationships between participants’ search contexts and corresponding search strategies were discovered by Grounded Theory approach. Our key findings include the following aspects: - Effects of users' interactive intents on query reformulation patterns and search strategies - Effects of task domain on task specificity and task difficulty, as well as on some specific searching behaviors - Effects of searching experience on result expansion strategies A contextual image searching model was constructed based on these findings. The model helped us understand Web image searching from user perspective, and introduced a context-aware searching paradigm for current retrieval systems. A query recommendation tool was also developed to demonstrate how users’ query reformulation contexts can potentially contribute to more efficient searching.
Resumo:
Process mining encompasses the research area which is concerned with knowledge discovery from event logs. One common process mining task focuses on conformance checking, comparing discovered or designed process models with actual real-life behavior as captured in event logs in order to assess the “goodness” of the process model. This paper introduces a novel conformance checking method to measure how well a process model performs in terms of precision and generalization with respect to the actual executions of a process as recorded in an event log. Our approach differs from related work in the sense that we apply the concept of so-called weighted artificial negative events towards conformance checking, leading to more robust results, especially when dealing with less complete event logs that only contain a subset of all possible process execution behavior. In addition, our technique offers a novel way to estimate a process model’s ability to generalize. Existing literature has focused mainly on the fitness (recall) and precision (appropriateness) of process models, whereas generalization has been much more difficult to estimate. The described algorithms are implemented in a number of ProM plugins, and a Petri net conformance checking tool was developed to inspect process model conformance in a visual manner.
Resumo:
Process-aware information systems (PAISs) can be configured using a reference process model, which is typically obtained via expert interviews. Over time, however, contextual factors and system requirements may cause the operational process to start deviating from this reference model. While a reference model should ideally be updated to remain aligned with such changes, this is a costly and often neglected activity. We present a new process mining technique that automatically improves the reference model on the basis of the observed behavior as recorded in the event logs of a PAIS. We discuss how to balance the four basic quality dimensions for process mining (fitness, precision, simplicity and generalization) and a new dimension, namely the structural similarity between the reference model and the discovered model. We demonstrate the applicability of this technique using a real-life scenario from a Dutch municipality.