620 resultados para Web Mining
Resumo:
Information available on company websites can help people navigate to the offices of groups and individuals within the company. Automatically retrieving this within-organisation spatial information is a challenging AI problem This paper introduces a novel unsupervised pattern-based method to extract within-organisation spatial information by taking advantage of HTML structure patterns, together with a novel Conditional Random Fields (CRF) based method to identify different categories of within-organisation spatial information. The results show that the proposed method can achieve a high performance in terms of F-Score, indicating that this purely syntactic method based on web search and an analysis of HTML structure is well-suited for retrieving within-organisation spatial information.
Resumo:
Big Data and predictive analytics have received significant attention from the media and academic literature throughout the past few years, and it is likely that these emerging technologies will materially impact the mining sector. This short communication argues, however, that these technological forces will probably unfold differently in the mining industry than they have in many other sectors because of significant differences in the marginal cost of data capture and storage. To this end, we offer a brief overview of what Big Data and predictive analytics are, and explain how they are bringing about changes in a broad range of sectors. We discuss the “N=all” approach to data collection being promoted by many consultants and technology vendors in the marketplace but, by considering the economic and technical realities of data acquisition and storage, we then explain why a “n « all” data collection strategy probably makes more sense for the mining sector. Finally, towards shaping the industry’s policies with regards to technology-related investments in this area, we conclude by putting forward a conceptual model for leveraging Big Data tools and analytical techniques that is a more appropriate fit for the mining sector.
Resumo:
Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods.
Resumo:
With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.
Resumo:
My thesis examined an alternative approach, referred to as the unitary taxation approach to the allocation of profit, which arises from the notion that as a multinational group exists as a single economic entity, it should be taxed as one taxable unit. The plausibility of a unitary taxation regime achieving international acceptance and agreement is highly contestable due to its implementation issues, and economic and political feasibility. Using a case-study approach focusing on Freeport-McMoRan and Rio Tinto's mining operations in Indonesia, this thesis compares both tax regimes against the criteria for a good tax system - equity, efficiency, neutrality and simplicity. This thesis evaluates key issues that arise when implementing a unitary taxation approach with formulary apportionment based on the context of mining multinational firms in Indonesia.
Resumo:
Purpose Peer-review programmes in radiation oncology are used to facilitate the process and evaluation of clinical decision-making. However, web-based peer-review methods are still uncommon. This study analysed an inter-centre, web-based peer-review case conference as a method of facilitating the decision-making process in radiation oncology. Methodology A benchmark form was designed based on the American Society for Radiation Oncology targets for radiation oncology peer review. This was used for evaluating the contents of the peer-review case presentations on 40 cases, selected from three participating radiation oncology centres. A scoring system was used for comparison of data, and a survey was conducted to analyse the experiences of radiation oncology professionals who attended the web-based peer-review meetings in order to identify priorities for improvement. Results The mean scores for the evaluations were 82·7, 84·5, 86·3 and 87·3% for cervical, prostate, breast and head and neck presentations, respectively. The survey showed that radiation oncology professionals were confident about the role of web-based peer-reviews in facilitating sharing of good practice, stimulating professionalism and promoting professional growth. The participants were satisfied with the quality of the audio and visual aspects of the web-based meeting. Conclusion The results of this study suggest that simple inter-centre web-based peer-review case conferences are a feasible technique for peer review in radiation oncology. Limitations such as data security and confidentiality can be overcome by the use of appropriate structure and technology. To drive the issues of quality and safety a step further, small radiotherapy departments may need to consider web-based peer-review case conference as part of their routine quality assurance practices.
Resumo:
Existing process mining techniques provide summary views of the overall process performance over a period of time, allowing analysts to identify bottlenecks and associated performance issues. However, these tools are not de- signed to help analysts understand how bottlenecks form and dissolve over time nor how the formation and dissolution of bottlenecks – and associated fluctua- tions in demand and capacity – affect the overall process performance. This paper presents an approach to analyze the evolution of process performance via a notion of Staged Process Flow (SPF). An SPF abstracts a business process as a series of queues corresponding to stages. The paper defines a number of stage character- istics and visualizations that collectively allow process performance evolution to be analyzed from multiple perspectives. The approach has been implemented in the ProM process mining framework. The paper demonstrates the advantages of the SPF approach over state-of-the-art process performance mining tools using two real-life event logs publicly available.
Resumo:
Background Psychotic-like experiences (PLEs) are subclinical delusional ideas and perceptual disturbances that have been associated with a range of adverse mental health outcomes. This study reports a qualitative and quantitative analysis of the acceptability, usability and short term outcomes of Get Real, a web program for PLEs in young people. Methods Participants were twelve respondents to an online survey, who reported at least one PLE in the previous 3 months, and were currently distressed. Ratings of the program were collected after participants trialled it for a month. Individual semi-structured interviews then elicited qualitative feedback, which was analyzed using Consensual Qualitative Research (CQR) methodology. PLEs and distress were reassessed at 3 months post-baseline. Results User ratings supported the program's acceptability, usability and perceived utility. Significant reductions in the number, frequency and severity of PLE-related distress were found at 3 months follow-up. The CQR analysis identified four qualitative domains: initial and current understandings of PLEs, responses to the program, and context of its use. Initial understanding involved emotional reactions, avoidance or minimization, limited coping skills and non-psychotic attributions. After using the program, participants saw PLEs as normal and common, had greater self-awareness and understanding of stress, and reported increased capacity to cope and accept experiences. Positive responses to the program focused on its normalization of PLEs, usefulness of its strategies, self-monitoring of mood, and information putting PLEs into perspective. Some respondents wanted more specific and individualized information, thought the program would be more useful for other audiences, or doubted its effectiveness. The program was mostly used in low-stress situations. Conclusions The current study provided initial support for the acceptability, utility and positive short-term outcomes of Get Real. The program now requires efficacy testing in randomized controlled trials.
Resumo:
This paper investigates the effect that text pre-processing approaches have on the estimation of the readability of web pages. Readability has been highlighted as an important aspect of web search result personalisation in previous work. The most widely used text readability measures rely on surface level characteristics of text, such as the length of words and sentences. We demonstrate that different tools for extracting text from web pages lead to very different estimations of readability. This has an important implication for search engines because search result personalisation strategies that consider users reading ability may fail if incorrect text readability estimations are computed.
Resumo:
We present a novel framework and algorithms for the analysis of Web service interfaces to improve the efficiency of application integration in wide-spanning business networks. Our approach addresses the notorious issue of large and overloaded operational signatures, which are becoming increasingly prevalent on the Internet and being opened up for third-party service aggregation. Extending upon existing techniques used to refactor service interfaces based on derived artefacts of applications, namely business entities, we propose heuristics for deriving relations between business entities, and in turn, deriving permissible orders in which operations are invoked. As a result, service operations are refactored on business entity CRUD which then leads to behavioural protocols generated, thus supportive of fine-grained and flexible service discovery, composition and interaction. A prototypical implementation and analysis of web services, including those of commercial logistic systems (Fedex), are used to validate the algorithms and open up further insights into service interface synthesis.
Resumo:
The growth of APIs and Web services on the Internet, especially through larger enterprise systems increasingly being leveraged for Cloud and software-as-a-service opportuni- ties, poses challenges to improving the efficiency of integration with these services. Interfaces of enterprise systems are typically larger, more complex and overloaded, with single operation having multiple data entities and parameter sets, supporting varying requests, and reflecting versioning across different system releases, compared to fine-grained operations of contemporary interfaces. We propose a technique to support the refactoring of service interfaces by deriving business entities and their relationships. In this paper, we focus on the behavioural aspects of service interfaces, aiming to discover the sequential dependencies of operations (otherwise known as protocol extraction) based on the entities and relationships derived. Specifically, we propose heuristics according to these relationships, and in turn, deriving permissible orders in which operations are invoked. As a result of this, service operations can be refactored on business entity CRUD lines, with explicit behavioural protocols as part of an interface definition. This supports flexible service discovery, composition and integration. A prototypical implementation and analysis of existing Web services, including those of commercial logistic systems (Fedex), are used to validate the algorithms proposed through the paper.
Resumo:
User generated information such as product reviews have been booming due to the advent of web 2.0. In particular, rich information associated with reviewed products has been buried in such big data. In order to facilitate identifying useful information from product (e.g., cameras) reviews, opinion mining has been proposed and widely used in recent years. In detail, as the most critical step of opinion mining, feature extraction aims to extract significant product features from review texts. However, most existing approaches only find individual features rather than identifying the hierarchical relationships between the product features. In this paper, we propose an approach which finds both features and feature relationships, structured as a feature hierarchy which is referred to as feature taxonomy in the remainder of the paper. Specifically, by making use of frequent patterns and association rules, we construct the feature taxonomy to profile the product at multiple levels instead of single level, which provides more detailed information about the product. The experiment which has been conducted based upon some real world review datasets shows that our proposed method is capable of identifying product features and relations effectively.
Resumo:
Web data can often be represented in free tree form; however, free tree mining methods seldom exist. In this paper, a computationally fast algorithm FreeS is presented to discover all frequently occurring free subtrees in a database of labelled free trees. FreeS is designed using an optimal canonical form, BOCF that can uniquely represent free trees even during the presence of isomorphism. To avoid enumeration of false positive candidates, it utilises the enumeration approach based on a tree-structure guided scheme. This paper presents lemmas that introduce conditions to conform the generation of free tree candidates during enumeration. Empirical study using both real and synthetic datasets shows that FreeS is scalable and significantly outperforms (i.e. few orders of magnitude faster than) the state-of-the-art frequent free tree mining algorithms, HybridTreeMiner and FreeTreeMiner.