826 resultados para Data mining models
Resumo:
Having a reliable understanding about the behaviours, problems, and performance of existing processes is important in enabling a targeted process improvement initiative. Recently, there has been an increase in the application of innovative process mining techniques to facilitate evidence-based understanding about organizations' business processes. Nevertheless, the application of these techniques in the domain of finance in Australia is, at best, scarce. This paper details a 6-month case study on the application of process mining in one of the largest insurance companies in Australia. In particular, the challenges encountered, the lessons learned, and the results obtained from this case study are detailed. Through this case study, we not only validated existing `lessons learned' from other similar case studies, but also added new insights that can be beneficial to other practitioners in applying process mining in their respective fields.
Resumo:
Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas.We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches.After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.
Resumo:
Automatic Call Recognition is vital for environmental monitoring. Patten recognition has been applied in automatic species recognition for years. However, few studies have applied formal syntactic methods to species call structure analysis. This paper introduces a novel method to adopt timed and probabilistic automata in automatic species recognition based upon acoustic components as the primitives. We demonstrate this through one kind of birds in Australia: Eastern Yellow Robin.
Resumo:
The concept of older adults contributing to society in a meaningful way has been termed ‘active ageing’. Active ageing reflects changes in prevailing theories of social and psychological aspects of ageing, with a focus on individuals' strengths as opposed to their deficits or pathology. In order to explore predictors of active ageing, the Australian Active Ageing (Triple A) project group undertook a national postal survey of participants over the age of 50 years recruited randomly through their 2004 membership of a large Australia-wide senior's organisation. The survey comprised 178 items covering paid and voluntary work, learning, social, spiritual, emotional, health and home, life events and demographic items. A 45% response rate (2655 returned surveys) reflected an expected balance of gender, age and geographic representation of participants. The data were analysed using data mining techniques to represent generalizations on individual situations. Data mining identifies the valid, novel, potentially useful and understandable patterns and trends in data. The results based on the clustering mining technique indicate that physical and emotional health combined with the desire to learn were the most significant factors when considering active ageing. The findings suggest that remaining active in later life is not only directly related to the maintenance of emotional and physical health, but may be significantly intertwined with the opportunity to engage in on-going learning activities that are relevant to the individual. The findings of this study suggest that practitioners and policy makers need to incorporate older peoples' learning needs within service and policy framework developments.
Resumo:
Modelling video sequences by subspaces has recently shown promise for recognising human actions. Subspaces are able to accommodate the effects of various image variations and can capture the dynamic properties of actions. Subspaces form a non-Euclidean and curved Riemannian manifold known as a Grassmann manifold. Inference on manifold spaces usually is achieved by embedding the manifolds in higher dimensional Euclidean spaces. In this paper, we instead propose to embed the Grassmann manifolds into reproducing kernel Hilbert spaces and then tackle the problem of discriminant analysis on such manifolds. To achieve efficient machinery, we propose graph-based local discriminant analysis that utilises within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, respectively. Experiments on KTH, UCF Sports, and Ballet datasets show that the proposed approach obtains marked improvements in discrimination accuracy in comparison to several state-of-the-art methods, such as the kernel version of affine hull image-set distance, tensor canonical correlation analysis, spatial-temporal words and hierarchy of discriminative space-time neighbourhood features.
Resumo:
Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations, and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analyzed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc postprocessing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset.
Resumo:
Grouping users in social networks is an important process that improves matching and recommendation activities in social networks. The data mining methods of clustering can be used in grouping the users in social networks. However, the existing general purpose clustering algorithms perform poorly on the social network data due to the special nature of users' data in social networks. One main reason is the constraints that need to be considered in grouping users in social networks. Another reason is the need of capturing large amount of information about users which imposes computational complexity to an algorithm. In this paper, we propose a scalable and effective constraint-based clustering algorithm based on a global similarity measure that takes into consideration the users' constraints and their importance in social networks. Each constraint's importance is calculated based on the occurrence of this constraint in the dataset. Performance of the algorithm is demonstrated on a dataset obtained from an online dating website using internal and external evaluation measures. Results show that the proposed algorithm is able to increases the accuracy of matching users in social networks by 10% in comparison to other algorithms.
Resumo:
With the explosion of Web 2.0 application such as blogs, social and professional networks, and various other types of social media, the rich online information and various new sources of knowledge flood users and hence pose a great challenge in terms of information overload. It is critical to use intelligent agent software systems to assist users in finding the right information from an abundance of Web data. Recommender systems can help users deal with information overload problem efficiently by suggesting items (e.g., information and products) that match users’ personal interests. The recommender technology has been successfully employed in many applications such as recommending films, music, books, etc. The purpose of this report is to give an overview of existing technologies for building personalized recommender systems in social networking environment, to propose a research direction for addressing user profiling and cold start problems by exploiting user-generated content newly available in Web 2.0.
Resumo:
The social tags in Web 2.0 are becoming another important information source to profile users' interests and preferences to make personalized recommendations. To solve the problem of low information sharing caused by the free-style vocabulary of tags and the long tails of the distribution of tags and items, this paper proposes an approach to integrate the social tags given by users and the item taxonomy with standard vocabulary and hierarchical structure provided by experts to make personalized recommendations. The experimental results show that the proposed approach can effectively improve the information sharing and recommendation accuracy.
Resumo:
Online social networks can be modelled as graphs; in this paper, we analyze the use of graph metrics for identifying users with anomalous relationships to other users. A framework is proposed for analyzing the effectiveness of various graph theoretic properties such as the number of neighbouring nodes and edges, betweenness centrality, and community cohesiveness in detecting anomalous users. Experimental results on real-world data collected from online social networks show that the majority of users typically have friends who are friends themselves, whereas anomalous users’ graphs typically do not follow this common rule. Empirical analysis also shows that the relationship between average betweenness centrality and edges identifies anomalies more accurately than other approaches.
Resumo:
Reasoning with uncertain knowledge and belief has long been recognized as an important research issue in Artificial Intelligence (AI). Several methodologies have been proposed in the past, including knowledge-based systems, fuzzy sets, and probability theory. The probabilistic approach became popular mainly due to a knowledge representation framework called Bayesian networks. Bayesian networks have earned reputation of being powerful tools for modeling complex problem involving uncertain knowledge. Uncertain knowledge exists in domains such as medicine, law, geographical information systems and design as it is difficult to retrieve all knowledge and experience from experts. In design domain, experts believe that design style is an intangible concept and that its knowledge is difficult to be presented in a formal way. The aim of the research is to find ways to represent design style knowledge in Bayesian net works. We showed that these networks can be used for diagnosis (inferences) and classification of design style. The furniture design style is selected as an example domain, however the method can be used for any other domain.
Resumo:
Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.
Resumo:
As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. In order to enhance customer satisfaction and their shopping experiences, it has become important to analysis customers reviews to extract opinions on the products that they buy. Thus, Opinion Mining is getting more important than before especially in doing analysis and forecasting about customers’ behavior for businesses purpose. The right decision in producing new products or services based on data about customers’ characteristics means profit for organization/company. This paper proposes a new architecture for Opinion Mining, which uses a multidimensional model to integrate customers’ characteristics and their comments about products (or services). The key step to achieve this objective is to transfer comments (opinions) to a fact table that includes several dimensions, such as, customers, products, time and locations. This research presents a comprehensive way to calculate customers’ orientation for all possible products’ attributes.
Resumo:
News blog hot topics are important for the information recommendation service and marketing. However, information overload and personalized management make the information arrangement more difficult. Moreover, what influences the formation and development of blog hot topics is seldom paid attention to. In order to correctly detect news blog hot topics, the paper first analyzes the development of topics in a new perspective based on W2T (Wisdom Web of Things) methodology. Namely, the characteristics of blog users, context of topic propagation and information granularity are unified to analyze the related problems. Some factors such as the user behavior pattern, network opinion and opinion leader are subsequently identified to be important for the development of topics. Then the topic model based on the view of event reports is constructed. At last, hot topics are identified by the duration, topic novelty, degree of topic growth and degree of user attention. The experimental results show that the proposed method is feasible and effective.