954 resultados para educational data mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Online social networks can be modelled as graphs; in this paper, we analyze the use of graph metrics for identifying users with anomalous relationships to other users. A framework is proposed for analyzing the effectiveness of various graph theoretic properties such as the number of neighbouring nodes and edges, betweenness centrality, and community cohesiveness in detecting anomalous users. Experimental results on real-world data collected from online social networks show that the majority of users typically have friends who are friends themselves, whereas anomalous users’ graphs typically do not follow this common rule. Empirical analysis also shows that the relationship between average betweenness centrality and edges identifies anomalies more accurately than other approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Reasoning with uncertain knowledge and belief has long been recognized as an important research issue in Artificial Intelligence (AI). Several methodologies have been proposed in the past, including knowledge-based systems, fuzzy sets, and probability theory. The probabilistic approach became popular mainly due to a knowledge representation framework called Bayesian networks. Bayesian networks have earned reputation of being powerful tools for modeling complex problem involving uncertain knowledge. Uncertain knowledge exists in domains such as medicine, law, geographical information systems and design as it is difficult to retrieve all knowledge and experience from experts. In design domain, experts believe that design style is an intangible concept and that its knowledge is difficult to be presented in a formal way. The aim of the research is to find ways to represent design style knowledge in Bayesian net works. We showed that these networks can be used for diagnosis (inferences) and classification of design style. The furniture design style is selected as an example domain, however the method can be used for any other domain.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. In order to enhance customer satisfaction and their shopping experiences, it has become important to analysis customers reviews to extract opinions on the products that they buy. Thus, Opinion Mining is getting more important than before especially in doing analysis and forecasting about customers’ behavior for businesses purpose. The right decision in producing new products or services based on data about customers’ characteristics means profit for organization/company. This paper proposes a new architecture for Opinion Mining, which uses a multidimensional model to integrate customers’ characteristics and their comments about products (or services). The key step to achieve this objective is to transfer comments (opinions) to a fact table that includes several dimensions, such as, customers, products, time and locations. This research presents a comprehensive way to calculate customers’ orientation for all possible products’ attributes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In order to comprehend user information needs by concepts, this paper introduces a novel method to match relevance features with ontological concepts. The method first discovers relevance features from user local instances. Then, a concept matching approach is developed for matching these features to accurate concepts in a global knowledge base. This approach is significant for the transition of informative descriptor and conceptional descriptor. The proposed method is elaborately evaluated by comparing against three information gathering baseline models. The experimental results shows the matching approach is successful and achieves a series of remarkable improvements on search effectiveness.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

News blog hot topics are important for the information recommendation service and marketing. However, information overload and personalized management make the information arrangement more difficult. Moreover, what influences the formation and development of blog hot topics is seldom paid attention to. In order to correctly detect news blog hot topics, the paper first analyzes the development of topics in a new perspective based on W2T (Wisdom Web of Things) methodology. Namely, the characteristics of blog users, context of topic propagation and information granularity are unified to analyze the related problems. Some factors such as the user behavior pattern, network opinion and opinion leader are subsequently identified to be important for the development of topics. Then the topic model based on the view of event reports is constructed. At last, hot topics are identified by the duration, topic novelty, degree of topic growth and degree of user attention. The experimental results show that the proposed method is feasible and effective.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the explosive growth of resources available through the Internet, information mismatching and overload have become a severe concern to users. Web users are commonly overwhelmed by huge volume of information and are faced with the challenge of finding the most relevant and reliable information in a timely manner. Personalised information gathering and recommender systems represent state-of-the-art tools for efficient selection of the most relevant and reliable information resources, and the interest in such systems has increased dramatically over the last few years. However, web personalization has not yet been well-exploited; difficulties arise while selecting resources through recommender systems from a technological and social perspective. Aiming to promote high quality research in order to overcome these challenges, this paper provides a comprehensive survey on the recent work and achievements in the areas of personalised web information gathering and recommender systems. The report covers concept-based techniques exploited in personalised information gathering and recommender systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the content and delivery of a software internationalisation subject (ITN677) that was developed for Master of Information Technology (MIT) students in the Faculty of Information Technology at Queensland University of Technology. This elective subject introduces students to the strategies, technologies, techniques and current development associated with this growing 'software development for the world' specialty area. Students learn what is involved in planning and managing a software internationalisation project as well as designing, building and using a software internationalisation application. Students also learn about how a software internationalisation project must fit into an over-all product localisation and globalisation that may include culturalisation, tailored system architectures, and reliance upon industry standards. In addition, students are exposed to the different software development techniques used by organizations in this arena and the perils and pitfalls of managing software internationalisation projects.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Iris based identity verification is highly reliable but it can also be subject to attacks. Pupil dilation or constriction stimulated by the application of drugs are examples of sample presentation security attacks which can lead to higher false rejection rates. Suspects on a watch list can potentially circumvent the iris based system using such methods. This paper investigates a new approach using multiple parts of the iris (instances) and multiple iris samples in a sequential decision fusion framework that can yield robust performance. Results are presented and compared with the standard full iris based approach for a number of iris degradations. An advantage of the proposed fusion scheme is that the trade-off between detection errors can be controlled by setting parameters such as the number of instances and the number of samples used in the system. The system can then be operated to match security threat levels. It is shown that for optimal values of these parameters, the fused system also has a lower total error rate.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The work described in this technical report is part of an ongoing project to build practical tools for the manipulation, analysis and visualisation of recordings of the natural environment. This report describes the methods we use to remove background noise from spectrograms. It updates techniques previously described in Towsey and Planitz (2011), Technical report: acoustic analysis of the natural environment, downloadable from: http://eprints.qut.edu.au/41131/. It also describes noise removal from wave-forms, a technique not described in the above 2011 technical report.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In a classification problem typically we face two challenging issues, the diverse characteristic of negative documents and sometimes a lot of negative documents that are closed to positive documents. Therefore, it is hard for a single classifier to clearly classify incoming documents into classes. This paper proposes a novel gradual problem solving to create a two-stage classifier. The first stage identifies reliable negatives (negative documents with weak positive characteristics). It concentrates on minimizing the number of false negative documents (recall-oriented). We use Rocchio, an existing recall based classifier, for this stage. The second stage is a precision-oriented “fine tuning”, concentrates on minimizing the number of false positive documents by applying pattern (a statistical phrase) mining techniques. In this stage a pattern-based scoring is followed by threshold setting (thresholding). Experiment shows that our statistical phrase based two-stage classifier is promising.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Web is a steadily evolving resource comprising much more than mere HTML pages. With its ever-growing data sources in a variety of formats, it provides great potential for knowledge discovery. In this article, we shed light on some interesting phenomena of the Web: the deep Web, which surfaces database records as Web pages; the Semantic Web, which de�nes meaningful data exchange formats; XML, which has established itself as a lingua franca for Web data exchange; and domain-speci�c markup languages, which are designed based on XML syntax with the goal of preserving semantics in targeted domains. We detail these four developments in Web technology, and explain how they can be used for data mining. Our goal is to show that all these areas can be as useful for knowledge discovery as the HTML-based part of the Web.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tags or personal metadata for annotating web resources have been widely adopted in Web 2.0 sites. However, as tags are freely chosen by users, the vocabularies are diverse, ambiguous and sometimes only meaningful to individuals. Tag recommenders may assist users during tagging process. Its objective is to suggest relevant tags to use as well as to help consolidating vocabulary in the systems. In this paper we discuss our approach for providing personalized tag recommendation by making use of existing domain ontology generated from folksonomy. Specifically we evaluated the approach in sparse situation. The evaluation shows that the proposed ontology-based method has improved the accuracy of tag recommendation in this situation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tag recommendation is a specific recommendation task for recommending metadata (tag) for a web resource (item) during user annotation process. In this context, sparsity problem refers to situation where tags need to be produced for items with few annotations or for user who tags few items. Most of the state of the art approaches in tag recommendation are rarely evaluated or perform poorly under this situation. This paper presents a combined method for mitigating sparsity problem in tag recommendation by mainly expanding and ranking candidate tags based on similar items’ tags and existing tag ontology. We evaluated the approach on two public social bookmarking datasets. The experiment results show better accuracy for recommendation in sparsity situation over several state of the art methods.