967 resultados para 080109 Pattern Recognition and Data Mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Topic recommendation can help users deal with the information overload issue in micro-blogging communities. This paper proposes to use the implicit information network formed by the multiple relationships among users, topics and micro-blogs, and the temporal information of micro-blogs to find semantically and temporally relevant topics of each topic, and to profile users' time-drifting topic interests. The Content based, Nearest Neighborhood based and Matrix Factorization models are used to make personalized recommendations. The effectiveness of the proposed approaches is demonstrated in the experiments conducted on a real world dataset that collected from Twitter.com.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosion of Web 2.0 application such as blogs, social and professional networks, and various other types of social media, the rich online information and various new sources of knowledge flood users and hence pose a great challenge in terms of information overload. It is critical to use intelligent agent software systems to assist users in finding the right information from an abundance of Web data. Recommender systems can help users deal with information overload problem efficiently by suggesting items (e.g., information and products) that match users’ personal interests. The recommender technology has been successfully employed in many applications such as recommending films, music, books, etc. The purpose of this report is to give an overview of existing technologies for building personalized recommender systems in social networking environment, to propose a research direction for addressing user profiling and cold start problems by exploiting user-generated content newly available in Web 2.0.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Online social networks can be modelled as graphs; in this paper, we analyze the use of graph metrics for identifying users with anomalous relationships to other users. A framework is proposed for analyzing the effectiveness of various graph theoretic properties such as the number of neighbouring nodes and edges, betweenness centrality, and community cohesiveness in detecting anomalous users. Experimental results on real-world data collected from online social networks show that the majority of users typically have friends who are friends themselves, whereas anomalous users’ graphs typically do not follow this common rule. Empirical analysis also shows that the relationship between average betweenness centrality and edges identifies anomalies more accurately than other approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. In order to enhance customer satisfaction and their shopping experiences, it has become important to analysis customers reviews to extract opinions on the products that they buy. Thus, Opinion Mining is getting more important than before especially in doing analysis and forecasting about customers’ behavior for businesses purpose. The right decision in producing new products or services based on data about customers’ characteristics means profit for organization/company. This paper proposes a new architecture for Opinion Mining, which uses a multidimensional model to integrate customers’ characteristics and their comments about products (or services). The key step to achieve this objective is to transfer comments (opinions) to a fact table that includes several dimensions, such as, customers, products, time and locations. This research presents a comprehensive way to calculate customers’ orientation for all possible products’ attributes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to comprehend user information needs by concepts, this paper introduces a novel method to match relevance features with ontological concepts. The method first discovers relevance features from user local instances. Then, a concept matching approach is developed for matching these features to accurate concepts in a global knowledge base. This approach is significant for the transition of informative descriptor and conceptional descriptor. The proposed method is elaborately evaluated by comparing against three information gathering baseline models. The experimental results shows the matching approach is successful and achieves a series of remarkable improvements on search effectiveness.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

News blog hot topics are important for the information recommendation service and marketing. However, information overload and personalized management make the information arrangement more difficult. Moreover, what influences the formation and development of blog hot topics is seldom paid attention to. In order to correctly detect news blog hot topics, the paper first analyzes the development of topics in a new perspective based on W2T (Wisdom Web of Things) methodology. Namely, the characteristics of blog users, context of topic propagation and information granularity are unified to analyze the related problems. Some factors such as the user behavior pattern, network opinion and opinion leader are subsequently identified to be important for the development of topics. Then the topic model based on the view of event reports is constructed. At last, hot topics are identified by the duration, topic novelty, degree of topic growth and degree of user attention. The experimental results show that the proposed method is feasible and effective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Iris based identity verification is highly reliable but it can also be subject to attacks. Pupil dilation or constriction stimulated by the application of drugs are examples of sample presentation security attacks which can lead to higher false rejection rates. Suspects on a watch list can potentially circumvent the iris based system using such methods. This paper investigates a new approach using multiple parts of the iris (instances) and multiple iris samples in a sequential decision fusion framework that can yield robust performance. Results are presented and compared with the standard full iris based approach for a number of iris degradations. An advantage of the proposed fusion scheme is that the trade-off between detection errors can be controlled by setting parameters such as the number of instances and the number of samples used in the system. The system can then be operated to match security threat levels. It is shown that for optimal values of these parameters, the fused system also has a lower total error rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tags or personal metadata for annotating web resources have been widely adopted in Web 2.0 sites. However, as tags are freely chosen by users, the vocabularies are diverse, ambiguous and sometimes only meaningful to individuals. Tag recommenders may assist users during tagging process. Its objective is to suggest relevant tags to use as well as to help consolidating vocabulary in the systems. In this paper we discuss our approach for providing personalized tag recommendation by making use of existing domain ontology generated from folksonomy. Specifically we evaluated the approach in sparse situation. The evaluation shows that the proposed ontology-based method has improved the accuracy of tag recommendation in this situation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tag recommendation is a specific recommendation task for recommending metadata (tag) for a web resource (item) during user annotation process. In this context, sparsity problem refers to situation where tags need to be produced for items with few annotations or for user who tags few items. Most of the state of the art approaches in tag recommendation are rarely evaluated or perform poorly under this situation. This paper presents a combined method for mitigating sparsity problem in tag recommendation by mainly expanding and ranking candidate tags based on similar items’ tags and existing tag ontology. We evaluated the approach on two public social bookmarking datasets. The experiment results show better accuracy for recommendation in sparsity situation over several state of the art methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a cluster ensemble method to map the corpus documents into the semantic space embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous cluster ensemble is constructed with multiple types of relations i.e. document-term, document-concept and document-category. A final clustering solution is obtained by exploiting associations between document pairs and hubness of the documents. Empirical analysis with various real data sets reveals that the proposed meth-od outperforms state-of-the-art text clustering approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Crashes on motorway contribute to a significant proportion (40-50%) of non-recurrent motorway congestions. Hence reduce crashes will help address congestion issues (Meyer, 2008). Crash likelihood estimation studies commonly focus on traffic conditions in a Short time window around the time of crash while longer-term pre-crash traffic flow trends are neglected. In this paper we will show, through data mining techniques, that a relationship between pre-crash traffic flow patterns and crash occurrence on motorways exists, and that this knowledge has the potential to improve the accuracy of existing models and opens the path for new development approaches. The data for the analysis was extracted from records collected between 2007 and 2009 on the Shibuya and Shinjuku lines of the Tokyo Metropolitan Expressway in Japan. The dataset includes a total of 824 rear-end and sideswipe crashes that have been matched with traffic flow data of one hour prior to the crash using an incident detection algorithm. Traffic flow trends (traffic speed/occupancy time series) revealed that crashes could be clustered with regards of the dominant traffic flow pattern prior to the crash. Using the k-means clustering method allowed the crashes to be clustered based on their flow trends rather than their distance. Four major trends have been found in the clustering results. Based on these findings, crash likelihood estimation algorithms can be fine-tuned based on the monitored traffic flow conditions with a sliding window of 60 minutes to increase accuracy of the results and minimize false alarms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper elaborates the approach used by the Applied Data Mining Research Group (ADMRG) for the Social Event Detection (SED) Tasks of the 2013 MediaEval Benchmark. We extended the constrained clustering algorithm to apply to the first semi-supervised clustering task, and we compared several classifiers with Latent Dirichlet Allocation as feature selector in the second event classification task. The proposed approach focuses on scalability and efficient memory allocation when applied to a high dimensional data with large clusters. Results of the first task show the effectiveness of the proposed method. Results from task 2 indicate that attention on the imbalance categories distributions is needed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Safety concerns in the operation of autonomous aerial systems require safe-landing protocols be followed during situations where the a mission should be aborted due to mechanical or other failure. On-board cameras provide information that can be used in the determination of potential landing sites, which are continually updated and ranked to prevent injury and minimize damage. Pulse Coupled Neural Networks have been used for the detection of features in images that assist in the classification of vegetation and can be used to minimize damage to the aerial vehicle. However, a significant drawback in the use of PCNNs is that they are computationally expensive and have been more suited to off-line applications on conventional computing architectures. As heterogeneous computing architectures are becoming more common, an OpenCL implementation of a PCNN feature generator is presented and its performance is compared across OpenCL kernels designed for CPU, GPU and FPGA platforms. This comparison examines the compute times required for network convergence under a variety of images obtained during unmanned aerial vehicle trials to determine the plausibility for real-time feature detection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Association rule mining is one technique that is widely used when querying databases, especially those that are transactional, in order to obtain useful associations or correlations among sets of items. Much work has been done focusing on efficiency, effectiveness and redundancy. There has also been a focusing on the quality of rules from single level datasets with many interestingness measures proposed. However, with multi-level datasets now being common there is a lack of interestingness measures developed for multi-level and cross-level rules. Single level measures do not take into account the hierarchy found in a multi-level dataset. This leaves the Support-Confidence approach, which does not consider the hierarchy anyway and has other drawbacks, as one of the few measures available. In this chapter we propose two approaches which measure multi-level association rules to help evaluate their interestingness by considering the database’s underlying taxonomy. These measures of diversity and peculiarity can be used to help identify those rules from multi-level datasets that are potentially useful.