783 resultados para Information Mining


Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents an automated image‐based safety assessment method for earthmoving and surface mining activities. The literature review revealed the possible causes of accidents on earthmoving operations, investigated the spatial risk factors of these types of accident, and identified spatial data needs for automated safety assessment based on current safety regulations. Image‐based data collection devices and algorithms for safety assessment were then evaluated. Analysis methods and rules for monitoring safety violations were also discussed. The experimental results showed that the safety assessment method collected spatial data using stereo vision cameras, applied object identification and tracking algorithms, and finally utilized identified and tracked object information for safety decision making.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The growing importance and need of data processing for information extraction is vital for Web databases. Due to the sheer size and volume of databases, retrieval of relevant information as needed by users has become a cumbersome process. Information seekers are faced by information overloading - too many result sets are returned for their queries. Moreover, too few or no results are returned if a specific query is asked. This paper proposes a ranking algorithm that gives higher preference to a user’s current search and also utilizes profile information in order to obtain the relevant results for a user’s query.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. This manifesto is created by the IEEE Task Force on Process Mining and aims to promote the topic of process mining. Moreover, by defining a set of guiding principles and listing important challenges, this manifesto hopes to serve as a guide for software developers, scientists, consultants, business managers, and end-users. The goal is to increase the maturity of process mining as a new tool to improve the (re)design, control, and support of operational business processes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new relationship type of social networks - online dating - are gaining popularity. With a large member base, users of a dating network are overloaded with choices about their ideal partners. Recommendation methods can be utilized to overcome this problem. However, traditional recommendation methods do not work effectively for online dating networks where the dataset is sparse and large, and a two-way matching is required. This paper applies social networking concepts to solve the problem of developing a recommendation method for online dating networks. We propose a method by using clustering, SimRank and adapted SimRank algorithms to recommend matching candidates. Empirical results show that the proposed method can achieve nearly double the performance of the traditional collaborative filtering and common neighbor methods of recommendation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Discovering proper search intents is a vi- tal process to return desired results. It is constantly a hot research topic regarding information retrieval in recent years. Existing methods are mainly limited by utilizing context-based mining, query expansion, and user profiling techniques, which are still suffering from the issue of ambiguity in search queries. In this pa- per, we introduce a novel ontology-based approach in terms of a world knowledge base in order to construct personalized ontologies for identifying adequate con- cept levels for matching user search intents. An iter- ative mining algorithm is designed for evaluating po- tential intents level by level until meeting the best re- sult. The propose-to-attempt approach is evaluated in a large volume RCV1 data set, and experimental results indicate a distinct improvement on top precision after compared with baseline models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Collaborative question answering (cQA) portals such as Yahoo! Answers allow users as askers or answer authors to communicate, and exchange information through the asking and answering of questions in the network. In their current set-up, answers to a question are arranged in chronological order. For effective information retrieval, it will be advantageous to have the users’ answers ranked according to their quality. This paper proposes a novel approach of evaluating and ranking the users’answers and recommending the top-n quality answers to information seekers. The proposed approach is based on a user-reputation method which assigns a score to an answer reflecting its answer author’s reputation level in the network. The proposed approach is evaluated on a dataset collected from a live cQA, namely, Yahoo! Answers. To compare the results obtained by the non-content-based user-reputation method, experiments were also conducted with several content-based methods that assign a score to an answer reflecting its content quality. Various combinations of non-content and content-based scores were also used in comparing results. Empirical analysis shows that the proposed method is able to rank the users’ answers and recommend the top-n answers with good accuracy. Results of the proposed method outperform the content-based methods, various combinations, and the results obtained by the popular link analysis method, HITS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Information mismatch and overload are two fundamental issues influencing the effectiveness of information filtering systems. Even though both term-based and pattern-based approaches have been proposed to address the issues, neither of these approaches alone can provide a satisfactory decision for determining the relevant information. This paper presents a novel two-stage decision model for solving the issues. The first stage is a novel rough analysis model to address the overload problem. The second stage is a pattern taxonomy mining model to address the mismatch problem. The experimental results on RCV1 and TREC filtering topics show that the proposed model significantly outperforms the state-of-the-art filtering systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays, everyone can effortlessly access a range of information on the World Wide Web (WWW). As information resources on the web continue to grow tremendously, it becomes progressively more difficult to meet high expectations of users and find relevant information. Although existing search engine technologies can find valuable information, however, they suffer from the problems of information overload and information mismatch. This paper presents a hybrid Web Information Retrieval approach allowing personalised search using ontology, user profile and collaborative filtering. This approach finds the context of user query with least user’s involvement, using ontology. Simultaneously, this approach uses time-based automatic user profile updating with user’s changing behaviour. Subsequently, this approach uses recommendations from similar users using collaborative filtering technique. The proposed method is evaluated with the FIRE 2010 dataset and manually generated dataset. Empirical analysis reveals that Precision, Recall and F-Score of most of the queries for many users are improved with proposed method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For more than a decade research in the field of context aware computing has aimed to find ways to exploit situational information that can be detected by mobile computing and sensor technologies. The goal is to provide people with new and improved applications, enhanced functionality and better use experience (Dey, 2001). Early applications focused on representing or computing on physical parameters, such as showing your location and the location of people or things around you. Such applications might show where the next bus is, which of your friends is in the vicinity and so on. With the advent of social networking software and microblogging sites such as Facebook and Twitter, recommender systems and so on context-aware computing is moving towards mining the social web in order to provide better representations and understanding of context, including social context. In this paper we begin by recapping different theoretical framings of context. We then discuss the problem of context- aware computing from a design perspective.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The rapid growth in the number of users using social networks and the information that a social network requires about their users make the traditional matching systems insufficiently adept at matching users within social networks. This paper introduces the use of clustering to form communities of users and, then, uses these communities to generate matches. Forming communities within a social network helps to reduce the number of users that the matching system needs to consider, and helps to overcome other problems from which social networks suffer, such as the absence of user activities' information about a new user. The proposed system has been evaluated on a dataset obtained from an online dating website. Empirical analysis shows that accuracy of the matching process is increased using the community information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is nearly 10 years since the introduction of s 299(1)(f) Corporations Act , which requires the disclosure of information regarding a company's environmental performance within its annual report. This provision has generated considerable debate in the years since its introduction, fundamentally between proponents of either a voluntary or mandatory environmental reporting framework. This study examines the adequacy of the current regulatory framework. The environmental reporting practices of 24 listed companies in the resources industries are assessed relative to a standard set by the Global Reporting Initiative (GRI) Sustainability Reporting Guidelines. These Guidelines are argued to represent "international best practice" in environmental reporting and a "scorecard" approach is used to score the quality of disclosure according to this voluntary benchmark. Larger companies in the sample tend to report environmental information over and above the level required by legislation. Some, but not all companies present a stand-alone environmental/sustainability report. However, smaller companies provide minimal information in compliance with s 299(1)(f) . The findings indicate that "international best practice" environmental reporting is unlikely to be achieved by Australian companies under the current regulatory framework. In the current regulatory environment that scrutinises s 299(1)(f) , this article provides some preliminary evidence of the quality of disclosures generated in the Australian market.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.