Biblioteca Digital

8 resultados para spam

em Queensland University of Technology - ePrints Archive

High-Order Concept Associations Mining and Inferential Language Modeling for Online Review Spam Detection

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite many incidents about fake online consumer reviews have been reported, very few studies have been conducted to date to examine the trustworthiness of online consumer reviews. One of the reasons is the lack of an effective computational method to separate the untruthful reviews (i.e., spam) from the legitimate ones (i.e., ham) given the fact that prominent spam features are often missing in online reviews. The main contribution of our research work is the development of a novel review spam detection method which is underpinned by an unsupervised inferential language modeling framework. Another contribution of this work is the development of a high-order concept association mining method which provides the essential term association knowledge to bootstrap the performance for untruthful review detection. Our experimental results confirm that the proposed inferential language model equipped with high-order concept association knowledge is effective in untruthful review detection when compared with other baseline methods.

Veja mais

Text mining and probabilistic language modeling for online review spam detecting

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.

Veja mais

Indexing without spam

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The presence of spam in a document ranking is a major issue for Web search engines. Common approaches that cope with spam remove from the document rankings those pages that are likely to contain spam. These approaches are implemented as post-retrieval processes, that filter out spam pages only after documents have been retrieved with respect to a user’s query. In this paper we suggest to remove spam pages at indexing time, therefore obtaining a pruned index that is virtually “spam-free”. We investigate the benefits of this approach from three points of view: indexing time, index size, and retrieval performances. Not surprisingly, we found that the strategy decreases both the time required by the indexing process and the space required for storing the index. Surprisingly instead, we found that by considering a spam-pruned version of a collection’s index, no difference in retrieval performance is found when compared to that obtained by traditional post-retrieval spam filtering approaches.

Veja mais

Taxonomy and control measures of SPAM and SPIM

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this age of electronic money transactions, the opportunities for electronic crime expanded at the same rate as ever expanding rise of on-line services. With world becoming a global village, crime over the internet transcends no boundaries, borders or jurisdictions. This paper critically examines the available literature on spam, and the control measures available to control spam. This study is followed by the literature overview related to mobility of devices and how the application of mobile technologies as communication medium has impacted the handling of spam. The conclusion of this literature review with proposed direction of study is summarized.

Veja mais

Critical analysis of spam prevention techniques

Relevância:

20.00% 20.00%

Publicador:

Resumo:

E-mail spam has remained a scourge and menacing nuisance for users, internet and network service operators and providers, in spite of the anti-spam techniques available; and spammers are relentlessly circumventing these anti-spam techniques embedded or installed in form of software products on both client and server sides of both fixed and mobile devices to their advantage. This continuous evasion degrades the capabilities of these anti-spam techniques as none of them provides a comprehensive reliable solution to the problem posed by spam and spammers. Major problem for instance arises when these anti-spam techniques misjudge or misclassify legitimate emails as spam (false positive); or fail to deliver or block spam on the SMTP server (false negative); and the spam passes-on to the receiver, and yet this server from where it originates does not notice or even have an auto alert service to indicate that the spam it was designed to prevent has slipped and moved on to the receiver’s SMTP server; and the receiver’s SMTP server still fail to stop the spam from reaching user’s device and with no auto alert mechanism to inform itself of this inability; thus causing a staggering cost in loss of time, effort and finance. This paper takes a comparative literature overview of some of these anti-spam techniques, especially the filtering technological endorsements designed to prevent spam, their merits and demerits to entrench their capability enhancements, as well as evaluative analytical recommendations that will be subject to further research.

Veja mais

Toward a language modeling approach for consumer review spam detection

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Developing a simulator application to test the billing platform in telecom industry

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Billing Mediation Platform (BMP) in telecommunication industry is used to process real-time streams of Call Detail Records (CDRs) which can be a massive number a day. The generated records by BMP can be deployed for billing purposes, fraud detection, spam filtering, traffic analysis, and churn forecast. Several of these applications are distinguished by real-time processing requiring low-latency analysis of CDRs. Testing of such a platform carries diverse aspects like stress testing of analytics for scalability and what-if scenarios which require generating of CDRs with realistic volumetric and appropriate properties. The approach of this project is to build user friendly and flexible application which assists the development department to test their billing solution occasionally. These generators projects have been around for a while the only difference are the potions they cover and the purpose they will be used for. This paper proposes to use a simulator application to test the BMPs with simulating CDRs. The Simulated CDRs are modifiable based on the user requirements and represent real world data.

Veja mais

Parallel streaming signature EM-tree: A clustering algorithm for web scale applications

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Veja mais

8 resultados para spam

em Queensland University of Technology - ePrints Archive

Filtro por publicador