Biblioteca Digital

Les courriels Spams (courriels indésirables ou pourriels) imposent des coûts annuels extrêmement lourds en termes de temps, d’espace de stockage et d’argent aux utilisateurs privés et aux entreprises. Afin de lutter efficacement contre le problème des spams, il ne suffit pas d’arrêter les messages de spam qui sont livrés à la boîte de réception de l’utilisateur. Il est obligatoire, soit d’essayer de trouver et de persécuter les spammeurs qui, généralement, se cachent derrière des réseaux complexes de dispositifs infectés, ou d’analyser le comportement des spammeurs afin de trouver des stratégies de défense appropriées. Cependant, une telle tâche est difficile en raison des techniques de camouflage, ce qui nécessite une analyse manuelle des spams corrélés pour trouver les spammeurs. Pour faciliter une telle analyse, qui doit être effectuée sur de grandes quantités des courriels non classés, nous proposons une méthodologie de regroupement catégorique, nommé CCTree, permettant de diviser un grand volume de spams en des campagnes, et ce, en se basant sur leur similarité structurale. Nous montrons l’efficacité et l’efficience de notre algorithme de clustering proposé par plusieurs expériences. Ensuite, une approche d’auto-apprentissage est proposée pour étiqueter les campagnes de spam en se basant sur le but des spammeur, par exemple, phishing. Les campagnes de spam marquées sont utilisées afin de former un classificateur, qui peut être appliqué dans la classification des nouveaux courriels de spam. En outre, les campagnes marquées, avec un ensemble de quatre autres critères de classement, sont ordonnées selon les priorités des enquêteurs. Finalement, une structure basée sur le semiring est proposée pour la représentation abstraite de CCTree. Le schéma abstrait de CCTree, nommé CCTree terme, est appliqué pour formaliser la parallélisation du CCTree. Grâce à un certain nombre d’analyses mathématiques et de résultats expérimentaux, nous montrons l’efficience et l’efficacité du cadre proposé.

Veja mais

Las cadenas y el correo spam

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Este artículo hace parte de la revista Papel de colgadura de la Facultad de Derecho y Ciencias Sociales de la Universidad Icesi de Cali, es una publicación de difusión y agitación cultural. La revista nace de la pasión por la música, los libros, las ilustraciones, el graffiti, los cómics, la web, la fiesta, el cine, la cafeína y de las tardes de tertulia con empanadas y cerveza, que circula en versión impresa dos veces al año, pero su versión digital se actualiza con mayor frecuencia.

Veja mais

Developing a simulator application to test the billing platform in telecom industry

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Billing Mediation Platform (BMP) in telecommunication industry is used to process real-time streams of Call Detail Records (CDRs) which can be a massive number a day. The generated records by BMP can be deployed for billing purposes, fraud detection, spam filtering, traffic analysis, and churn forecast. Several of these applications are distinguished by real-time processing requiring low-latency analysis of CDRs. Testing of such a platform carries diverse aspects like stress testing of analytics for scalability and what-if scenarios which require generating of CDRs with realistic volumetric and appropriate properties. The approach of this project is to build user friendly and flexible application which assists the development department to test their billing solution occasionally. These generators projects have been around for a while the only difference are the potions they cover and the purpose they will be used for. This paper proposes to use a simulator application to test the BMPs with simulating CDRs. The Simulated CDRs are modifiable based on the user requirements and represent real world data.

Veja mais

Parallel streaming signature EM-tree: A clustering algorithm for web scale applications

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Veja mais

Incidental Memory of Younger and Older Adults for Objects Encountered in a Real World Context

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Effects of context on the perception of, and incidental memory for, real-world objects have predominantly been investigated in younger individuals, under conditions involving a single static viewpoint. We examined the effects of prior object context and object familiarity on both older and younger adults' incidental memory for real objects encountered while they traversed a conference room. Recognition memory for context-typical and context-atypical objects was compared with a third group of unfamiliar objects that were not readily named and that had no strongly associated context. Both older and younger adults demonstrated a typicality effect, showing significantly lower 2-alternative-forced-choice recognition of context-typical than context-atypical objects; for these objects, the recognition of older adults either significantly exceeded, or numerically surpassed, that of younger adults. Testing-awareness elevated recognition but did not interact with age or with object type. Older adults showed significantly higher recognition for context-atypical objects than for unfamiliar objects that had no prior strongly associated context. The observation of a typicality effect in both age groups is consistent with preserved semantic schemata processing in aging. The incidental recognition advantage of older over younger adults for the context-typical and context-atypical objects may reflect aging-related differences in goal-related processing, with older adults under comparatively more novel circumstances being more likely to direct their attention to the external environment, or age-related differences in top-down effortful distraction regulation, with older individuals' attention more readily captured by salient objects in the environment. Older adults' reduced recognition of unfamiliar objects compared to context-atypical objects may reflect possible age differences in contextually driven expectancy violations. The latter finding underscores the theoretical and methodological value of including a third type of objects-that are comparatively neutral with respect to their contextual associations-to help differentiate between contextual integration effects (for schema-consistent objects) and expectancy violations (for schema-inconsistent objects).

Veja mais

Estimating labels from label proportions

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, also with known label proportions. This problem appears in areas like e-commerce, spam filtering and improper content detection. We present consistent estimators which can reconstruct the correct labels with high probability in a uniform convergence sense. Experiments show that our method works well in practice. Copyright 2008 by the author(s)/owner(s).

Veja mais

基于I-Match算法的垃圾邮件过滤研究

Relevância:

10.00% 10.00%

Publicador:

Resumo:

电子邮件(Electronic Mail，E-Mail)是目前使用最广泛的互联网应用。随着互联网络以惊人的速度增长，电子邮件成为发布恶意信息的一个重要途径，垃圾邮件已经成为危害互联网络的最大毒瘤。针对方式多样的垃圾邮件技术，垃圾邮件过滤系统往往也需要综合多种过滤技术以提高系统的有效性。其中摘要技术已经成为重要的垃圾邮件过滤方法之一：通过摘要技术判断一个邮件和已知垃圾邮件的相似度，从而对邮件进行分类。判断一个垃圾邮件过滤算法是否有效，要综合考虑算法的召回率、准确率以及时间性能。I-Match算法通过摘要值的精确匹配来判断两个邮件文本内容是否相似，算法在效率方面表现突出。但是I-Match算法在实际的应用中还存在很多问题，其中包括字典生成制约算法的性能以及面对攻击时算法表现出的鲁棒性不足。因此，优化算法的字典生成过程以及提高算法的鲁棒性成了算法应用于实际系统的两个重要问题。本文的主要工作包含以下内容： 对垃圾邮件进行相似性分析，包括垃圾邮件相似性的起因、垃圾邮件在时间和内容两方面所表现出的相似性特征。垃圾邮件体现出的相似性特征是使用摘要算法进行垃圾邮件过滤的必要条件之一。 改进I-Match算法的字典生成过程。提出利用特征的互信息作为特征选择依据改进字典生成过程，并对比几种不同的特征选择方式对算法性能的影响。 分析I-Match算法的鲁棒性以及几种I-Match改进算法对算法鲁棒性的提升，在实际的邮件语料上对各种改进算法进行评测，并综合分析各个算法的实用性。 完成了KSpam系统原型，以插件的形式综合多种邮件过滤方法，并给出了I-Match算法在KSpam系统中的实现方案。同时，系统实现了一种新式的邮件自动回收功能，有效减少邮件管理员的邮件语料收集工作。

Veja mais

70 resultados para Spam

Filtro por publicador