994 resultados para Fraud detection


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The application of chemometrics in food science has revolutionized the field by allowing the creation of models able to automate a broad range of applications such as food authenticity and food fraud detection. In order to create effective and general models able to address the complexity of real life problems, a vast amount of varied training samples are required. Training dataset has to cover all possible types of sample and instrument variability. However, acquiring a varied amount of samples is a time consuming and costly process, in which collecting samples representative of the real world variation is not always possible, specially in some application fields. To address this problem, a novel framework for the application of data augmentation techniques to spectroscopic data has been designed and implemented. This is a carefully designed pipeline of four complementary and independent blocks which can be finely tuned depending on the desired variance for enhancing model's robustness: a) blending spectra, b) changing baseline, c) shifting along x axis, and d) adding random noise.
This novel data augmentation solution has been tested in order to obtain highly efficient generalised classification model based on spectroscopic data. Fourier transform mid-infrared (FT-IR) spectroscopic data of eleven pure vegetable oils (106 admixtures) for the rapid identification of vegetable oil species in mixtures of oils have been used as a case study to demonstrate the influence of this pioneering approach in chemometrics, obtaining a 10% improvement in classification which is crucial in some applications of food adulteration.


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes a rapid technique: communal analysis suspicion scoring (CASS), for generating numeric suspicion scores on streaming credit applications based on implicit links to each other, over both time and space. CASS includes pair-wise communal scoring of identifier attributes for applications, definition of categories of suspiciousness for application-pairs, the incorporation of temporal and spatial weights, and smoothed k-wise scoring of multiple linked application-pairs. Results on mining several hundred thousand real credit applications demonstrate that CASS reduces false alarm rates while maintaining reasonable hit rates. CASS is scalable for this large data sample, and can rapidly detect early symptoms of identity crime. In addition, new insights have been observed from the relationships between applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Investigation of the role of hypothesis formation in complex (business) problem solving has resulted in a new approach to hypothesis generation. A prototypical hypothesis generation paradigm for management intelligence has been developed, reflecting a widespread need to support management in such areas as fraud detection and intelligent decision analysis. This dissertation presents this new paradigm and its application to goal directed problem solving methodologies, including case based reasoning. The hypothesis generation model, which is supported by a dynamic hypothesis space, consists of three components, namely, Anomaly Detection, Abductive Reasoning, and Conflict Resolution models. Anomaly detection activates the hypothesis generation model by scanning anomalous data and relations in its working environment. The respective heuristics are activated by initial indications of anomalous behaviour based on evidence from historical patterns, linkages with other cases, inconsistencies, etc. Abductive reasoning, as implemented in this paradigm, is based on joining conceptual graphs, and provides an inference process that can incorporate a new observation into a world model by determining what assumptions should be added to the world, so that it can explain new observations. Abductive inference is a weak mechanism for generating explanation and hypothesis. Although a practical conclusion cannot be guaranteed, the cues provided by the inference are very beneficial. Conflict resolution is crucial for the evaluation of explanations, especially those generated by a weak (abduction) mechanism.The measurements developed in this research for explanation and hypothesis provide an indirect way of estimating the ‘quality’ of an explanation for given evidence. Such methods are realistic for complex domains such as fraud detection, where the prevailing hypothesis may not always be relevant to the new evidence. In order to survive in rapidly changing environments, it is necessary to bridge the gap that exists between the system’s view of the world and reality.Our research has demonstrated the value of Case-Based Interaction, which utilises an hypothesis structure for the representation of relevant planning and strategic knowledge. Under, the guidance of case based interaction, users are active agents empowered by system knowledge, and the system acquires its auxiliary information/knowledge from this external source. Case studies using the new paradigm and drawn from the insurance industry have attracted wide interest. A prototypical system of fraud detection for motor vehicle insurance based on an hypothesis guided problem solving mechanism is now under commercial development. The initial feedback from claims managers is promising.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Artificial neural networks and statistical techniques like decision trees, discriminant analysis, logistic regression and survival analysis play a crucial role in Business Intelligence. These predictive analytical tools exploit patterns found in historical data to make predictions about future events. In this paper we have shown some recent developments of a few of these techniques in financial and business intelligence applications like fraud detection, bankruptcy prediction and credit rating scoring.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Em compras realizadas pela internet ou televendas com cartões de crédito, em muitos países como Brasil e EUA, não há apresentação física do cartão em nenhum momento da compra ou entrega da mercadoria ou serviço, tampouco são populares mecanismos como senhas que assegurem a autenticidade do cartão e seu portador. Ao mesmo tempo, a responsabilidade por assumir os custos nessas transações é dos lojistas. Em todos os estudos anteriores presentes na literatura, a detecção de fraudes com cartões de crédito não abrangia somente esses canais nem focava a detecção nos principais interessados nela, os lojistas. Este trabalho apresenta os resultados da utilização de cinco das técnicas de modelagem mais citadas na literatura e analisa o poder do compartilhamento de dados ao comparar os resultados dos modelos quando processados apenas sobre a base da loja ou com ela compartilhando dados com outros lojistas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Esta tese é composta por três artigos. Dois deles investigam assuntos afeitos a tributação e o terceiro é um artigo sobre o tema “poupança”'. Embora os objetos de análise sejam distintos, os três possuem como característica comum a aplicação de técnicas de econometria de dados em painel a bases de dados inéditas. Em dois dos artigos, utiliza-se estimação por GMM em modelos dinâmicos. Por sua vez, o artigo remanescente é uma aplicação de modelos de variável dependente latente. Abaixo, apresenta-se um breve resumo de cada artigo, começando pelos dois artigos de tributação, que dividem uma seção comum sobre o ICMS (o imposto estadual sobre valor adicionado) e terminando com o artigo sobre poupança. O primeiro artigo analisa a importância da fiscalização como instrumento para deter a evasão de tributos e aumentar a receita tributária, no caso de um imposto sobre valor adicionado, no contexto de um país em desenvolvimento. O estudo é realizado com dados do estado de São Paulo. Para tratar questões relativas a endogeneidade e inércia na série de receita tributária, empregam-se técnicas de painel dinâmico. Utiliza-se como variáveis de controle o nível do PIB regional e duas proxies para esforço fiscal: a quantidade e o valor das multas tributárias. Os resultados apontam impacto significativo do esforço fiscal nas receitas tributárias. O artigo evidencia, indiretamente, a forma como a evasão fiscal é afetada pela penalidade aplicada aos casos de sonegação. Suas conclusões também são relevantes no contexto das discussões sobre o federalismo fiscal brasileiro, especialmente no caso de uma reforma tributária potencial. O segundo artigo examina uma das principais tarefas das administrações tributárias: a escolha periódica de contribuintes para auditoria. A melhora na eficiência dos mecanismos de seleção de empresas tem o potencial de impactar positivamente a probabilidade de detecção de fraudes fiscais, provendo melhor alocação dos escassos recursos fiscais. Neste artigo, tentamos desenvolver este mecanismo calculando a probabilidade de sonegação associada a cada contribuinte. Isto é feito, no universo restrito de empresas auditadas, por meio da combinação “ótima” de diversos indicadores fiscais existentes e de informações dos resultados dos procedimentos de auditoria, em modelos de variável dependente latente. Após calculados os coeficientes, a probabilidade de sonegação é calculada para todo o universo de contribuintes. O método foi empregado em um painel com micro-dados de empresas sujeitas ao recolhimento de ICMS no âmbito da Delegacia Tributária de Guarulhos, no estado de São Paulo. O terceiro artigo analisa as baixas taxas de poupança dos países latino-americanos nas últimas décadas. Utilizando técnicas de dados em painel, identificam-se os determinantes da taxa de poupança. Em seguida, faz-se uma análise contrafactual usando a China, que tem apresentado altas taxas de poupança no mesmo período, como parâmetro. Atenção especial é dispensada ao Brasil, que tem ficado muito atrás dos seus pares no grupo dos BRICs neste quesito. O artigo contribui para a literatura existente em vários sentidos: emprega duas amplas bases de dados para analisar a influência de uma grande variedade de determinantes da taxa de poupança, incluindo variáveis demográficas e de previdência social; confirma resultados previamente encontrados na literatura, com a robustez conferida por bases de dados mais ricas; para alguns países latino-americanos, revela que as suas taxas de poupança tenderiam a aumentar se eles tivessem um comportamento mais semelhante ao da China em outras áreas, mas o incremento não seria tão dramático.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we deal with the problem of feature selection by introducing a new approach based on Gravitational Search Algorithm (GSA). The proposed algorithm combines the optimization behavior of GSA together with the speed of Optimum-Path Forest (OPF) classifier in order to provide a fast and accurate framework for feature selection. Experiments on datasets obtained from a wide range of applications, such as vowel recognition, image classification and fraud detection in power distribution systems are conducted in order to asses the robustness of the proposed technique against Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and a Particle Swarm Optimization (PSO)-based algorithm for feature selection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fraud detection in energy systems by illegal consumers is the most actively pursued study in non-technical losses by electric power companies. Commonly used supervised pattern recognition techniques, such as Artificial Neural Networks and Support Vector Machines have been applied for automatic commercial frauds identification, however they suffer from slow convergence and high computational burden. We introduced here the Optimum-Path Forest classifier for a fast non-technical losses recognition, which has been demonstrated to be superior than neural networks and similar to Support Vector Machines, but much faster. Comparisons among these classifiers are also presented. © 2009 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years, applications in domains such as telecommunications, network security or large scale sensor networks showed the limits of the traditional store-then-process paradigm. In this context, Stream Processing Engines emerged as a candidate solution for all these applications demanding for high processing capacity with low processing latency guarantees. With Stream Processing Engines, data streams are not persisted but rather processed on the fly, producing results continuously. Current Stream Processing Engines, either centralized or distributed, do not scale with the input load due to single-node bottlenecks. Moreover, they are based on static configurations that lead to either under or over-provisioning. This Ph.D. thesis discusses StreamCloud, an elastic paralleldistributed stream processing engine that enables for processing of large data stream volumes. Stream- Cloud minimizes the distribution and parallelization overhead introducing novel techniques that split queries into parallel subqueries and allocate them to independent sets of nodes. Moreover, Stream- Cloud elastic and dynamic load balancing protocols enable for effective adjustment of resources depending on the incoming load. Together with the parallelization and elasticity techniques, Stream- Cloud defines a novel fault tolerance protocol that introduces minimal overhead while providing fast recovery. StreamCloud has been fully implemented and evaluated using several real word applications such as fraud detection applications or network analysis applications. The evaluation, conducted using a cluster with more than 300 cores, demonstrates the large scalability, the elasticity and fault tolerance effectiveness of StreamCloud. Resumen En los útimos años, aplicaciones en dominios tales como telecomunicaciones, seguridad de redes y redes de sensores de gran escala se han encontrado con múltiples limitaciones en el paradigma tradicional de bases de datos. En este contexto, los sistemas de procesamiento de flujos de datos han emergido como solución a estas aplicaciones que demandan una alta capacidad de procesamiento con una baja latencia. En los sistemas de procesamiento de flujos de datos, los datos no se persisten y luego se procesan, en su lugar los datos son procesados al vuelo en memoria produciendo resultados de forma continua. Los actuales sistemas de procesamiento de flujos de datos, tanto los centralizados, como los distribuidos, no escalan respecto a la carga de entrada del sistema debido a un cuello de botella producido por la concentración de flujos de datos completos en nodos individuales. Por otra parte, éstos están basados en configuraciones estáticas lo que conducen a un sobre o bajo aprovisionamiento. Esta tesis doctoral presenta StreamCloud, un sistema elástico paralelo-distribuido para el procesamiento de flujos de datos que es capaz de procesar grandes volúmenes de datos. StreamCloud minimiza el coste de distribución y paralelización por medio de una técnica novedosa la cual particiona las queries en subqueries paralelas repartiéndolas en subconjuntos de nodos independientes. Ademas, Stream- Cloud posee protocolos de elasticidad y equilibrado de carga que permiten una optimización de los recursos dependiendo de la carga del sistema. Unidos a los protocolos de paralelización y elasticidad, StreamCloud define un protocolo de tolerancia a fallos que introduce un coste mínimo mientras que proporciona una rápida recuperación. StreamCloud ha sido implementado y evaluado mediante varias aplicaciones del mundo real tales como aplicaciones de detección de fraude o aplicaciones de análisis del tráfico de red. La evaluación ha sido realizada en un cluster con más de 300 núcleos, demostrando la alta escalabilidad y la efectividad tanto de la elasticidad, como de la tolerancia a fallos de StreamCloud.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the last few years there has been a heightened interest in data treatment and analysis with the aim of discovering hidden knowledge and eliciting relationships and patterns within this data. Data mining techniques (also known as Knowledge Discovery in Databases) have been applied over a wide range of fields such as marketing, investment, fraud detection, manufacturing, telecommunications and health. In this study, well-known data mining techniques such as artificial neural networks (ANN), genetic programming (GP), forward selection linear regression (LR) and k-means clustering techniques, are proposed to the health and sports community in order to aid with resistance training prescription. Appropriate resistance training prescription is effective for developing fitness, health and for enhancing general quality of life. Resistance exercise intensity is commonly prescribed as a percent of the one repetition maximum. 1RM, dynamic muscular strength, one repetition maximum or one execution maximum, is operationally defined as the heaviest load that can be moved over a specific range of motion, one time and with correct performance. The safety of the 1RM assessment has been questioned as such an enormous effort may lead to muscular injury. Prediction equations could help to tackle the problem of predicting the 1RM from submaximal loads, in order to avoid or at least, reduce the associated risks. We built different models from data on 30 men who performed up to 5 sets to exhaustion at different percentages of the 1RM in the bench press action, until reaching their actual 1RM. Also, a comparison of different existing prediction equations is carried out. The LR model seems to outperform the ANN and GP models for the 1RM prediction in the range between 1 and 10 repetitions. At 75% of the 1RM some subjects (n = 5) could perform 13 repetitions with proper technique in the bench press action, whilst other subjects (n = 20) performed statistically significant (p < 0:05) more repetitions at 70% than at 75% of their actual 1RM in the bench press action. Rate of perceived exertion (RPE) seems not to be a good predictor for 1RM when all the sets are performed until exhaustion, as no significant differences (p < 0:05) were found in the RPE at 75%, 80% and 90% of the 1RM. Also, years of experience and weekly hours of strength training are better correlated to 1RM (p < 0:05) than body weight. O'Connor et al. 1RM prediction equation seems to arise from the data gathered and seems to be the most accurate 1RM prediction equation from those proposed in literature and used in this study. Epley's 1RM prediction equation is reproduced by means of data simulation from 1RM literature equations. Finally, future lines of research are proposed related to the problem of the 1RM prediction by means of genetic algorithms, neural networks and clustering techniques. RESUMEN En los últimos años ha habido un creciente interés en el tratamiento y análisis de datos con el propósito de descubrir relaciones, patrones y conocimiento oculto en los mismos. Las técnicas de data mining (también llamadas de \Descubrimiento de conocimiento en bases de datos\) se han aplicado consistentemente a lo gran de un gran espectro de áreas como el marketing, inversiones, detección de fraude, producción industrial, telecomunicaciones y salud. En este estudio, técnicas bien conocidas de data mining como las redes neuronales artificiales (ANN), programación genética (GP), regresión lineal con selección hacia adelante (LR) y la técnica de clustering k-means, se proponen a la comunidad del deporte y la salud con el objetivo de ayudar con la prescripción del entrenamiento de fuerza. Una apropiada prescripción de entrenamiento de fuerza es efectiva no solo para mejorar el estado de forma general, sino para mejorar la salud e incrementar la calidad de vida. La intensidad en un ejercicio de fuerza se prescribe generalmente como un porcentaje de la repetición máxima. 1RM, fuerza muscular dinámica, una repetición máxima o una ejecución máxima, se define operacionalmente como la carga máxima que puede ser movida en un rango de movimiento específico, una vez y con una técnica correcta. La seguridad de las pruebas de 1RM ha sido cuestionada debido a que el gran esfuerzo requerido para llevarlas a cabo puede derivar en serias lesiones musculares. Las ecuaciones predictivas pueden ayudar a atajar el problema de la predicción de la 1RM con cargas sub-máximas y son empleadas con el propósito de eliminar o al menos, reducir los riesgos asociados. En este estudio, se construyeron distintos modelos a partir de los datos recogidos de 30 hombres que realizaron hasta 5 series al fallo en el ejercicio press de banca a distintos porcentajes de la 1RM, hasta llegar a su 1RM real. También se muestra una comparación de algunas de las distintas ecuaciones de predicción propuestas con anterioridad. El modelo LR parece superar a los modelos ANN y GP para la predicción de la 1RM entre 1 y 10 repeticiones. Al 75% de la 1RM algunos sujetos (n = 5) pudieron realizar 13 repeticiones con una técnica apropiada en el ejercicio press de banca, mientras que otros (n = 20) realizaron significativamente (p < 0:05) más repeticiones al 70% que al 75% de su 1RM en el press de banca. El ínndice de esfuerzo percibido (RPE) parece no ser un buen predictor del 1RM cuando todas las series se realizan al fallo, puesto que no existen diferencias signifiativas (p < 0:05) en el RPE al 75%, 80% y el 90% de la 1RM. Además, los años de experiencia y las horas semanales dedicadas al entrenamiento de fuerza están más correlacionadas con la 1RM (p < 0:05) que el peso corporal. La ecuación de O'Connor et al. parece surgir de los datos recogidos y parece ser la ecuación de predicción de 1RM más precisa de aquellas propuestas en la literatura y empleadas en este estudio. La ecuación de predicción de la 1RM de Epley es reproducida mediante simulación de datos a partir de algunas ecuaciones de predicción de la 1RM propuestas con anterioridad. Finalmente, se proponen futuras líneas de investigación relacionadas con el problema de la predicción de la 1RM mediante algoritmos genéticos, redes neuronales y técnicas de clustering.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

People manage a spectrum of identities in cyber domains. Profiling individuals and assigning them to distinct groups or classes have potential applications in targeted services, online fraud detection, extensive social sorting, and cyber-security. This paper presents the Uncertainty of Identity Toolset, a framework for the identification and profiling of users from their social media accounts and e-mail addresses. More specifically, in this paper we discuss the design and implementation of two tools of the framework. The Twitter Geographic Profiler tool builds a map of the ethno-cultural communities of a person's friends on Twitter social media service. The E-mail Address Profiler tool identifies the probable identities of individuals from their e-mail addresses and maps their geographical distribution across the UK. To this end, this paper presents a framework for profiling the digital traces of individuals.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Individuals living in highly networked societies publish a large amount of personal, and potentially sensitive, information online. Web investigators can exploit such information for a variety of purposes, such as in background vetting and fraud detection. However, such investigations require a large number of expensive man hours and human effort. This paper describes InfoScout, a search tool which is intended to reduce the time it takes to identify and gather subject centric information on the Web. InfoScout collects relevance feedback information from the investigator in order to rerank search results, allowing the intended information to be discovered more quickly. Users may still direct their search as they see fit, issuing ad-hoc queries and filtering existing results by keywords. Design choices are informed by prior work and industry collaboration.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The need for paying with mobile devices has urged the development of payment systems for mobile electronic commerce. In this paper we have considered two important abuses in electronic payments systems for detection. The fraud, which is an intentional deception accomplished to secure an unfair gain, and an intrusion which are any set of actions that attempt to compromise the integrity, confidentiality or availability of a resource. Most of the available fraud and intrusion detection systems for e-payments are specific to the systems where they have been incorporated. This paper proposes a generic model called as Activity-Event-Symptoms(AES) model for detecting fraud and intrusion attacks which appears during payment process in the mobile commerce environment. The AES model is designed to identify the symptoms of fraud and intrusions by observing various events/transactions occurs during mobile commerce activity. The symptoms identification is followed by computing the suspicion factors for event attributes, and the certainty factor for a fraud and intrusion is generated using these suspicion factors. We have tested the proposed system by conducting various case studies, on the in-house established mobile commerce environment over wired and wire-less networks test bed.

Relevância:

40.00% 40.00%

Publicador: