22 resultados para datamining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A system for temporal data mining includes a computer readable medium having an application configured to receive at an input module a temporal data series and a threshold frequency. The system is further configured to identify, using a candidate identification and tracking module, one or more occurrences in the temporal data series of a candidate episode and increment a count for each identified occurrence. The system is also configured to produce at an output module an output for those episodes whose count of occurrences results in a frequency exceeding the threshold frequency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cada vez mais as empresas vêm enfrentando níveis de concorrência mais altos, menores margens nas vendas e perda de diferenciação de seus produtos e serviços. Por outro lado, o desenvolvimento tecnológico e particularmente a Internet vêm permitindo às empresas ter um contato próximo com seus clientes. Este mesmo desenvolvimento também permite que todos os dados, operações e transações empresas sejam armazenadas, fazendo com que maioria delas possuam grandes bancos de dados. Este trabalho tem por objetivo: • Entender quais as tendências do Marketing e como o surgimento de novos canais como a Internet podem alavancar a posição competitiva das empresas; • Constatar a grande quantidade de dados que as empresas possuem, e qual o potencial de informações contida nos mesmos; • Mostrar como Datamining pode auxiliar na descoberta de informações contidas em meio ao emaranhado de dados, • Desenvolvimento de metodologia em um estudo de caso prático, onde se buscará determinar que clientes têm maiores probabilidades de compra, e quais seus padrões de consumoatravés de um estudo de caso, e como a empresa pode usufruir destas informações; • Mostrar como as empresas podem usufruir das informações descobertas e quais são os valores econômico e estratégico.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A tanulmány Magyarország egyik legnagyobb foglalkoztatójának megrendelésére készült abból a célból, hogy milyen megoldásokkal lehetne a vállalati működést hatékonyabbá tenni. Ennek keretében a szerzők megvizsgálták, hol tart ma a HR adatbányászati kutatás a világban. Milyen eszközök állnak rendelkezésre ahhoz, hogy a munkavállalói elmenetelt előre jelezzék, illetve figyeljék, valamint milyen lehetőség van a hálózati kutatások felhasználására a biztonság területén. Szerencsés, hogy a vállalkozói kérdések és erőforrások találkozhattak a kutatói szféra aktuális kutatási területeivel. A tanulmány szerzői úgy gondolják, hogy a cikkben megfogalmazott állítások, következtetések, eredmények a jövőben hasznosíthatók lesznek a vállalat és más cégek számára is. _____ The authors were pleased to take part in this research project initiated by one of Hungary’s largest employer. The goal of the project was to work out BI solutions to improve upon their business process. In the framework of the project first the authors made a survey on the current trends in the world of HR datamining. They reviewed the available tools for the prediction of employee promotion and investigated the question on how to utilize results achieved in social network analysis in the field of enterprise security. When real business problems and resources meet the mainstream research of the scientific community it is always a fortunate and it is rather fruitful. The authors are certain that the results published in this document will be beneficial for Foxconn in the near future. Of course, they are not done. There are continually new research perspectives opening up and huge amount of information is accumulating in the enterprises just waiting for getting discovered and analysed. Also the environment in which an enterprise operates is dynamically changing and thus the company faces new challenges and new type of business problems arise. The authors are in the hope that their research experience will help decision makers also in the future to solve real world business problems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Network crawling and visualisation tools and other datamining systems are now advanced enough to provide significant new impulses to the study of cultural activity on the Web. A growing range of studies focus on communicative processes in the blogosphere – including for example Adamic & Glance’s 2005 map of political allegiances during the 2004 U.S. presidential election and Kelly & Etling’s 2008 study of blogging practices in Iran. There remain a number of significant shortcomings in the application of such tools and methodologies to the study of blogging; these relate both to how the content of blogs is analysed, and to how the network maps resulting from such studies are understood. Our project highlights and addresses such shortcomings.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dealing with product yield and quality in manufacturing industries is getting more difficult due to the increasing volume and complexity of data and quicker time to market expectations. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large databases. Growing self-organizing map (GSOM) is established as an efficient unsupervised datamining algorithm. In this study some modifications to the original GSOM are proposed for manufacturing yield improvement by clustering. These modifications include introduction of a clustering quality measure to evaluate the performance of the programme in separating good and faulty products and a filtering index to reduce noise from the dataset. Results show that the proposed method is able to effectively differentiate good and faulty products. It will help engineers construct the knowledge base to predict product quality automatically from collected data and provide insights for yield improvement.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Our paper approaches Twitter through the lens of “platform politics” (Gillespie, 2010), focusing in particular on controversies around user data access, ownership, and control. We characterise different actors in the Twitter data ecosystem: private and institutional end users of Twitter, commercial data resellers such as Gnip and DataSift, data scientists, and finally Twitter, Inc. itself; and describe their conflicting interests. We furthermore study Twitter’s Terms of Service and application programming interface (API) as material instantiations of regulatory instruments used by the platform provider and argue for a more promotion of data rights and literacy to strengthen the position of end users.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We examine some variations of standard probability designs that preferentially sample sites based on how easy they are to access. Preferential sampling designs deliver unbiased estimates of mean and sampling variance and will ease the burden of data collection but at what cost to our design efficiency? Preferential sampling has the potential to either increase or decrease sampling variance depending on the application. We carry out a simulation study to gauge what effect it will have when sampling Soil Organic Carbon (SOC) values in a large agricultural region in south-eastern Australia. Preferential sampling in this region can reduce the distance to travel by up to 16%. Our study is based on a dataset of predicted SOC values produced from a datamining exercise. We consider three designs and two ways to determine ease of access. The overall conclusion is that sampling performance deteriorates as the strength of preferential sampling increases, due to the fact the regions of high SOC are harder to access. So our designs are inadvertently targeting regions of low SOC value. The good news, however, is that Generalised Random Tessellation Stratification (GRTS) sampling designs are not as badly affected as others and GRTS remains an efficient design compared to competitors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We conducted data-mining analyses of genome wide association (GWA) studies of the CATIE and MGS-GAIN datasets, and found 13 markers in the two physically linked genes, PTPN21 and EML5, showing nominally significant association with schizophrenia. Linkage disequilibrium (LD) analysis indicated that all 7 markers from PTPN21 shared high LD (r(2)>0.8), including rs2274736 and rs2401751, the two non-synonymous markers with the most significant association signals (rs2401751, P=1.10 × 10(-3) and rs2274736, P=1.21 × 10(-3)). In a meta-analysis of all 13 replication datasets with a total of 13,940 subjects, we found that the two non-synonymous markers are significantly associated with schizophrenia (rs2274736, OR=0.92, 95% CI: 0.86-0.97, P=5.45 × 10(-3) and rs2401751, OR=0.92, 95% CI: 0.86-0.97, P=5.29 × 10(-3)). One SNP (rs7147796) in EML5 is also significantly associated with the disease (OR=1.08, 95% CI: 1.02-1.14, P=6.43 × 10(-3)). These 3 markers remain significant after Bonferroni correction. Furthermore, haplotype conditioned analyses indicated that the association signals observed between rs2274736/rs2401751 and rs7147796 are statistically independent. Given the results that 2 non-synonymous markers in PTPN21 are associated with schizophrenia, further investigation of this locus is warranted.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A vigilância de efeitos indesejáveis após a vacinação é complexa. Existem vários actores de confundimento que podem dar origem a associações espúrias, meramente temporais mas que podem provocar uma percepção do risco alterada e uma consequente desconfiança generalizada acerca do uso das vacinas. Com efeito as vacinas são medicamentos complexos com características únicas cuja vigilância necessita de abordagens metodológicas desenvolvidas para esse propósito. Do exposto se entende que, desde o desenvolvimento da farmacovigilância se tem procurado desenvolver novas metodologias que sejam concomitantes aos Sistemas de Notificação Espontânea que já existem. Neste trabalho propusemo-nos a desenvolver e testar um modelo de vigilância de reacções adversas a vacinas, baseado na auto-declaração pelo utente de eventos ocorridos após a vacinação e testar a capacidade de gerar sinais aplicando cálculos de desproporção a datamining. Para esse efeito foi constituída uma coorte não controlada de utentes vacinados em Centros de Saúde que foram seguidos durante quinze dias. A recolha de eventos adversos a vacinas foi efectuada pelos próprios utentes através de um diário de registo. Os dados recolhidos foram objecto de análise descritiva e análise de data-mining utilizando os cálculos Proportional Reporting Ratio e o Information Component. A metodologia utilizada permitiu gerar um corpo de evidência suficiente para a geração de sinais. Tendo sido gerados quatro sinais. No âmbito do data-mining a utilização do Information Component como método de geração de sinais parece aumentar a eficiência científica ao permitir reduzir o número de ocorrências até detecção de sinal. A informação reportada pelos utentes parece válida como indicador de sinais de reacções adversas não graves, o que permitiu o registo de eventos sem incluir o viés da avaliação da relação causal pelo notificador. Os principais eventos reportados foram eventos adversos locais (62,7%) e febre (31,4%).------------------------------------------ABSTRACT: The monitoring of undesirable effects following vaccination is complex. There are several confounding factors that can lead to merely temporal but spurious associations that can cause a change in the risk perception and a consequent generalized distrust about the safe use of vaccines. Indeed, vaccines are complex drugs with unique characteristics so that its monitoring requires specifically designed methodological approaches. From the above-cited it is understandable that since the development of Pharmacovigilance there has been a drive for the development of new methodologies that are concomitant with Spontaneous Reporting Systems already in place. We proposed to develop and test a new model for vaccine adverse reaction monitoring, based on self-report by users of events following vaccination and to test its capability to generate disproportionality signals applying quantitative methods of signal generation to data-mining. For that effect we set up an uncontrolled cohort of users vaccinated in Healthcare Centers,with a follow-up period of fifteen days. Adverse vaccine events we registered by the users themselves in a paper diary The data was analyzed using descriptive statistics and two quantitative methods of signal generation: Proportional Reporting Ratio and Information Component. themselves in a paper diary The data was analyzed using descriptive statistics and two quantitative methods of signal generation: Proportional Reporting Ratio and Information Component. The methodology we used allowed for the generation of a sufficient body of evidence for signal generation. Four signals were generated. Regarding the data-mining, the use of Information Component as a method for generating disproportionality signals seems to increase scientific efficiency by reducing the number of events needed to signal detection. The information reported by users seems valid as an indicator of non serious adverse vaccine reactions, allowing for the registry of events without the bias of the evaluation of the casual relation by the reporter. The main adverse events reported were injection site reactions (62,7%) and fever (31,4%).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One major difficulty frustrating the application of linear causal models is that they are not easily adapted to cope with discrete data. This is unfortunate since most real problems involve both continuous and discrete variables. In this paper, we consider a class of graphical models which allow both continuous and discrete variables, and propose the parameter estimation method and a structure discovery algorithm based on Minimum Message Length and parameter estimation. Experimental results are given to demonstrate the potential for the application of this method.

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper reports on the preparation and management processes of inconsistent data on damage on residential houses in Victoria, Australia. There are no existing specific and fully relevant databases readily available except for the incomplete paper-based and electronic-based reports. Therefore, the extracting of information from the reports is complicated and time consuming in order to extract and include all the necessary information needed for analysis of damage on residential houses founded on expansive soils. Data mining is adopted to develop a database. Statistical methods and Artificial Intelligence methods are used to quantify the quality of data. The paper concludes that the development of such database could enable BHC to evaluate the usefulness of the reports prepared on the reported damage properties for further analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel data mining framework for the exploration and extraction of actionable knowledge from data generated by electricity meters. Although a rich source of information for energy consumption analysis, electricity meters produce a voluminous, fast-paced, transient stream of data that conventional approaches are unable to address entirely. In order to overcome these issues, it is important for a data mining framework to incorporate functionality for interim summarization and incremental analysis using intelligent techniques. The proposed Incremental Summarization and Pattern Characterization (ISPC) framework demonstrates this capability. Stream data is structured in a data warehouse based on key dimensions enabling rapid interim summarization. Independently, the IPCL algorithm incrementally characterizes patterns in stream data and correlates these across time. Eventually, characterized patterns are consolidated with interim summarization to facilitate an overall analysis and prediction of energy consumption trends. Results of experiments conducted using the actual data from electricity meters confirm applicability of the ISPC framework.