3 resultados para Identificação de outliers
em Queensland University of Technology - ePrints Archive
Resumo:
In this paper, we propose a semi-supervised approach of anomaly detection in Online Social Networks. The social network is modeled as a graph and its features are extracted to detect anomaly. A clustering algorithm is then used to group users based on these features and fuzzy logic is applied to assign degree of anomalous behavior to the users of these clusters. Empirical analysis shows effectiveness of this method.
Resumo:
This paper presents a technique for the automated removal of noise from process execution logs. Noise is the result of data quality issues such as logging errors and manifests itself in the form of infrequent process behavior. The proposed technique generates an abstract representation of an event log as an automaton capturing the direct follows relations between event labels. This automaton is then pruned from arcs with low relative frequency and used to remove from the log those events not fitting the automaton, which are identified as outliers. The technique has been extensively evaluated on top of various auto- mated process discovery algorithms using both artificial logs with different levels of noise, as well as a variety of real-life logs. The results show that the technique significantly improves the quality of the discovered process model along fitness, appropriateness and simplicity, without negative effects on generalization. Further, the technique scales well to large and complex logs.
Resumo:
Most information in linkage analysis for quantitative traits comes from pairs of relatives that are phenotypically most discordant or concordant. Confounding this, within-family outliers from non-genetic causes may create false positives and negatives. We investigated the influence of within-family outliers empirically, using one of the largest genome-wide linkage scans for height. The subjects were drawn from Australian twin cohorts consisting of 8447 individuals in 2861 families, providing a total of 5815 possible pairs of siblings in sibships. A variance component linkage analysis was performed, either including or excluding the within-family outliers. Using the entire dataset, the largest LOD scores were on chromosome 15q (LOD 2.3) and 11q (1.5). Excluding within-family outliers increased the LOD score for most regions, but the LOD score on chromosome 15 decreased from 2.3 to 1.2, suggesting that the outliers may create false negatives and false positives, although rare alleles of large effect may also be an explanation. Several regions suggestive of linkage to height were found after removing the outliers, including 1q23.1 (2.0), 3q22.1 (1.9) and 5q32 (2.3). We conclude that the investigation of the effect of within-family outliers, which is usually neglected, should be a standard quality control measure in linkage analysis for complex traits and may reduce the noise for the search of common variants of modest effect size as well as help identify rare variants of large effect and clinical significance. We suggest that the effect of within-family outliers deserves further investigation via theoretical and simulation studies.