20 resultados para OUTLIERS
Resumo:
An outlier removal based data cleaning technique is proposed to
clean manually pre-segmented human skin data in colour images.
The 3-dimensional colour data is projected onto three 2-dimensional
planes, from which outliers are removed. The cleaned 2 dimensional
data projections are merged to yield a 3D clean RGB data. This data
is finally used to build a look up table and a single Gaussian classifier
for the purpose of human skin detection in colour images.
Resumo:
High gene flow is considered the norm for most marine organisms and is expected to limit their ability to adapt to local environments. Few studies have directly compared the patterns of differentiation at neutral and selected gene loci in marine organisms. We analysed a transcriptome-derived panel of 281 SNPs in Atlantic herring (Clupea harengus), a highly migratory small pelagic fish, for elucidating neutral and selected genetic variation among populations and to identify candidate genes for environmental adaptation. We analysed 607 individuals from 18 spawning locations in the northeast Atlantic, including two temperature clines (5-12 °C) and two salinity clines (5-35‰). By combining genome scan and landscape genetic analyses, four genetically distinct groups of herring were identified: Baltic Sea, Baltic-North Sea transition area, North Sea/British Isles and North Atlantic; notably, samples exhibited divergent clustering patterns for neutral and selected loci. We found statistically strong evidence for divergent selection at 16 outlier loci on a global scale, and significant correlations with temperature and salinity at nine loci. On regional scales, we identified two outlier loci with parallel patterns across temperature clines and five loci associated with temperature in the North Sea/North Atlantic. Likewise, we found seven replicated outliers, of which five were significantly associated with low salinity across both salinity clines. Our results reveal a complex pattern of varying spatial genetic variation among outlier loci, likely reflecting adaptations to local environments. In addition to disclosing the fine scale of local adaptation in a highly vagile species, our data emphasize the need to preserve functionally important biodiversity.
Resumo:
The problem of detecting spatially-coherent groups of data that exhibit anomalous behavior has started to attract attention due to applications across areas such as epidemic analysis and weather forecasting. Earlier efforts from the data mining community have largely focused on finding outliers, individual data objects that display deviant behavior. Such point-based methods are not easy to extend to find groups of data that exhibit anomalous behavior. Scan Statistics are methods from the statistics community that have considered the problem of identifying regions where data objects exhibit a behavior that is atypical of the general dataset. The spatial scan statistic and methods that build upon it mostly adopt the framework of defining a character for regions (e.g., circular or elliptical) of objects and repeatedly sampling regions of such character followed by applying a statistical test for anomaly detection. In the past decade, there have been efforts from the statistics community to enhance efficiency of scan statstics as well as to enable discovery of arbitrarily shaped anomalous regions. On the other hand, the data mining community has started to look at determining anomalous regions that have behavior divergent from their neighborhood.In this chapter,we survey the space of techniques for detecting anomalous regions on spatial data from across the data mining and statistics communities while outlining connections to well-studied problems in clustering and image segmentation. We analyze the techniques systematically by categorizing them appropriately to provide a structured birds eye view of the work on anomalous region detection;we hope that this would encourage better cross-pollination of ideas across communities to help advance the frontier in anomaly detection.
Resumo:
The complexity of modern geochemical data sets is increasing in several aspects (number of available samples, number of elements measured, number of matrices analysed, geological-environmental variability covered, etc), hence it is becoming increasingly necessary to apply statistical methods to elucidate their structure. This paper presents an exploratory analysis of one such complex data set, the Tellus geochemical soil survey of Northern Ireland (NI). This exploratory analysis is based on one of the most fundamental exploratory tools, principal component analysis (PCA) and its graphical representation as a biplot, albeit in several variations: the set of elements included (only major oxides vs. all observed elements), the prior transformation applied to the data (none, a standardization or a logratio transformation) and the way the covariance matrix between components is estimated (classical estimation vs. robust estimation). Results show that a log-ratio PCA (robust or classical) of all available elements is the most powerful exploratory setting, providing the following insights: the first two processes controlling the whole geochemical variation in NI soils are peat coverage and a contrast between “mafic” and “felsic” background lithologies; peat covered areas are detected as outliers by a robust analysis, and can be then filtered out if required for further modelling; and peat coverage intensity can be quantified with the %Br in the subcomposition (Br, Rb, Ni).
Resumo:
Robust joint modelling is an emerging field of research. Through the advancements in electronic patient healthcare records, the popularly of joint modelling approaches has grown rapidly in recent years providing simultaneous analysis of longitudinal and survival data. This research advances previous work through the development of a novel robust joint modelling methodology for one of the most common types of standard joint models, that which links a linear mixed model with a Cox proportional hazards model. Through t-distributional assumptions, longitudinal outliers are accommodated with their detrimental impact being down weighed and thus providing more efficient and reliable estimates. The robust joint modelling technique and its major benefits are showcased through the analysis of Northern Irish end stage renal disease patients. With an ageing population and growing prevalence of chronic kidney disease within the United Kingdom, there is a pressing demand to investigate the detrimental relationship between the changing haemoglobin levels of haemodialysis patients and their survival. As outliers within the NI renal data were found to have significantly worse survival, identification of outlying individuals through robust joint modelling may aid nephrologists to improve patient's survival. A simulation study was also undertaken to explore the difference between robust and standard joint models in the presence of increasing proportions and extremity of longitudinal outliers. More efficient and reliable estimates were obtained by robust joint models with increasing contrast between the robust and standard joint models when a greater proportion of more extreme outliers are present. Through illustration of the gains in efficiency and reliability of parameters when outliers exist, the potential of robust joint modelling is evident. The research presented in this thesis highlights the benefits and stresses the need to utilise a more robust approach to joint modelling in the presence of longitudinal outliers.