7 resultados para pattern clustering
em BORIS: Bern Open Repository and Information System - Berna - Suiça
Resumo:
We consider the problem of fitting a union of subspaces to a collection of data points drawn from one or more subspaces and corrupted by noise and/or gross errors. We pose this problem as a non-convex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and self-expressive dictionary plus a matrix of noise and/or gross errors. By self-expressive we mean a dictionary whose atoms can be expressed as linear combinations of themselves with low-rank coefficients. In the case of noisy data, our key contribution is to show that this non-convex matrix decomposition problem can be solved in closed form from the SVD of the noisy data matrix. The solution involves a novel polynomial thresholding operator on the singular values of the data matrix, which requires minimal shrinkage. For one subspace, a particular case of our framework leads to classical PCA, which requires no shrinkage. For multiple subspaces, the low-rank coefficients obtained by our framework can be used to construct a data affinity matrix from which the clustering of the data according to the subspaces can be obtained by spectral clustering. In the case of data corrupted by gross errors, we solve the problem using an alternating minimization approach, which combines our polynomial thresholding operator with the more traditional shrinkage-thresholding operator. Experiments on motion segmentation and face clustering show that our framework performs on par with state-of-the-art techniques at a reduced computational cost.
Resumo:
SUMMARY There is interest in the potential of companion animal surveillance to provide data to improve pet health and to provide early warning of environmental hazards to people. We implemented a companion animal surveillance system in Calgary, Alberta and the surrounding communities. Informatics technologies automatically extracted electronic medical records from participating veterinary practices and identified cases of enteric syndrome in the warehoused records. The data were analysed using time-series analyses and a retrospective space-time permutation scan statistic. We identified a seasonal pattern of reports of occurrences of enteric syndromes in companion animals and four statistically significant clusters of enteric syndrome cases. The cases within each cluster were examined and information about the animals involved (species, age, sex), their vaccination history, possible exposure or risk behaviour history, information about disease severity, and the aetiological diagnosis was collected. We then assessed whether the cases within the cluster were unusual and if they represented an animal or public health threat. There was often insufficient information recorded in the medical record to characterize the clusters by aetiology or exposures. Space-time analysis of companion animal enteric syndrome cases found evidence of clustering. Collection of more epidemiologically relevant data would enhance the utility of practice-based companion animal surveillance.
Resumo:
The aetiology of childhood cancers remains largely unknown. It has been hypothesized that infections may be involved and that mini-epidemics thereof could result in space-time clustering of incident cases. Most previous studies support spatio-temporal clustering for leukaemia, while results for other diagnostic groups remain mixed. Few studies have corrected for uneven regional population shifts which can lead to spurious detection of clustering. We examined whether there is space-time clustering of childhood cancers in Switzerland identifying cases diagnosed at age <16 years between 1985 and 2010 from the Swiss Childhood Cancer Registry. Knox tests were performed on geocoded residence at birth and diagnosis separately for leukaemia, acute lymphoid leukaemia (ALL), lymphomas, tumours of the central nervous system, neuroblastomas and soft tissue sarcomas. We used Baker's Max statistic to correct for multiple testing and randomly sampled time-, sex- and age-matched controls from the resident population to correct for uneven regional population shifts. We observed space-time clustering of childhood leukaemia at birth (Baker's Max p = 0.045) but not at diagnosis (p = 0.98). Clustering was strongest for a spatial lag of <1 km and a temporal lag of <2 years (Observed/expected close pairs: 124/98; p Knox test = 0.003). A similar clustering pattern was observed for ALL though overall evidence was weaker (Baker's Max p = 0.13). Little evidence of clustering was found for other diagnostic groups (p > 0.2). Our study suggests that childhood leukaemia tends to cluster in space-time due to an etiologic factor present in early life.
Resumo:
Bayesian clustering methods are typically used to identify barriers to gene flow, but they are prone to deduce artificial subdivisions in a study population characterized by an isolation-by-distance pattern (IbD). Here we analysed the landscape genetic structure of a population of wild boars (Sus scrofa) from south-western Germany. Two clustering methods inferred the presence of the same genetic discontinuity. However, the population in question was characterized by a strong IbD pattern. While landscape-resistance modelling failed to identify landscape features that influenced wild boar movement, partial Mantel tests and multiple regression of distance matrices (MRDMs) suggested that the empirically inferred clusters were separated by a genuine barrier. When simulating random lines bisecting the study area, 60% of the unique barriers represented, according to partial Mantel tests and MRDMs, significant obstacles to gene flow. By contrast, the random-lines simulation showed that the boundaries of the inferred empirical clusters corresponded to the most important genetic discontinuity in the study area. Given the degree of habitat fragmentation separating the two empirical partitions, it is likely that the clustering programs correctly identified a barrier to gene flow. The differing results between the work published here and other studies suggest that it will be very difficult to draw general conclusions about habitat permeability in wild boar from individual studies.