98 resultados para Incremental Clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case,and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex symbolic sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective To investigate the epidemic characteristics of human cutaneous anthrax (CA) in China, detect the spatiotemporal clusters at the county level for preemptive public health interventions, and evaluate the differences in the epidemiological characteristics within and outside clusters. Methods CA cases reported during 2005–2012 from the national surveillance system were evaluated at the county level using space-time scan statistic. Comparative analysis of the epidemic characteristics within and outside identified clusters was performed using using the χ2 test or Kruskal-Wallis test. Results The group of 30–39 years had the highest incidence of CA, and the fatality rate increased with age, with persons ≥70 years showing a fatality rate of 4.04%. Seasonality analysis showed that most of CA cases occurred between May/June and September/October of each year. The primary spatiotemporal cluster contained 19 counties from June 2006 to May 2010, and it was mainly located straddling the borders of Sichuan, Gansu, and Qinghai provinces. In these high-risk areas, CA cases were predominantly found among younger, local, males, shepherds, who were living on agriculture and stockbreeding and characterized with high morbidity, low mortality and a shorter period from illness onset to diagnosis. Conclusion CA was geographically and persistently clustered in the Southwestern China during 2005–2012, with notable differences in the epidemic characteristics within and outside spatiotemporal clusters; this demonstrates the necessity for CA interventions such as enhanced surveillance, health education, mandatory and standard decontamination or disinfection procedures to be geographically targeted to the areas identified in this study.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dietary nitrate (NO3−) supplementation with beetroot juice (BR) over 4–6 days has been shown to reduce the O2 cost of submaximal exercise and to improve exercise tolerance. However, it is not known whether shorter (or longer) periods of supplementation have similar (or greater) effects. We therefore investigated the effects of acute and chronic NO3− supplementation on resting blood pressure (BP) and the physiological responses to moderate-intensity exercise and ramp incremental cycle exercise in eight healthy subjects. Following baseline tests, the subjects were assigned in a balanced crossover design to receive BR (0.5 l/day; 5.2 mmol of NO3−/day) and placebo (PL; 0.5 l/day low-calorie juice cordial) treatments. The exercise protocol (two moderate-intensity step tests followed by a ramp test) was repeated 2.5 h following first ingestion (0.5 liter) and after 5 and 15 days of BR and PL. Plasma nitrite concentration (baseline: 454 ± 81 nM) was significantly elevated (+39% at 2.5 h postingestion; +25% at 5 days; +46% at 15 days; P < 0.05) and systolic and diastolic BP (baseline: 127 ± 6 and 72 ± 5 mmHg, respectively) were reduced by ∼4% throughout the BR supplementation period (P < 0.05). Compared with PL, the steady-state V̇o2 during moderate exercise was reduced by ∼4% after 2.5 h and remained similarly reduced after 5 and 15 days of BR (P < 0.05). The ramp test peak power and the work rate at the gas exchange threshold (baseline: 322 ± 67 W and 89 ± 15 W, respectively) were elevated after 15 days of BR (331 ± 68 W and 105 ± 28 W; P < 0.05) but not PL (323 ± 68 W and 84 ± 18 W). These results indicate that dietary NO3− supplementation acutely reduces BP and the O2 cost of submaximal exercise and that these effects are maintained for at least 15 days if supplementation is continued.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The family of location and scale mixtures of Gaussians has the ability to generate a number of flexible distributional forms. The family nests as particular cases several important asymmetric distributions like the Generalized Hyperbolic distribution. The Generalized Hyperbolic distribution in turn nests many other well known distributions such as the Normal Inverse Gaussian. In a multivariate setting, an extension of the standard location and scale mixture concept is proposed into a so called multiple scaled framework which has the advantage of allowing different tail and skewness behaviours in each dimension with arbitrary correlation between dimensions. Estimation of the parameters is provided via an EM algorithm and extended to cover the case of mixtures of such multiple scaled distributions for application to clustering. Assessments on simulated and real data confirm the gain in degrees of freedom and flexibility in modelling data of varying tail behaviour and directional shape.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from introducing multidimensional instead of univariate scale variables for the mixture of scaled Gaussian family of distributions. In contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examine a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and t tails. For these latter cases, we provide maximum likelihood estimation of the parameters and illustrate their modelling flexibility on simulated and real data clustering examples.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a statistical aircraft trajectory clustering approach aimed at discriminating between typical manned and expected unmanned traffic patterns. First, a resampled version of each trajectory is modelled using a mixture of Von Mises distributions (circular statistics). Second, the remodelled trajectories are globally aligned using tools from bioinformatics. Third, the alignment scores are used to cluster the trajectories using an iterative k-medoids approach and an appropriate distance function. The approach is then evaluated using synthetically generated unmanned aircraft flights combined with real air traffic position reports taken over a sector of Northern Queensland, Australia. Results suggest that the technique is useful in distinguishing between expected unmanned and manned aircraft traffic behaviour, as well as identifying some common conventional air traffic patterns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Do SMEs cluster around different types of innovation activities? Are there patterns of SME innovation activities? To investigate we develop a taxonomy of innovation activities in SMEs using a qualitative study, followed by a survey. First, based upon our qualitative research and literature review we develop a comprehensive list of innovation activities SMEs typically engage in. We then conduct a factor analysis to determine if these activities can be combined into factors. We identify three innovation activity factors: R&D activities, incremental innovation activities and cost innovation activities. We use these factors to identify three clusters of firms engaging in similar innovation activities: active innovators, incremental innovators and opportunistic innovators. The clusters are enriched by validating that they also exhibit significant internal similarities and external differences in their innovation skills, demographics, industry segments and family business ownership. This research contributes to innovation and SME theory and practice by identifying SME clusters based upon their innovation activities.