935 resultados para clustering


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper outlines the approach taken by the Speech, Audio, Image and Video Technologies laboratory, and the Applied Data Mining Research Group (SAIVT-ADMRG) in the 2014 MediaEval Social Event Detection (SED) task. We participated in the event based clustering subtask (subtask 1), and focused on investigating the incorporation of image features as another source of data to aid clustering. In particular, we developed a descriptor based around the use of super-pixel segmentation, that allows a low dimensional feature that incorporates both colour and texture information to be extracted and used within the popular bag-of-visual-words (BoVW) approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Transit passenger market segmentation enables transit operators to target different classes of transit users for targeted surveys and various operational and strategic planning improvements. However, the existing market segmentation studies in the literature have been generally done using passenger surveys, which have various limitations. The smart card (SC) data from an automated fare collection system facilitate the understanding of the multiday travel pattern of transit passengers and can be used to segment them into identifiable types of similar behaviors and needs. This paper proposes a comprehensive methodology for passenger segmentation solely using SC data. After reconstructing the travel itineraries from SC transactions, this paper adopts the density-based spatial clustering of application with noise (DBSCAN) algorithm to mine the travel pattern of each SC user. An a priori market segmentation approach then segments transit passengers into four identifiable types. The methodology proposed in this paper assists transit operators to understand their passengers and provides them oriented information and services.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There is a wide range of potential study designs for intervention studies to decrease nosocomial infections in hospitals. The analysis is complex due to competing events, clustering, multiple timescales and time-dependent period and intervention variables. This review considers the popular pre-post quasi-experimental design and compares it with randomized designs. Randomization can be done in several ways: randomization of the cluster [intensive care unit (ICU) or hospital] in a parallel design; randomization of the sequence in a cross-over design; and randomization of the time of intervention in a stepped-wedge design. We introduce each design in the context of nosocomial infections and discuss the designs with respect to the following key points: bias, control for nonintervention factors, and generalizability. Statistical issues are discussed. A pre-post-intervention design is often the only choice that will be informative for a retrospective analysis of an outbreak setting. It can be seen as a pilot study with further, more rigorous designs needed to establish causality. To yield internally valid results, randomization is needed. Generally, the first choice in terms of the internal validity should be a parallel cluster randomized trial. However, generalizability might be stronger in a stepped-wedge design because a wider range of ICU clinicians may be convinced to participate, especially if there are pilot studies with promising results. For analysis, the use of extended competing risk models is recommended.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole - term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR. In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering. Copyright 2014 ACM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this work, we present the challenges associated with the two-way recommendation methods in social networks and the solutions. We discuss them from the perspective of community-type social networks such as online dating networks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Samples of Forsythia suspensa from raw (Laoqiao) and ripe (Qingqiao) fruit were analyzed with the use of HPLC-DAD and the EIS-MS techniques. Seventeen peaks were detected, and of these, twelve were identified. Most were related to the glucopyranoside molecular fragment. Samples collected from three geographical areas (Shanxi, Henan and Shandong Provinces), were discriminated with the use of hierarchical clustering analysis (HCA), discriminant analysis (DA), and principal component analysis (PCA) models, but only PCA was able to provide further information about the relationships between objects and loadings; eight peaks were related to the provinces of sample origin. The supervised classification models-K-nearest neighbor (KNN), least squares support vector machines (LS-SVM), and counter propagation artificial neural network (CP-ANN) methods, indicated successful classification but KNN produced 100% classification rate. Thus, the fruit were discriminated on the basis of their places of origin.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Due to their unobtrusive nature, vision-based approaches to tracking sports players have been preferred over wearable sensors as they do not require the players to be instrumented for each match. Unfortunately however, due to the heavy occlusion between players, variation in resolution and pose, in addition to fluctuating illumination conditions, tracking players continuously is still an unsolved vision problem. For tasks like clustering and retrieval, having noisy data (i.e. missing and false player detections) is problematic as it generates discontinuities in the input data stream. One method of circumventing this issue is to use an occupancy map, where the field is discretised into a series of zones and a count of player detections in each zone is obtained. A series of frames can then be concatenated to represent a set-play or example of team behaviour. A problem with this approach though is that the compressibility is low (i.e. the variability in the feature space is incredibly high). In this paper, we propose the use of a bilinear spatiotemporal basis model using a role representation to clean-up the noisy detections which operates in a low-dimensional space. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed high-definition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the-art real-time player detector and compare it to manually labeled data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objectives To determine the frequency and types of stressful events experienced by urban Aboriginal and Torres Strait Islander children, and to explore the relationship between these experiences and the children’s physical health and parental concerns about their behaviour and learning ability. Design, setting and participants Cross-sectional study of Aboriginal and Torres Strait Islander children aged ≤ 14 years presenting to an urban Indigenous primary health care service in Brisbane for annual child health checks between March 2007 and March 2010. Main outcome measures Parental or carer report of stressful events ever occurring in the family that may have affected the child. Results Of 344 participating children, 175 (51%) had experienced at least one stressful event. Reported events included the death of a family member or close friend (40; 23%), parental divorce or separation (28; 16%), witness to violence or abuse (20; 11%), or incarceration of a family member (7; 4%). These children were more likely to have parents or carers concerned about their behaviour (P < 0.001) and to have a history of ear (P < 0.001) or skin (P = 0.003) infections. Conclusions Children who had experienced stressful events had poorer physical health and more parental concern about behavioural issues than those who had not. Parental disclosure in the primary health care setting of stressful events that have affected the child necessitates appropriate medical, psychological or social interventions to ameliorate both the immediate and potential lifelong negative impact. However, treating the impact of stressful events is insufficient without dealing with the broader political and societal issues that result in a clustering of stressful events in the Aboriginal and Torres Strait Islander population.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose the use of optical flow information as a method for detecting and describing changes in the environment, from the perspective of a mobile camera. We analyze the characteristics of the optical flow signal and demonstrate how robust flow vectors can be generated and used for the detection of depth discontinuities and appearance changes at key locations. To successfully achieve this task, a full discussion on camera positioning, distortion compensation, noise filtering, and parameter estimation is presented. We then extract statistical attributes from the flow signal to describe the location of the scene changes. We also employ clustering and dominant shape of vectors to increase the descriptiveness. Once a database of nodes (where a node is a detected scene change) and their corresponding flow features is created, matching can be performed whenever nodes are encountered, such that topological localization can be achieved. We retrieve the most likely node according to the Mahalanobis and Chi-square distances between the current frame and the database. The results illustrate the applicability of the technique for detecting and describing scene changes in diverse lighting conditions, considering indoor and outdoor environments and different robot platforms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Smart Card Automated Fare Collection (AFC) data has been extensively exploited to understand passenger behavior, passenger segment, trip purpose and improve transit planning through spatial travel pattern analysis. The literature has been evolving from simple to more sophisticated methods such as from aggregated to individual travel pattern analysis, and from stop-to-stop to flexible stop aggregation. However, the issue of high computing complexity has limited these methods in practical applications. This paper proposes a new algorithm named Weighted Stop Density Based Scanning Algorithm with Noise (WS-DBSCAN) based on the classical Density Based Scanning Algorithm with Noise (DBSCAN) algorithm to detect and update the daily changes in travel pattern. WS-DBSCAN converts the classical quadratic computation complexity DBSCAN to a problem of sub-quadratic complexity. The numerical experiment using the real AFC data in South East Queensland, Australia shows that the algorithm costs only 0.45% in computation time compared to the classical DBSCAN, but provides the same clustering results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In recent years, considerable research efforts have been directed to micro-array technologies and their role in providing simultaneous information on expression profiles for thousands of genes. These data, when subjected to clustering and classification procedures, can assist in identifying patterns and providing insight on biological processes. To understand the properties of complex gene expression datasets, graphical representations can be used. Intuitively, the data can be represented in terms of a bipartite graph, with weighted edges corresponding to gene-sample node couples in the dataset. Biologically meaningful subgraphs can be sought, but performance can be influenced both by the search algorithm, and, by the graph-weighting scheme and both merit rigorous investigation. In this paper, we focus on edge-weighting schemes for bipartite graphical representation of gene expression. Two novel methods are presented: the first is based on empirical evidence; the second on a geometric distribution. The schemes are compared for several real datasets, assessing efficiency of performance based on four essential properties: robustness to noise and missing values, discrimination, parameter influence on scheme efficiency and reusability. Recommendations and limitations are briefly discussed. Keywords: Edge-weighting; weighted graphs; gene expression; bi-clustering

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In providing simultaneous information on expression profiles for thousands of genes, microarray technologies have, in recent years, been largely used to investigate mechanisms of gene expression. Clustering and classification of such data can, indeed, highlight patterns and provide insight on biological processes. A common approach is to consider genes and samples of microarray datasets as nodes in a bipartite graphs, where edges are weighted e.g. based on the expression levels. In this paper, using a previously-evaluated weighting scheme, we focus on search algorithms and evaluate, in the context of biclustering, several variations of Genetic Algorithms. We also introduce a new heuristic “Propagate”, which consists in recursively evaluating neighbour solutions with one more or one less active conditions. The results obtained on three well-known datasets show that, for a given weighting scheme,optimal or near-optimal solutions can be identified.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis in software engineering presents a novel automated framework to identify similar operations utilized by multiple algorithms for solving related computing problems. It provides a new effective solution to perform multi-application based algorithm analysis, employing fundamentally light-weight static analysis techniques compared to the state-of-art approaches. Significant performance improvements are achieved across the objective algorithms through enhancing the efficiency of the identified similar operations, targeting discrete application domains.