933 resultados para agglomerative clustering


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Introduction and aims: Despite evidence that many Australian adolescents have considerable experience with various drug types, little is known about the extent to which adolescents use multiple substances. The aim of this study was to examine the degree of clustering of drug types within individuals, and the extent to which demographic and psychosocial predictors are related to cluster membership. Design and method: A sample of 1402 adolescents aged 12-17. years were extracted from the Australian 2007 National Drug Strategy Household Survey. Extracted data included lifetime use of 10 substances, gender, psychological distress, physical health, perceived peer substance use, socioeconomic disadvantage, and regionality. Latent class analysis was used to determine clusters, and multinomial logistic regression employed to examine predictors of cluster membership. Result: There were 3 latent classes. The great majority (79.6%) of adolescents used alcohol only, 18.3% were limited range multidrug users (encompassing alcohol, tobacco, and marijuana), and 2% were extended range multidrug users. Perceived peer drug use and psychological distress predicted limited and extended multiple drug use. Psychological distress was a more significant predictor of extended multidrug use compared to limited multidrug use. Discussion and conclusion: In the Australian school-based prevention setting, a very strong focus on alcohol use and the linkages between alcohol, tobacco and marijuana are warranted. Psychological distress may be an important target for screening and early intervention for adolescents who use multiple drugs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Provenance studies of iron-age pottery specimens originating from the Mngeni river area in South Africa was carried out by applying XRF spectrometry. A total of sixteen major and trace elements were analysed in a batch of 107 potsherds, excavated from four different archaeological sites in the aforementioned area. A multivariate statistical programme Correspondence Analysis was used in this study to obtain the relevant clustering patterns according to the similarity of the elemental distributions. Differences and similarities in the clusters obtained for the majors and trace elements are discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

PURPOSE The purpose of this study was to demonstrate the potential of near infrared (NIR) spectroscopy for characterizing the health and degenerative state of articular cartilage based on the components of the Mankin score. METHODS Three models of osteoarthritic degeneration induced in laboratory rats by anterior cruciate ligament (ACL) transection, meniscectomy (MSX), and intra-articular injection of monoiodoacetate (1 mg) (MIA) were used in this study. Degeneration was induced in the right knee joint; each model group consisted of 12 rats (N = 36). After 8 weeks, the animals were euthanized and knee joints were collected. A custom-made diffuse reflectance NIR probe of 5-mm diameter was placed on the tibial and femoral surfaces, and spectral data were acquired from each specimen in the wave number range of 4,000 to 12,500 cm(-1). After spectral data acquisition, the specimens were fixed and safranin O staining (SOS) was performed to assess disease severity based on the Mankin scoring system. Using multivariate statistical analysis, with spectral preprocessing and wavelength selection technique, the spectral data were then correlated to the structural integrity (SI), cellularity (CEL), and matrix staining (SOS) components of the Mankin score for all the samples tested. RESULTS ACL models showed mild cartilage degeneration, MSX models had moderate degeneration, and MIA models showed severe cartilage degenerative changes both morphologically and histologically. Our results reveal significant linear correlations between the NIR absorption spectra and SI (R(2) = 94.78%), CEL (R(2) = 88.03%), and SOS (R(2) = 96.39%) parameters of all samples in the models. In addition, clustering of the samples according to their level of degeneration, with respect to the Mankin components, was also observed. CONCLUSIONS NIR spectroscopic probing of articular cartilage can potentially provide critical information about the health of articular cartilage matrix in early and advanced stages of osteoarthritis (OA). CLINICAL RELEVANCE This rapid nondestructive method can facilitate clinical appraisal of articular cartilage integrity during arthroscopic surgery.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we summarize our recent work in analyz- ing and predicting behaviors in sports using spatiotemporal data. We specifically focus on two recent works: 1) Predicting the location of shot in tennis using Hawk-Eye tennis data, and 2) Clustering spatiotemporal plays in soccer to discover the methods in which they get a shot on goal from a professional league.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper outlines the approach taken by the Speech, Audio, Image and Video Technologies laboratory, and the Applied Data Mining Research Group (SAIVT-ADMRG) in the 2014 MediaEval Social Event Detection (SED) task. We participated in the event based clustering subtask (subtask 1), and focused on investigating the incorporation of image features as another source of data to aid clustering. In particular, we developed a descriptor based around the use of super-pixel segmentation, that allows a low dimensional feature that incorporates both colour and texture information to be extracted and used within the popular bag-of-visual-words (BoVW) approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Transit passenger market segmentation enables transit operators to target different classes of transit users for targeted surveys and various operational and strategic planning improvements. However, the existing market segmentation studies in the literature have been generally done using passenger surveys, which have various limitations. The smart card (SC) data from an automated fare collection system facilitate the understanding of the multiday travel pattern of transit passengers and can be used to segment them into identifiable types of similar behaviors and needs. This paper proposes a comprehensive methodology for passenger segmentation solely using SC data. After reconstructing the travel itineraries from SC transactions, this paper adopts the density-based spatial clustering of application with noise (DBSCAN) algorithm to mine the travel pattern of each SC user. An a priori market segmentation approach then segments transit passengers into four identifiable types. The methodology proposed in this paper assists transit operators to understand their passengers and provides them oriented information and services.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There is a wide range of potential study designs for intervention studies to decrease nosocomial infections in hospitals. The analysis is complex due to competing events, clustering, multiple timescales and time-dependent period and intervention variables. This review considers the popular pre-post quasi-experimental design and compares it with randomized designs. Randomization can be done in several ways: randomization of the cluster [intensive care unit (ICU) or hospital] in a parallel design; randomization of the sequence in a cross-over design; and randomization of the time of intervention in a stepped-wedge design. We introduce each design in the context of nosocomial infections and discuss the designs with respect to the following key points: bias, control for nonintervention factors, and generalizability. Statistical issues are discussed. A pre-post-intervention design is often the only choice that will be informative for a retrospective analysis of an outbreak setting. It can be seen as a pilot study with further, more rigorous designs needed to establish causality. To yield internally valid results, randomization is needed. Generally, the first choice in terms of the internal validity should be a parallel cluster randomized trial. However, generalizability might be stronger in a stepped-wedge design because a wider range of ICU clinicians may be convinced to participate, especially if there are pilot studies with promising results. For analysis, the use of extended competing risk models is recommended.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole - term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR. In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering. Copyright 2014 ACM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this work, we present the challenges associated with the two-way recommendation methods in social networks and the solutions. We discuss them from the perspective of community-type social networks such as online dating networks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Samples of Forsythia suspensa from raw (Laoqiao) and ripe (Qingqiao) fruit were analyzed with the use of HPLC-DAD and the EIS-MS techniques. Seventeen peaks were detected, and of these, twelve were identified. Most were related to the glucopyranoside molecular fragment. Samples collected from three geographical areas (Shanxi, Henan and Shandong Provinces), were discriminated with the use of hierarchical clustering analysis (HCA), discriminant analysis (DA), and principal component analysis (PCA) models, but only PCA was able to provide further information about the relationships between objects and loadings; eight peaks were related to the provinces of sample origin. The supervised classification models-K-nearest neighbor (KNN), least squares support vector machines (LS-SVM), and counter propagation artificial neural network (CP-ANN) methods, indicated successful classification but KNN produced 100% classification rate. Thus, the fruit were discriminated on the basis of their places of origin.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Due to their unobtrusive nature, vision-based approaches to tracking sports players have been preferred over wearable sensors as they do not require the players to be instrumented for each match. Unfortunately however, due to the heavy occlusion between players, variation in resolution and pose, in addition to fluctuating illumination conditions, tracking players continuously is still an unsolved vision problem. For tasks like clustering and retrieval, having noisy data (i.e. missing and false player detections) is problematic as it generates discontinuities in the input data stream. One method of circumventing this issue is to use an occupancy map, where the field is discretised into a series of zones and a count of player detections in each zone is obtained. A series of frames can then be concatenated to represent a set-play or example of team behaviour. A problem with this approach though is that the compressibility is low (i.e. the variability in the feature space is incredibly high). In this paper, we propose the use of a bilinear spatiotemporal basis model using a role representation to clean-up the noisy detections which operates in a low-dimensional space. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed high-definition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the-art real-time player detector and compare it to manually labeled data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objectives To determine the frequency and types of stressful events experienced by urban Aboriginal and Torres Strait Islander children, and to explore the relationship between these experiences and the children’s physical health and parental concerns about their behaviour and learning ability. Design, setting and participants Cross-sectional study of Aboriginal and Torres Strait Islander children aged ≤ 14 years presenting to an urban Indigenous primary health care service in Brisbane for annual child health checks between March 2007 and March 2010. Main outcome measures Parental or carer report of stressful events ever occurring in the family that may have affected the child. Results Of 344 participating children, 175 (51%) had experienced at least one stressful event. Reported events included the death of a family member or close friend (40; 23%), parental divorce or separation (28; 16%), witness to violence or abuse (20; 11%), or incarceration of a family member (7; 4%). These children were more likely to have parents or carers concerned about their behaviour (P < 0.001) and to have a history of ear (P < 0.001) or skin (P = 0.003) infections. Conclusions Children who had experienced stressful events had poorer physical health and more parental concern about behavioural issues than those who had not. Parental disclosure in the primary health care setting of stressful events that have affected the child necessitates appropriate medical, psychological or social interventions to ameliorate both the immediate and potential lifelong negative impact. However, treating the impact of stressful events is insufficient without dealing with the broader political and societal issues that result in a clustering of stressful events in the Aboriginal and Torres Strait Islander population.