42 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract: Purpose – This paper aims to document women's reflections on their careers over a ten-year period to provide quantitative baseline data on which to frame follow-up in-depth interviews. The participants work in the public service in Queensland (Australia) and had been recommended for, and participated in, women in management (WIM) courses conducted in the early 1990s. Design/methodology/approach – Data were collected by means of a survey (containing closed and open items) which gathered demographic data and data related to employment history, perceptions of success and satisfaction, and the women's future career expectations. Findings – Findings revealed that the percentage of women in middle and senior management had increased over the ten-year period, although not to the extent one might have anticipated, given that the women had been targeted as high flyers by their supervisors. While not content with their classification levels (i.e. seniority), the majority of the cohort viewed their careers as being successful. Practical implications – Questions arise from this study as to why women are still “not getting to the top”. There are also policy implications for the public service concerning women's possible “reinventive contribution” and training implications associated with women only courses. Originality/value – The study is part of an Australian longitudinal study on the careers of women who attended a prestigious women-only management course in the early 1990s in Queensland. This is now becoming a study of older women.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: To develop a standard weight descriptor that can be used for estimation of patient size for obese patients. Patients and methods: Data were available from 3849 patients: 2839 from oncology patients (index data set) and 1010 from general medical patients (validation data set). The patients had a wide range of age (16-100 years), weight (25-165kg) and body mass index (BMI) [12-52 kg/m(2)] in both data sets. From the normal-weight patients in the oncology data set, an equation for male and female patients was developed to predict their normal weight as the sum of the lean body mass and normal fat body mass. The equations were evaluated by predicting the weight of patients in the general medical data set who had a normal BMI (30 kg/m(2)).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Examples from the Murray-Darling basin in Australia are used to illustrate different methods of disaggregation of reconnaissance-scale maps. One approach for disaggregation revolves around the de-convolution of the soil-landscape paradigm elaborated during a soil survey. The descriptions of soil ma units and block diagrams in a soil survey report detail soil-landscape relationships or soil toposequences that can be used to disaggregate map units into component landscape elements. Toposequences can be visualised on a computer by combining soil maps with digital elevation data. Expert knowledge or statistics can be used to implement the disaggregation. Use of a restructuring element and k-means clustering are illustrated. Another approach to disaggregation uses training areas to develop rules to extrapolate detailed mapping into other, larger areas where detailed mapping is unavailable. A two-level decision tree example is presented. At one level, the decision tree method is used to capture mapping rules from the training area; at another level, it is used to define the domain over which those rules can be extrapolated. (C) 2001 Elsevier Science B.V. All rights reserved.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper proposes a novel application of fuzzy logic to web data mining for two basic problems of a website: popularity and satisfaction. Popularity means that people will visit the website while satisfaction refers to the usefulness of the site. We will illustrate that the popularity of a website is a fuzzy logic problem. It is an important characteristic of a website in order to survive in Internet commerce. The satisfaction of a website is also a fuzzy logic problem that represents the degree of success in the application of information technology to the business. We propose a framework of fuzzy logic for the representation of these two problems based on web data mining techniques to fuzzify the attributes of a website.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The RKKEE cluster of charged residues located within the cytoplasmic helix of the bacterial mechanosensitive channel, MscL, is essential for the channel function. The structure of MscL determined by x-ray crystallography and electron paramagnetic resonance spectroscopy has revealed discrepancies toward the C-terminus suggesting that the structure of the C-terminal helical bundle differs depending on the pH of the cytoplasm. In this study we examined the effect of pH as well as charge reversal and residue substitution within the RKKEE cluster on the mechanosensitivity of Escherichia coli MscL reconstituted into liposomes using the patch-clamp technique. Protonation of either positively or negatively charged residues within the cluster, achieved by changing the experimental pH or residue substitution within the RKKEE cluster, significantly increased the free energy of activation for the MscL channel due to an increase in activation pressure. Our data suggest that the orientation of the C-terminal helices relative to the aqueous medium is pH dependent, indicating that the RKKEE cluster functions as a proton sensor by adjusting the channel sensitivity to membrane tension in a pH-dependent fashion. A possible implication of our results for the physiology of bacterial cells is briefly discussed.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Background & Aims: Steatosis is a frequent histologic finding in chronic hepatitis C (CHC), but it is unclear whether steatosis is an independent predictor for liver fibrosis. We evaluated the association between steatosis and fibrosis and their common correlates in persons with CHC and in subgroup analyses according to hepatitis C virus (HCV) genotype and body mass index. Methods: We conducted a meta-analysis on individual data from 3068 patients with histologically confirmed CHC recruited from 10 clinical centers in Italy, Switzerland, France, Australia, and the United States. Results: Steatosis was present in 1561 patients (50.9%) and fibrosis in 2688 (87.6%). HCV genotype was 1 in :1694 cases (55.2%), 2 in 563 (18.4%), 3 in 669 (21.8%), and 4 in :142 (4.6%). By stepwise logistic regression, steatosis was associated independently with genotype 3, the presence of fibrosis, diabetes, hepatic inflammation, ongoing alcohol abuse, higher body mass index, and older age. Fibrosis was associated independently with inflammatory activity, steatosis, male sex, and older age, whereas HCV genotype 2 was associated with reduced fibrosis. In the subgroup analyses, the association between steatosis and fibrosis invariably was dependent on a simultaneous association between steatosis and hepatic inflammation. Conclusions: In this large and geographically different group of CHC patients, steatosis is confirmed as significantly and independently associated with fibrosis in CHC. Hepatic inflammation may mediate fibrogenesis in patients with liver steatosis. Control of metabolic factors (such as overweight, via lifestyle adjustments) appears important in the management of CHC.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Quality of life has been shown to be poor among people living with chronic hepatitis C However, it is not clear how this relates to the presence of symptoms and their severity. The aim of this study was to describe the typology of a broad array of symptoms that were attributed to hepatitis C virus (HCV) infection. Phase I used qualitative methods to identify symptoms. In Phase 2, 188 treatment-naive people living with HCV participated in a quantitative survey. The most prevalent symptom was physical tiredness (86%) followed by irritability (75%), depression (70%), mental tiredness (70%), and abdominal pain (68%). Temporal clustering of symptoms was reported in 62% of participants. Principal components analysis identified four symptom clusters: neuropsychiatric (mental tiredness, poor concentration, forgetfulness, depression, irritability, physical tiredness, and sleep problems); gastrointestinal (day sweats, nausea, food intolerance, night sweats, abdominal pain, poor appetite, and diarrhea); algesic (joint pain, muscle pain, and general body pain); and dysesthetic (noise sensitivity, light sensitivity, skin. problems, and headaches). These data demonstrate that symptoms are prevalent in treatment-naive people with HCV and support the hypothesis that symptom clustering occurs.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Las Campanas Observatory and Anglo-Australian Telescope Rich Cluster Survey (LARCS) is a panoramic imaging and spectroscopic survey of an X-ray luminosity-selected sample of 21 clusters of galaxies at 0.07 < z < 0.16. Charge-coupled device (CCD) imaging was obtained in B and R of typically 2 degrees wide regions centred on the 21 clusters, and the galaxy sample selected from the imaging is being used for an on-going spectroscopic survey of the clusters with the 2dF spectrograph on the Anglo-Australian Telescope. This paper presents the reduction of the imaging data and the photometric analysis used in the survey. Based on an overlapping area of 12.3 deg(2) we compare the CCD-based LARCS catalogue with the photographic-based galaxy catalogue used for the input to the 2dF Galaxy Redshift Survey (2dFGRS) from the APM, to the completeness of the GRS/APM catalogue, b(J) = 19.45. This comparison confirms the reliability of the photometry across our mosaics and between the clusters in our survey. This comparison also provides useful information concerning the properties of the GRS/APM. The stellar contamination in the GRS/APM galaxy catalogue is confirmed as around 5-10 per cent, as originally estimated. However, using the superior sensitivity and spatial resolution in the LARCS survey evidence is found for four distinct populations of galaxies that are systematically omitted from the GRS/APM catalogue. The characteristics of the 'missing' galaxy populations are described, reasons for their absence examined and the impact they will have on the conclusions drawn from the 2dF Galaxy Redshift Survey are discussed.