925 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices
Resumo:
Two deep ice cores from central Greenland, drilled in the 1990s, have played a key role in climate reconstructions of the Northern Hemisphere, but the oldest sections of the cores were disturbed in chronology owing to ice folding near the bedrock. Here we present an undisturbed climate record from a North Greenland ice core, which extends back to 123,000 years before the present, within the last interglacial period. The oxygen isotopes in the ice imply that climate was stable during the last interglacial period, with temperatures 5 °C warmer than today. We find unexpectedly large temperature differences between our new record from northern Greenland and the undisturbed sections of the cores from central Greenland, suggesting that the extent of ice in the Northern Hemisphere modulated the latitudinal temperature gradients in Greenland. This record shows a slow decline in temperatures that marked the initiation of the last glacial period. Our record reveals a hitherto unrecognized warm period initiated by an abrupt climate warming about 115,000 years ago, before glacial conditions were fully developed. This event does not appear to have an immediate Antarctic counterpart, suggesting that the climate see-saw between the hemispheres (which dominated the last glacial period) was not operating at this time.
Resumo:
"May 1980."
Resumo:
"GAO-02-224."
Resumo:
This paper proposes a novel application of fuzzy logic to web data mining for two basic problems of a website: popularity and satisfaction. Popularity means that people will visit the website while satisfaction refers to the usefulness of the site. We will illustrate that the popularity of a website is a fuzzy logic problem. It is an important characteristic of a website in order to survive in Internet commerce. The satisfaction of a website is also a fuzzy logic problem that represents the degree of success in the application of information technology to the business. We propose a framework of fuzzy logic for the representation of these two problems based on web data mining techniques to fuzzify the attributes of a website.
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
The RKKEE cluster of charged residues located within the cytoplasmic helix of the bacterial mechanosensitive channel, MscL, is essential for the channel function. The structure of MscL determined by x-ray crystallography and electron paramagnetic resonance spectroscopy has revealed discrepancies toward the C-terminus suggesting that the structure of the C-terminal helical bundle differs depending on the pH of the cytoplasm. In this study we examined the effect of pH as well as charge reversal and residue substitution within the RKKEE cluster on the mechanosensitivity of Escherichia coli MscL reconstituted into liposomes using the patch-clamp technique. Protonation of either positively or negatively charged residues within the cluster, achieved by changing the experimental pH or residue substitution within the RKKEE cluster, significantly increased the free energy of activation for the MscL channel due to an increase in activation pressure. Our data suggest that the orientation of the C-terminal helices relative to the aqueous medium is pH dependent, indicating that the RKKEE cluster functions as a proton sensor by adjusting the channel sensitivity to membrane tension in a pH-dependent fashion. A possible implication of our results for the physiology of bacterial cells is briefly discussed.
Resumo:
Background & Aims: Steatosis is a frequent histologic finding in chronic hepatitis C (CHC), but it is unclear whether steatosis is an independent predictor for liver fibrosis. We evaluated the association between steatosis and fibrosis and their common correlates in persons with CHC and in subgroup analyses according to hepatitis C virus (HCV) genotype and body mass index. Methods: We conducted a meta-analysis on individual data from 3068 patients with histologically confirmed CHC recruited from 10 clinical centers in Italy, Switzerland, France, Australia, and the United States. Results: Steatosis was present in 1561 patients (50.9%) and fibrosis in 2688 (87.6%). HCV genotype was 1 in :1694 cases (55.2%), 2 in 563 (18.4%), 3 in 669 (21.8%), and 4 in :142 (4.6%). By stepwise logistic regression, steatosis was associated independently with genotype 3, the presence of fibrosis, diabetes, hepatic inflammation, ongoing alcohol abuse, higher body mass index, and older age. Fibrosis was associated independently with inflammatory activity, steatosis, male sex, and older age, whereas HCV genotype 2 was associated with reduced fibrosis. In the subgroup analyses, the association between steatosis and fibrosis invariably was dependent on a simultaneous association between steatosis and hepatic inflammation. Conclusions: In this large and geographically different group of CHC patients, steatosis is confirmed as significantly and independently associated with fibrosis in CHC. Hepatic inflammation may mediate fibrogenesis in patients with liver steatosis. Control of metabolic factors (such as overweight, via lifestyle adjustments) appears important in the management of CHC.
Resumo:
Quality of life has been shown to be poor among people living with chronic hepatitis C However, it is not clear how this relates to the presence of symptoms and their severity. The aim of this study was to describe the typology of a broad array of symptoms that were attributed to hepatitis C virus (HCV) infection. Phase I used qualitative methods to identify symptoms. In Phase 2, 188 treatment-naive people living with HCV participated in a quantitative survey. The most prevalent symptom was physical tiredness (86%) followed by irritability (75%), depression (70%), mental tiredness (70%), and abdominal pain (68%). Temporal clustering of symptoms was reported in 62% of participants. Principal components analysis identified four symptom clusters: neuropsychiatric (mental tiredness, poor concentration, forgetfulness, depression, irritability, physical tiredness, and sleep problems); gastrointestinal (day sweats, nausea, food intolerance, night sweats, abdominal pain, poor appetite, and diarrhea); algesic (joint pain, muscle pain, and general body pain); and dysesthetic (noise sensitivity, light sensitivity, skin. problems, and headaches). These data demonstrate that symptoms are prevalent in treatment-naive people with HCV and support the hypothesis that symptom clustering occurs.
Resumo:
In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005
Resumo:
PURPOSE: Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset, using a large, international database. METHODS: The database includes 4037 patients with a diagnosis of bipolar I disorder, previously collected at 36 collection sites in 23 countries. Generalized estimating equations (GEE) were used to adjust the data for country median age, and in some models, birth cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. RESULTS: There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After adjusting for the birth cohort or when considering only those born after 1959, two subgroups were found. With results of either two or three subgroups, the youngest subgroup was more likely to have a family history of mood disorders and a first episode with depressed polarity. However, without adjusting for birth cohort (three subgroups), family history and polarity of the first episode could not be distinguished between the middle and oldest subgroups. CONCLUSION: These results using international data confirm prior findings using single country data, that there are subgroups of bipolar I disorder based on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more useful for research.
Resumo:
Funding and trial registration: Scottish Government Chief Scientist Office grant CZH/3/17. ClinicalTrials.gov registration NCT01602705.
Resumo:
The design of reverse logistics networks has now emerged as a major issue for manufacturers, not only in developed countries where legislation and societal pressures are strong, but also in developing countries where the adoption of reverse logistics practices may offer a competitive advantage. This paper presents a new model for partner selection for reverse logistic centres in green supply chains. The model offers three advantages. Firstly, it enables economic, environment, and social factors to be considered simultaneously. Secondly, by integrating fuzzy set theory and artificial immune optimization technology, it enables both quantitative and qualitative criteria to be considered simultaneously throughout the whole decision-making process. Thirdly, it extends the flat criteria structure for partner selection evaluation for reverse logistics centres to the more suitable hierarchy structure. The applicability of the model is demonstrated by means of an empirical application based on data from a Chinese electronic equipment and instruments manufacturing company.
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
Resumo:
Rigid adherence to pre-specified thresholds and static graphical representations can lead to incorrect decisions on merging of clusters. As an alternative to existing automated or semi-automated methods, we developed a visual analytics approach for performing hierarchical clustering analysis of short time-series gene expression data. Dynamic sliders control parameters such as the similarity threshold at which clusters are merged and the level of relative intra-cluster distinctiveness, which can be used to identify "weak-edges" within clusters. An expert user can drill down to further explore the dendrogram and detect nested clusters and outliers. This is done by using the sliders and by pointing and clicking on the representation to cut the branches of the tree in multiple-heights. A prototype of this tool has been developed in collaboration with a small group of biologists for analysing their own datasets. Initial feedback on the tool has been positive.