891 resultados para Fuzzy C-Means clustering
Resumo:
We use quantitative X-ray diffraction to determine the mineralogy of late Quaternary marine sediments from the West and East Greenland shelves offshore from early Tertiary basalt outcrops. Despite the similar basalt outcrop area (60 000-70 000 km**2), there are significant differences between East and West Greenland sediments in the fraction of minerals (e.g. pyroxene) sourced from the basalt outcrops. We demonstrate the differences in the mineralogy between East and West Greenland marine sediments on three scales: (1) modern day, (2) late Quaternary inputs and (3) detailed down-core variations in 10 cores from the two margins. On the East Greenland Shelf (EGS), late Quaternary samples have an average quartz weight per cent of 6.2 ± 2.3 versus 12.8 ± 3.9 from the West Greenland Shelf (WGS), and 12.02 ± 4.8 versus 1.9 ± 2.3 wt% for pyroxene. K-means clustering indicated only 9% of the samples did not fit a simple EGS vs. WGS dichotomy. Sediments from the EGS and WGS are also isotopically distinct, with the EGS having higher eNd (-18 to 4) than those from the WGS (eNd = -25 to -35). We attribute the striking dichotomy in sediment composition to fundamentally different long-term Quaternary styles of glaciation on the two basalt outcrops.
Resumo:
Two series of volume numberings are used: Memorie ecclesiastiche di Città di Castello, v. [1]-5; Memorie civili di Città di Castello, v.1-2 [i.e., v. 6-7]
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
Collaborative recommendation is one of widely used recommendation systems, which recommend items to visitor on a basis of referring other's preference that is similar to current user. User profiling technique upon Web transaction data is able to capture such informative knowledge of user task or interest. With the discovered usage pattern information, it is likely to recommend Web users more preferred content or customize the Web presentation to visitors via collaborative recommendation. In addition, it is helpful to identify the underlying relationships among Web users, items as well as latent tasks during Web mining period. In this paper, we propose a Web recommendation framework based on user profiling technique. In this approach, we employ Probabilistic Latent Semantic Analysis (PLSA) to model the co-occurrence activities and develop a modified k-means clustering algorithm to build user profiles as the representatives of usage patterns. Moreover, the hidden task model is derived by characterizing the meaningful latent factor space. With the discovered user profiles, we then choose the most matched profile, which possesses the closely similar preference to current user and make collaborative recommendation based on the corresponding page weights appeared in the selected user profile. The preliminary experimental results performed on real world data sets show that the proposed approach is capable of making recommendation accurately and efficiently.
Resumo:
We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences. © 2008. Licensed under the Creative Commons.
Resumo:
A likviditás mérésére többféle mutató terjedt el, amelyek a likviditás jelenségét különböző szempontok alapján számszerűsítik. A cikk a szakirodalom által javasolt, különféle likviditási mutatókat elemzi sokdimenziós statisztikai módszerekkel: főkomponens-elemzés segítségével keresünk olyan faktorokat, amelyek legjobban tömörítik a likviditási jellemzőket, majd megnézzük, hogy az egyes mutatók milyen mértékben mozognak együtt a faktorokkal, illetve a korrelációk alapján klaszterezési eljárással keresünk hasonló tulajdonságokkal bíró csoportokat. Arra keressük a választ, hogy a rendelkezésünkre álló minta elemzésével kialakított változócsoportok egybeesnek-e a likviditás egyes aspektusaihoz kapcsolt mutatókkal, valamint meghatározhatók-e olyan összetett likviditási mérőszámok, amelyeknek a segítségével a likviditás jelensége több dimenzióban mérhető. / === / Liquidity is measured from different aspects (e.g. tightness, depth, and resiliency) by different ratios. We studied the co-movements and the clustering of different liquidity measures on a sample of the Swiss stock market. We performed a PCA to obtain the main factors that explain the cross-sectional variability of liquidity measures, and we used the k-means clustering methodology to defi ne groups of liquidity measures. Based on our explorative data analysis, we formed clusters of liquidity measures, and we compared the resulting groups with the expectations and intuition. Our modelling methodology provides a framework to analyze the correlation between the different aspects of liquidity as well as a means to defi ne complex liquidity measures.
Resumo:
The recent advent of new technologies has led to huge amounts of genomic data. With these data come new opportunities to understand biological cellular processes underlying hidden regulation mechanisms and to identify disease related biomarkers for informative diagnostics. However, extracting biological insights from the immense amounts of genomic data is a challenging task. Therefore, effective and efficient computational techniques are needed to analyze and interpret genomic data. In this thesis, novel computational methods are proposed to address such challenges: a Bayesian mixture model, an extended Bayesian mixture model, and an Eigen-brain approach. The Bayesian mixture framework involves integration of the Bayesian network and the Gaussian mixture model. Based on the proposed framework and its conjunction with K-means clustering and principal component analysis (PCA), biological insights are derived such as context specific/dependent relationships and nested structures within microarray where biological replicates are encapsulated. The Bayesian mixture framework is then extended to explore posterior distributions of network space by incorporating a Markov chain Monte Carlo (MCMC) model. The extended Bayesian mixture model summarizes the sampled network structures by extracting biologically meaningful features. Finally, an Eigen-brain approach is proposed to analyze in situ hybridization data for the identification of the cell-type specific genes, which can be useful for informative blood diagnostics. Computational results with region-based clustering reveals the critical evidence for the consistency with brain anatomical structure.
Resumo:
Background: Recurrent spontaneous abortion is one of the diseases that can lead to physical, psychological, and, economical problems for both individuals and society. Recently a few numbers of genetic polymorphisms in kinase insert domain-containing receptor (KDR) gene are examined that can endanger the life of the fetus in pregnant women. Objective: The risk of KDR gene polymorphisms was investigated in Iranian women with idiopathic recurrent spontaneous abortion (RSA). Materials and Methods: A case controlled study was performed. One hundred idiopathic recurrent spontaneous abortion patients with at least two consecutive pregnancy losses before 20 weeks of gestational age with normal karyotypes were included in the study. Also, 100 healthy women with at least one natural pregnancy were studied as control group. Two functional SNPs located in KDR gene; rs1870377 (Q472H), and rs2305948 (V297I) as well as one tag SNP in the intron region (rs6838752) were genotyped by using PCR based restriction fragment length polymorphism (PCR-RFLP) technique. Haplotype frequency was determined for these three SNPs’ genotypes. Analysis of genetic STRUCTURE and K means clustering were performed to study genetic variation. Results: Functional SNP (rs1870377) was highly linked to tag SNP (rs6838752) (D´ value=0. 214; χ2 = 16.44, p<0. 001). K means clustering showed that k = 8 as the best fit for the optimal number of genetic subgroups in our studied materials. This result was in agreement with Neighbor Joining cluster analysis. Conclusion: In our study, the allele and genotype frequencies were not associated with RSA between patient and control individuals. Inconsistent results in different populations with different allele frequencies among RSA patients and controls may be due to ethnic variation and used sample size.
Resumo:
The Dirichlet process mixture model (DPMM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPMM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. For example, they would not be practical for digital signal processing on embedded hardware, where computational resources are at a serious premium. Here, we develop a simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithm for DPMMs. This algorithm is as simple as DP-means clustering, solves the MAP problem as well as Gibbs sampling, while requiring only a fraction of the computational effort. (For freely available code that implements the MAP-DP algorithm for Gaussian mixtures see http://www.maxlittle.net/.) Unlike related small variance asymptotics (SVA), our method is non-degenerate and so inherits the “rich get richer” property of the Dirichlet process. It also retains a non-degenerate closed-form likelihood which enables out-of-sample calculations and the use of standard tools such as cross-validation. We illustrate the benefits of our algorithm on a range of examples and contrast it to variational, SVA and sampling approaches from both a computational complexity perspective as well as in terms of clustering performance. We demonstrate the wide applicabiity of our approach by presenting an approximate MAP inference method for the infinite hidden Markov model whose performance contrasts favorably with a recently proposed hybrid SVA approach. Similarly, we show how our algorithm can applied to a semiparametric mixed-effects regression model where the random effects distribution is modelled using an infinite mixture model, as used in longitudinal progression modelling in population health science. Finally, we propose directions for future research on approximate MAP inference in Bayesian nonparametrics.
Resumo:
Long-term monitoring of acoustical environments is gaining popularity thanks to the relevant amount of scientific and engineering insights that it provides. The increasing interest is due to the constant growth of storage capacity and computational power to process large amounts of data. In this perspective, machine learning (ML) provides a broad family of data-driven statistical techniques to deal with large databases. Nowadays, the conventional praxis of sound level meter measurements limits the global description of a sound scene to an energetic point of view. The equivalent continuous level Leq represents the main metric to define an acoustic environment, indeed. Finer analyses involve the use of statistical levels. However, acoustic percentiles are based on temporal assumptions, which are not always reliable. A statistical approach, based on the study of the occurrences of sound pressure levels, would bring a different perspective to the analysis of long-term monitoring. Depicting a sound scene through the most probable sound pressure level, rather than portions of energy, brought more specific information about the activity carried out during the measurements. The statistical mode of the occurrences can capture typical behaviors of specific kinds of sound sources. The present work aims to propose an ML-based method to identify, separate and measure coexisting sound sources in real-world scenarios. It is based on long-term monitoring and is addressed to acousticians focused on the analysis of environmental noise in manifold contexts. The presented method is based on clustering analysis. Two algorithms, Gaussian Mixture Model and K-means clustering, represent the main core of a process to investigate different active spaces monitored through sound level meters. The procedure has been applied in two different contexts: university lecture halls and offices. The proposed method shows robust and reliable results in describing the acoustic scenario and it could represent an important analytical tool for acousticians.
Resumo:
This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process
Resumo:
Abstract: To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.
Resumo:
Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process