4 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices

em Digital Commons - Michigan Tech


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Virtually every sector of business and industry that uses computing, including financial analysis, search engines, and electronic commerce, incorporate Big Data analysis into their business model. Sophisticated clustering algorithms are popular for deducing the nature of data by assigning labels to unlabeled data. We address two main challenges in Big Data. First, by definition, the volume of Big Data is too large to be loaded into a computer’s memory (this volume changes based on the computer used or available, but there is always a data set that is too large for any computer). Second, in real-time applications, the velocity of new incoming data prevents historical data from being stored and future data from being accessed. Therefore, we propose our Streaming Kernel Fuzzy c-Means (stKFCM) algorithm, which reduces both computational complexity and space complexity significantly. The proposed stKFCM only requires O(n2) memory where n is the (predetermined) size of a data subset (or data chunk) at each time step, which makes this algorithm truly scalable (as n can be chosen based on the available memory). Furthermore, only 2n2 elements of the full N × N (where N >> n) kernel matrix need to be calculated at each time-step, thus reducing both the computation time in producing the kernel elements and also the complexity of the FCM algorithm. Empirical results show that stKFCM, even with relatively very small n, can provide clustering performance as accurately as kernel fuzzy c-means run on the entire data set while achieving a significant speedup.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fuzzy community detection is to identify fuzzy communities in a network, which are groups of vertices in the network such that the membership of a vertex in one community is in [0,1] and that the sum of memberships of vertices in all communities equals to 1. Fuzzy communities are pervasive in social networks, but only a few works have been done for fuzzy community detection. Recently, a one-step forward extension of Newman’s Modularity, the most popular quality function for disjoint community detection, results into the Generalized Modularity (GM) that demonstrates good performance in finding well-known fuzzy communities. Thus, GMis chosen as the quality function in our research. We first propose a generalized fuzzy t-norm modularity to investigate the effect of different fuzzy intersection operators on fuzzy community detection, since the introduction of a fuzzy intersection operation is made feasible by GM. The experimental results show that the Yager operator with a proper parameter value performs better than the product operator in revealing community structure. Then, we focus on how to find optimal fuzzy communities in a network by directly maximizing GM, which we call it Fuzzy Modularity Maximization (FMM) problem. The effort on FMM problem results into the major contribution of this thesis, an efficient and effective GM-based fuzzy community detection method that could automatically discover a fuzzy partition of a network when it is appropriate, which is much better than fuzzy partitions found by existing fuzzy community detection methods, and a crisp partition of a network when appropriate, which is competitive with partitions resulted from the best disjoint community detections up to now. We address FMM problem by iteratively solving a sub-problem called One-Step Modularity Maximization (OSMM). We present two approaches for solving this iterative procedure: a tree-based global optimizer called Find Best Leaf Node (FBLN) and a heuristic-based local optimizer. The OSMM problem is based on a simplified quadratic knapsack problem that can be solved in linear time; thus, a solution of OSMM can be found in linear time. Since the OSMM algorithm is called within FBLN recursively and the structure of the search tree is non-deterministic, we can see that the FMM/FBLN algorithm runs in a time complexity of at least O (n2). So, we also propose several highly efficient and very effective heuristic algorithms namely FMM/H algorithms. We compared our proposed FMM/H algorithms with two state-of-the-art community detection methods, modified MULTICUT Spectral Fuzzy c-Means (MSFCM) and Genetic Algorithm with a Local Search strategy (GALS), on 10 real-world data sets. The experimental results suggest that the H2 variant of FMM/H is the best performing version. The H2 algorithm is very competitive with GALS in producing maximum modularity partitions and performs much better than MSFCM. On all the 10 data sets, H2 is also 2-3 orders of magnitude faster than GALS. Furthermore, by adopting a simply modified version of the H2 algorithm as a mutation operator, we designed a genetic algorithm for fuzzy community detection, namely GAFCD, where elite selection and early termination are applied. The crossover operator is designed to make GAFCD converge fast and to enhance GAFCD’s ability of jumping out of local minimums. Experimental results on all the data sets show that GAFCD uncovers better community structure than GALS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A non-hierarchical K-means algorithm is used to cluster 47 years (1960–2006) of 10-day HYSPLIT backward trajectories to the Pico Mountain (PM) observatory on a seasonal basis. The resulting cluster centers identify the major transport pathways and collectively comprise a long-term climatology of transport to the observatory. The transport climatology improves our ability to interpret the observations made there and our understanding of pollution source regions to the station and the central North Atlantic region. I determine which pathways dominate transport to the observatory and examine the impacts of these transport patterns on the O3, NOy, NOx, and CO measurements made there during 2001–2006. Transport from the U.S., Canada, and the Atlantic most frequently reaches the station, but Europe, east Africa, and the Pacific can also contribute significantly depending on the season. Transport from Canada was correlated with the North Atlantic Oscillation (NAO) in spring and winter, and transport from the Pacific was uncorrelated with the NAO. The highest CO and O3 are observed during spring. Summer is also characterized by high CO and O3 and the highest NOy and NOx of any season. Previous studies at the station attributed the summer time high CO and O3 to transport of boreal wildfire emissions (for 2002–2004), and boreal fires continued to affect the station during 2005 and 2006. The particle dispersion model FLEXPART was used to calculate anthropogenic and biomass-burning CO tracer values at the station in an attempt to identify the regions responsible for the high CO and O3 observations during spring and biomass-burning impacts in summer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In 1969, Lovasz asked whether every connected, vertex-transitive graph has a Hamilton path. This question has generated a considerable amount of interest, yet remains vastly open. To date, there exist no known connected, vertex-transitive graph that does not possess a Hamilton path. For the Cayley graphs, a subclass of vertex-transitive graphs, the following conjecture was made: Weak Lovász Conjecture: Every nontrivial, finite, connected Cayley graph is hamiltonian. The Chen-Quimpo Theorem proves that Cayley graphs on abelian groups flourish with Hamilton cycles, thus prompting Alspach to make the following conjecture: Alspach Conjecture: Every 2k-regular, connected Cayley graph on a finite abelian group has a Hamilton decomposition. Alspach’s conjecture is true for k = 1 and 2, but even the case k = 3 is still open. It is this case that this thesis addresses. Chapters 1–3 give introductory material and past work on the conjecture. Chapter 3 investigates the relationship between 6-regular Cayley graphs and associated quotient graphs. A proof of Alspach’s conjecture is given for the odd order case when k = 3. Chapter 4 provides a proof of the conjecture for even order graphs with 3-element connection sets that have an element generating a subgroup of index 2, and having a linear dependency among the other generators. Chapter 5 shows that if Γ = Cay(A, {s1, s2, s3}) is a connected, 6-regular, abelian Cayley graph of even order, and for some1 ≤ i ≤ 3, Δi = Cay(A/(si), {sj1 , sj2}) is 4-regular, and Δi ≄ Cay(ℤ3, {1, 1}), then Γ has a Hamilton decomposition. Alternatively stated, if Γ = Cay(A, S) is a connected, 6-regular, abelian Cayley graph of even order, then Γ has a Hamilton decomposition if S has no involutions, and for some s ∈ S, Cay(A/(s), S) is 4-regular, and of order at least 4. Finally, the Appendices give computational data resulting from C and MAGMA programs used to generate Hamilton decompositions of certain non-isomorphic Cayley graphs on low order abelian groups.