997 resultados para complete linkage clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Doutoramento em Economia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering of big data has received much attention recently. In this paper, we present a new clusiVAT algorithm and compare it with four other popular data clustering algorithms. Three of the four comparison methods are based on the well known, classical batch k-means model. Specifically, we use k-means, single pass k-means, online k-means, and clustering using representatives (CURE) for numerical comparisons. clusiVAT is based on sampling the data, imaging the reordered distance matrix to estimate the number of clusters in the data visually, clustering the samples using a relative of single linkage (SL), and then noniteratively extending the labels to the rest of the data-set using the nearest prototype rule. Previous work has established that clusiVAT produces true SL clusters in compact-separated data. We have performed experiments to show that k-means and its modified algorithms suffer from initialization issues that cause many failures. On the other hand, clusiVAT needs no initialization, and almost always finds partitions that accurately match ground truth labels in labeled data. CURE also finds SL type partitions but is much slower than the other four algorithms. In our experiments, clusiVAT proves to be the fastest and most accurate of the five algorithms; e.g., it recovers 97% of the ground truth labels in the real world KDD-99 cup data (4 292 637 samples in 41 dimensions) in 76 s.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The complete mitochondrial genome and a set of polymorphic microsatellite markers were identified by 454 pyrosequencing (1/16th of a plate) for the New Caledonian rainforest spider-ant Leptomyrmex pallens. De novo genome assembly recovered the entire mitochondrial genome with mean coverage of 8.9-fold (range 1-27). The mitogenome consists of 15,591 base pairs including 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a non-coding AT-rich region. The genome arrangement is typical of insect taxa and very similar to the only other published ant mitogenome from the Solenopsis genus, with the main differences consisting of translocations and inversions of tRNAs. A total of 13 polymorphic loci were also characterized using 41 individuals from a single population in the Aoupinié region, corresponding to workers from 21 nests and 16 foraging workers. We observed moderate genetic variation across most loci (mean number of alleles per locus = 4.50; mean expected heterozygosity = 0.53) with evidence of only two loci deviating significantly from Hardy-Weinberg equilibrium due to null alleles. Marker independence was confirmed with tests for linkage disequilibrium. Most loci cross amplified for three additional Leptomyrmex species. The annotation of the mitogenome and characterization of microsatellite markers will provide useful tools for assessing the colony structure, population genetic patterns, and dispersal strategy of L. pallens in the context of rainforest fragmentation in New Caledonia. Furthermore, this paper confirms a recent line of evidence that comprehensive mitochondrial data can be obtained relatively easily from small next-generation sequencing analyses. Greater synthesis of next-generation sequencing data will play a significant role in expanding the taxonomic representation of mitochondrial genome sequences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A suite of polymorphic microsatellite markers and the complete mitochondrial genome sequence was developed by next generation sequencing (NGS) for the critically endangered orange-bellied parrot, Neophema chrysogaster. A total of 14 polymorphic loci were identified and characterized using DNA extractions representing 40 individuals from Melaleuca, Tasmania, sampled in 2002. We observed moderate genetic variation across most loci (mean number of alleles per locus = 2.79; mean expected heterozygosity = 0.53) with no evidence of individual loci deviating significantly from Hardy-Weinberg equilibrium. Marker independence was confirmed with tests for linkage disequilibrium, and analyses indicated no evidence of null alleles across loci. De novo and reference-based genome assemblies performed using MIRA were used to assemble the N. chrysogaster mitochondrial genome sequence with mean coverage of 116-fold (range 89 to 142-fold). The mitochondrial genome consists of 18,034 base pairs, and a typical metazoan mitochondrial gene content consisting of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a single large non-coding region (control region). The arrangement of mitochondrial genes is also typical of Avian taxa. The annotation of the mitochondrial genome and the characterization of 14 microsatellite markers provide a valuable resource for future genetic monitoring of wild and captive N. chrysogaster populations. As found previously, NGS provides a rapid, low cost and reliable method for polymorphic nuclear genetic marker development and determining complete mitochondrial genome sequences when only a fraction of a genome is sequenced.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diese Studie untersucht Gruppen von Ortsnamen in Deutschland (in den Postleitregionen) nach vorhandenen Ähnlichkeiten. Als Messgröße wird ein Häufigkeitsvektor von Trigrammen in jeder Gruppe herangezogen. Mit der Anwendung des Average Linkage-Algorithmus auf die Messgröße werden Cluster aus räumlich zusammenhängenden Gebieten gebildet, obwohl das Verfahren keine Kenntnis über die Lage der Cluster zueinander besitzt. In den Clustern werden die zehn häufigsten n-Gramme ermittelt, um charakteristische Wortpartikel darzustellen. Die von den Clustern umschriebenen Gebiete lassen sich zwanglos durch historische oder linguistische Entwicklungen erklären. Das hier verwendete Verfahren setzt jedoch kein linguistisches, geographisches oder historisches Wissen voraus, ermöglicht aber die Gruppierung von Namen in eindeutiger Weise unter Berücksichtigung einer Vielzahl von Wortpartikeln in einem Schritt. Die Vorgehensweise ohne Vorwissen unterscheidet diese Studie von den meisten bisher angewendeten Untersuchungen.

Relevância:

20.00% 20.00%

Publicador: