Biblioteca Digital

110 resultados para complete linkage clustering

em Deakin Research Online - Australia

Profiling phishing email based on clustering approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, an approach for profiling email-born phishing activities is proposed. Profiling phishing activities are useful in determining the activity of an individual or a particular group of phishers. By generating profiles, phishing activities can be well understood and observed. Typically, work in the area of phishing is intended at detection of phishing emails, whereas we concentrate on profiling the phishing email. We formulate the profiling problem as a clustering problem using the various features in the phishing emails as feature vectors. Further, we generate profiles based on clustering predictions. These predictions are further utilized to generate complete profiles of these emails. The performance of the clustering algorithms at the earlier stage is crucial for the effectiveness of this model. We carried out an experimental evaluation to determine the performance of many classification algorithms by incorporating clustering approach in our model. Our proposed profiling email-born phishing algorithm (ProEP) demonstrates promising results with the RatioSize rules for selecting the optimal number of clusters.

A clustering-based multi-layer distributed ensemble for neurological diagnostics in cloud services

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates the problem of minimizing data transfer between different data centers of the cloud during the neurological diagnostics of cardiac autonomic neuropathy (CAN). This problem has never been considered in the literature before. All classifiers considered for the diagnostics of CAN previously assume complete access to all data, which would lead to enormous burden of data transfer during training if such classifiers were deployed in the cloud. We introduce a new model of clustering-based multi-layer distributed ensembles (CBMLDE). It is designed to eliminate the need to transfer data between different data centers for training of the classifiers. We conducted experiments utilizing a dataset derived from an extensive DiScRi database. Our comprehensive tests have determined the best combinations of options for setting up CBMLDE classifiers. The results demonstrate that CBMLDE classifiers not only completely eliminate the need in patient data transfer, but also have significantly outperformed all base classifiers and simpler counterpart models in all cloud frameworks.

A hybrid approach to clustering in big data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering of big data has received much attention recently. In this paper, we present a new clusiVAT algorithm and compare it with four other popular data clustering algorithms. Three of the four comparison methods are based on the well known, classical batch k-means model. Specifically, we use k-means, single pass k-means, online k-means, and clustering using representatives (CURE) for numerical comparisons. clusiVAT is based on sampling the data, imaging the reordered distance matrix to estimate the number of clusters in the data visually, clustering the samples using a relative of single linkage (SL), and then noniteratively extending the labels to the rest of the data-set using the nearest prototype rule. Previous work has established that clusiVAT produces true SL clusters in compact-separated data. We have performed experiments to show that k-means and its modified algorithms suffer from initialization issues that cause many failures. On the other hand, clusiVAT needs no initialization, and almost always finds partitions that accurately match ground truth labels in labeled data. CURE also finds SL type partitions but is much slower than the other four algorithms. In our experiments, clusiVAT proves to be the fastest and most accurate of the five algorithms; e.g., it recovers 97% of the ground truth labels in the real world KDD-99 cup data (4 292 637 samples in 41 dimensions) in 76 s.

Characterisation of the complete mitochondrial genome and 13 microsatellite loci through next-generation sequencing for the New Caledonian spider-ant Leptomyrmex pallens

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The complete mitochondrial genome and a set of polymorphic microsatellite markers were identified by 454 pyrosequencing (1/16th of a plate) for the New Caledonian rainforest spider-ant Leptomyrmex pallens. De novo genome assembly recovered the entire mitochondrial genome with mean coverage of 8.9-fold (range 1-27). The mitogenome consists of 15,591 base pairs including 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a non-coding AT-rich region. The genome arrangement is typical of insect taxa and very similar to the only other published ant mitogenome from the Solenopsis genus, with the main differences consisting of translocations and inversions of tRNAs. A total of 13 polymorphic loci were also characterized using 41 individuals from a single population in the Aoupinié region, corresponding to workers from 21 nests and 16 foraging workers. We observed moderate genetic variation across most loci (mean number of alleles per locus = 4.50; mean expected heterozygosity = 0.53) with evidence of only two loci deviating significantly from Hardy-Weinberg equilibrium due to null alleles. Marker independence was confirmed with tests for linkage disequilibrium. Most loci cross amplified for three additional Leptomyrmex species. The annotation of the mitogenome and characterization of microsatellite markers will provide useful tools for assessing the colony structure, population genetic patterns, and dispersal strategy of L. pallens in the context of rainforest fragmentation in New Caledonia. Furthermore, this paper confirms a recent line of evidence that comprehensive mitochondrial data can be obtained relatively easily from small next-generation sequencing analyses. Greater synthesis of next-generation sequencing data will play a significant role in expanding the taxonomic representation of mitochondrial genome sequences.

Microsatellite loci and the complete mitochondrial DNA sequence characterized through next generation sequencing and de novo genome assembly for the critically endangered orange-bellied parrot, Neophema chrysogaster

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A suite of polymorphic microsatellite markers and the complete mitochondrial genome sequence was developed by next generation sequencing (NGS) for the critically endangered orange-bellied parrot, Neophema chrysogaster. A total of 14 polymorphic loci were identified and characterized using DNA extractions representing 40 individuals from Melaleuca, Tasmania, sampled in 2002. We observed moderate genetic variation across most loci (mean number of alleles per locus = 2.79; mean expected heterozygosity = 0.53) with no evidence of individual loci deviating significantly from Hardy-Weinberg equilibrium. Marker independence was confirmed with tests for linkage disequilibrium, and analyses indicated no evidence of null alleles across loci. De novo and reference-based genome assemblies performed using MIRA were used to assemble the N. chrysogaster mitochondrial genome sequence with mean coverage of 116-fold (range 89 to 142-fold). The mitochondrial genome consists of 18,034 base pairs, and a typical metazoan mitochondrial gene content consisting of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a single large non-coding region (control region). The arrangement of mitochondrial genes is also typical of Avian taxa. The annotation of the mitochondrial genome and the characterization of 14 microsatellite markers provide a valuable resource for future genetic monitoring of wild and captive N. chrysogaster populations. As found previously, NGS provides a rapid, low cost and reliable method for polymorphic nuclear genetic marker development and determining complete mitochondrial genome sequences when only a fraction of a genome is sequenced.

Effectively finding relevant web pages from linkage information

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents two hyperlink analysis-based algorithms to find relevant pages for a given Web page (URL). The first algorithm comes from the extended cocitation analysis of the Web pages. It is intuitive and easy to implement. The second one takes advantage of linear algebra theories to reveal deeper relationships among the Web pages and to identify relevant pages more precisely and effectively. The experimental results show the feasibility and effectiveness of the algorithms. These algorithms could be used for various Web applications, such as enhancing Web search. The ideas and techniques in this work would be helpful to other Web-related researches.

Outsourcing and benchmarking in a rural public hospital : does economic theory provide the complete answer?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

INTRODUCTION: The ideology and pronouncements of the Australian Government in introducing 'competitive neutrality' to the public sector has improved efficiency and resource usage. In the health sector, the Human Services Department directed that non-clinical and clinical areas be market tested through benchmarking services against the private sector, with the possibility of outsourcing. These services included car parking, computing, laundry, engineering, cleaning, catering, medical imaging (radiology), pathology, pharmacy, allied health and general practice. Managers, when they choose between outsourcing, and internal servicing and production, would thus ideally base their decision on economic principles. Williamson's transaction cost theory studies the governance mechanisms that can be used to achieve economic efficiency and proposes that the optimal organisation structure is that which minimises transaction costs or the costs of exchange. Williamson proposes that four variables will affect such costs, namely: (i) frequency of exchange; (ii) asset specificity; (iii) environmental uncertainty; and (iv) threat of opportunism. This paper provides evidence from a rural public hospital and examines whether Williamson's transaction cost theory is applicable. d into an analysis that relies solely on transaction

Integrating highlights for more complete sports video summarization

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Summarization is an essential requirement for achieving a more compact and interesting representation of sports video contents. We propose a framework that integrates highlights into play segments and reveal why we should still retain breaks. Experimental results show that fast detections of whistle sounds, crowd excitement, and text boxes can complement existing techniques for play-breaks and highlights localization.

SCLOPE: an algorithm for clustering data streams of categorical attributes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is a difficult problem especially when we consider the task in the context of a data stream of categorical attributes. In this paper, we propose SCLOPE, a novel algorithm based on CLOPErsquos intuitive observation about cluster histograms. Unlike CLOPE however, our algo- rithm is very fast and operates within the constraints of a data stream environment. In particular, we designed SCLOPE according to the recent CluStream framework. Our evaluation of SCLOPE shows very promising results. It consistently outperforms CLOPE in speed and scalability tests on our data sets while maintaining high cluster purity; it also supports cluster analysis that other algorithms in its class do not.

Complete mitochondrial DNA sequence of the Australian freshwater crayfish, Cherax destructor (Crustacea: Decapoda: Parastacidae): a novel gene order revealed

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The complete mitochondrial DNA sequence was determined for the Australian freshwater crayfish Cherax destructor (Crustacea: Decapoda: Parastacidae). The 15,895-bp genome is circular with the same gene composition as that found in other metazoans. However, we report a novel gene arrangement with respect to the putative arthropod ancestral gene order and all other arthropod mitochondrial genomes sequenced to date. It is apparent that 11 genes have been translocated (ND1, ND4, ND4L, Cyt b, srRNA, and tRNAs Ser(UGA), Leu(CUN), Ile, Cys, Pro, and Val), two of which have also undergone inversions (tRNAs Pro and Val). The ‘duplication/random loss’ mechanism is a plausible model for the observed translocations, while ‘intramitochondrial recombination’ may account for the gene inversions. In addition, the arrangement of rRNA genes is incompatible with current mitochondrial transcription models, and suggests that a different transcription mechanism may operate in C. destructor.

A linkage measure framework for the real estate sector

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Linkage is one of the most important factors for gaining competitive advantage. Information on linkages is essential to understanding the structure of an economy, which is in turn important in formulating industry policies and business strategies. The hypothetical extraction method is used to measure the linkages by extracting a sector hypothetically from an economic system in the literature. In the previous research, however, the internal linkage (linkage within a sector) and sectoral linkages (linkage between two specific sectors) are ignored, and there is not a comprehensive framework to measure the linkages of a specific sector. Using the recently published Organisation for Economic Co-operation and Development input-output database at constant prices, this paper aims to resolve these two shortcomings and thereby propose a linkage measure framework to explore the linkages between the real estate sector and other sectors from a new angle. The relative and absolute linkages are termed and the total, backward, forward, internal and sectoral linkage indicators are formulated to investigate the linkages of the real estate sector from all directions. Empirical results show an increasing trend of these linkages, which confirms the increasing role of the real estate sector with economic maturity over the examined period. This framework also can be employed in other sectors.

Complete mitochondrial DNA sequences of the decapod crustaceans pseudocarcinus gigas (Menippidae) and macrobrachium rosenbergii (Palaemonidae)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The complete mitochondrial DNA sequence was determined for the Australian giant crab Pseudocarcinns gigas (Crustacea: Decapoda: Menippidae) and the giant freshwater shrimp Macrobrachium rosenbergii (Crustacea: Decapoda: Palaemonidae). The Pse gigas and Mrosenbergii mitochondrial genomes are circular molecules, 15,515 and 15,772 bp in length, respectively, and have the same gene composition as found in other metazoans. The gene arrangement of M. rosenbergii corresponds with that of the presumed ancestral arthropod gene order, represented by Limulus polyphemus, except for the position of the tRNA^Leu(UUR)gene. The Pse. gigas gene arrangement corresponds exactly with that reported for another brachyuran, Portunus trituberculatus, and differs from the M. rosenbergii gene order by only the position of the tRNA^His gene. Given the relative positions of intergenic nonoding nucleotides, the “duplication/random loss” model appears to be the most plausible mechanism for the translocation of this gene. These data represent the first caridean and only the second brachyuran complete mtDNA sequences, and a source of information that will facilitate surveys of intraspecific variation within these commercially important decapod species.

ó-SCLOPE : clustering categorical streams using attribute selection

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is a difficult problem especially when we consider the task in the context of a data stream of categorical attributes. In this paper, we propose σ-SCLOPE, a novel algorithm based on SCLOPE’s intuitive observation about cluster histograms. Unlike SCLOPE however, our algorithm consumes less memory per window and has a better clustering runtime for the same data stream in a given window. This positions σ-SCLOPE as a more attractive option over SCLOPE if a minor lost of clustering accuracy is insignificant in the application.

Linkage measures of the construction sector using the hypothetical extraction method

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The hypothetical extraction method (HEM) is used to extract a sector hypothetically from an economic system and examine the influence of this extraction on other sectors in the economy. Linkage measures based on the HEM become increasingly prominent. However, little construction linkage research applies the HEM. Using the recently published Organisation for Economic Co-operation and Development input-output database at constant prices, this research applies the HEM to the construction sector in order to explore the role of this sector in national economies and the quantitative interdependence between the construction sector and the remaining sectors. The output differences before and after the hypothetical extraction reflect the linkages of the construction sector. Empirical results show a declining trend of the total, backward and forward linkages, which confirms the decreasing role of the construction sector with economic maturity over the examined period from a new angle. Analytical results reveal that the unique nature of the construction sector and multifold external factors are the main reasons for the linkage difference between countries. Moreover, hypothesis-testing results consider statistically that the extraction structures employed in this research are appropriate to analyse the linkages of the construction sector.

Density based fuzzy c-means clustering of non-convex patterns

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a new technique to perform unsupervised data classification (clustering) based on density induced metric and non-smooth optimization. Our goal is to automatically recognize multidimensional clusters of non-convex shape. We present a modification of the fuzzy c-means algorithm, which uses the data induced metric, defined with the help of Delaunay triangulation. We detail computation of the distances in such a metric using graph algorithms. To find optimal positions of cluster prototypes we employ the discrete gradient method of non-smooth optimization. The new clustering method is capable to identify non-convex overlapped d-dimensional clusters.

«
1
2
3
4
5
6
7
8
»