52 resultados para Unsupervised clustering

em CentAUR: Central Archive University of Reading - UK


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper represents the first step in an on-going work for designing an unsupervised method based on genetic algorithm for intrusion detection. Its main role in a broader system is to notify of an unusual traffic and in that way provide the possibility of detecting unknown attacks. Most of the machine-learning techniques deployed for intrusion detection are supervised as these techniques are generally more accurate, but this implies the need of labeling the data for training and testing which is time-consuming and error-prone. Hence, our goal is to devise an anomaly detector which would be unsupervised, but at the same time robust and accurate. Genetic algorithms are robust and able to avoid getting stuck in local optima, unlike the rest of clustering techniques. The model is verified on KDD99 benchmark dataset, generating a solution competitive with the solutions of the state-of-the-art which demonstrates high possibilities of the proposed method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for data represented in multidimensional input spaces. In this paper, we describe Fast Learning SOM (FLSOM) which adopts a learning algorithm that improves the performance of the standard SOM with respect to the convergence time in the training phase. We show that FLSOM also improves the quality of the map by providing better clustering quality and topology preservation of multidimensional input data. Several tests have been carried out on different multidimensional datasets, which demonstrate better performances of the algorithm in comparison with the original SOM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present work presents a new method for activity extraction and reporting from video based on the aggregation of fuzzy relations. Trajectory clustering is first employed mainly to discover the points of entry and exit of mobiles appearing in the scene. In a second step, proximity relations between resulting clusters of detected mobiles and contextual elements from the scene are modeled employing fuzzy relations. These can then be aggregated employing typical soft-computing algebra. A clustering algorithm based on the transitive closure calculation of the fuzzy relations allows building the structure of the scene and characterises the ongoing different activities of the scene. Discovered activity zones can be reported as activity maps with different granularities thanks to the analysis of the transitive closure matrix. Taking advantage of the soft relation properties, activity zones and related activities can be labeled in a more human-like language. We present results obtained on real videos corresponding to apron monitoring in the Toulouse airport in France.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The clustering in time (seriality) of extratropical cyclones is responsible for large cumulative insured losses in western Europe, though surprisingly little scientific attention has been given to this important property. This study investigates and quantifies the seriality of extratropical cyclones in the Northern Hemisphere using a point-process approach. A possible mechanism for serial clustering is the time-varying effect of the large-scale flow on individual cyclone tracks. Another mechanism is the generation by one parent cyclone of one or more offspring through secondary cyclogenesis. A long cyclone-track database was constructed for extended October March winters from 1950 to 2003 using 6-h analyses of 850-mb relative vorticity derived from the NCEP NCAR reanalysis. A dispersion statistic based on the varianceto- mean ratio of monthly cyclone counts was used as a measure of clustering. It reveals extensive regions of statistically significant clustering in the European exit region of the North Atlantic storm track and over the central North Pacific. Monthly cyclone counts were regressed on time-varying teleconnection indices with a log-linear Poisson model. Five independent teleconnection patterns were found to be significant factors over Europe: the North Atlantic Oscillation (NAO), the east Atlantic pattern, the Scandinavian pattern, the east Atlantic western Russian pattern, and the polar Eurasian pattern. The NAO alone is not sufficient for explaining the variability of cyclone counts in the North Atlantic region and western Europe. Rate dependence on time-varying teleconnection indices accounts for the variability in monthly cyclone counts, and a cluster process did not need to be invoked.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A methodology for discovering the mechanisms and dynamics of protein clustering on solid surfaces is presented. In situ atomic force microscopy images are quantitatively compared to Monte Carlo simulations using cluster statistics to differentiate various models. We study lysozyme adsorption on mica as a model system and find that all surface-supported clusters are mobile, not just the monomers, with diffusion constant inversely related to cluster size. The surface monomer diffusion constant is measured to be D1∼9×10-16  cm2 s-1, such a low value being difficult to measure using other techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the past decade, the amount of data in biological field has become larger and larger; Bio-techniques for analysis of biological data have been developed and new tools have been introduced. Several computational methods are based on unsupervised neural network algorithms that are widely used for multiple purposes including clustering and visualization, i.e. the Self Organizing Maps (SOM). Unfortunately, even though this method is unsupervised, the performances in terms of quality of result and learning speed are strongly dependent from the neuron weights initialization. In this paper we present a new initialization technique based on a totally connected undirected graph, that report relations among some intersting features of data input. Result of experimental tests, where the proposed algorithm is compared to the original initialization techniques, shows that our technique assures faster learning and better performance in terms of quantization error.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Bayesian method of classifying observations that are assumed to come from a number of distinct subpopulations is outlined. The method is illustrated with simulated data and applied to the classification of farms according to their level and variability of income. The resultant classification shows a greater diversity of technical charactersitics within farm types than is conventionally the case. The range of mean farm income between groups in the new classification is wider than that of the conventional method and the variability of income within groups is narrower. Results show that the highest income group in 2000 included large specialist dairy farmers and pig and poultry producers, whilst in 2001 it included large and small specialist dairy farms and large mixed dairy and arable farms. In both years the lowest income group is dominated by non-milk producing livestock farms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Apoptosis induced by the death-inducing ligand FasL (CD95L) is a major mechanism of cell death. Trophoblast cells express the Fas receptor yet survive in an environment that is rich in the ligand. We report that basal nitric oxide (NO) production is responsible for the resistance of trophoblasts to FasL-induced apoptosis. In this study we demonstrate that basal NO production resulted in the inhibition of receptor clustering following ligand binding. In addition NO also protected cells through the selective nitrosylation, and inhibition, of protein kinase Cepsilon (PKCepsilon) but not PKCalpha. In the absence of NO production PKCepsilon interacted with, and phosphorylated, the anti-apoptotic protein cFLIP. The interaction is predominantly with the short form of cFLIP and its phosphorylation reduces its recruitment to the death-inducing signaling complex (DISC) that is formed following binding of a death-inducing ligand to its receptor. Inhibition of cFLIP recruitment to the DISC leads to increased activation of caspase 8 and subsequently to apoptosis. Inhibition of PKCepsilon using siRNA significantly reversed the sensitivity to apoptosis induced by inhibition of NO synthesis suggesting that NO-mediated inhibition of PKCepsilon plays an important role in the regulation of Fas-induced apoptosis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The study of motor unit action potential (MUAP) activity from electrornyographic signals is an important stage on neurological investigations that aim to understand the state of the neuromuscular system. In this context, the identification and clustering of MUAPs that exhibit common characteristics, and the assessment of which data features are most relevant for the definition of such cluster structure are central issues. In this paper, we propose the application of an unsupervised Feature Relevance Determination (FRD) method to the analysis of experimental MUAPs obtained from healthy human subjects. In contrast to approaches that require the knowledge of a priori information from the data, this FRD method is embedded on a constrained mixture model, known as Generative Topographic Mapping, which simultaneously performs clustering and visualization of MUAPs. The experimental results of the analysis of a data set consisting of MUAPs measured from the surface of the First Dorsal Interosseous, a hand muscle, indicate that the MUAP features corresponding to the hyperpolarization period in the physisiological process of generation of muscle fibre action potentials are consistently estimated as the most relevant and, therefore, as those that should be paid preferential attention for the interpretation of the MUAP groupings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we address issues in segmentation Of remotely sensed LIDAR (LIght Detection And Ranging) data. The LIDAR data, which were captured by airborne laser scanner, contain 2.5 dimensional (2.5D) terrain surface height information, e.g. houses, vegetation, flat field, river, basin, etc. Our aim in this paper is to segment ground (flat field)from non-ground (houses and high vegetation) in hilly urban areas. By projecting the 2.5D data onto a surface, we obtain a texture map as a grey-level image. Based on the image, Gabor wavelet filters are applied to generate Gabor wavelet features. These features are then grouped into various windows. Among these windows, a combination of their first and second order of statistics is used as a measure to determine the surface properties. The test results have shown that ground areas can successfully be segmented from LIDAR data. Most buildings and high vegetation can be detected. In addition, Gabor wavelet transform can partially remove hill or slope effects in the original data by tuning Gabor parameters.