856 resultados para Data Driven Clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present work proposes a method based on CLV (Clustering around Latent Variables) for identifying groups of consumers in L-shape data. This kind of datastructure is very common in consumer studies where a panel of consumers is asked to assess the global liking of a certain number of products and then, preference scores are arranged in a two-way table Y. External information on both products (physicalchemical description or sensory attributes) and consumers (socio-demographic background, purchase behaviours or consumption habits) may be available in a row descriptor matrix X and in a column descriptor matrix Z respectively. The aim of this method is to automatically provide a consumer segmentation where all the three matrices play an active role in the classification, getting homogeneous groups from all points of view: preference, products and consumer characteristics. The proposed clustering method is illustrated on data from preference studies on food products: juices based on berry fruits and traditional cheeses from Trentino. The hedonic ratings given by the consumer panel on the products under study were explained with respect to the product chemical compounds, sensory evaluation and consumer socio-demographic information, purchase behaviour and consumption habits.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

[ES]En este artículo se describe la experiencia de la aplicación de técnicas de EDM (clustering) a un curso disponible en la plataforma Ude@ de la Universidad de Antioquia. El objetivo es clasificar los patrones de interacción de los estudiantes a partir de la información almacenada en la base de datos de la plataforma Moodle. Para ello, se generan informes sobre el uso de los recursos y la autoevaluación que permiten analizar el comportamiento y los patrones de navegación de los estudiantes durante el uso del LMS (Learning Management System).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Il task del data mining si pone come obiettivo l'estrazione automatica di schemi significativi da grandi quantità di dati. Un esempio di schemi che possono essere cercati sono raggruppamenti significativi dei dati, si parla in questo caso di clustering. Gli algoritmi di clustering tradizionali mostrano grossi limiti in caso di dataset ad alta dimensionalità, composti cioè da oggetti descritti da un numero consistente di attributi. Di fronte a queste tipologie di dataset è necessario quindi adottare una diversa metodologia di analisi: il subspace clustering. Il subspace clustering consiste nella visita del reticolo di tutti i possibili sottospazi alla ricerca di gruppi signicativi (cluster). Una ricerca di questo tipo è un'operazione particolarmente costosa dal punto di vista computazionale. Diverse ottimizzazioni sono state proposte al fine di rendere gli algoritmi di subspace clustering più efficienti. In questo lavoro di tesi si è affrontato il problema da un punto di vista diverso: l'utilizzo della parallelizzazione al fine di ridurre il costo computazionale di un algoritmo di subspace clustering.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Le tecniche di next generation sequencing costituiscono un potente strumento per diverse applicazioni, soprattutto da quando i loro costi sono iniziati a calare e la qualità dei loro dati a migliorare. Una delle applicazioni del sequencing è certamente la metagenomica, ovvero l'analisi di microorganismi entro un dato ambiente, come per esempio quello dell'intestino. In quest'ambito il sequencing ha permesso di campionare specie batteriche a cui non si riusciva ad accedere con le tradizionali tecniche di coltura. Lo studio delle popolazioni batteriche intestinali è molto importante in quanto queste risultano alterate come effetto ma anche causa di numerose malattie, come quelle metaboliche (obesità, diabete di tipo 2, etc.). In questo lavoro siamo partiti da dati di next generation sequencing del microbiota intestinale di 5 animali (16S rRNA sequencing) [Jeraldo et al.]. Abbiamo applicato algoritmi ottimizzati (UCLUST) per clusterizzare le sequenze generate in OTU (Operational Taxonomic Units), che corrispondono a cluster di specie batteriche ad un determinato livello tassonomico. Abbiamo poi applicato la teoria ecologica a master equation sviluppata da [Volkov et al.] per descrivere la distribuzione dell'abbondanza relativa delle specie (RSA) per i nostri campioni. La RSA è uno strumento ormai validato per lo studio della biodiversità dei sistemi ecologici e mostra una transizione da un andamento a logserie ad uno a lognormale passando da piccole comunità locali isolate a più grandi metacomunità costituite da più comunità locali che possono in qualche modo interagire. Abbiamo mostrato come le OTU di popolazioni batteriche intestinali costituiscono un sistema ecologico che segue queste stesse regole se ottenuto usando diverse soglie di similarità nella procedura di clustering. Ci aspettiamo quindi che questo risultato possa essere sfruttato per la comprensione della dinamica delle popolazioni batteriche e quindi di come queste variano in presenza di particolari malattie.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Several countries have acquired, over the past decades, large amounts of area covering Airborne Electromagnetic data. Contribution of airborne geophysics has dramatically increased for both groundwater resource mapping and management proving how those systems are appropriate for large-scale and efficient groundwater surveying. We start with processing and inversion of two AEM dataset from two different systems collected over the Spiritwood Valley Aquifer area, Manitoba, Canada respectively, the AeroTEM III (commissioned by the Geological Survey of Canada in 2010) and the “Full waveform VTEM” dataset, collected and tested over the same survey area, during the fall 2011. We demonstrate that in the presence of multiple datasets, either AEM and ground data, due processing, inversion, post-processing, data integration and data calibration is the proper approach capable of providing reliable and consistent resistivity models. Our approach can be of interest to many end users, ranging from Geological Surveys, Universities to Private Companies, which are often proprietary of large geophysical databases to be interpreted for geological and\or hydrogeological purposes. In this study we deeply investigate the role of integration of several complimentary types of geophysical data collected over the same survey area. We show that data integration can improve inversions, reduce ambiguity and deliver high resolution results. We further attempt to use the final, most reliable output resistivity models as a solid basis for building a knowledge-driven 3D geological voxel-based model. A voxel approach allows a quantitative understanding of the hydrogeological setting of the area, and it can be further used to estimate the aquifers volumes (i.e. potential amount of groundwater resources) as well as hydrogeological flow model prediction. In addition, we investigated the impact of an AEM dataset towards hydrogeological mapping and 3D hydrogeological modeling, comparing it to having only a ground based TEM dataset and\or to having only boreholes data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A prototype vortex-driven air lift pump was developed and experimentally evaluated. It was designed to be easily manufactured and scalable for arbitrary riser diameters. The model tested fit in a 2 inch diameter riser with six air injection nozzles through which airwas injected helically around the perimeter of the riser at an angle of 70º from pure tangential injection. The pump was intended to transport both water and sediment over a large range of submergence ratios. A test apparatus was designed to be able to simulate deep water or oceanic environments. The resulting test setup had a finite reservoir; over the course of a test, the submergence ratio varied from 0.48 to 0.39. For air injection pressures ranging from 10 to 60 psig and for air flow rates of 6 to 15 scfm, the induced water discharge flow rates varied only slightly, due to the limited range of available submergence ratios. The anticipated simulation of deep water environment, with a corresponding equivalent increase in thesubmergence ratio, proved unattainable. The pump prototype successfully transported both water and sediment (sand). Thepercent volume yield of the sediment was in an acceptable range. The pump design has been subsequently used successfully in a 4 inch configuration in a follow-on project. A computer program was written in Matlab to simulate the pump characteristics. The program output water pressures at the location of air injection which were physicallycompatible with the experimental data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clay mineral-rich sedimentary formations are currently under investigation to evaluate their potential use as host formations for installation of deep underground disposal facilities for radioactive waste (e.g. Boom Clay (BE), Opalinus Clay (CH), Callovo-Oxfordian argillite (FR)). The ultimate safety of the corresponding repository concepts depends largely on the capacity of the host formation to limit the flux towards the biosphere of radionuclides (RN) contained in the waste to acceptably low levels. Data for diffusion-driven transfer in these formations shows extreme differences in the measured or modelled behaviour for various radionuclides, e. g. between halogen RN (Cl-36, I-129) and actinides (U-238,U-235, Np-237, Th-232, etc.), which result from major differences between RN of the effects on transport of two phenomena: diffusion and sorption. This paper describes recent research aimed at improving understanding of these two phenomena, focusing on the results of studies carried out during the EC Funmig IP on clayrocks from the above three formations and from the Boda formation (HU). Project results regarding phenomena governing water, cation and anion distribution and mobility in the pore volumes influenced by the negatively-charged surfaces of clay minerals show a convergence of the modelling results for behaviour at the molecular scale and descriptions based on electrical double layer models. Transport models exist which couple ion distribution relative to the clay-solution interface and differentiated diffusive characteristics. These codes are able to reproduce the main trends in behaviour observed experimentally, e.g. D-e(anion) < D-e(HTO) < D-e(cation) and D-e(anion) variations as a function of ionic strength and material density. These trends are also well-explained by models of transport through ideal porous matrices made up of a charged surface material. Experimental validation of these models is good as regards monovalent alkaline cations, in progress for divalent electrostatically-interacting cations (e.g. Sr2+) and still relatively poor for 'strongly sorbing', high K-d cations. Funmig results have clarified understanding of how clayrock mineral composition, and the corresponding organisation of mineral grain assemblages and their associated porosity, can affect mobile solute (anions, HTO) diffusion at different scales (mm to geological formation). In particular, advances made in the capacity to map clayrock mineral grain-porosity organisation at high resolution provide additional elements for understanding diffusion anisotropy and for relating diffusion characteristics measured at different scales. On the other hand, the results of studies focusing on evaluating the potential effects of heterogeneity on mobile species diffusion at the formation scale tend to show that there is a minimal effect when compared to a homogeneous property model. Finally, the results of a natural tracer-based study carried out on the Opalinus Clay formation increase confidence in the use of diffusion parameters measured on laboratory scale samples for predicting diffusion over geological time-space scales. Much effort was placed on improving understanding of coupled sorption-diffusion phenomena for sorbing cations in clayrocks. Results regarding sorption equilibrium in dispersed and compacted materials for weakly to moderately sorbing cations (Sr2+, Cs+, Co2+) tend to show that the same sorption model probably holds in both systems. It was not possible to demonstrate this for highly sorbing elements such as Eu(III) because of the extremely long times needed to reach equilibrium conditions, but there does not seem to be any clear reason why such elements should not have similar behaviour. Diffusion experiments carried out with Sr2+, Cs+ and Eu(III) on all of the clayrocks gave mixed results and tend to show that coupled diffusion-sorption migration is much more complex than expected, leading generally to greater mobility than that predicted by coupling a batch-determined K-d and Ficks law based on the diffusion behaviour of HTO. If the K-d measured on equivalent dispersed systems holds as was shown to be the case for Sr, Cs (and probably Co) for Opalinus Clay, these results indicate that these cations have a D-e value higher than HTO (up to a factor of 10 for Cs+). Results are as yet very limited for very moderate to strongly sorbing species (e.g. Co(II), Eu(III), Cu(II)) because of their very slow transfer characteristics. (C) 2011 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study describes brain areas involved in medial temporal lobe (mTL) seizures of 12 patients. All patients showed so-called oro-alimentary behavior within the first 20 s of clinical seizure manifestation characteristic of mTL seizures. Single photon emission computed tomography (SPECT) images of regional cerebral blood flow (rCBF) were acquired from the patients in ictal and interictal phases and from normal volunteers. Image analysis employed categorical comparisons with statistical parametric mapping and principal component analysis (PCA) to assess functional connectivity. PCA supplemented the findings of the categorical analysis by decomposing the covariance matrix containing images of patients and healthy subjects into distinct component images of independent variance, including areas not identified by the categorical analysis. Two principal components (PCs) discriminated the subject groups: patients with right or left mTL seizures and normal volunteers, indicating distinct neuronal networks implicated by the seizure. Both PCs were correlated with seizure duration, one positively and the other negatively, confirming their physiological significance. The independence of the two PCs yielded a clear clustering of subject groups. The local pattern within the temporal lobe describes critical relay nodes which are the counterpart of oro-alimentary behavior: (1) right mesial temporal zone and ipsilateral anterior insula in right mTL seizures, and (2) temporal poles on both sides that are densely interconnected by the anterior commissure. Regions remote from the temporal lobe may be related to seizure propagation and include positively and negatively loaded areas. These patterns, the covarying areas of the temporal pole and occipito-basal visual association cortices, for example, are related to known anatomic paths.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a model of spike-driven synaptic plasticity inspired by experimental observations and motivated by the desire to build an electronic hardware device that can learn to classify complex stimuli in a semisupervised fashion. During training, patterns of activity are sequentially imposed on the input neurons, and an additional instructor signal drives the output neurons toward the desired activity. The network is made of integrate-and-fire neurons with constant leak and a floor. The synapses are bistable, and they are modified by the arrival of presynaptic spikes. The sign of the change is determined by both the depolarization and the state of a variable that integrates the postsynaptic action potentials. Following the training phase, the instructor signal is removed, and the output neurons are driven purely by the activity of the input neurons weighted by the plastic synapses. In the absence of stimulation, the synapses preserve their internal state indefinitely. Memories are also very robust to the disruptive action of spontaneous activity. A network of 2000 input neurons is shown to be able to classify correctly a large number (thousands) of highly overlapping patterns (300 classes of preprocessed Latex characters, 30 patterns per class, and a subset of the NIST characters data set) and to generalize with performances that are better than or comparable to those of artificial neural networks. Finally we show that the synaptic dynamics is compatible with many of the experimental observations on the induction of long-term modifications (spike-timing-dependent plasticity and its dependence on both the postsynaptic depolarization and the frequency of pre- and postsynaptic neurons).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

AIMS: Multiple arrhythmia re-inductions were recently shown in His-Purkinje system (HPS) ventricular tachycardia (VT). We hypothesized that HPS VT was a frequent mechanism of repetitive or incessant VT and assessed diagnostic criteria to select patients likely to have HPS VT. METHODS AND RESULTS: Consecutive patients with clustering VT episodes (>3 sustained monomorphic VT within 2 weeks) were included in the analysis. HPS VT was considered plausible in patients with (i) impaired left ventricular function associated with dilated cardiomyopathy or valvular heart disease; or (ii) ECG during VT similar to sinus rhythm QRS or to bundle-branch block QRS. HPS VT was plausible in 12 of 48 patients and HPS VT was demonstrated in 6 of 12 patients (50%, or 13% of the whole study group). Median VT cycle length was 318 ms (250-550). Catheter ablation was successful in all six patients. CONCLUSION: His-Purkinje system VT is found in a significant number of patients with repetitive or incessant VT episodes, and in a large proportion of patients with predefined clinical or electrocardiographic characteristics. Since it is easily amenable to catheter ablation, our data support the screening of all patients with repetitive VT in this regard and an invasive approach in a selected group of patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In 1998-2001 Finland suffered the most severe insect outbreak ever recorded, over 500,000 hectares. The outbreak was caused by the common pine sawfly (Diprion pini L.). The outbreak has continued in the study area, Palokangas, ever since. To find a good method to monitor this type of outbreaks, the purpose of this study was to examine the efficacy of multi-temporal ERS-2 and ENVISAT SAR imagery for estimating Scots pine (Pinus sylvestris L.) defoliation. Three methods were tested: unsupervised k-means clustering, supervised linear discriminant analysis (LDA) and logistic regression. In addition, I assessed if harvested areas could be differentiated from the defoliated forest using the same methods. Two different speckle filters were used to determine the effect of filtering on the SAR imagery and subsequent results. The logistic regression performed best, producing a classification accuracy of 81.6% (kappa 0.62) with two classes (no defoliation, >20% defoliation). LDA accuracy was with two classes at best 77.7% (kappa 0.54) and k-means 72.8 (0.46). In general, the largest speckle filter, 5 x 5 image window, performed best. When additional classes were added the accuracy was usually degraded on a step-by-step basis. The results were good, but because of the restrictions in the study they should be confirmed with independent data, before full conclusions can be made that results are reliable. The restrictions include the small size field data and, thus, the problems with accuracy assessment (no separate testing data) as well as the lack of meteorological data from the imaging dates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the current market system, power systems are operated at higher loads for economic reasons. Power system stability becomes a genuine concern in such operating conditions. In case of failure of any larger component, the system may become stressed. These events may start cascading failures, which may lead to blackouts. One of the main reasons of the major recorded blackout events has been the unavailability of system-wide information. Synchrophasor technology has the capability to provide system-wide real time information. Phasor Measurement Units (PMUs) are the basic building block of this technology, which provide the Global Positioning System (GPS) time-stamped voltage and current phasor values along with the frequency. It is being assumed that synchrophasor data of all the buses is available and thus the whole system is fully observable. This information can be used to initiate islanding or system separation to avoid blackouts. A system separation strategy using synchrophasor data has been developed to answer the three main aspects of system separation: (1) When to separate: One class support machines (OC-SVM) is primarily used for the anomaly detection. Here OC-SVM was used to detect wide area instability. OC-SVM has been tested on different stable and unstable cases and it is found that OC-SVM has the capability to detect the wide area instability and thus is capable to answer the question of “when the system should be separated”. (2) Where to separate: The agglomerative clustering technique was used to find the groups of coherent buses. The lines connecting different groups of coherent buses form the separation surface. The rate of change of the bus voltage phase angles has been used as the input to this technique. This technique has the potential to exactly identify the lines to be tripped for the system separation. (3) What to do after separation: Load shedding was performed approximately equal to the sum of power flows along the candidate system separation lines should be initiated before tripping these lines. Therefore it is recommended that load shedding should be initiated before tripping the lines for system separation.