20 resultados para hierarchical clustering techniques

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we study the classification of spatiotemporal pattern of one-dimensional cellular automata (CA) whereas the classification comprises CA rules including their initial conditions. We propose an exploratory analysis method based on the normalized compression distance (NCD) of spatiotemporal patterns which is used as dissimilarity measure for a hierarchical clustering. Our approach is different with respect to the following points. First, the classification of spatiotemporal pattern is comparative because the NCD evaluates explicitly the difference of compressibility among two objects, e.g., strings corresponding to spatiotemporal patterns. This is in contrast to all other measures applied so far in a similar context because they are essentially univariate. Second, Kolmogorov complexity, which underlies the NCD, was used in the classification of CA with respect to their spatiotemporal pattern. Third, our method is semiautomatic allowing us to investigate hundreds or thousands of CA rules or initial conditions simultaneously to gain insights into their organizational structure. Our numerical results are not only plausible confirming previous classification attempts but also shed light on the intricate influence of random initial conditions on the classification results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper proposes max separation clustering (MSC), a new non-hierarchical clustering method used for feature extraction from optical emission spectroscopy (OES) data for plasma etch process control applications. OES data is high dimensional and inherently highly redundant with the result that it is difficult if not impossible to recognize useful features and key variables by direct visualization. MSC is developed for clustering variables with distinctive patterns and providing effective pattern representation by a small number of representative variables. The relationship between signal-to-noise ratio (SNR) and clustering performance is highlighted, leading to a requirement that low SNR signals be removed before applying MSC. Experimental results on industrial OES data show that MSC with low SNR signal removal produces effective summarization of the dominant patterns in the data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This papers examines the use of trajectory distance measures and clustering techniques to define normal
and abnormal trajectories in the context of pedestrian tracking in public spaces. In order to detect abnormal
trajectories, what is meant by a normal trajectory in a given scene is firstly defined. Then every trajectory
that deviates from this normality is classified as abnormal. By combining Dynamic Time Warping and a
modified K-Means algorithms for arbitrary-length data series, we have developed an algorithm for trajectory
clustering and abnormality detection. The final system performs with an overall accuracy of 83% and 75%
when tested in two different standard datasets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Juvenile idiopathic arthritis (JIA) comprises a poorly understood group of chronic, childhood onset, autoimmune diseases with variable clinical outcomes. We investigated whether profiling of the synovial fluid (SF) proteome by a fluorescent dye based, two-dimensional gel (DIGE) approach could distinguish patients in whom inflammation extends to affect a large number of joints, early in the disease process. SF samples from 22 JIA patients were analyzed: 10 with oligoarticular arthritis, 5 extended oligoarticular and 7 polyarticular disease. SF samples were labeled with Cy dyes and separated by two-dimensional electrophoresis. Multivariate analyses were used to isolate a panel of proteins which distinguish patient subgroups. Proteins were identified using MALDI-TOF mass spectrometry with expression further verified by Western immunoblotting and immunohistochemistry. Hierarchical clustering based on the expression levels of a set of 40 proteins segregated the extended oligoarticular from the oligoarticular patients (p <0.05). Expression patterns of the isolated protein panel have also been observed over time, as disease spreads to multiple joints. The data indicates that synovial fluid proteome profiles could be used to stratify patients based on risk of disease extension. These protein profiles may also assist in monitoring therapeutic responses over time and help predict joint damage. © 2009 American Chemical Society.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The recent emergence of high-throughput arrays for methylation analysis has made the influence of tumor content on the interpretation of methylation levels increasingly pertinent. However, to what degree does tumor content have an influence, and what degree of tumor content makes a specimen acceptable for accurate analysis remains unclear. Taking a systematic approach, we analyzed 98 unselected formalin-fixed and paraffin-embedded gastric tumors and matched normal tissue samples using the Illumina GoldenGate methylation assay. Unsupervised hierarchical clustering showed 2 separate clusters with a significant difference in average tumor content levels. The probes identified to be significantly differentially methylated between the tumors and normals also differed according to the tumor content of the samples included, with the sensitivity of identifying the

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Evidence suggests that in prokaryotes sequence-dependent transcriptional pauses a?ect the dynamics of transcription and translation, as well as of small genetic circuits. So far, a few pause-prone sequences have been identi?ed from in vitro measurements of transcription elongation kinetics.

Results: Using a stochastic model of gene expression at the nucleotide and codon levels with realistic parameter values, we investigate three di?erent but related questions and present statistical methods for their analysis. First, we show that information from in vivo RNA and protein temporal numbers is su?cient to discriminate between models with and without a pause site in their coding sequence. Second, we demonstrate that it is possible to separate a large variety of models from each other with pauses of various durations and locations in the template by means of a hierarchical clustering and a random forest classi?er. Third, we introduce an approximate likelihood function that allows to estimate the location of a pause site.

Conclusions: This method can aid in detecting unknown pause-prone sequences from temporal measurements of RNA and protein numbers at a genome-wide scale and thus elucidate possible roles that these sequences play in the dynamics of genetic networks and phenotype.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Ineffective risk stratification can delay diagnosis of serious disease in patients with hematuria. We applied a systems biology approach to analyze clinical, demographic and biomarker measurements (n = 29) collected from 157 hematuric patients: 80 urothelial cancer (UC) and 77 controls with confounding pathologies.

Methods: On the basis of biomarkers, we conducted agglomerative hierarchical clustering to identify patient and biomarker clusters. We then explored the relationship between the patient clusters and clinical characteristics using Chi-square analyses. We determined classification errors and areas under the receiver operating curve of Random Forest Classifiers (RFC) for patient subpopulations using the biomarker clusters to reduce the dimensionality of the data.

Results: Agglomerative clustering identified five patient clusters and seven biomarker clusters. Final diagnoses categories were non-randomly distributed across the five patient clusters. In addition, two of the patient clusters were enriched with patients with ‘low cancer-risk’ characteristics. The biomarkers which contributed to the diagnostic classifiers for these two patient clusters were similar. In contrast, three of the patient clusters were significantly enriched with patients harboring ‘high cancer-risk” characteristics including proteinuria, aggressive pathological stage and grade, and malignant cytology. Patients in these three clusters included controls, that is, patients with other serious disease and patients with cancers other than UC. Biomarkers which contributed to the diagnostic classifiers for the largest ‘high cancer- risk’ cluster were different than those contributing to the classifiers for the ‘low cancer-risk’ clusters. Biomarkers which contributed to subpopulations that were split according to smoking status, gender and medication were different.

Conclusions: The systems biology approach applied in this study allowed the hematuric patients to cluster naturally on the basis of the heterogeneity within their biomarker data, into five distinct risk subpopulations. Our findings highlight an approach with the promise to unlock the potential of biomarkers. This will be especially valuable in the field of diagnostic bladder cancer where biomarkers are urgently required. Clinicians could interpret risk classification scores in the context of clinical parameters at the time of triage. This could reduce cystoscopies and enable priority diagnosis of aggressive diseases, leading to improved patient outcomes at reduced costs. © 2013 Emmert-Streib et al; licensee BioMed Central Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: There is no method routinely used to predict response to anthracycline and cyclophosphamide–based chemotherapy in the clinic; therefore patients often receive treatment for breast cancer with no benefit. Loss of the Fanconi anemia/BRCA (FA/BRCA) DNA damage response (DDR) pathway occurs in approximately 25% of breast cancer patients through several mechanisms and results in sensitization to DNA-damaging agents. The aim of this study was to develop an assay to detect DDR-deficient tumors associated with loss of the FA/BRCA pathway, for the purpose of treatment selection.

Methods: DNA microarray data from 21 FA patients and 11 control subjects were analyzed to identify genetic processes associated with a deficiency in DDR. Unsupervised hierarchical clustering was then performed using 60 BRCA1/2 mutant and 47 sporadic tumor samples, and a molecular subgroup was identified that was defined by the molecular processes represented within FA patients. A 44-gene microarray-based assay (the DDR deficiency assay) was developed to prospectively identify this subgroup from formalin-fixed, paraffin-embedded samples. All statistical tests were two-sided.

Results: In a publicly available independent cohort of 203 patients, the assay predicted complete pathologic response vs residual disease after neoadjuvant DNA-damaging chemotherapy (5-fluorouracil, anthracycline, and cyclophosphamide) with an odds ratio of 3.96 (95% confidence interval [Cl] =1.67 to 9.41; P = .002). In a new independent cohort of 191 breast cancer patients treated with adjuvant 5-fluorouracil, epirubicin, and cyclophosphamide, a positive assay result predicted 5-year relapse-free survival with a hazard ratio of 0.37 (95% Cl = 0.15 to 0.88; P = .03) compared with the assay negative population.

Conclusions: A formalin-fixed, paraffin-embedded tissue-based assay has been developed and independently validated as a predictor of response and prognosis after anthracycline/cyclophosphamide–based chemotherapy in the neoadjuvant and adjuvant settings. These findings warrant further validation in a prospective clinical study.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Traditional Chinese Medicines (TCMs) derived from animal horns are one of the most important types of Chinese medicine. In the present study, a fast and sensitive analytical method was established for qualitative and quantitative determination of 14 nucleosides and nucleobases in animal horns using hydrophilic interaction ultra-high performance liquid chromatography coupled with triple-quadruple tandem mass spectrometry (HILIC-UPLC-QQQ-MS/MS) in selective reaction monitoring (SRM) mode. The method was optimized and validated, and showed good linearity, precision, repeatability, and accuracy. The method was successfully used to determine contents of the 14 nucleosides and nucleobases in 25 animal horn samples. Hierarchical clustering analysis (HCA) and principal component analysis (PCA) were performed and the 25 samples were thereby divided into two groups, which agreed with taxonomy. The method may enable quick and effective search of substitutes for precious horns.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Invasive urothelial cell carcinoma (UCC) is characterized by increased chromosomal instability and follows an aggressive clinical course in contrast to non-invasive disease. To identify molecular processes that confer and maintain an aggressive malignant phenotype, we used a high-throughput genome-wide approach to interrogate a cohort of high and low clinical risk UCC tumors. Differential expression analyses highlighted cohesive dysregulation of critical genes involved in the G(2)/M checkpoint in aggressive UCC. Hierarchical clustering based on DNA Damage Response (DDR) genes separated tumors according to a pre-defined clinical risk phenotype. Using array-comparative genomic hybridization, we confirmed that the DDR was disrupted in tumors displaying high genomic instability. We identified DNA copy number gains at 20q13.2-q13.3 (AURKA locus) and determined that overexpression of AURKA accompanied dysregulation of DDR genes in high risk tumors. We postulated that DDR-deficient UCC tumors are advantaged by a selective pressure for AURKA associated override of M phase barriers and confirmed this in an independent tissue microarray series. This mechanism that enables cancer cells to maintain an aggressive phenotype forms a rationale for targeting AURKA as a therapeutic strategy in advanced stage UCC.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background/Purpose:Juvenile idiopathic arthritis (JIA) comprises a poorly understood group of chronic, childhood onset, autoimmune diseases with variable clinical outcomes. We investigated whether profiling of the synovial fluid (SF) proteome by a fluorescent dye based, two-dimensional gel (DIGE) approach could distinguish the subset of patients in whom inflammation extends to affect a large number of joints, early in the disease process. The post-translational modifications to candidate protein markers were verified by a novel deglycosylation strategy.Methods:SF samples from 57 patients were obtained around time of initial diagnosis of JIA. At 1 year from inclusion patients were categorized according to ILAR criteria as oligoarticular arthritis (n=26), extended oligoarticular (n=8) and polyarticular disease (n=18). SF samples were labeled with Cy dyes and separated by two-dimensional electrophoresis. Multivariate analyses were used to isolate a panel of proteins which distinguish patient subgroups. Proteins were identified using MALDI-TOF mass spectrometry with vitamin D binding protein (VDBP) expression and siaylation further verified by immunohistochemistry, ELISA test and immunoprecipitation. Candidate biomarkers were compared to conventional inflammation measure C-reactive protein (CRP). Sialic acid residues were enzymatically cleaved from immunopurified SF VDBP, enriched by hydrophilic interaction liquid chromatography (HILIC) and analysed by mass spectrometry.Results:Hierarchical clustering based on the expression levels of a set of 23 proteins segregated the extended-to-be oligoarticular from the oligoarticular patients. A cleaved isoform of VDBP, spot 873, is present at significantly reduced levels in the SF of oligoarticular patients at risk of disease extension, relative to other subgroups (p<0.05). Conversely total levels of vitamin D binding protein are elevated in plasma and ROC curves indicate an improved diagnostic sensitivity to detect patients at risk of disease extension, over both spot 873 and CRP levels. Sialysed forms of intact immunopurified VDBP were more prevalent in persistent oligoarticular patient synovial fluids.Conclusion:The data indicate that a subset of the synovial fluid proteome may be used to stratify patients to determine risk of disease extension. Reduced conversion of VDBP to a macrophage activation factor may represent a novel pathway contributing to increased risk of disease extension in JIA patients.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

To define specific pathways important in the multistep transformation process of normal plasma cells (PCs) to monoclonal gammopathy of uncertain significance (MGUS) and multiple myeloma (MM), we have applied microarray analysis to PCs from 5 healthy donors (N), 7 patients with MGUS, and 24 patients with newly diagnosed MM. Unsupervised hierarchical clustering using 125 genes with a large variation across all samples defined 2 groups: N and MGUS/MM. Supervised analysis identified 263 genes differentially expressed between N and MGUS and 380 genes differentially expressed between N and MM, 197 of which were also differentially regulated between N and MGUS. Only 74 genes were differentially expressed between MGUS and MM samples, indicating that the differences between MGUS and MM are smaller than those between N and MM or N and MGUS. Differentially expressed genes included oncogenes/tumor-suppressor genes (LAF4, RB1, and disabled homolog 2), cell-signaling genes (RAS family members, B-cell signaling and NF-kappaB genes), DNA-binding and transcription-factor genes (XBP1, zinc finger proteins, forkhead box, and ring finger proteins), and developmental genes (WNT and SHH pathways). Understanding the molecular pathogenesis of MM by gene expression profiling has demonstrated sequential genetic changes from N to malignant PCs and highlighted important pathways involved in the transformation of MGUS to MM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, a hierarchical video structure summarization approach using Laplacian Eigenmap is proposed, where a small set of reference frames is selected from the video sequence to form a reference subspace to measure the dissimilarity between two arbitrary frames. In the proposed summarization scheme, the shot-level key frames are first detected from the continuity of inter-frame dissimilarity, and the sub-shot level and scene level representative frames are then summarized by using K-mean clustering. The experiment is carried on both test videos and movies, and the results show that in comparison with a similar approach using latent semantic analysis, the proposed approach using Laplacian Eigenmap can achieve a better recall rate in keyframe detection, and gives an efficient hierarchical summarization at sub shot, shot and scene levels subsequently.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conditional branches frequently exhibit similar behavior (bias, time-varying behavior,...), a property that can be used to improve branch prediction accuracy. Branch clustering constructs groups or clusters of branches with similar behavior and applies different branch prediction techniques to each branch cluster. We revisit the topic of branch clustering with the aim of generalizing branch clustering. We investigate several methods to measure cluster information, with the most effective the storage of information in the branch target buffer. Also, we investigate alternative methods of using the branch cluster identification in the branch predictor. By these improvements we arrive at a branch clustering technique that obtains higher accuracy than previous approaches presented in the literature for the gshare predictor. Furthermore, we evaluate our branch clustering technique in a wide range of predictors to show the general applicability of the method. Branch clustering improves the accuracy of the local history (PAg) predictor, the path-based perceptron and the PPM-like predictor, one of the 2004 CBP finalists.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The quantity and quality of spatial data are increasing rapidly. This is particularly evident in the case of movement data. Devices capable of accurately recording the position of moving entities have become ubiquitous and created an abundance of movement data. Valuable knowledge concerning processes occurring in the physical world can be extracted from these large movement data sets. Geovisual analytics offers powerful techniques to achieve this. This article describes a new geovisual analytics tool specifically designed for movement data. The tool features the classic space-time cube augmented with a novel clustering approach to identify common behaviour. These techniques were used to analyse pedestrian movement in a city environment which revealed the effectiveness of the tool for identifying spatiotemporal patterns. © 2014 Taylor & Francis.