912 resultados para hierarchical clustering techniques
Resumo:
Sk?t, L., Humphreys, M. O., Armstead, I. P., Heywood, S., Sk?t, K. P., Sanderson, R., Thomas, I. D., Chorlton, K. H., & Sackville Hamilton, N. R. (2005). An association mapping approach to identify flowering time genes in natural populations of Lolium perenne (L.). Molecular Breeding, 15(3), 233-245. Sponsorship: BBSRC RAE2008
Resumo:
The SIEGE (Smoking Induced Epithelial Gene Expression) database is a clinical resource for compiling and analyzing gene expression data from epithelial cells of the human intra-thoracic airway. This database supports a translational research study whose goal is to profile the changes in airway gene expression that are induced by cigarette smoke. RNA is isolated from airway epithelium obtained at bronchoscopy from current-, former- and never-smoker subjects, and hybridized to Affymetrix HG-U133A Genechips, which measure the level of expression of ~22 500 human transcripts. The microarray data generated along with relevant patient information is uploaded to SIEGE by study administrators using the database's web interface, found at http://pulm.bumc.bu.edu/siegeDB. PERL-coded scripts integrated with SIEGE perform various quality control functions including the processing, filtering and formatting of stored data. The R statistical package is used to import database expression values and execute a number of statistical analyses including t-tests, correlation coefficients and hierarchical clustering. Values from all statistical analyses can be queried through CGI-based tools and web forms found on the �Search� section of the database website. Query results are embedded with graphical capabilities as well as with links to other databases containing valuable gene resources, including Entrez Gene, GO, Biocarta, GeneCards, dbSNP and the NCBI Map Viewer.
Resumo:
We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number of all text characters on a web page. K-means clustering is used to create unique thresholds to differentiate index pages and article pages on individual web sites. Index pages contain mostly links to articles and other indices, while article pages contain mostly text. We also present a novel link grouping algorithm using agglomerative hierarchical clustering that groups links in the same spatial neighborhood together while preserving link structure. Grouping allows users with severe disabilities to use a scan-based mechanism to tab through a web page and select items. In experiments, we saw up to a 40-fold reduction in the number of commands needed to click on a link with a scan-based interface, which shows that we can vastly improve the rate of communication for users with disabilities. We used web page classification and link grouping to alter web page display on an accessible web browser that we developed to make a usable browsing interface for users with disabilities. Our classification method consistently outperformed a baseline classifier even when using minimal data to generate article and index clusters, and achieved classification accuracy of 94.0% on web sites with well-formed or slightly malformed HTML, compared with 80.1% accuracy for the baseline classifier.
Resumo:
BACKGROUND: A major challenge in oncology is the selection of the most effective chemotherapeutic agents for individual patients, while the administration of ineffective chemotherapy increases mortality and decreases quality of life in cancer patients. This emphasizes the need to evaluate every patient's probability of responding to each chemotherapeutic agent and limiting the agents used to those most likely to be effective. METHODS AND RESULTS: Using gene expression data on the NCI-60 and corresponding drug sensitivity, mRNA and microRNA profiles were developed representing sensitivity to individual chemotherapeutic agents. The mRNA signatures were tested in an independent cohort of 133 breast cancer patients treated with the TFAC (paclitaxel, 5-fluorouracil, adriamycin, and cyclophosphamide) chemotherapy regimen. To further dissect the biology of resistance, we applied signatures of oncogenic pathway activation and performed hierarchical clustering. We then used mRNA signatures of chemotherapy sensitivity to identify alternative therapeutics for patients resistant to TFAC. Profiles from mRNA and microRNA expression data represent distinct biologic mechanisms of resistance to common cytotoxic agents. The individual mRNA signatures were validated in an independent dataset of breast tumors (P = 0.002, NPV = 82%). When the accuracy of the signatures was analyzed based on molecular variables, the predictive ability was found to be greater in basal-like than non basal-like patients (P = 0.03 and P = 0.06). Samples from patients with co-activated Myc and E2F represented the cohort with the lowest percentage (8%) of responders. Using mRNA signatures of sensitivity to other cytotoxic agents, we predict that TFAC non-responders are more likely to be sensitive to docetaxel (P = 0.04), representing a viable alternative therapy. CONCLUSIONS: Our results suggest that the optimal strategy for chemotherapy sensitivity prediction integrates molecular variables such as ER and HER2 status with corresponding microRNA and mRNA expression profiles. Importantly, we also present evidence to support the concept that analysis of molecular variables can present a rational strategy to identifying alternative therapeutic opportunities.
Resumo:
Coastal zooplankton have been investigated since 1984 at a Long Term Ecological Research station MC (LTER-MC) in the inner Gulf of Naples (Tyrrhenian Sea, Western Mediterranean). The sampling site, located between the littoral and the open sea systems, has very active hydrography that affects plankton communities. The present work was aimed at establishing whether, in such a dynamic and variable environment, species associations and homogeneous periods could be identified as characteristic and stable features of the mesozooplankton over the period 1984–2006. Hierarchical clustering was applied to assess species associations based on a matrix of similarities between species (R-mode), and homogeneous periods based on a matrix of similarities between observations (Q-mode). The Indicator Value index [IndVal, Dufrene and Legendre (1997) Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecol. Monogr., 67, 345–366] was calculated to identify species characterizing each period. Five taxonomic groups with well-defined composition and abundance were identified as robust associations that likely reflect different modes of community functioning. The temporal course of these associations was largely shaped by strong seasonal forcing comprising both physical and biological (e.g. trophic) signals. These associations persisted over the long term, thus indicating some stable characters in the Naples zooplankton time-series, providing evidence of resilience in communities in highly variable coastal conditions.
Resumo:
Juvenile idiopathic arthritis (JIA) comprises a poorly understood group of chronic, childhood onset, autoimmune diseases with variable clinical outcomes. We investigated whether profiling of the synovial fluid (SF) proteome by a fluorescent dye based, two-dimensional gel (DIGE) approach could distinguish patients in whom inflammation extends to affect a large number of joints, early in the disease process. SF samples from 22 JIA patients were analyzed: 10 with oligoarticular arthritis, 5 extended oligoarticular and 7 polyarticular disease. SF samples were labeled with Cy dyes and separated by two-dimensional electrophoresis. Multivariate analyses were used to isolate a panel of proteins which distinguish patient subgroups. Proteins were identified using MALDI-TOF mass spectrometry with expression further verified by Western immunoblotting and immunohistochemistry. Hierarchical clustering based on the expression levels of a set of 40 proteins segregated the extended oligoarticular from the oligoarticular patients (p <0.05). Expression patterns of the isolated protein panel have also been observed over time, as disease spreads to multiple joints. The data indicates that synovial fluid proteome profiles could be used to stratify patients based on risk of disease extension. These protein profiles may also assist in monitoring therapeutic responses over time and help predict joint damage. © 2009 American Chemical Society.
Resumo:
The recent emergence of high-throughput arrays for methylation analysis has made the influence of tumor content on the interpretation of methylation levels increasingly pertinent. However, to what degree does tumor content have an influence, and what degree of tumor content makes a specimen acceptable for accurate analysis remains unclear. Taking a systematic approach, we analyzed 98 unselected formalin-fixed and paraffin-embedded gastric tumors and matched normal tissue samples using the Illumina GoldenGate methylation assay. Unsupervised hierarchical clustering showed 2 separate clusters with a significant difference in average tumor content levels. The probes identified to be significantly differentially methylated between the tumors and normals also differed according to the tumor content of the samples included, with the sensitivity of identifying the
Resumo:
Background: Evidence suggests that in prokaryotes sequence-dependent transcriptional pauses a?ect the dynamics of transcription and translation, as well as of small genetic circuits. So far, a few pause-prone sequences have been identi?ed from in vitro measurements of transcription elongation kinetics.
Results: Using a stochastic model of gene expression at the nucleotide and codon levels with realistic parameter values, we investigate three di?erent but related questions and present statistical methods for their analysis. First, we show that information from in vivo RNA and protein temporal numbers is su?cient to discriminate between models with and without a pause site in their coding sequence. Second, we demonstrate that it is possible to separate a large variety of models from each other with pauses of various durations and locations in the template by means of a hierarchical clustering and a random forest classi?er. Third, we introduce an approximate likelihood function that allows to estimate the location of a pause site.
Conclusions: This method can aid in detecting unknown pause-prone sequences from temporal measurements of RNA and protein numbers at a genome-wide scale and thus elucidate possible roles that these sequences play in the dynamics of genetic networks and phenotype.
Resumo:
Background: Ineffective risk stratification can delay diagnosis of serious disease in patients with hematuria. We applied a systems biology approach to analyze clinical, demographic and biomarker measurements (n = 29) collected from 157 hematuric patients: 80 urothelial cancer (UC) and 77 controls with confounding pathologies.
Methods: On the basis of biomarkers, we conducted agglomerative hierarchical clustering to identify patient and biomarker clusters. We then explored the relationship between the patient clusters and clinical characteristics using Chi-square analyses. We determined classification errors and areas under the receiver operating curve of Random Forest Classifiers (RFC) for patient subpopulations using the biomarker clusters to reduce the dimensionality of the data.
Results: Agglomerative clustering identified five patient clusters and seven biomarker clusters. Final diagnoses categories were non-randomly distributed across the five patient clusters. In addition, two of the patient clusters were enriched with patients with ‘low cancer-risk’ characteristics. The biomarkers which contributed to the diagnostic classifiers for these two patient clusters were similar. In contrast, three of the patient clusters were significantly enriched with patients harboring ‘high cancer-risk” characteristics including proteinuria, aggressive pathological stage and grade, and malignant cytology. Patients in these three clusters included controls, that is, patients with other serious disease and patients with cancers other than UC. Biomarkers which contributed to the diagnostic classifiers for the largest ‘high cancer- risk’ cluster were different than those contributing to the classifiers for the ‘low cancer-risk’ clusters. Biomarkers which contributed to subpopulations that were split according to smoking status, gender and medication were different.
Conclusions: The systems biology approach applied in this study allowed the hematuric patients to cluster naturally on the basis of the heterogeneity within their biomarker data, into five distinct risk subpopulations. Our findings highlight an approach with the promise to unlock the potential of biomarkers. This will be especially valuable in the field of diagnostic bladder cancer where biomarkers are urgently required. Clinicians could interpret risk classification scores in the context of clinical parameters at the time of triage. This could reduce cystoscopies and enable priority diagnosis of aggressive diseases, leading to improved patient outcomes at reduced costs. © 2013 Emmert-Streib et al; licensee BioMed Central Ltd.
Resumo:
Background: There is no method routinely used to predict response to anthracycline and cyclophosphamide–based chemotherapy in the clinic; therefore patients often receive treatment for breast cancer with no benefit. Loss of the Fanconi anemia/BRCA (FA/BRCA) DNA damage response (DDR) pathway occurs in approximately 25% of breast cancer patients through several mechanisms and results in sensitization to DNA-damaging agents. The aim of this study was to develop an assay to detect DDR-deficient tumors associated with loss of the FA/BRCA pathway, for the purpose of treatment selection.
Methods: DNA microarray data from 21 FA patients and 11 control subjects were analyzed to identify genetic processes associated with a deficiency in DDR. Unsupervised hierarchical clustering was then performed using 60 BRCA1/2 mutant and 47 sporadic tumor samples, and a molecular subgroup was identified that was defined by the molecular processes represented within FA patients. A 44-gene microarray-based assay (the DDR deficiency assay) was developed to prospectively identify this subgroup from formalin-fixed, paraffin-embedded samples. All statistical tests were two-sided.
Results: In a publicly available independent cohort of 203 patients, the assay predicted complete pathologic response vs residual disease after neoadjuvant DNA-damaging chemotherapy (5-fluorouracil, anthracycline, and cyclophosphamide) with an odds ratio of 3.96 (95% confidence interval [Cl] =1.67 to 9.41; P = .002). In a new independent cohort of 191 breast cancer patients treated with adjuvant 5-fluorouracil, epirubicin, and cyclophosphamide, a positive assay result predicted 5-year relapse-free survival with a hazard ratio of 0.37 (95% Cl = 0.15 to 0.88; P = .03) compared with the assay negative population.
Conclusions: A formalin-fixed, paraffin-embedded tissue-based assay has been developed and independently validated as a predictor of response and prognosis after anthracycline/cyclophosphamide–based chemotherapy in the neoadjuvant and adjuvant settings. These findings warrant further validation in a prospective clinical study.
Resumo:
Traditional Chinese Medicines (TCMs) derived from animal horns are one of the most important types of Chinese medicine. In the present study, a fast and sensitive analytical method was established for qualitative and quantitative determination of 14 nucleosides and nucleobases in animal horns using hydrophilic interaction ultra-high performance liquid chromatography coupled with triple-quadruple tandem mass spectrometry (HILIC-UPLC-QQQ-MS/MS) in selective reaction monitoring (SRM) mode. The method was optimized and validated, and showed good linearity, precision, repeatability, and accuracy. The method was successfully used to determine contents of the 14 nucleosides and nucleobases in 25 animal horn samples. Hierarchical clustering analysis (HCA) and principal component analysis (PCA) were performed and the 25 samples were thereby divided into two groups, which agreed with taxonomy. The method may enable quick and effective search of substitutes for precious horns.
Resumo:
Invasive urothelial cell carcinoma (UCC) is characterized by increased chromosomal instability and follows an aggressive clinical course in contrast to non-invasive disease. To identify molecular processes that confer and maintain an aggressive malignant phenotype, we used a high-throughput genome-wide approach to interrogate a cohort of high and low clinical risk UCC tumors. Differential expression analyses highlighted cohesive dysregulation of critical genes involved in the G(2)/M checkpoint in aggressive UCC. Hierarchical clustering based on DNA Damage Response (DDR) genes separated tumors according to a pre-defined clinical risk phenotype. Using array-comparative genomic hybridization, we confirmed that the DDR was disrupted in tumors displaying high genomic instability. We identified DNA copy number gains at 20q13.2-q13.3 (AURKA locus) and determined that overexpression of AURKA accompanied dysregulation of DDR genes in high risk tumors. We postulated that DDR-deficient UCC tumors are advantaged by a selective pressure for AURKA associated override of M phase barriers and confirmed this in an independent tissue microarray series. This mechanism that enables cancer cells to maintain an aggressive phenotype forms a rationale for targeting AURKA as a therapeutic strategy in advanced stage UCC.
Resumo:
Background/Purpose:Juvenile idiopathic arthritis (JIA) comprises a poorly understood group of chronic, childhood onset, autoimmune diseases with variable clinical outcomes. We investigated whether profiling of the synovial fluid (SF) proteome by a fluorescent dye based, two-dimensional gel (DIGE) approach could distinguish the subset of patients in whom inflammation extends to affect a large number of joints, early in the disease process. The post-translational modifications to candidate protein markers were verified by a novel deglycosylation strategy.Methods:SF samples from 57 patients were obtained around time of initial diagnosis of JIA. At 1 year from inclusion patients were categorized according to ILAR criteria as oligoarticular arthritis (n=26), extended oligoarticular (n=8) and polyarticular disease (n=18). SF samples were labeled with Cy dyes and separated by two-dimensional electrophoresis. Multivariate analyses were used to isolate a panel of proteins which distinguish patient subgroups. Proteins were identified using MALDI-TOF mass spectrometry with vitamin D binding protein (VDBP) expression and siaylation further verified by immunohistochemistry, ELISA test and immunoprecipitation. Candidate biomarkers were compared to conventional inflammation measure C-reactive protein (CRP). Sialic acid residues were enzymatically cleaved from immunopurified SF VDBP, enriched by hydrophilic interaction liquid chromatography (HILIC) and analysed by mass spectrometry.Results:Hierarchical clustering based on the expression levels of a set of 23 proteins segregated the extended-to-be oligoarticular from the oligoarticular patients. A cleaved isoform of VDBP, spot 873, is present at significantly reduced levels in the SF of oligoarticular patients at risk of disease extension, relative to other subgroups (p<0.05). Conversely total levels of vitamin D binding protein are elevated in plasma and ROC curves indicate an improved diagnostic sensitivity to detect patients at risk of disease extension, over both spot 873 and CRP levels. Sialysed forms of intact immunopurified VDBP were more prevalent in persistent oligoarticular patient synovial fluids.Conclusion:The data indicate that a subset of the synovial fluid proteome may be used to stratify patients to determine risk of disease extension. Reduced conversion of VDBP to a macrophage activation factor may represent a novel pathway contributing to increased risk of disease extension in JIA patients.
Resumo:
Dissertação para obtenção do grau de Mestre em Engenharia Electrotécnica na Área de Especialização de Energia
Resumo:
This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.