152 resultados para Datasets
Resumo:
BACKGROUND: Tumorigenesis is characterised by changes in transcriptional control. Extensive transcript expression data have been acquired over the last decade and used to classify prostate cancers. Prostate cancer is, however, a heterogeneous multifocal cancer and this poses challenges in identifying robust transcript biomarkers.
METHODS: In this study, we have undertaken a meta-analysis of publicly available transcriptomic data spanning datasets and technologies from the last decade and encompassing laser capture microdissected and macrodissected sample sets.
RESULTS: We identified a 33 gene signature that can discriminate between benign tissue controls and localised prostate cancers irrespective of detection platform or dissection status. These genes were significantly overexpressed in localised prostate cancer versus benign tissue in at least three datasets within the Oncomine Compendium of Expression Array Data. In addition, they were also overexpressed in a recent exon-array dataset as well a prostate cancer RNA-seq dataset generated as part of the The Cancer Genomics Atlas (TCGA) initiative. Biologically, glycosylation was the single enriched process associated with this 33 gene signature, encompassing four glycosylating enzymes. We went on to evaluate the performance of this signature against three individual markers of prostate cancer, v-ets avian erythroblastosis virus E26 oncogene homolog (ERG) expression, prostate specific antigen (PSA) expression and androgen receptor (AR) expression in an additional independent dataset. Our signature had greater discriminatory power than these markers both for localised cancer and metastatic disease relative to benign tissue, or in the case of metastasis, also localised prostate cancer.
CONCLUSION: In conclusion, robust transcript biomarkers are present within datasets assembled over many years and cohorts and our study provides both examples and a strategy for refining and comparing datasets to obtain additional markers as more data are generated.
Resumo:
Alterations in transcriptional programs are fundamental to the development of cancers. The androgen receptor is central to the normal development of the prostate gland and to the development of prostate cancer. To a large extent this is believed to be due to the control of gene expression through the interaction of the androgen receptor with chromatin and subsequently with coregulators and the transcriptional machinery. Unbiased genome-wide studies have recently uncovered the recruitment sites that are gene-distal and intragenic rather than associated with proximal promoter regions. Whilst expression profiles from AR-positive primary prostate tumours and cell lines can directly relate to the AR cistrome in prostate cancer cells, this distribution raises significant challenges in making direct mechanistic connections. Furthermore, extrapolating from datasets assembled in one model to other model systems or clinical samples poses challenges if we are to use the AR-directed transcriptome to guide the development of novel biomarkers or treatment decisions. This review will provide an overview of the androgen receptor before addressing the challenges and opportunities created by whole-genome studies of the interplay between the androgen receptor and chromatin.
Resumo:
BACKGROUND: The aberrant transcription in cancer of genes normally associated with embryonic tissue differentiation at various organ sites may be a hallmark of tumour progression. For example, neuroendocrine differentiation is found more commonly in cancers destined to progress, including prostate and lung. We sought to identify proteins which are involved in neuroendocrine differentiation and differentially expressed in aggressive/metastatic tumours.
RESULTS: Expression arrays were used to identify up-regulated transcripts in a neuroendocrine (NE) transgenic mouse model of prostate cancer. Amongst these were several genes normally expressed in neural tissues, including the pro-neural transcription factors Ascl1 and Hes6. Using quantitative RT-PCR and immuno-histochemistry we showed that these same genes were highly expressed in castrate resistant, metastatic LNCaP cell-lines. Finally we performed a meta-analysis on expression array datasets from human clinical material. The expression of these pro-neural transcripts effectively segregates metastatic from localised prostate cancer and benign tissue as well as sub-clustering a variety of other human cancers.
CONCLUSION: By focussing on transcription factors known to drive normal tissue development and comparing expression signatures for normal and malignant mouse tissues we have identified two transcription factors, Ascl1 and Hes6, which appear effective markers for an aggressive phenotype in all prostate models and tissues examined. We suggest that the aberrant initiation of differentiation programs may confer a selective advantage on cells in all contexts and this approach to identify biomarkers therefore has the potential to uncover proteins equally applicable to pre-clinical and clinical cancer biology.
Resumo:
Patterns of glycosylation are important in cancer, but the molecular mechanisms that drive changes are often poorly understood. The androgen receptor drives prostate cancer (PCa) development and progression to lethal metastatic castration-resistant disease. Here we used RNA-Seq coupled with bioinformatic analyses of androgen-receptor (AR) binding sites and clinical PCa expression array data to identify ST6GalNAc1 as a direct and rapidly activated target gene of the AR in PCa cells. ST6GalNAc1 encodes a sialytransferase that catalyses formation of the cancer-associated sialyl-Tn antigen (sTn), which we find is also induced by androgen exposure. Androgens induce expression of a novel splice variant of the ST6GalNAc1 protein in PCa cells. This splice variant encodes a shorter protein isoform that is still fully functional as a sialyltransferase and able to induce expression of the sTn-antigen. Surprisingly, given its high expression in tumours, stable expression of ST6GalNAc1 in PCa cells reduced formation of stable tumours in mice, reduced cell adhesion and induced a switch towards a more mesenchymal-like cell phenotype in vitro. ST6GalNAc1 has a dynamic expression pattern in clinical datasets, beingsignificantly up-regulated in primary prostate carcinoma but relatively down-regulated in established metastatic tissue. ST6GalNAc1 is frequently upregulated concurrently with another important glycosylation enzyme GCNT1 previously associated with prostate cancer progression and implicated in Sialyl Lewis X antigen synthesis. Together our data establishes an androgen-dependent mechanism for sTn antigen expression in PCa, and are consistent with a general role for the androgen receptor in driving important coordinate changes to the glycoproteome during PCa progression.
Resumo:
Symposium Chair: Dr Jennifer McGaughey
Title: Early Warning Systems: problems, pragmatics and potential
Early Warning Systems (EWS) provide a mechanism for staff to recognise, refer and manage deteriorating patients on general hospital wards. Implementation of EWS in practice has required considerable change in the delivery of critical care across hospitals. Drawing their experience of these changes the authors will demonstrate the problems and potential of using EWS to improve patient outcomes.
The first paper (Dr Jennifer McGaughey: Early Warning Systems: what works?) reviews the research evidence regarding the factors that support or constrain the implementation of Early Warning System (EWS) in practice. These findings explain those processes which impact on the successful achievement of patient outcomes. In order to improve detection and standardise practice National EWS have been implemented in the United Kingdom. The second paper (Catherine Plowright: The implementation of the National EWS in a District General Hospital) focuses on the process of implementing and auditing a National EWS. This process improvement is essential to contribute to future collaborative research and collection of robust datasets to improve patient safety as recommended by the Royal College of Physicians (RCP 2012). To successfully implement NEWS in practice requires strategic planning and staff education. The practical issues of training staff is discussed in the third paper. This paper (Collette Laws-Chapman: Simulation as a modality to embed the use of Early Warning Systems) focuses on using simulation and structured debrief to enhance learning in the early recognition and management of deteriorating patients. This session emphasises the importance of cognitive and social skills developed alongside practical skills in the simulated setting.
Resumo:
Extracellular vesicles (EVs) released by parasites have important roles in establishing and maintaining infection. Analysis of the soluble and vesicular secretions of adult Fasciola hepatica has established a definitive characterisation of the total secretome of this zoonotic parasite. Fasciola secretes at least two sub-populations of EVs that differ according to size, cargo molecules and site of release from the parasite. The larger EVs are released from the specialised cells that line the parasite gastrodermus and contain the zymogen of the 37 kDa cathepsin L peptidase that performs a digestive function. The smaller exosome-like vesicle population originate from multivesicular bodies within the tegumental syncytium and carry many previously described immunomodulatory molecules that could be delivered into host cells. By integrating our proteomics data with recently available transcriptomic datasets we have detailed the pathways involved with EV biogenesis in F. hepatica and propose that the small exosome biogenesis occurs via ESCRT-dependent MVB formation in the tegumental syncytium before being shed from the apical plasma membrane. Furthermore, we found that the molecular machinery required for EV biogenesis is constitutively expressed across the intra-mammalian development stages of the parasite. By contrast, the cargo molecules packaged within the EVs are developmentally regulated, most likely to facilitate the parasites migration through host tissue and to counteract host immune attack.
Resumo:
The Antrim Coast Road stretching from the seaport of Larne in the East of Northern Ireland to the famous Giant’s Causeway in the North has a well-deserved reputation for being one of the most spectacular roads in Europe (Day, 2006). At various locations along the route, fluid interactions between the problematic geology, Jurassic Lias Clay and Triassic Mudstone overlain by Cretaceous Limestone and Tertiary Basalt, and environmental variables result in frequent instances of slope instability within the vadose zone. During such instances of instability, debris flows and composite mudflows encroach on the carriageway posing a hazard to road users. This paper examines the site investigative, geotechnical and spatial analysis techniques currently being implemented to monitor slope stability for one site at Straidkilly Point, Glenarm, Northern Ireland. An in-depth understanding of the geology was obtained via boreholes, resistivity surveys and laboratory testing. Environmental variables recorded by an on-site weather station were correlated with measured pore water pressure and soil moisture infiltration dynamic data.
Terrestrial LiDAR (TLS) was applied to the slope for the monitoring of failures, with surveys carried out on a bi-monthly basis. TLS monitoring allowed for the generation of Digital Elevation Models (DEMs) of difference, highlighting areas of recent movement, erosion and deposition. Morphology parameters were generated from the DEMs and include slope, curvature and multiple measures of roughness. Changes in the structure of the slope coupled with morphological parameters are characterised and linked to progressive failures from the temporal monitoring. In addition to TLS monitoring, Aerial LiDARi datasets were used for the spatio-morphological characterisation of the slope on a macro scale. Results from the geotechnical and environmental monitoring were compared with spatial data obtained through Terrestrial and Airborne LiDAR, providing a multi-faceted approach to slope stability characterization, which facilitates more informed management of geotechnical risk by the Northern Ireland Roads Service.
Resumo:
In this research, an agent-based model (ABM) was developed to generate human movement routes between homes and water resources in a rural setting, given commonly available geospatial datasets on population distribution, land cover and landscape resources. ABMs are an object-oriented computational approach to modelling a system, focusing on the interactions of autonomous agents, and aiming to assess the impact of these agents and their interactions on the system as a whole. An A* pathfinding algorithm was implemented to produce walking routes, given data on the terrain in the area. A* is an extension of Dijkstra's algorithm with an enhanced time performance through the use of heuristics. In this example, it was possible to impute daily activity movement patterns to the water resource for all villages in a 75 km long study transect across the Luangwa Valley, Zambia, and the simulated human movements were statistically similar to empirical observations on travel times to the water resource (Chi-squared, 95% confidence interval). This indicates that it is possible to produce realistic data regarding human movements without costly measurement as is commonly achieved, for example, through GPS, or retrospective or real-time diaries. The approach is transferable between different geographical locations, and the product can be useful in providing an insight into human movement patterns, and therefore has use in many human exposure-related applications, specifically epidemiological research in rural areas, where spatial heterogeneity in the disease landscape, and space-time proximity of individuals, can play a crucial role in disease spread.
Resumo:
Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
Resumo:
The Antrim Coast Road stretching from the seaport of Larne in the East of Northern Ireland has a well-deserved reputation for being one of the most spectacular roads in Europe (Day, 2006). However the problematic geology; Jurassic Lias Clay and Triassic Mudstone overlain by Cretaceous Limestone and Tertiary Basalt, and environmental variables result in frequent instances of slope instability manifested in both shallow debris flows and occasional massive rotational movements, creating a geotechnical risk to this highway. This paper describes how a variety of techniques are being used to both assess instability and monitor movement of these active slopes near one site at Straidkilly Point, Glenarm. An in-depth understanding of the geology was obtained via boreholes, resistivity surveys and laboratory testing. Environmental variables recorded by an on-site weather station were correlated with measured pore water pressure and soil moisture infiltration data. Terrestrial LiDAR (TLS), with surveys carried out on a bi-monthly basis allowed for the generation of Digital Elevation Models (DEMs) of difference, highlighting areas of recent movement, accumulation and depletion. Morphology parameters were generated from the DEMs and include slope, curvature and multiple measures of roughness. Changes in the structure of the slope coupled with morphological parameters were characterised and linked to progressive failures from the temporal monitoring. In addition to TLS monitoring, Aerial LiDAR datasets were used for the spatio-morphological characterisation of the slope on a macro scale. A Differential Global Positioning System (dGPS) was also deployed on site to provide a real-time warning system for gross movements, which were also correlated with environmental conditions. Frequent electrical resistivity tomography (ERT) surveys were also implemented to provide a better understanding of long-term changes in soil moisture and help to define the complex geology. The paper describes how the data obtained via a diverse range of methods has been combined to facilitate a more informed management regime of geotechnical risk by the Northern Ireland Roads Service.
Resumo:
Angiogenesis is important in cancer progression. Promising results in clinical trials have indicated that targeting vascular epidermal growth factor (VEGF) signaling may prolong lung cancer patient survival. In particular, various studies have implicated VEGFA as a potential prognostic marker in lung cancer, although prognostication using the expression of VEGF receptors (VEGFRs), such as fms-related tyrosine kinase 1 (FLT1; also known as VEGFR1) and kinase insert domain receptor (KDR; also known as VEGFR2), has produced varied results in different lung cancer studies. The present study aimed to investigate the prognostic significance of these three factors, alone or in combination. mRNA expression data were extracted from four independent lung cancer cohorts totaling 583 patients, and the association between mRNA expression and survival was investigated by performing statistical analyses. When VEGFA, FLT1 and KDR expression were considered alone, only VEGFA demonstrated a significant association with patient survival consistently across all four datasets (P<0.05). Patients with a high expression of VEGFA and one of the two receptors were associated with significantly worse survival than patients expressing low levels of VEGFA and the particular receptor (P<0.05). Notably, patients with a high level expression of all three genes in their tumor specimens were associated with a significantly shorter survival time compared with patients exhibiting a low level expression of one, two or all three genes (P<0.05). The results indicate that a high level of VEGFA expression and its receptors may be required for cancer progression. Therefore, these three factors should be considered together as a prognostic indicator for lung cancer patients.
Resumo:
A RkNN query returns all objects whose nearest k neighbors
contain the query object. In this paper, we consider RkNN
query processing in the case where the distances between
attribute values are not necessarily metric. Dissimilarities
between objects could then be a monotonic aggregate of dissimilarities
between their values, such aggregation functions
being specified at query time. We outline real world cases
that motivate RkNN processing in such scenarios. We consider
the AL-Tree index and its applicability in RkNN query
processing. We develop an approach that exploits the group
level reasoning enabled by the AL-Tree in RkNN processing.
We evaluate our approach against a Naive approach
that performs sequential scans on contiguous data and an
improved block-based approach that we provide. We use
real-world datasets and synthetic data with varying characteristics
for our experiments. This extensive empirical
evaluation shows that our approach is better than existing
methods in terms of computational and disk access costs,
leading to significantly better response times.
Resumo:
Most traditional data mining algorithms struggle to cope with the sheer scale of data efficiently. In this paper, we propose a general framework to accelerate existing clustering algorithms to cluster large-scale datasets which contain large numbers of attributes, items, and clusters. Our framework makes use of locality sensitive hashing (LSH) to significantly reduce the cluster search space. We also theoretically prove that our framework has a guaranteed error bound in terms of the clustering quality. This framework can be applied to a set of centroid-based clustering algorithms that assign an object to the most similar cluster, and we adopt the popular K-Modes categorical clustering algorithm to present how the framework can be applied. We validated our framework with five synthetic datasets and a real world Yahoo! Answers dataset. The experimental results demonstrate that our framework is able to speed up the existing clustering algorithm between factors of 2 and 6, while maintaining comparable cluster purity.
Resumo:
Learning or writing regular expressions to identify instances of a specific
concept within text documents with a high precision and recall is challenging.
It is relatively easy to improve the precision of an initial regular expression
by identifying false positives covered and tweaking the expression to avoid the
false positives. However, modifying the expression to improve recall is difficult
since false negatives can only be identified by manually analyzing all documents,
in the absence of any tools to identify the missing instances. We focus on partially
automating the discovery of missing instances by soliciting minimal user
feedback. We present a technique to identify good generalizations of a regular
expression that have improved recall while retaining high precision. We empirically
demonstrate the effectiveness of the proposed technique as compared to
existing methods and show results for a variety of tasks such as identification of
dates, phone numbers, product names, and course numbers on real world datasets
Resumo:
Textual problem-solution repositories are available today in
various forms, most commonly as problem-solution pairs from community
question answering systems. Modern search engines that operate on
the web can suggest possible completions in real-time for users as they
type in queries. We study the problem of generating intelligent query
suggestions for users of customized search systems that enable querying
over problem-solution repositories. Due to the small scale and specialized
nature of such systems, we often do not have the luxury of depending on
query logs for finding query suggestions. We propose a retrieval model
for generating query suggestions for search on a set of problem solution
pairs. We harness the problem solution partition inherent in such
repositories to improve upon traditional query suggestion mechanisms
designed for systems that search over general textual corpora. We evaluate
our technique over real problem-solution datasets and illustrate that
our technique provides large and statistically significant