970 resultados para DATASETS
Resumo:
Integrating evidence from multiple domains is useful in prioritizing disease candidate genes for subsequent testing. We ranked all known human genes (n = 3819) under linkage peaks in the Irish Study of High-Density Schizophrenia Families using three different evidence domains: 1) a meta-analysis of microarray gene expression results using the Stanley Brain collection, 2) a schizophrenia protein-protein interaction network, and 3) a systematic literature search. Each gene was assigned a domain-specific p-value and ranked after evaluating the evidence within each domain. For comparison to this
ranking process, a large-scale candidate gene hypothesis was also tested by including genes with Gene Ontology terms related to neurodevelopment. Subsequently, genotypes of 3725 SNPs in 167 genes from a custom Illumina iSelect array were used to evaluate the top ranked vs. hypothesis selected genes. Seventy-three genes were both highly ranked and involved in neurodevelopment (category 1) while 42 and 52 genes were exclusive to neurodevelopment (category 2) or highly ranked (category 3), respectively. The most significant associations were observed in genes PRKG1, PRKCE, and CNTN4 but no individual SNPs were significant after correction for multiple testing. Comparison of the approaches showed an excess of significant tests using the hypothesis-driven neurodevelopment category. Random selection of similar sized genes from two independent genome-wide association studies (GWAS) of schizophrenia showed the excess was unlikely by chance. In a further meta-analysis of three GWAS datasets, four candidate SNPs reached nominal significance. Although gene ranking using integrated sources of prior information did not enrich for significant results in the current experiment, gene selection using an a priori hypothesis (neurodevelopment) was superior to random selection. As such, further development of gene ranking strategies using more carefully selected sources of information is warranted.
Resumo:
FLT3-ITD mutations are prevalent mutations in acute myeloid leukaemia (AML). PRL-3, a metastasis-associated phosphatase, is a downstream target of FLT3-ITD. This study investigates the regulation and function of PRL-3 in leukaemia cell lines and AML patients associated with FLT3-ITD mutations. PRL-3 expression is upregulated by the FLT3-STAT5 signalling pathway in leukaemia cells, leading an activation of AP-1 transcription factors via ERK and JNK pathways. PRL-3-depleted AML cells showed a significant decrease in cell growth. Clinically, high PRL-3 mRNA expression was associated with FLT3-ITD mutations in four independent AML datasets with 1158 patients. Multivariable Cox-regression analysis on our Cohort 1 with 221 patients identified PRL-3 as a novel prognostic marker independent of other clinical parameters. Kaplan-Meier analysis showed high PRL-3 mRNA expression was significantly associated with poorer survival among 491 patients with normal karyotype. Targeting PRL-3 reversed the oncogenic effects in FLT3-ITD AML models in vitro and in vivo. Herein, we suggest that PRL-3 could serve as a prognostic marker to predict poorer survival and as a promising novel therapeutic target for AML patients.
Resumo:
Web sites that rely on databases for their content are now ubiquitous. Query result pages are dynamically generated from these databases in response to user-submitted queries. Automatically extracting structured data from query result pages is a challenging problem, as the structure of the data is not explicitly represented. While humans have shown good intuition in visually understanding data records on a query result page as displayed by a web browser, no existing approach to data record extraction has made full use of this intuition. We propose a novel approach, in which we make use of the common sources of evidence that humans use to understand data records on a displayed query result page. These include structural regularity, and visual and content similarity between data records displayed on a query result page. Based on these observations we propose new techniques that can identify each data record individually, while ignoring noise items, such as navigation bars and adverts. We have implemented these techniques in a software prototype, rExtractor, and tested it using two datasets. Our experimental results show that our approach achieves significantly higher accuracy than previous approaches. Furthermore, it establishes the case for use of vision-based algorithms in the context of data extraction from web sites.
Resumo:
Identifying differential expression of genes in psoriatic and healthy skin by microarray data analysis is a key approach to understand the pathogenesis of psoriasis. Analysis of more than one dataset to identify genes commonly upregulated reduces the likelihood of false positives and narrows down the possible signature genes. Genes controlling the critical balance between T helper 17 and regulatory T cells are of special interest in psoriasis. Our objectives were to identify genes that are consistently upregulated in lesional skin from three published microarray datasets. We carried out a reanalysis of gene expression data extracted from three experiments on samples from psoriatic and nonlesional skin using the same stringency threshold and software and further compared the expression levels of 92 genes related to the T helper 17 and regulatory T cell signaling pathways. We found 73 probe sets representing 57 genes commonly upregulated in lesional skin from all datasets. These included 26 probe sets representing 20 genes that have no previous link to the etiopathogenesis of psoriasis. These genes may represent novel therapeutic targets and surely need more rigorous experimental testing to be validated. Our analysis also identified 12 of 92 genes known to be related to the T helper 17 and regulatory T cell signaling pathways, and these were found to be differentially expressed in the lesional skin samples.
Resumo:
We examine mid- to late Holocene centennial-scale climate variability in Ireland using proxy data from peatlands, lakes and a speleothem. A high degree of between-record variability is apparent in the proxy data and significant chronological uncertainties are present. However, tephra layers provide a robust tool for correlation and improve the chronological precision of the records. Although we can find no statistically significant coherence in the dataset as a whole, a selection of high-quality peatland water table reconstructions co-vary more than would be expected by chance alone. A locally weighted regression model with bootstrapping can be used to construct a ‘best-estimate’ palaeoclimatic reconstruction from these datasets. Visual comparison and cross-wavelet analysis of peatland water table compilations from Ireland and Northern Britain show that there are some periods of coherence between these records. Some terrestrial palaeoclimatic changes in Ireland appear to coincide with changes in the North Atlantic thermohaline circulation and solar activity. However, these relationships are inconsistent and may be obscured by chronological uncertainties. We conclude by suggesting an agenda for future Holocene climate research in Ireland. ©2013 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we introduce an application of matrix factorization to produce corpus-derived, distributional
models of semantics that demonstrate cognitive plausibility. We find that word representations
learned by Non-Negative Sparse Embedding (NNSE), a variant of matrix factorization, are sparse,
effective, and highly interpretable. To the best of our knowledge, this is the first approach which
yields semantic representation of words satisfying these three desirable properties. Though extensive
experimental evaluations on multiple real-world tasks and datasets, we demonstrate the superiority
of semantic models learned by NNSE over other state-of-the-art baselines.
Resumo:
In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.
Resumo:
Our review of paleoclimate information for New Zealand pertaining to the past 30,000 years has identified a general sequence of climatic events, spanning the onset of cold conditions marking the final phase of the Last Glaciation, through to the emergence to full interglacial conditions in the early Holocene. In order to facilitate more detailed assessments of climate variability and any leads or lags in the timing of climate changes across the region, a composite stratotype is proposed for New Zealand. The stratotype is based on terrestrial stratigraphic records and is intended to provide a standard reference for the intercomparison and evaluation of climate proxy records. We nominate a specific stratigraphic type record for each climatic event, using either natural exposure or drill core stratigraphic sections. Type records were selected on thebasis of having very good numerical age control and a clear proxy record. In all cases the main proxy of the type record is subfossil pollen. The type record for the period from ca 30 to ca 18 calendar kiloyears BP (cal. ka BP) is designated in lake-bed sediments from a small morainic kettle lake (Galway tarn) in western South Island. The Galway tarn type record spans a period of full glacial conditions (Last Glacial Coldest Period, LGCP) within the Otira Glaciation, and includes three cold stadials separated by two cool interstadials. The type record for the emergence from glacial conditions following the termination of the Last Glaciation (post-Termination amelioration) is in a core of lake sediments from a maar (Pukaki volcanic crater) in Auckland, northern North Island, and spans from ca 18 to 15.64±0.41 cal. ka BP. The type record for the Lateglacial period is an exposure of interbedded peat and mud at montane Kaipo bog, eastern North Island. In this high-resolution type record, an initial mild period was succeeded at 13.74±0.13 cal. ka BP by a cooler period, which after 12.55±0.14 cal. ka BP gave way to a progressive ascent to full interglacial conditions that were achieved by 11.88±0.18 cal. ka BP. Although a type section is not formally designated for the Holocene Interglacial (11.88±0.18 cal. ka BP to the present day), the sedimentary record of Lake Maratoto on the Waikato lowlands, northwestern North Island, is identified as a prospective type section pending the integration and updating of existing stratigraphic and proxy datasets, and age models. The type records are interconnected by one or more dated tephra layers, the ages of which are derived from Bayesian depositional modelling and OxCal-based calibrations using the IntCal09 dataset. Along with the type sections and the Lake Maratoto record, important, well-dated terrestrial reference records are provided for each climate event. Climate proxies from these reference records include pollen flora, stable isotopes from speleothems, beetle and chironomid fauna, and glacier moraines. The regional composite stratotype provides a benchmark against which to compare other records and proxies. Based on the composite stratotype, we provide an updated climate event stratigraphic classification for the New Zealand region. © 2013 Elsevier Ltd.
Resumo:
Background: Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take >2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping.
Results: cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance.
Conclusion: Emerging 'omics' technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap.
Resumo:
ABSTRACT
The start of the Upper Wurmian in the Alps was marked by massive fluvioglacial aggradation prior to the arrival of the Central Alpine glaciers. In 1984, the Subcommission on European Quaternary Stratigraphy defined the clay pit of Baumkirchen (in the foreland of the Inn Valley, Austria) as the stratotype for the Middle to Upper Wurmian boundary in the Alps. Key for the selection of this site was its radiocarbon chronology, which still ranks among the most important datasets of this time interval in the Alps. In this study we re-sampled all available original plant specimens and established an accelerator mass spectrometry chronology which supersedes the published 40-year-old chronology. The new data show a much smaller scatter and yielded slightly older conventional radiocarbon dates clustering at ca. 31 C-14 ka BP. When calibrated using INTCAL13 the new data suggest that the sampled interval of 653-681 m in the clay pit was deposited 34-36 cal ka BP. Using two new radiocarbon dates of bone fragments found in the fluvioglacial gravel above the banded clays allows us to constrain the timing of the marked change from lacustrine to fluvioglacial sedimentation to ca. 32-33 cal ka BP, which suggests a possible link to the Heinrich 3 event in the North Atlantic. Copyright (c) 2013 John Wiley & Sons, Ltd.
Resumo:
Model selection between competing models is a key consideration in the discovery of prognostic multigene signatures. The use of appropriate statistical performance measures as well as verification of biological significance of the signatures is imperative to maximise the chance of external validation of the generated signatures. Current approaches in time-to-event studies often use only a single measure of performance in model selection, such as logrank test p-values, or dichotomise the follow-up times at some phase of the study to facilitate signature discovery. In this study we improve the prognostic signature discovery process through the application of the multivariate partial Cox model combined with the concordance index, hazard ratio of predictions, independence from available clinical covariates and biological enrichment as measures of signature performance. The proposed framework was applied to discover prognostic multigene signatures from early breast cancer data. The partial Cox model combined with the multiple performance measures were used in both guiding the selection of the optimal panel of prognostic genes and prediction of risk within cross validation without dichotomising the follow-up times at any stage. The signatures were successfully externally cross validated in independent breast cancer datasets, yielding a hazard ratio of 2.55 [1.44, 4.51] for the top ranking signature.
Resumo:
According to the axiomatic literature on consensus methods, the best collective choice by one method of preference aggregation can easily be the worst by another. Are award committees, electorates, managers, online retailers, and web-based recommender systems stuck with an impossibility of rational preference aggregation? We investigate this social choice conundrum for seven social choice methods: Condorcet, Borda, Plurality, Antiplurality, the Single Transferable Vote, Coombs, and Plurality Runoff. We rely on Monte Carlo simulations for theoretical results and on twelve ballot datasets from American Psychological Association (APA) presidential elections for empirical results. Each of these elections provides partial rankings of five candidates from about 13,000 to about 20,000 voters. APA preferences are neither domain-restricted nor generated by an Impartial Culture. We find virtually no trace of a Condorcet paradox. In direct contrast with the classical social choice conundrum, competing consensus methods agree remarkably well, especially on the overall best and worst options. The agreement is also robust under perturbations of the preference prole via resampling, even in relatively small pseudosamples. We also explore prescriptive implications of our findings.
Resumo:
The management of water resources in Ireland prior to the Water Framework Directive (WFD) has focussed on surface water and groundwater as separate entities. A critical element to the successful implementation of the
WFD is to improve our understanding of the interaction between the two and flow mechanisms by which groundwaters discharge to surface waters. An improved understanding of the contribution of groundwater to surface water is required for the classification of groundwater body status and the determination of groundwater quality thresholds. The results of the study will also have a wider application to many areas of the WFD.
A subcommittee of the WFD Groundwater Working Group (GWWG) has been formed to develop a methodology to estimate the groundwater contribution to Irish Rivers. The group has selected a number of analytical techniques to quantify components of stream flow in an Irish context (Master Recession Curve, Unit Hydrograph, Flood Studies Report methodologies and
hydrogeological analytical modelling). The components of stream flow that can be identified include deep groundwater, intermediate and overland. These analyses have been tested on seven pilot catchments that have a variety of hydrogeological settings and have been used to inform and constrain a mathematical model. The mathematical model used was the NAM (NedbØr-AfstrØmnings-Model) rainfall-runoff model which is a module of DHIs MIKE 11 modelling suite. The results from these pilot catchments have been used to develop a decision model based on catchment descriptors from GIS datasets for the selection of NAM parameters. The datasets used include the mapping of aquifers, vulnerability and subsoils, soils, the Digital Terrain Model, CORINE and lakes. The national coverage of the GIS datasets has allowed the extrapolation of the mathematical model to regional catchments across Ireland.
Resumo:
Public policy is expected to be both responsive to societal views and accountable to all citizens. As such, policy is informed, but not governed, by public opinion. Therefore, understanding the attitudes of the public is important, both to help shape and to evaluate policy priorities. In this way, surveys play a potentially important role in the policy making process.
The aim of this paper is to explore the role of survey research in policy making in Northern Ireland, with particular reference to community relations (better known internationally as good relations). In a region which is emerging from 40 years of conflict, community relations is a key policy area.
For more than 20 years, public attitudes to community relations have been recorded and monitored using two key surveys: the Northern Ireland Social Attitudes Survey (1989 to 1996) and the Northern Ireland Life and Times Survey (1998 to present). This paper will illustrate how these important time series datasets have been used to both inform and evaluate government policy in relation to community relations. By using four examples, we will highlight how these survey data have provided key government indicators of community relations, as well as how they have been used by other groups (such as NGOs) within policy consultation debates. Thus, the paper will provide a worked example of the integral, and bi-directional relationship between attitude measurement and policy making.
Resumo:
Human action recognition is an important problem in computer vision, which has been applied to many applications. However, how to learn an accurate and discriminative representation of videos based on the features extracted from videos still remains to be a challenging problem. In this paper, we propose a novel method named low-rank representation based action recognition to recognize human actions. Given a dictionary, low-rank representation aims at finding the lowestrank representation of all data, which can capture the global data structures. According to its characteristics, low-rank representation is robust against noises. Experimental results demonstrate the effectiveness of the proposed approach on several publicly available datasets.