2 resultados para VLE data sets


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Tide gauge data are identified as legacy data given the radical transition between observation method and required output format associated with tide gauges over the 20th-century. Observed water level variation through tide-gauge records is regarded as the only significant basis for determining recent historical variation (decade to century) in mean sea-level and storm surge. There are limited tide gauge records that cover the 20th century, such that the Belfast (UK) Harbour tide gauge would be a strategic long-term (110 years) record, if the full paper-based records (marigrams) were digitally restructured to allow for consistent data analysis. This paper presents the methodology of extracting a consistent time series of observed water levels from the 5 different Belfast Harbour tide gauges’ positions/machine types, starting late 1901. Tide-gauge data was digitally retrieved from the original analogue (daily) records by scanning the marigrams and then extracting the sequential tidal elevations with graph-line seeking software (Ungraph™). This automation of signal extraction allowed the full Belfast series to be retrieved quickly, relative to any manual x–y digitisation of the signal. Restructuring variably lengthed tidal data sets to a consistent daily, monthly and annual file format was undertaken by project-developed software: Merge&Convert and MergeHYD allow consistent water level sampling both at 60 min (past standard) and 10 min intervals, the latter enhancing surge measurement. Belfast tide-gauge data have been rectified, validated and quality controlled (IOC 2006 standards). The result is a consistent annual-based legacy data series for Belfast Harbour that includes over 2 million tidal-level data observations.