89 resultados para multiple data

em Université de Lausanne, Switzerland


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: Gene expression analysis has emerged as a major biological research area, with real-time quantitative reverse transcription PCR (RT-QPCR) being one of the most accurate and widely used techniques for expression profiling of selected genes. In order to obtain results that are comparable across assays, a stable normalization strategy is required. In general, the normalization of PCR measurements between different samples uses one to several control genes (e. g. housekeeping genes), from which a baseline reference level is constructed. Thus, the choice of the control genes is of utmost importance, yet there is not a generally accepted standard technique for screening a large number of candidates and identifying the best ones. Results: We propose a novel approach for scoring and ranking candidate genes for their suitability as control genes. Our approach relies on publicly available microarray data and allows the combination of multiple data sets originating from different platforms and/or representing different pathologies. The use of microarray data allows the screening of tens of thousands of genes, producing very comprehensive lists of candidates. We also provide two lists of candidate control genes: one which is breast cancer-specific and one with more general applicability. Two genes from the breast cancer list which had not been previously used as control genes are identified and validated by RT-QPCR. Open source R functions are available at http://www.isrec.isb-sib.ch/similar to vpopovic/research/ Conclusion: We proposed a new method for identifying candidate control genes for RT-QPCR which was able to rank thousands of genes according to some predefined suitability criteria and we applied it to the case of breast cancer. We also empirically showed that translating the results from microarray to PCR platform was achievable.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

PURPOSE: A number of microarray studies have reported distinct molecular profiles of breast cancers (BC), such as basal-like, ErbB2-like, and two to three luminal-like subtypes. These were associated with different clinical outcomes. However, although the basal and the ErbB2 subtypes are repeatedly recognized, identification of estrogen receptor (ER) -positive subtypes has been inconsistent. Therefore, refinement of their molecular definition is needed. MATERIALS AND METHODS: We have previously reported a gene expression grade index (GGI), which defines histologic grade based on gene expression profiles. Using this algorithm, we assigned ER-positive BC to either high-or low-genomic grade subgroups and compared these with previously reported ER-positive molecular classifications. As further validation, we classified 666 ER-positive samples into subtypes and assessed their clinical outcome. RESULTS: Two ER-positive molecular subgroups (high and low genomic grade) could be defined using the GGI. Despite tracking a single biologic pathway, these were highly comparable to the previously described luminal A and B classification and significantly correlated to the risk groups produced using the 21-gene recurrence score. The two subtypes were associated with statistically distinct clinical outcome in both systemically untreated and tamoxifen-treated populations. CONCLUSION: The use of genomic grade can identify two clinically distinct ER-positive molecular subtypes in a simple and highly reproducible manner across multiple data sets. This study emphasizes the important role of proliferation-related genes in predicting prognosis in ER-positive BC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Avalanche forecasting is a complex process involving the assimilation of multiple data sources to make predictions over varying spatial and temporal resolutions. Numerically assisted forecasting often uses nearest neighbour methods (NN), which are known to have limitations when dealing with high dimensional data. We apply Support Vector Machines to a dataset from Lochaber, Scotland to assess their applicability in avalanche forecasting. Support Vector Machines (SVMs) belong to a family of theoretically based techniques from machine learning and are designed to deal with high dimensional data. Initial experiments showed that SVMs gave results which were comparable with NN for categorical and probabilistic forecasts. Experiments utilising the ability of SVMs to deal with high dimensionality in producing a spatial forecast show promise, but require further work.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent years have seen a significant increase in understanding of the host genetic and genomic determinants of susceptibility to HIV-1 infection and disease progression, driven in large part by candidate gene studies, genome-wide association studies, genome-wide transcriptome analyses, and large-scale in vitro genome screens. These studies have identified common variants in some host loci that clearly influence disease progression, characterized the scale and dynamics of gene and protein expression changes in response to infection, and provided the first comprehensive catalogs of genes and pathways involved in viral replication. Experimental models of AIDS and studies in natural hosts of primate lentiviruses have complemented and in some cases extended these findings. As the relevant technology continues to progress, the expectation is that such studies will increase in depth (e.g., to include host whole exome and whole genome sequencing) and in breadth (in particular, by integrating multiple data types).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending the corresponding approaches to the scale of a field site represents a major, and as-of-yet largely unresolved, challenge. To address this problem, we have developed downscaling procedure based on a non-linear Bayesian sequential simulation approach. The main objective of this algorithm is to estimate the value of the sparsely sampled hydraulic conductivity at non-sampled locations based on its relation to the electrical conductivity logged at collocated wells and surface resistivity measurements, which are available throughout the studied site. The in situ relationship between the hydraulic and electrical conductivities is described through a non-parametric multivariatekernel density function. Then a stochastic integration of low-resolution, large-scale electrical resistivity tomography (ERT) data in combination with high-resolution, local-scale downhole measurements of the hydraulic and electrical conductivities is applied. The overall viability of this downscaling approach is tested and validated by comparing flow and transport simulation through the original and the upscaled hydraulic conductivity fields. Our results indicate that the proposed procedure allows obtaining remarkably faithful estimates of the regional-scale hydraulic conductivity structure and correspondingly reliable predictions of the transport characteristics over relatively long distances.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The sparsely spaced highly permeable fractures of the granitic rock aquifer at Stang-er-Brune (Brittany, France) form a well-connected fracture network of high permeability but unknown geometry. Previous work based on optical and acoustic logging together with single-hole and cross-hole flowmeter data acquired in 3 neighbouring boreholes (70-100 m deep) has identified the most important permeable fractures crossing the boreholes and their hydraulic connections. To constrain possible flow paths by estimating the geometries of known and previously unknown fractures, we have acquired, processed and interpreted multifold, single- and cross-hole GPR data using 100 and 250 MHz antennas. The GPR data processing scheme consisting of timezero corrections, scaling, bandpass filtering and F-X deconvolution, eigenvector filtering, muting, pre-stack Kirchhoff depth migration and stacking was used to differentiate fluid-filled fracture reflections from source generated noise. The final stacked and pre-stack depth-migrated GPR sections provide high-resolution images of individual fractures (dipping 30-90°) in the surroundings (2-20 m for the 100 MHz antennas; 2-12 m for the 250 MHz antennas) of each borehole in a 2D plane projection that are of superior quality to those obtained from single-offset sections. Most fractures previously identified from hydraulic testing can be correlated to reflections in the single-hole data. Several previously unknown major near vertical fractures have also been identified away from the boreholes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

OBJECT: To study a scan protocol for coronary magnetic resonance angiography based on multiple breath-holds featuring 1D motion compensation and to compare the resulting image quality to a navigator-gated free-breathing acquisition. Image reconstruction was performed using L1 regularized iterative SENSE. MATERIALS AND METHODS: The effects of respiratory motion on the Cartesian sampling scheme were minimized by performing data acquisition in multiple breath-holds. During the scan, repetitive readouts through a k-space center were used to detect and correct the respiratory displacement of the heart by exploiting the self-navigation principle in image reconstruction. In vivo experiments were performed in nine healthy volunteers and the resulting image quality was compared to a navigator-gated reference in terms of vessel length and sharpness. RESULTS: Acquisition in breath-hold is an effective method to reduce the scan time by more than 30 % compared to the navigator-gated reference. Although an equivalent mean image quality with respect to the reference was achieved with the proposed method, the 1D motion compensation did not work equally well in all cases. CONCLUSION: In general, the image quality scaled with the robustness of the motion compensation. Nevertheless, the featured setup provides a positive basis for future extension with more advanced motion compensation methods.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.