23 resultados para Genomics -- Data processing

em Université de Lausanne, Switzerland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There are far-reaching conceptual similarities between bi-static surface georadar and post-stack, "zero-offset" seismic reflection data, which is expressed in largely identical processing flows. One important difference is, however, that standard deconvolution algorithms routinely used to enhance the vertical resolution of seismic data are notoriously problematic or even detrimental to the overall signal quality when applied to surface georadar data. We have explored various options for alleviating this problem and have tested them on a geologically well-constrained surface georadar dataset. Standard stochastic and direct deterministic deconvolution approaches proved to be largely unsatisfactory. While least-squares-type deterministic deconvolution showed some promise, the inherent uncertainties involved in estimating the source wavelet introduced some artificial "ringiness". In contrast, we found spectral balancing approaches to be effective, practical and robust means for enhancing the vertical resolution of surface georadar data, particularly, but not exclusively, in the uppermost part of the georadar section, which is notoriously plagued by the interference of the direct air- and groundwaves. For the data considered in this study, it can be argued that band-limited spectral blueing may provide somewhat better results than standard band-limited spectral whitening, particularly in the uppermost part of the section affected by the interference of the air- and groundwaves. Interestingly, this finding is consistent with the fact that the amplitude spectrum resulting from least-squares-type deterministic deconvolution is characterized by a systematic enhancement of higher frequencies at the expense of lower frequencies and hence is blue rather than white. It is also consistent with increasing evidence that spectral "blueness" is a seemingly universal, albeit enigmatic, property of the distribution of reflection coefficients in the Earth. Our results therefore indicate that spectral balancing techniques in general and spectral blueing in particular represent simple, yet effective means of enhancing the vertical resolution of surface georadar data and, in many cases, could turn out to be a preferable alternative to standard deconvolution approaches.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The sparsely spaced highly permeable fractures of the granitic rock aquifer at Stang-er-Brune (Brittany, France) form a well-connected fracture network of high permeability but unknown geometry. Previous work based on optical and acoustic logging together with single-hole and cross-hole flowmeter data acquired in 3 neighbouring boreholes (70-100 m deep) has identified the most important permeable fractures crossing the boreholes and their hydraulic connections. To constrain possible flow paths by estimating the geometries of known and previously unknown fractures, we have acquired, processed and interpreted multifold, single- and cross-hole GPR data using 100 and 250 MHz antennas. The GPR data processing scheme consisting of timezero corrections, scaling, bandpass filtering and F-X deconvolution, eigenvector filtering, muting, pre-stack Kirchhoff depth migration and stacking was used to differentiate fluid-filled fracture reflections from source generated noise. The final stacked and pre-stack depth-migrated GPR sections provide high-resolution images of individual fractures (dipping 30-90°) in the surroundings (2-20 m for the 100 MHz antennas; 2-12 m for the 250 MHz antennas) of each borehole in a 2D plane projection that are of superior quality to those obtained from single-offset sections. Most fractures previously identified from hydraulic testing can be correlated to reflections in the single-hole data. Several previously unknown major near vertical fractures have also been identified away from the boreholes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Microbe browser is a web server providing comparative microbial genomics data. It offers comprehensive, integrated data from GenBank, RefSeq, UniProt, InterPro, Gene Ontology and the Orthologs Matrix Project (OMA) database, displayed along with gene predictions from five software packages. The Microbe browser is daily updated from the source databases and includes all completely sequenced bacterial and archaeal genomes. The data are displayed in an easy-to-use, interactive website based on Ensembl software. The Microbe browser is available at http://microbe.vital-it.ch/. Programmatic access is available through the OMA application programming interface (API) at http://microbe.vital-it.ch/api.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: This study describes the prevalence, associated anomalies, and demographic characteristics of cases of multiple congenital anomalies (MCA) in 19 population-based European registries (EUROCAT) covering 959,446 births in 2004 and 2010. METHODS: EUROCAT implemented a computer algorithm for classification of congenital anomaly cases followed by manual review of potential MCA cases by geneticists. MCA cases are defined as cases with two or more major anomalies of different organ systems, excluding sequences, chromosomal and monogenic syndromes. RESULTS: The combination of an epidemiological and clinical approach for classification of cases has improved the quality and accuracy of the MCA data. Total prevalence of MCA cases was 15.8 per 10,000 births. Fetal deaths and termination of pregnancy were significantly more frequent in MCA cases compared with isolated cases (p < 0.001) and MCA cases were more frequently prenatally diagnosed (p < 0.001). Live born infants with MCA were more often born preterm (p < 0.01) and with birth weight < 2500 grams (p < 0.01). Respiratory and ear, face, and neck anomalies were the most likely to occur with other anomalies (34% and 32%) and congenital heart defects and limb anomalies were the least likely to occur with other anomalies (13%) (p < 0.01). However, due to their high prevalence, congenital heart defects were present in half of all MCA cases. Among males with MCA, the frequency of genital anomalies was significantly greater than the frequency of genital anomalies among females with MCA (p < 0.001). CONCLUSION: Although rare, MCA cases are an important public health issue, because of their severity. The EUROCAT database of MCA cases will allow future investigation on the epidemiology of these conditions and related clinical and diagnostic problems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper illustrates the practicality and efficiency of gravimetry for aquifer prospecting in arid zones. Known for the long and tedious data-processing it requires, this method becomes expeditious when simplified as presented here. Its use is then fully justified in a survey of this kind. During the study of the Teloua alluvial aquifer (Agadez, Niger), several ancient channels were clearly and rapidly located. Comparison of the results obtained here with those from previous studies demonstrates anew that for comprehensive prospecting, several complementary geophysical methods should always be employed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Natural genetic variation can have a pronounced influence on human taste perception, which in turn may influence food preference and dietary choice. Genome-wide association studies represent a powerful tool to understand this influence. To help optimize the design of future genome-wide-association studies on human taste perception we have used the well-known TAS2R38-PROP association as a tool to determine the relative power and efficiency of different phenotyping and data-analysis strategies. The results show that the choice of both data collection and data processing schemes can have a very substantial impact on the power to detect genotypic variation that affects chemosensory perception. Based on these results we provide practical guidelines for the design of future GWAS studies on chemosensory phenotypes. Moreover, in addition to the TAS2R38 gene past studies have implicated a number of other genetic loci to affect taste sensitivity to PROP and the related bitter compound PTC. None of these other locations showed genome-wide significant associations in our study. To facilitate further, target-gene driven, studies on PROP taste perception we provide the genome-wide list of p-values for all SNPs genotyped in the current study.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many regions of the world, including inland lakes, present with suboptimal conditions for the remotely sensed retrieval of optical signals, thus challenging the limits of available satellite data-processing tools, such as atmospheric correction models (ACM) and water constituent-retrieval (WCR) algorithms. Working in such regions, however, can improve our understanding of remote-sensing tools and their applicabil- ity in new contexts, in addition to potentially offering useful information about aquatic ecology. Here, we assess and compare 32 combinations of two ACMs, two WCRs, and three binary categories of data quality standards to optimize a remotely sensed proxy of plankton biomass in Lake Kivu. Each parameter set is compared against the available ground-truth match-ups using Spearman's right-tailed ρ. Focusing on the best sets from each ACM-WCR combination, their performances are discussed with regard to data distribution, sample size, spatial completeness, and seasonality. The results of this study may be of interest both for ecological studies on Lake Kivu and for epidemio- logical studies of disease, such as cholera, the dynamics of which has been associated with plankton biomass in other regions of the world.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Proteomics has come a long way from the initial qualitative analysis of proteins present in a given sample at a given time ("cataloguing") to large-scale characterization of proteomes, their interactions and dynamic behavior. Originally enabled by breakthroughs in protein separation and visualization (by two-dimensional gels) and protein identification (by mass spectrometry), the discipline now encompasses a large body of protein and peptide separation, labeling, detection and sequencing tools supported by computational data processing. The decisive mass spectrometric developments and most recent instrumentation news are briefly mentioned accompanied by a short review of gel and chromatographic techniques for protein/peptide separation, depletion and enrichment. Special emphasis is placed on quantification techniques: gel-based, and label-free techniques are briefly discussed whereas stable-isotope coding and internal peptide standards are extensively reviewed. Another special chapter is dedicated to software and computing tools for proteomic data processing and validation. A short assessment of the status quo and recommendations for future developments round up this journey through quantitative proteomics.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Achieving a high degree of dependability in complex macro-systems is challenging. Because of the large number of components and numerous independent teams involved, an overview of the global system performance is usually lacking to support both design and operation adequately. A functional failure mode, effects and criticality analysis (FMECA) approach is proposed to address the dependability optimisation of large and complex systems. The basic inductive model FMECA has been enriched to include considerations such as operational procedures, alarm systems. environmental and human factors, as well as operation in degraded mode. Its implementation on a commercial software tool allows an active linking between the functional layers of the system and facilitates data processing and retrieval, which enables to contribute actively to the system optimisation. The proposed methodology has been applied to optimise dependability in a railway signalling system. Signalling systems are typical example of large complex systems made of multiple hierarchical layers. The proposed approach appears appropriate to assess the global risk- and availability-level of the system as well as to identify its vulnerabilities. This enriched-FMECA approach enables to overcome some of the limitations and pitfalls previously reported with classical FMECA approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Neuronal oscillations are an important aspect of EEG recordings. These oscillations are supposed to be involved in several cognitive mechanisms. For instance, oscillatory activity is considered a key component for the top-down control of perception. However, measuring this activity and its influence requires precise extraction of frequency components. This processing is not straightforward. Particularly, difficulties with extracting oscillations arise due to their time-varying characteristics. Moreover, when phase information is needed, it is of the utmost importance to extract narrow-band signals. This paper presents a novel method using adaptive filters for tracking and extracting these time-varying oscillations. This scheme is designed to maximize the oscillatory behavior at the output of the adaptive filter. It is then capable of tracking an oscillation and describing its temporal evolution even during low amplitude time segments. Moreover, this method can be extended in order to track several oscillations simultaneously and to use multiple signals. These two extensions are particularly relevant in the framework of EEG data processing, where oscillations are active at the same time in different frequency bands and signals are recorded with multiple sensors. The presented tracking scheme is first tested with synthetic signals in order to highlight its capabilities. Then it is applied to data recorded during a visual shape discrimination experiment for assessing its usefulness during EEG processing and in detecting functionally relevant changes. This method is an interesting additional processing step for providing alternative information compared to classical time-frequency analyses and for improving the detection and analysis of cross-frequency couplings.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Basal ganglia and brain stem nuclei are involved in the pathophysiology of various neurological and neuropsychiatric disorders. Currently available structural T1-weighted (T1w) magnetic resonance images do not provide sufficient contrast for reliable automated segmentation of various subcortical grey matter structures. We use a novel, semi-quantitative magnetization transfer (MT) imaging protocol that overcomes limitations in T1w images, which are mainly due to their sensitivity to the high iron content in subcortical grey matter. We demonstrate improved automated segmentation of putamen, pallidum, pulvinar and substantia nigra using MT images. A comparison with segmentation of high-quality T1w images was performed in 49 healthy subjects. Our results show that MT maps are highly suitable for automated segmentation, and so for multi-subject morphometric studies with a focus on subcortical structures.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

MicroRNAs (miRs) are involved in the pathogenesis of several neoplasms; however, there are no data on their expression patterns and possible roles in adrenocortical tumors. Our objective was to study adrenocortical tumors by an integrative bioinformatics analysis involving miR and transcriptomics profiling, pathway analysis, and a novel, tissue-specific miR target prediction approach. Thirty-six tissue samples including normal adrenocortical tissues, benign adenomas, and adrenocortical carcinomas (ACC) were studied by simultaneous miR and mRNA profiling. A novel data-processing software was used to identify all predicted miR-mRNA interactions retrieved from PicTar, TargetScan, and miRBase. Tissue-specific target prediction was achieved by filtering out mRNAs with undetectable expression and searching for mRNA targets with inverse expression alterations as their regulatory miRs. Target sets and significant microarray data were subjected to Ingenuity Pathway Analysis. Six miRs with significantly different expression were found. miR-184 and miR-503 showed significantly higher, whereas miR-511 and miR-214 showed significantly lower expression in ACCs than in other groups. Expression of miR-210 was significantly lower in cortisol-secreting adenomas than in ACCs. By calculating the difference between dCT(miR-511) and dCT(miR-503) (delta cycle threshold), ACCs could be distinguished from benign adenomas with high sensitivity and specificity. Pathway analysis revealed the possible involvement of G2/M checkpoint damage in ACC pathogenesis. To our knowledge, this is the first report describing miR expression patterns and pathway analysis in sporadic adrenocortical tumors. miR biomarkers may be helpful for the diagnosis of adrenocortical malignancy. This tissue-specific target prediction approach may be used in other tumors too.