149 resultados para Data encoding
Resumo:
The European Surveillance of Congenital Anomalies (EUROCAT) network of population-based congenital anomaly registries is an important source of epidemiologic information on congenital anomalies in Europe covering live births, fetal deaths from 20 weeks gestation, and terminations of pregnancy for fetal anomaly. EUROCAT's policy is to strive for high-quality data, while ensuring consistency and transparency across all member registries. A set of 30 data quality indicators (DQIs) was developed to assess five key elements of data quality: completeness of case ascertainment, accuracy of diagnosis, completeness of information on EUROCAT variables, timeliness of data transmission, and availability of population denominator information. This article describes each of the individual DQIs and presents the output for each registry as well as the EUROCAT (unweighted) average, for 29 full member registries for 2004-2008. This information is also available on the EUROCAT website for previous years. The EUROCAT DQIs allow registries to evaluate their performance in relation to other registries and allows appropriate interpretations to be made of the data collected. The DQIs provide direction for improving data collection and ascertainment, and they allow annual assessment for monitoring continuous improvement. The DQI are constantly reviewed and refined to best document registry procedures and processes regarding data collection, to ensure appropriateness of DQI, and to ensure transparency so that the data collected can make a substantial and useful contribution to epidemiologic research on congenital anomalies.
Resumo:
As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data. Database URL: http://bgee.unil.ch/
Resumo:
High-throughput technologies are now used to generate more than one type of data from the same biological samples. To properly integrate such data, we propose using co-modules, which describe coherent patterns across paired data sets, and conceive several modular methods for their identification. We first test these methods using in silico data, demonstrating that the integrative scheme of our Ping-Pong Algorithm uncovers drug-gene associations more accurately when considering noisy or complex data. Second, we provide an extensive comparative study using the gene-expression and drug-response data from the NCI-60 cell lines. Using information from the DrugBank and the Connectivity Map databases we show that the Ping-Pong Algorithm predicts drug-gene associations significantly better than other methods. Co-modules provide insights into possible mechanisms of action for a wide range of drugs and suggest new targets for therapy
Resumo:
ICEclc is a mobile genetic element found in two copies on the chromosome of the bacterium Pseudomonas knackmussii B13. ICEclc harbors genes encoding metabolic pathways for the degradation of chlorocatechols (CLC) and 2-aminophenol (2AP). At low frequencies, ICEclc excises from the chromosome, closes into a circular DNA molecule which can transfer to another bacterium via conjugation. Once in the recipient cell, ICEclc can reintegrate into the chromosome by site-specific recombination. This thesis aimed at identifying the regulatory network underlying the decisions for ICEclc horizontal transfer (HGT). The first chapter is an introduction on integrative and conjugative elements (ICEs) more in general, of which ICEclc is one example. In particular I emphasized the current knowledge of regulation and conjugation machineries of the different classes of ICE. In the second chapter, I describe a transcriptional analysis using microarrays and other experiments to understand expression of ICEclc in exponential and stationary phase. By overlaying transcriptomic profiles with Northern hybridizations and RT- PCR data, we established a transcription map for the entire core region of ICEclc, a region assumed to encode the ICE conjugation process. We also demonstrated how transcription of the ICEclc core is maximal in stationary phase, which correlates to expression of reporter genes fused to key ICEclc promoters. In the third chapter, I present a transcriptome analysis of ICEclc in a variety of different host species, in order to explore whether there are species-specific differences. In the fourth chapter, I focus on the role of a curious ICEclc-encoded TetR-type transcriptional repressor. We find that this gene, which we name mfsR, not only controls its own expression but that of a set of genes for a putative multi-drug efflux pump (mfsABC) as well. By using a combination of biochemical and molecular biology techniques, I could show that MfsR specifically binds to operator boxes in two ICEclc promoters (PmfsR and PmfsA), inhibiting the transcription of both the mfsR and mfsABC-orf38184 operons. Although we could not detect a clear phenotype of an mfsABC deletion, we discuss the implications of pump gene reorganizations in ICEclc and close relatives. In the fifth chapter, we find that mfsR not only controls its own expression and that of the mfsABC operon, but is also indirectly controlling ICEclc transfer. Using gene deletions, microarrays, transfer assays and microscopy-based reporter fusions, we demonstrate that mfsR actually controls a small operon of three regulatory genes. The last gene of this mfsR operon, orf17162, encodes a LysR-type activator that when deleted strongly impairs ICEclc transfer. Interestingly, deletion of mfsR leads to transfer competence in almost all cells, thereby overruling the bistability process in the wild-type. In the final sixth chapter, I discuss the relevance of the present thesis and the resulting perspectives for future studies.
Resumo:
Sequence homologies suggest that the Bacillus subtilis 168 tagO gene encodes UDP-N-acetylglucosamine:undecaprenyl-P N-acetylglucosaminyl 1-P transferase, the enzyme responsible for catalysing the first step in the synthesis of the teichoic acid linkage unit, i.e. the formation of undecaprenyl-PP-N-acetylglucosamine. Inhibition of tagO expression mediated by an IPTG-inducible P(spac) promoter led to the development of a coccoid cell morphology, a feature characteristic of mutants blocked in teichoic acid synthesis. Indeed, analyses of the cell-wall phosphate content, as well as the incorporation of radioactively labelled precursors, revealed that the synthesis of poly(glycerol phosphate) and poly(glucosyl N-acetylgalactosamine 1-phosphate), the two strain 168 teichoic acids known to share the same linkage unit, was affected. Surprisingly, under phosphate limitation, deficiency of TagO precludes the synthesis of teichuronic acid, which is normally induced under these conditions. The regulatory region of tagO, containing two partly overlapping sigma(A)-controlled promoters, is similar to that of sigA, the gene encoding the major sigma factor responsible for growth. Here, the authors discuss the possibility that TagO may represent a pivotal element in the multi-enzyme complexes responsible for the synthesis of anionic cell-wall polymers, and that it may play one of the key roles in balanced cell growth.
Resumo:
BACKGROUND: Greater tobacco smoking and alcohol consumption and lower body mass index (BMI) increase odds ratios (OR) for oral cavity, oropharyngeal, hypopharyngeal, and laryngeal cancers; however, there are no comprehensive sex-specific comparisons of ORs for these factors. METHODS: We analyzed 2,441 oral cavity (925 women and 1,516 men), 2,297 oropharynx (564 women and 1,733 men), 508 hypopharynx (96 women and 412 men), and 1,740 larynx (237 women and 1,503 men) cases from the INHANCE consortium of 15 head and neck cancer case-control studies. Controls numbered from 7,604 to 13,829 subjects, depending on analysis. Analyses fitted linear-exponential excess ORs models. RESULTS: ORs were increased in underweight (<18.5 BMI) relative to normal weight (18.5-24.9) and reduced in overweight and obese categories (>/=25 BMI) for all sites and were homogeneous by sex. ORs by smoking and drinking in women compared with men were significantly greater for oropharyngeal cancer (p < 0.01 for both factors), suggestive for hypopharyngeal cancer (p = 0.05 and p = 0.06, respectively), but homogeneous for oral cavity (p = 0.56 and p = 0.64) and laryngeal (p = 0.18 and p = 0.72) cancers. CONCLUSIONS: The extent that OR modifications of smoking and drinking by sex for oropharyngeal and, possibly, hypopharyngeal cancers represent true associations, or derive from unmeasured confounders or unobserved sex-related disease subtypes (e.g., human papillomavirus-positive oropharyngeal cancer) remains to be clarified.
Resumo:
Des progrès significatifs ont été réalisés dans le domaine de l'intégration quantitative des données géophysique et hydrologique l'échelle locale. Cependant, l'extension à de plus grandes échelles des approches correspondantes constitue encore un défi majeur. Il est néanmoins extrêmement important de relever ce défi pour développer des modèles fiables de flux des eaux souterraines et de transport de contaminant. Pour résoudre ce problème, j'ai développé une technique d'intégration des données hydrogéophysiques basée sur une procédure bayésienne de simulation séquentielle en deux étapes. Cette procédure vise des problèmes à plus grande échelle. L'objectif est de simuler la distribution d'un paramètre hydraulique cible à partir, d'une part, de mesures d'un paramètre géophysique pertinent qui couvrent l'espace de manière exhaustive, mais avec une faible résolution (spatiale) et, d'autre part, de mesures locales de très haute résolution des mêmes paramètres géophysique et hydraulique. Pour cela, mon algorithme lie dans un premier temps les données géophysiques de faible et de haute résolution à travers une procédure de réduction déchelle. Les données géophysiques régionales réduites sont ensuite reliées au champ du paramètre hydraulique à haute résolution. J'illustre d'abord l'application de cette nouvelle approche dintégration des données à une base de données synthétiques réaliste. Celle-ci est constituée de mesures de conductivité hydraulique et électrique de haute résolution réalisées dans les mêmes forages ainsi que destimations des conductivités électriques obtenues à partir de mesures de tomographic de résistivité électrique (ERT) sur l'ensemble de l'espace. Ces dernières mesures ont une faible résolution spatiale. La viabilité globale de cette méthode est testée en effectuant les simulations de flux et de transport au travers du modèle original du champ de conductivité hydraulique ainsi que du modèle simulé. Les simulations sont alors comparées. Les résultats obtenus indiquent que la procédure dintégration des données proposée permet d'obtenir des estimations de la conductivité en adéquation avec la structure à grande échelle ainsi que des predictions fiables des caractéristiques de transports sur des distances de moyenne à grande échelle. Les résultats correspondant au scénario de terrain indiquent que l'approche d'intégration des données nouvellement mise au point est capable d'appréhender correctement les hétérogénéitées à petite échelle aussi bien que les tendances à gande échelle du champ hydraulique prévalent. Les résultats montrent également une flexibilté remarquable et une robustesse de cette nouvelle approche dintégration des données. De ce fait, elle est susceptible d'être appliquée à un large éventail de données géophysiques et hydrologiques, à toutes les gammes déchelles. Dans la deuxième partie de ma thèse, j'évalue en détail la viabilité du réechantillonnage geostatique séquentiel comme mécanisme de proposition pour les méthodes Markov Chain Monte Carlo (MCMC) appliquées à des probmes inverses géophysiques et hydrologiques de grande dimension . L'objectif est de permettre une quantification plus précise et plus réaliste des incertitudes associées aux modèles obtenus. En considérant une série dexemples de tomographic radar puits à puits, j'étudie deux classes de stratégies de rééchantillonnage spatial en considérant leur habilité à générer efficacement et précisément des réalisations de la distribution postérieure bayésienne. Les résultats obtenus montrent que, malgré sa popularité, le réechantillonnage séquentiel est plutôt inefficace à générer des échantillons postérieurs indépendants pour des études de cas synthétiques réalistes, notamment pour le cas assez communs et importants où il existe de fortes corrélations spatiales entre le modèle et les paramètres. Pour résoudre ce problème, j'ai développé un nouvelle approche de perturbation basée sur une déformation progressive. Cette approche est flexible en ce qui concerne le nombre de paramètres du modèle et lintensité de la perturbation. Par rapport au rééchantillonage séquentiel, cette nouvelle approche s'avère être très efficace pour diminuer le nombre requis d'itérations pour générer des échantillons indépendants à partir de la distribution postérieure bayésienne. - Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending corresponding approaches beyond the local scale still represents a major challenge, yet is critically important for the development of reliable groundwater flow and contaminant transport models. To address this issue, I have developed a hydrogeophysical data integration technique based on a two-step Bayesian sequential simulation procedure that is specifically targeted towards larger-scale problems. The objective is to simulate the distribution of a target hydraulic parameter based on spatially exhaustive, but poorly resolved, measurements of a pertinent geophysical parameter and locally highly resolved, but spatially sparse, measurements of the considered geophysical and hydraulic parameters. To this end, my algorithm links the low- and high-resolution geophysical data via a downscaling procedure before relating the downscaled regional-scale geophysical data to the high-resolution hydraulic parameter field. I first illustrate the application of this novel data integration approach to a realistic synthetic database consisting of collocated high-resolution borehole measurements of the hydraulic and electrical conductivities and spatially exhaustive, low-resolution electrical conductivity estimates obtained from electrical resistivity tomography (ERT). The overall viability of this method is tested and verified by performing and comparing flow and transport simulations through the original and simulated hydraulic conductivity fields. The corresponding results indicate that the proposed data integration procedure does indeed allow for obtaining faithful estimates of the larger-scale hydraulic conductivity structure and reliable predictions of the transport characteristics over medium- to regional-scale distances. The approach is then applied to a corresponding field scenario consisting of collocated high- resolution measurements of the electrical conductivity, as measured using a cone penetrometer testing (CPT) system, and the hydraulic conductivity, as estimated from electromagnetic flowmeter and slug test measurements, in combination with spatially exhaustive low-resolution electrical conductivity estimates obtained from surface-based electrical resistivity tomography (ERT). The corresponding results indicate that the newly developed data integration approach is indeed capable of adequately capturing both the small-scale heterogeneity as well as the larger-scale trend of the prevailing hydraulic conductivity field. The results also indicate that this novel data integration approach is remarkably flexible and robust and hence can be expected to be applicable to a wide range of geophysical and hydrological data at all scale ranges. In the second part of my thesis, I evaluate in detail the viability of sequential geostatistical resampling as a proposal mechanism for Markov Chain Monte Carlo (MCMC) methods applied to high-dimensional geophysical and hydrological inverse problems in order to allow for a more accurate and realistic quantification of the uncertainty associated with the thus inferred models. Focusing on a series of pertinent crosshole georadar tomographic examples, I investigated two classes of geostatistical resampling strategies with regard to their ability to efficiently and accurately generate independent realizations from the Bayesian posterior distribution. The corresponding results indicate that, despite its popularity, sequential resampling is rather inefficient at drawing independent posterior samples for realistic synthetic case studies, notably for the practically common and important scenario of pronounced spatial correlation between model parameters. To address this issue, I have developed a new gradual-deformation-based perturbation approach, which is flexible with regard to the number of model parameters as well as the perturbation strength. Compared to sequential resampling, this newly proposed approach was proven to be highly effective in decreasing the number of iterations required for drawing independent samples from the Bayesian posterior distribution.
Resumo:
This paper describes the development of an analytical technique for arsenic analyses that is based on genetically-modified bioreporter bacteria bearing a gene encoding for the production of a green fluorescent protein (gfp). Upon exposure to arsenic (in the aqueous form of arsenite), the bioreporter production of the fluorescent reporter molecule is monitored spectroscopically. We compared the response measured as a function of time and concentration by steady-state fluorimetry (SSF) to that measured by epi-fluorescent microscopy (EFM). SSF is a bulk technique; as such it inherently yields less information, whereas EFM monitors the response of many individual cells simultaneously and data can be processed in terms of population averages or subpopulations. For the bioreporter strain used here, as well as for the literature we cite, the two techniques exhibit similar performance characteristics. The results presented here show that the EFM technique can compete with SSF and shows substantially more promise for future improvement; it is a matter of research interest to develop optimized methods of EFM image analysis and statistical data treatment. EFM is a conduit for understanding the dynamics of individual cell response vs. population response, which is not only a matter of research interest, but is also promising in the practical terms of developing micro-scale analysis.
Resumo:
Sex allocation data in eusocial Hymenoptera (ants, bees and wasps) provide an excellent opportunity to assess the effectiveness of kin selection, because queens and workers differ in their relatedness to females and males. The first studies on sex allocation in eusocial Hymenoptera compared population sex investment ratios across species. Female-biased investment in monogyne (= with single-queen colonies) populations of ants suggested that workers manipulate sex allocation according to their higher relatedness to females than males (relatedness asymmetry). However, several factors may confound these comparisons across species. First, variation in relatedness asymmetry is typically associated with major changes in breeding system and life history that may also affect sex allocation. Secondly, the relative cost of females and males is difficult to estimate across sexually dimorphic taxa, such as ants. Thirdly, each species in the comparison may not represent an independent data point, because of phylogenetic relationships among species. Recently, stronger evidence that workers control sex allocation has been provided by intraspecific studies of sex ratio variation across colonies. In several species of eusocial Hymenoptera, colonies with high relatedness asymmetry produced mostly females, in contrast to colonies with low relatedness asymmetry which produced mostly males. Additional signs of worker control were found by investigating proximate mechanisms of sex ratio manipulation in ants and wasps. However, worker control is not always effective, and further manipulative experiments will be needed to disentangle the multiple evolutionary factors and processes affecting sex allocation in eusocial Hymenoptera.
Resumo:
Ductal carcinoma in situ (DCIS), accounting for 15-25% of all breast cancers, is frequently diagnosed by mammographic examination. This heterogeneous disease requires a rigorous local treatment based, in about two-third of cases, on conservative surgery and radiotherapy. DCIS are currently classified on the basis of nuclear grade. Most lesions, and especially high nuclear grade DCIS, are limited to one quadrant. Micropapillary DCIS are likely to be of larger size/extent and thus a conservative approach is often difficult. A careful pathological examination of an oriented excisional biopsy is a pre-requisite for optimal therapy.
Resumo:
Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending the corresponding approaches to the scale of a field site represents a major, and as-of-yet largely unresolved, challenge. To address this problem, we have developed downscaling procedure based on a non-linear Bayesian sequential simulation approach. The main objective of this algorithm is to estimate the value of the sparsely sampled hydraulic conductivity at non-sampled locations based on its relation to the electrical conductivity logged at collocated wells and surface resistivity measurements, which are available throughout the studied site. The in situ relationship between the hydraulic and electrical conductivities is described through a non-parametric multivariatekernel density function. Then a stochastic integration of low-resolution, large-scale electrical resistivity tomography (ERT) data in combination with high-resolution, local-scale downhole measurements of the hydraulic and electrical conductivities is applied. The overall viability of this downscaling approach is tested and validated by comparing flow and transport simulation through the original and the upscaled hydraulic conductivity fields. Our results indicate that the proposed procedure allows obtaining remarkably faithful estimates of the regional-scale hydraulic conductivity structure and correspondingly reliable predictions of the transport characteristics over relatively long distances.
Resumo:
The permeability-glycoprotein efflux-transporter encoded by the multidrug resistance 1 (ABCB1) gene and the cytochromes P450 3A4/5 encoded by the CYP3A4/5 genes are known to interact in the transport and metabolism of many drugs. Recent data have shown that the CYP3A5 genotypes influence blood pressure and that permeability-glycoprotein activity might influence the activity of the renin-angiotensin system. Hence, these 2 genes may contribute to blood pressure regulation in humans. We analyzed the association of variants of the ABCB1 and CYP3A5 genes with ambulatory blood pressure, plasma renin activity, plasma aldosterone, endogenous lithium clearance, and blood pressure response to treatment in 72 families (373 individuals; 55% women; mean age: 46 years) of East African descent. The ABCB1 and CYP3A5 genes interact with urinary sodium excretion in their effect on ambulatory blood pressure (daytime systolic: P=0.05; nighttime systolic and diastolic: P<0.01), suggesting a gene-gene-environment interaction. The combined action of these genes is also associated with postproximal tubular sodium reabsorption, plasma renin activity, plasma aldosterone, and with an altered blood pressure response to the angiotensin-converting enzyme inhibitor lisinopril (P<0.05). This is the first reported association of the ABCB1 gene with blood pressure in humans and demonstration that genes encoding for proteins metabolizing and transporting drugs and endogenous substrates contribute to blood pressure regulation.