954 resultados para Genomic data
Resumo:
Mutations within BRCA1 predispose carriers to a high risk of breast and ovarian cancers. BRCA1 functions to maintain genomic stability through the assembly of multiple protein complexes involved in DNA repair, cell-cycle arrest, and transcriptional regulation. Here, we report the identification of a DNA damage-induced BRCA1 protein complex containing BCLAF1 and other key components of the mRNA-splicing machinery. In response to DNA damage, this complex regulates pre-mRNA splicing of a number of genes involved in DNA damage signaling and repair, thereby promoting the stability of these transcripts/proteins. Further, we show that abrogation of this complex results in sensitivity to DNA damage, defective DNA repair, and genomic instability. Interestingly, mutations in a number of proteins found within this complex have been identified in numerous cancer types. These data suggest that regulation of splicing by the BRCA1-mRNA splicing complex plays an important role in the cellular response to DNA damage.
Resumo:
Background: The increasing prevalence of bovine tuberculosis (bTB) in the UK and the limitations of the currently available diagnostic and control methods require the development of complementary approaches to assist in the sustainable control of the disease. One potential approach is the identification of animals that are genetically more resistant to bTB, to enable breeding of animals with enhanced resistance. This paper focuses on prediction of resistance to bTB. We explore estimation of direct genomic estimated breeding values (DGVs) for bTB resistance in UK dairy cattle, using dense SNP chip data, and test these genomic predictions for situations when disease phenotypes are not available on selection candidates. Methodology/Principal Findings: We estimated DGVs using genomic best linear unbiased prediction methodology, and assessed their predictive accuracies with a cross validation procedure and receiver operator characteristic (ROC) curves. Furthermore, these results were compared with theoretical expectations for prediction accuracy and area-under-the-ROC- curve (AUC). The dataset comprised 1151 Holstein-Friesian cows (bTB cases or controls). All individuals (592 cases and 559 controls) were genotyped for 727,252 loci (Illumina Bead Chip). The estimated observed heritability of bTB resistance was 0.23±0.06 (0.34 on the liability scale) and five-fold cross validation, replicated six times, provided a prediction accuracy of 0.33 (95% C.I.: 0.26, 0.40). ROC curves, and the resulting AUC, gave a probability of 0.58, averaged across six replicates, of correctly classifying cows as diseased or as healthy based on SNP chip genotype alone using these data. Conclusions/Significance: These results provide a first step in the investigation of the potential feasibility of genomic selection for bTB resistance using SNP data. Specifically, they demonstrate that genomic selection is possible, even in populations with no pedigree data and on animals lacking bTB phenotypes. However, a larger training population will be required to improve prediction accuracies. © 2014 Tsairidou et al.
Resumo:
Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated 1/42,000, 1/43,700 and 1/49,500 SNPs explained 1/421%, 1/424% and 1/429% of phenotypic variance. Furthermore, all common variants together captured 60% of heritability. The 697 variants clustered in 423 loci were enriched for genes, pathways and tissue types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/I 2-catenin and chondroitin sulfate-related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants.
Resumo:
In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.
Resumo:
BACKGROUND: Urothelial pathogenesis is a complex process driven by an underlying network of interconnected genes. The identification of novel genomic target regions and gene targets that drive urothelial carcinogenesis is crucial in order to improve our current limited understanding of urothelial cancer (UC) on the molecular level. The inference of genome-wide gene regulatory networks (GRN) from large-scale gene expression data provides a promising approach for a detailed investigation of the underlying network structure associated to urothelial carcinogenesis.
METHODS: In our study we inferred and compared three GRNs by the application of the BC3Net inference algorithm to large-scale transitional cell carcinoma gene expression data sets from Illumina RNAseq (179 samples), Illumina Bead arrays (165 samples) and Affymetrix Oligo microarrays (188 samples). We investigated the structural and functional properties of GRNs for the identification of molecular targets associated to urothelial cancer.
RESULTS: We found that the urothelial cancer (UC) GRNs show a significant enrichment of subnetworks that are associated with known cancer hallmarks including cell cycle, immune response, signaling, differentiation and translation. Interestingly, the most prominent subnetworks of co-located genes were found on chromosome regions 5q31.3 (RNAseq), 8q24.3 (Oligo) and 1q23.3 (Bead), which all represent known genomic regions frequently deregulated or aberated in urothelial cancer and other cancer types. Furthermore, the identified hub genes of the individual GRNs, e.g., HID1/DMC1 (tumor development), RNF17/TDRD4 (cancer antigen) and CYP4A11 (angiogenesis/ metastasis) are known cancer associated markers. The GRNs were highly dataset specific on the interaction level between individual genes, but showed large similarities on the biological function level represented by subnetworks. Remarkably, the RNAseq UC GRN showed twice the proportion of significant functional subnetworks. Based on our analysis of inferential and experimental networks the Bead UC GRN showed the lowest performance compared to the RNAseq and Oligo UC GRNs.
CONCLUSION: To our knowledge, this is the first study investigating genome-scale UC GRNs. RNAseq based gene expression data is the data platform of choice for a GRN inference. Our study offers new avenues for the identification of novel putative diagnostic targets for subsequent studies in bladder tumors.
Resumo:
The androgen receptor (AR) initiates important developmental and oncogenic transcriptional pathways. The AR is known to bind as a homodimer to 15-base pair bipartite palindromic androgen-response elements; however, few direct AR gene targets are known. To identify AR promoter targets, we used chromatin immunoprecipitation with on-chip detection of genomic fragments. We identified 1,532 potential AR-binding sites, including previously known AR gene targets. Many of the new AR target genes show altered expression in prostate cancer. Analysis of sequences underlying AR-binding sites showed that more than 50% of AR-binding sites did not contain the established 15 bp AR-binding element. Unbiased sequence analysis showed 6-bp motifs, which were significantly enriched and were bound directly by the AR in vitro. Binding sequences for the avian erythroblastosis virus E26 homologue (ETS) transcription factor family were also highly enriched, and we uncovered an interaction between the AR and ETS1 at a subset of AR promoter targets.
Resumo:
Cork stopper manufacturing process includes an operation, known as stabilisation, by which humid cork slabs are extensively colonised by fungi. The effects of fungal growth on cork are yet to be completely understood and are considered to be involved in the so called “cork taint” of bottled wine. It is essential to identify environmental constraints which define the appearance of the colonising fungal species and to trace their origin to the forest and/or as residents in the manufacturing space. The present article correlates two sets of data, from consecutive years and the same season, of systematic biologic sampling of two manufacturing units, located in the North and South of Portugal. Chrysonilia sitophila dominance was identified, followed by a high diversity of Penicillium species. Penicillium glabrum, found in all samples, was the most frequent isolated species. P. glabrum intra-species variability was investigated using DNA fingerprinting techniques revealing highly discriminative polymorphic markers in the genome. Cluster analysis of P. glabrum data was discussed in relation to the geographical location of strains, and results suggest that P. glabrum arise from predominantly the manufacturing space, although cork resident fungi can also contrib
Resumo:
RESUMO: Introdução: A espondilite anquilosante (EA) é uma doença inflamatória crónica caracterizada pela inflamação das articulações sacroilíacas e da coluna. A anquilose progressiva motiva uma deterioração gradual da função física e da qualidade de vida. O diagnóstico e o tratamento precoces podem contribuir para um melhor prognóstico. Neste contexto, a identificação de biomarcadores, assume-se como sendo muito útil para a prática clínica e representa hoje um grande desafio para a comunidade científica. Objetivos: Este estudo teve como objetivos: 1 - caracterizar a EA em Portugal; 2 - investigar possíveis associações entre genes, MHC e não-MHC, com a suscetibilidade e as características fenotípicas da EA; 3 - identificar genes candidatos associados a EA através da tecnologia de microarray. Material e Métodos: Foram recrutados doentes com EA, de acordo com os critérios modificados de Nova Iorque, nas consultas de Reumatologia dos diferentes hospitais participantes. Colecionaram-se dados demográficos, clínicos e radiológicos e colhidas amostras de sangue periférico. Selecionaram-se de forma aleatória, doentes HLA-B27 positivos, os quais foram tipados em termos de HLA classe I e II por PCR-rSSOP. Os haplótipos HLA estendidos foram estimados pelo algoritmo Expectation Maximization com recurso ao software Arlequin v3.11. As variantes alélicas dos genes IL23R, ERAP1 e ANKH foram estudadas através de ensaios de discriminação alélica TaqMan. A análise de associação foi realizada utilizando testes da Cochrane-Armitage e de regressão linear, tal como implementado pelo PLINK, para variáveis qualitativas e quantitativas, respetivamente. O estudo de expressão génica foi realizado por Illumina HT-12 Whole-Genome Expression BeadChips. Os genes candidatos foram validados usando qPCR-based TaqMan Low Density Arrays (TLDAs). Resultados: Foram incluídos 369 doentes (62,3% do sexo masculino, com idade média de 45,4 ± 13,2 anos, duração média da doença de 11,4 ± 10,5 anos). No momento da avaliação, 49,9% tinham doença axial, 2,4% periférica, 40,9% mista e 7,1% entesopática. A uveíte anterior aguda (33,6%) foi a manifestação extra-articular mais comum. Foram positivos para o HLA-B27, 80,3% dos doentes. Os haplótipo A*02/B*27/Cw*02/DRB1*01/DQB1*05 parece conferir suscetibilidade para a EA, e o A*02/B*27/Cw*01/DRB1*08/DQB1*04 parece conferir proteção em termos de atividade, repercussão funcional e radiológica da doença. Três variantes (2 para IL23R e 1 para ERAP1) mostraram significativa associação com a doença, confirmando a associação destes genes com a EA na população Portuguesa. O mesmo não se verificou com as variantes estudadas do ANKH. Não se verificou associação entre as variantes génicas não-MHC e as manifestações clínicas da EA. Foi identificado um perfil de expressão génica para a EA, tendo sido validados catorze genes - alguns têm um papel bem documentado em termos de inflamação, outros no metabolismo da cartilagem e do osso. Conclusões: Foi estabelecido um perfil demográfico e clínico dos doentes com EA em Portugal. A identificação de variantes génicas e de um perfil de expressão contribuem para uma melhor compreensão da sua fisiopatologia e podem ser úteis para estabelecer modelos com relevância em termos de diagnóstico, prognóstico e orientação terapêutica dos doentes. -----------ABSTRACT: Background: Ankylosing Spondylitis (AS) is a chronic inflammatory disorder characterized by inflammation in the spine and sacroiliac joints leading to progressive joint ankylosis and in progressive deterioration of physical function and quality of life. An early diagnosis and early therapy may contribute to a better prognosis. The identification of biomarkers would be helpful and represents a great challenge for the scientific community. Objectives: The present study had the following aims: 1- to characterize the pattern of AS in Portuguese patients; 2- to investigate MHC and non-MHC gene associations with susceptibility and phenotypic features of AS and; 3- to identify candidate genes associated with AS by means of whole-genome microarray. Material and Methods: AS was defined in accordance to the modified New York criteria and AS cases were recruited from hospital outcares patient clinics. Demographic and clinical data were recorded and blood samples collected. A random group of HLA-B27 positive patients and controls were selected and typed for HLA class I and II by PCR-rSSOP. The extended HLA haplotypes were estimated by Expectation Maximization Algorithm using Arlequin v3.11 software. Genotyping of IL23R, ERAP1 and ANKH allelic variants was carried out with TaqMan allelic discrimination assays. Association analysis was performed using the Cochrane-Armitage and linear regression tests as implemented in PLINK, for dichotomous and quantitative variables, respectively. Gene expression profile was carried out using Illumina HT-12 Whole-Genome Expression BeadChips and candidate genes were validated using qPCR-based TaqMan Low Density Arrays (TLDAs). Results: A total of 369 patients (62.3% male; mean age 45.4±13.2 years; mean disease duration 11.4±10.5 years), were included. Regarding clinical disease pattern, at the time of assessment, 49.9% had axial disease, 2.4% peripheral disease, 40.9% mixed disease and 7.1% isolated enthesopathic disease. Acute anterior uveitis (33.6%) was the most common extra-articular manifestation. 80.3% of AS patients were HLA-B27 positive. The haplotype A*02/B*27/Cw*02/DRB1*01/DQB1*05 seems to confer susceptibility to AS, whereas A*02/B*27/Cw*01/DRB1*08/DQB1*04 seems to provide protection in terms of disease activity, functional and radiological repercussion. Three markers (two for IL23R and one for ERAP1) showed significant single-locus disease associations. Association of these genes with AS in the Portuguese population was confirmed, whereas ANKH markers studied did not show an association with AS. No association was seen between non-MHC genes and clinical manifestations of AS. A gene expression signature for AS was established; among the fourteen validated genes, a number of them have a well-documented inflammatory role or in modulation of cartilage and bone metabolism. Conclusions: A demographic and clinical profile of patients with AS in Portugal was established. Identification of genetic variants of target genes as well as gene expression signatures could provide a better understanding of AS pathophysiology and could be useful to establish models with relevance in terms of susceptibility, prognosis, and potential therapeutic guidance.
Resumo:
The limited ability of common variants to account for the genetic contribution to complex disease has prompted searches for rare variants of large effect, to partly explain the 'missing heritability'. Analyses of genome-wide genotyping data have identified genomic structural variants (GSVs) as a source of such rare causal variants. Recent studies have reported multiple GSV loci associated with risk of obesity. We attempted to replicate these associations by similar analysis of two familial-obesity case-control cohorts and a population cohort, and detected GSVs at 11 out of 18 loci, at frequencies similar to those previously reported. Based on their reported frequencies and effect sizes (OR≥25), we had sufficient statistical power to detect the large majority (80%) of genuine associations at these loci. However, only one obesity association was replicated. Deletion of a 220 kb region on chromosome 16p11.2 has a carrier population frequency of 2×10(-4) (95% confidence interval [9.6×10(-5)-3.1×10(-4)]); accounts overall for 0.5% [0.19%-0.82%] of severe childhood obesity cases (P = 3.8×10(-10); odds ratio = 25.0 [9.9-60.6]); and results in a mean body mass index (BMI) increase of 5.8 kg.m(-2) [1.8-10.3] in adults from the general population. We also attempted replication using BMI as a quantitative trait in our population cohort; associations with BMI at or near nominal significance were detected at two further loci near KIF2B and within FOXP2, but these did not survive correction for multiple testing. These findings emphasise several issues of importance when conducting rare GSV association, including the need for careful cohort selection and replication strategy, accurate GSV identification, and appropriate correction for multiple testing and/or control of false discovery rate. Moreover, they highlight the potential difficulty in replicating rare CNV associations across different populations. Nevertheless, we show that such studies are potentially valuable for the identification of variants making an appreciable contribution to complex disease.
Resumo:
Splenic marginal zone lymphoma (SMZL) is a low grade B-cell non-Hodgkin's lymphoma. The molecular pathology of this entity remains poorly understood. To characterise this lymphoma at the molecular level, we performed an integrated analysis of 1) genome wide genetic copy number alterations 2) gene expression profiles and 3) epigenetic DNA methylation profiles.We have previously shown that SMZL is characterised by recurrent alterations of chromosomes 7q, 6q, 3q, 9q and 18; however, gene resolution oligonucleotide array comparative genomic hybridisation did not reveal evidence of cryptic amplification or deletion in these regions. The most frequently lost 7q32 region contains a cluster of miRNAs. qRT-PCR revealed that three of these (miR-182/96/183) show underexpression in SMZL, and miR-182 is somatically mutated in >20% of cases of SMZL, as well as in >20% of cases of follicular lymphoma, and between 5-15% of cases of chronic lymphocytic leukaemia, MALT-lymphoma and hairy cell leukaemia. We conclude that miR-182 is a strong candidate novel tumour suppressor miRNA in lymphoma.The overall gene expression signature of SMZL was found to be strongly distinct fromthose of other lymphomas. Functional analysis of gene expression data revealed SMZL to be characterised by abnormalities in B-cell receptor signalling (especially through the CD19/21-PI3K/AKT pathway) and apoptotic pathways. In addition, genes involved in the response to viral infection appeared upregulated. SMZL shows a unique epigenetic profile, but analysis of differentially methylated genes showed few with methylation related transcriptional deregulation, suggesting that DNA methylation abnormalities are not a critical component of the SMZL malignant phenotype.
Resumo:
Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analyses, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the single-nucleotide polymorphisms (SNPs) reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, and -DQB1). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1092 1000G samples and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect and that allele frequencies are estimated with an error greater than ±0.1 at approximately 25% of the SNPs in HLA genes. We found a bias toward overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates and discuss the outcomes of including those sites in different kinds of analyses. Because the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity.
Resumo:
Contexte : L’anémie falciforme ou drépanocytose est un problème de santé important, particulièrement pour les patients d’origine africaine. La variation phénotypique de l’anémie falciforme est problématique pour le suivi et le traitement des patients. L’architecture génomique responsable de cette variabilité est peu connue. Principe : Mieux saisir la contribution génétique de la variation clinique de cette maladie facilitera l’identification des patients à risque de développer des phénotypes sévères, ainsi que l’adaptation des soins. Objectifs : L’objectif général de cette thèse est de combler les lacunes relatives aux connaissances sur l’épidémiologie génomique de l’anémie falciforme à l’aide d’une cohorte issue au Bénin. Les objectifs spécifiques sont les suivants : 1) caractériser les profils d’expressions génomiques associés à la sévérité de l’anémie falciforme ; 2) identifier des biomarqueurs de la sévérité de l’anémie falciforme ; 3) identifier la régulation génétique des variations transcriptionelles ; 4) identifier des interactions statistiques entre le génotype et le niveau de sévérité associé à l’expression ; 5) identifier des cibles de médicaments pour améliorer l’état des patients atteints d’anémie falciforme. Méthode : Une étude cas-témoins de 250 patients et 61 frères et soeurs non-atteints a été menée au Centre de Prise en charge Médical Intégré du Nourrisson et de la Femme Enceinte atteints de Drépanocytose, au Bénin entre février et décembre 2010. Résultats : Notre analyse a montré que des profils d’expressions sont associés avec la sévérité de l’anémie falciforme. Ces profils sont enrichis de génes des voies biologiques qui contribuent à la progression de la maladie : l’activation plaquettaire, les lymphocytes B, le stress, l’inflammation et la prolifération cellulaire. Des biomarqueurs transcriptionnels ont permis de distinguer les patients ayant des niveaux de sévérité clinique différents. La régulation génétique de la variation de l’expression des gènes a été démontrée et des interactions ont été identifiées. Sur la base de ces résultats génétiques, des cibles de médicaments sont proposées. Conclusion: Ce travail de thèse permet de mieux comprendre l’impact de la génomique sur la sévérité de l’anémie falciforme et ouvre des perspectives de développement de traitements ciblés pour améliorer les soins offerts aux patients.
Resumo:
The Escherichia coli O26 serogroup includes important food-borne pathogens associated with human and animal diarrheal disease. Current typing methods have revealed great genetic heterogeneity within the O26 group; the data are often inconsistent and focus only on verotoxin (VT)-positive O26 isolates. To improve current understanding of diversity within this serogroup, the genomic relatedness of VT-positive and -negative O26 strains was assessed by comparative genomic indexing. Our results clearly demonstrate that irrespective of virulence characteristics and pathotype designation, the O26 strains show greater genomic similarity to each other than to any other strain included in this study. Our data suggest that enteropathogenic and VT-expressing E. coli O26 strains represent the same clonal lineage and that W-expressing E. coli O26 strains have gained additional virulence characteristics. Using this approach, we established the core genes which are central to the E. coli species and identified regions of variation from the E. coli K-12 chromosomal backbone.
Resumo:
With the increasing awareness of protein folding disorders, the explosion of genomic information, and the need for efficient ways to predict protein structure, protein folding and unfolding has become a central issue in molecular sciences research. Molecular dynamics computer simulations are increasingly employed to understand the folding and unfolding of proteins. Running protein unfolding simulations is computationally expensive and finding ways to enhance performance is a grid issue on its own. However, more and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. This paper describes efforts to provide a grid-enabled data warehouse for protein unfolding data. We outline the challenge and present first results in the design and implementation of the data warehouse.
Resumo:
The past years have shown an enormous advancement in sequencing and array-based technologies, producing supplementary or alternative views of the genome stored in various formats and databases. Their sheer volume and different data scope pose a challenge to jointly visualize and integrate diverse data types. We present AmalgamScope a new interactive software tool focusing on assisting scientists with the annotation of the human genome and particularly the integration of the annotation files from multiple data types, using gene identifiers and genomic coordinates. Supported platforms include next-generation sequencing and microarray technologies. The available features of AmalgamScope range from the annotation of diverse data types across the human genome to integration of the data based on the annotational information and visualization of the merged files within chromosomal regions or the whole genome. Additionally, users can define custom transcriptome library files for any species and use the file exchanging distant server options of the tool.