997 resultados para SNP identification
Resumo:
Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons.
Resumo:
Background: This paper describes SeqDoC, a simple, web-based tool to carry out direct comparison of ABI sequence chromatograms. This allows the rapid identification of single nucleotide polymorphisms (SNPs) and point mutations without the need to install or learn more complicated analysis software. Results: SeqDoC produces a subtracted trace showing differences between a reference and test chromatogram, and is optimised to emphasise those characteristic of single base changes. It automatically aligns sequences, and produces straightforward graphical output. The use of direct comparison of the sequence chromatograms means that artefacts introduced by automatic base-calling software are avoided. Homozygous and heterozygous substitutions and insertion/deletion events are all readily identified. SeqDoC successfully highlights nucleotide changes missed by the Staden package 'tracediff' program. Conclusion: SeqDoC is ideal for small-scale SNP identification, for identification of changes in random mutagenesis screens, and for verification of PCR amplification fidelity. Differences are highlighted, not interpreted, allowing the investigator to make the ultimate decision on the nature of the change.
Resumo:
Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Linkage disequilibrium (LD) is defined as the nonrandom association of alleles at two or more loci in a population and may be a useful tool in a diverse array of applications including disease gene mapping, elucidating the demographic history of populations, and testing hypotheses of human evolution. However, the successful application of LD-based approaches to pertinent genetic questions is hampered by a lack of understanding about the forces that mediate the genome-wide distribution of LD within and between human populations. Delineating the genomic patterns of LD is a complex task that will require interdisciplinary research that transcends traditional scientific boundaries. The research presented in this dissertation is predicated upon the need for interdisciplinary studies and both theoretical and experimental projects were pursued. In the theoretical studies, I have investigated the effect of genotyping errors and SNP identification strategies on estimates of LD. The primary importance of these two chapters is that they provide important insights and guidance for the design of future empirical LD studies. Furthermore, I analyzed the allele frequency distribution of 26,530 single nucleotide polymorphisms (SNPs) in three populations and generated the first-generation natural selection map of the human genome, which will be an important resource for explaining and understanding genomic patterns of LD. Finally, in the experimental study, I describe a novel and simple, low-cost, and high-throughput SNP genotyping method. The theoretical analyses and experimental tools developed in this dissertation will facilitate a more complete understanding of patterns of LD in human populations. ^
Resumo:
This thesis presents a highly sensitive genome wide search method for recessive mutations. The method is suitable for distantly related samples that are divided into phenotype positives and negatives. High throughput genotype arrays are used to identify and compare homozygous regions between the cohorts. The method is demonstrated by comparing colorectal cancer patients against unaffected references. The objective is to find homozygous regions and alleles that are more common in cancer patients. We have designed and implemented software tools to automate the data analysis from genotypes to lists of candidate genes and to their properties. The programs have been designed in respect to a pipeline architecture that allows their integration to other programs such as biological databases and copy number analysis tools. The integration of the tools is crucial as the genome wide analysis of the cohort differences produces many candidate regions not related to the studied phenotype. CohortComparator is a genotype comparison tool that detects homozygous regions and compares their loci and allele constitutions between two sets of samples. The data is visualised in chromosome specific graphs illustrating the homozygous regions and alleles of each sample. The genomic regions that may harbour recessive mutations are emphasised with different colours and a scoring scheme is given for these regions. The detection of homozygous regions, cohort comparisons and result annotations are all subjected to presumptions many of which have been parameterized in our programs. The effect of these parameters and the suitable scope of the methods have been evaluated. Samples with different resolutions can be balanced with the genotype estimates of their haplotypes and they can be used within the same study.
Resumo:
The thesis identify CNV structural variants as possible markers for genomic selection and identify QTL regions for Fatty Acid Content in the Italian Brown Swiss population. Additionally it maps the QTL for mastitis resistance in the Valdostana Red Pied cattle.
Resumo:
Staphylococcus aureus is a common pathogen that causes a variety of infections including soft tissue infections, impetigo, septicemia toxic shock and scalded skin syndrome. Traditionally, Methicillin-Resistant Staphylococcus aureus (MRSA) was considered a Hospital-Acquired (HA) infection. It is now recognised that the frequency of infections with MRSA is increasing in the community, and that these infections are not originating from hospital environments. A 2007 report by the Centers for Disease Control and Prevention (CDC) stated that Staphylococcus aureus is the most important cause of serious and fatal infections in the USA. Community-Acquired MRSA (CA-MRSA) are genetically diverse and distinct, meaning they are able to be identified and tracked by way of genotyping. Genotyping of MRSA using Single nucleotide polymorphisms (SNPs) is a rapid and robust method for monitoring MRSA, specifically ST93 (Queensland Clone) dissemination in the community. It has been shown that a large proportion of CA-MRSA infections in Queensland and New South Wales are caused by ST93. The rationale for this project was that SNP analysis of MLST genes is a rapid and cost-effective method for genotyping and monitoring MRSA dissemination in the community. In this study, 16 different sequence types (ST) were identified with 41% of isolates identified as ST93 making it the predominate clone. Males and Females were infected equally with an average patient age of 45yrs. Phenotypically, all of the ST93 had an identical antimicrobial resistance pattern. They were resistant to the β-lactams – Penicillin, Flu(di)cloxacillin and Cephalothin but sensitive to all other antibiotics tested. Virulence factors play an important role in allowing S. aureus to cause disease by way of colonising, replication and damage to the host. One virulence factor of particular interest is the toxin Panton-Valentine leukocidin (PVL), which is composed of two separate proteins encoded by two adjacent genes. PVL positive CA-MRSA are shown to cause recurrent, chronic or severe skin and soft tissue infections. As a result, it is important that PVL positive CA-MRSA is genotyped and tracked. Especially now that CA-MRSA infections are more prevalent than HA-MRSA infections and are now deemed endemic in Australia. 98% of all isolates in this study tested positive for the PVL toxin gene. This study showed that PVL is present in many different community based ST, not just ST93, which were all PVL positive. With this toxin becoming entrenched in CA-MRSA, genotyping would provide more accurate data and a way of tracking the dissemination. PVL gene can be sub-typed using an allele-specific Real-Time PCR (RT-PCR) followed by High resolution meltanalysis. This allows the identification of PVL subtypes within the CA-MRSA population and allow the tracking of these clones in the community.
Resumo:
Abstract Genome-wide association studies (GWAS) have identified more than 30 prostate cancer (PrCa) susceptibility loci. One of these (rs2735839) is located close to a plausible candidate susceptibility gene, KLK3, which encodes prostate-specific antigen (PSA). PSA is widely used as a biomarker for PrCa detection and disease monitoring. To refine the association between PrCa and variants in this region, we used genotyping data from a two-stage GWAS using samples from the UK and Australia, and the Cancer Genetic Markers of Susceptibility (CGEMS) study. Genotypes were imputed for 197 and 312 single nucleotide polymorphisms (SNPs) from HapMap2 and the 1000 Genome Project, respectively. The most significant association with PrCa was with a previously unidentified SNP, rs17632542 (combined P = 3.9 × 10−22). This association was confirmed by direct genotyping in three stages of the UK/Australian GWAS, involving 10,405 cases and 10,681 controls (combined P = 1.9 × 10−34). rs17632542 is also shown to be associated with PSA levels and it is a non-synonymous coding SNP (Ile179Thr) in KLK3. Using molecular dynamic simulation, we showed evidence that this variant has the potential to introduce alterations in the protein or affect RNA splicing. We propose that rs17632542 may directly influence PrCa risk.
Resumo:
Metastatic melanoma, a cancer historically refractory to chemotherapeutic strategies, has a poor prognosis and accounts for the majority of skin cancer related mortality. Although the recent approval of two new drugs combating this disease, Ipilimumab and Vemurafenib (PLX4032), has demonstrated for the first time in decades an improvement in overall survival; the clinical efficacy of these drugs has been marred by severe adverse immune reactions and acquired drug resistance in patients, respectively. Thus, understanding the etiology of metastatic melanoma will contribute to the improvement of current therapeutic strategies while leading to the development of novel drug approaches. In order to identify recurrently mutated genes of therapeutic relevance in metastatic melanoma, a panel of stage III local lymph node melanomas were extensively characterised using high-throughput genomic technologies. This led to the identification of mutations in TFG in 5% of melanomas from a candidate gene sequencing approach using SNP array analysis, 24% of melanomas with mutations in MAP3K5 or MAP3K9 though unbiased whole-exome sequencing strategies, and inactivating mutations in NF1 in BRAF/NRAS wild type tumours though pathway analysis. Lastly, this thesis describes the development of a melanoma specific mutation panel that can rapidly identify clinically relevant mutation profiles that could guide effective treatment strategies through a personalised therapeutic approach. These findings are discussed in respect to a number of important issues raised by this study including the current limitation of next-generation sequencing technology, the difficulty in identifying ‘driver’ mutations critical to the development of melanoma due to high carcinogenic exposure by UV radiation, and the ultimate application of mutation screening in a personalised therapeutic setting. In summary, a number novel genes involved in metastatic melanoma have been identified that may have relevance for current therapeutic strategies in treating this disease.
Resumo:
High density SNP arrays can be used to identify DNA copy number changes in tumors such as homozygous deletions of tumor suppressor genes and focal amplifications of oncogenes. Illumina Human CNV370 Bead chip arrays were used to assess the genome for unbalanced chromosomal events occurring in 39 cell lines derived from stage III metastatic melanomas. A number of genes previously recognized to have an important role in the development and progression of melanoma were identified including homozygous deletions of CDKN2A (13 of 39 samples), CDKN2B (10 of 39), PTEN (3 of 39), PTPRD (3 of 39), TP53 (1 of 39), and amplifications of CCND1 (2 of 39), MITF (2 of 39), MDM2 (1 of 39), and NRAS (1 of 39). In addition, a number of focal homozygous deletions potentially targeting novel melanoma tumor suppressor genes were identified. Because of their likely functional significance for melanoma progression, FAS, CH25H, BMPR1A, ACTA2, and TFG were investigated in a larger cohort of melanomas through sequencing. Nonsynonymous mutations were identified in BMPR1A (1 of 43), ACTA2 (3 of 43), and TFG (5 of 103). A number of potentially important mutation events occurred in TFG including the identification of a mini mutation ‘‘hotspot’’ at amino acid residue 380 (P380S and P380L) and the presence of multiple mutations in two melanomas. Mutations in TFG may have important clinical relevance for current therapeutic strategies to treat metastatic melanoma.
Resumo:
Forward genetic screens have identified numerous genes involved in development and metabolism, and remain a cornerstone of biological research. However, to locate a causal mutation, the practice of crossing to a polymorphic background to generate a mapping population can be problematic if the mutant phenotype is difficult to recognize in the hybrid F2 progeny, or dependent on parental specific traits. Here in a screen for leaf hyponasty mutants, we have performed a single backcross of an Ethane Methyl Sulphonate (EMS) generated hyponastic mutant to its parent. Whole genome deep sequencing of a bulked homozygous F2 population and analysis via the Next Generation EMS mutation mapping pipeline (NGM) unambiguously determined the causal mutation to be a single nucleotide polymorphisim (SNP) residing in HASTY, a previously characterized gene involved in microRNA biogenesis. We have evaluated the feasibility of this backcross approach using three additional SNP mapping pipelines; SHOREmap, the GATK pipeline, and the samtools pipeline. Although there was variance in the identification of EMS SNPs, all returned the same outcome in clearly identifying the causal mutation in HASTY. The simplicity of performing a single parental backcross and genome sequencing a small pool of segregating mutants has great promise for identifying mutations that may be difficult to map using conventional approaches.
Resumo:
Breast cancer is the second most common cancer worldwide and the most common cancer reported in women. This malignant tumour is characterised by a number of specific features including uncontrolled cell proliferation. It ranks fifth in the world as a cause of cancer death in women. Early diagnosis increases 5 year survival rates up to 95%. Heparan sulfate proteoglycans (HSPGs) are complex proteins composed of a core protein to which a number of highly sulfated side chains are synthesised by a highly co-ordinated process resulting in distinct sulfation patterns, which determine specific interations with cell-signaling partners including growth factors, their receptors, ligands and morphogens. The enzymes responsible for chain initiation, elongation and sulfation are critical for creating HS chain variability conferring biological functionality. This study investigated single nucleotide polymorphism in SULF1, the enzyme responsible for the 6-0 desulfation of heparan sulfate side chains. We investigated this SNP in an Australian Caucasian case-control breast cancer population and found a significant association between SULF1 and breast cancer at both the allelic and genotypic level (allele, p=0.016; genotype, p=0.032). Our results suggest the res2623047 SNP in SULF1 may impact breast cancer susceptibility. Specifically, the T allele of rs2623047 in SULF1 is associated with a increased risk of developing breast cancer in our cohort. The identification of markers including SULF1 may improve detection of this disease at its earliest stages improving patient treatment and prognosis.
Resumo:
The discovery of genetic factors that contribute to schizophrenia susceptibility is a key challenge in understanding the etiology of this disease. Here, we report the identification of a novel schizophrenia candidate gene on chromosome 1q32, plexin A2 (PLXNA2), in a genome-wide association study using 320 patients with schizophrenia of European descent and 325 matched controls. Over 25,000 single-nucleotide polymorphisms (SNPs) located within approximately 14,000 genes were tested. Out of 62 markers found to be associated with disease status, the most consistent finding was observed for a candidate locus on chromosome 1q32. The marker SNP rs752016 showed suggestive association with schizophrenia (odds ratio (OR) = 1.49, P = 0.006). This result was confirmed in an independent case-control sample of European Americans (combined OR = 1.38, P = 0.035) and similar genetic effects were observed in smaller subsets of Latin Americans (OR = 1.26) and Asian Americans (OR = 1.37). Supporting evidence was also obtained from two family-based collections, one of which reached statistical significance (OR = 2.2, P = 0.02). High-density SNP mapping showed that the region of association spans approximately 60 kb of the PLXNA2 gene. Eight out of 14 SNPs genotyped showed statistically significant differences between cases and controls. These results are in accordance with previous genetic findings that identified chromosome 1q32 as a candidate region for schizophrenia. PLXNA2 is a member of the transmembrane semaphorin receptor family that is involved in axonal guidance during development and may modulate neuronal plasticity and regeneration. The PLXNA2 ligand semaphorin 3A has been shown to be upregulated in the cerebellum of individuals with schizophrenia. These observations, together with the genetic results, make PLXNA2 a likely candidate for the 1q32 schizophrenia susceptibility locus.
Resumo:
Pangasianodon hypophthalmus is a commercially important freshwater fish used in inland aquaculture in the Mekong Delta, Vietnam. The current study using Ion Torrent technology generated EST resources from the kidney for Tra catfish reared at a salinity level of 9 ppt. We obtained 2,623,929 reads after trimming and processing with an average length of 104 bp. De novo assemblies were generated using CLC Genomic Workbench, Trinity and Velvet/Oases with the best overall contig performance resulting from the CLC assembly. De novo assembly using CLC yielded 29,940 contigs, and allowing identification of 5,710 putative genes when comppared with NCBI non-redundant database. A large number of single nucleotide polymorphisms (SNPs) were also detected. The sequence collection generated in our study represents the most comprehensive transcriptomic resource for P. hypophthalmus available to date.