70 resultados para protein sequence classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, we defined a new syndromic form of X-linked mental retardation in a 4-generation family with a unique clinical phenotype characterized by mild mental retardation, choreoathetosis, and abnormal behavior (MRXS10). Linkage analysis in this family revealed a candidate region of 13.4 Mb between markers DXS1201 and DXS991 on Xp11; therefore, mutation analysis was performed by direct sequencing in most of the 135 annotated genes located in the region. The gene (HADH2) encoding L-3-hydroxyacyl-CoA dehydrogenase II displayed a sequence alteration (c.574 C-->A; p.R192R) in all patients and carrier females that was absent in unaffected male family members and could not be found in 2,500 control X chromosomes, including in those of 500 healthy males. The silent C-->A substitution is located in exon 5 and was shown by western blot to reduce the amount of HADH2 protein by 60%-70% in the patient. Quantitative in vivo and in vitro expression studies revealed a ratio of splicing transcript amounts different from those normally seen in controls. Apparently, the reduced expression of the wild-type fragment, which results in the decreased protein expression, rather than the increased amount of aberrant splicing fragments of the HADH2 gene, is pathogenic. Our data therefore strongly suggest that reduced expression of the HADH2 protein causes MRXS10, a phenotype different from that caused by 2-methyl-3-hydroxybutyryl-CoA dehydrogenase deficiency, which is a neurodegenerative disorder caused by missense mutations in this multifunctional protein.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intrinsically disordered proteins (IDPs) are a relatively recently defined class of proteins which, under native conditions, lack a unique tertiary structure whilst maintaining essential biological functions. Functional classification of IDPs have implicated such proteins as being involved in various physiological processes including transcription and translation regulation, signal transduction and protein modification. Actinidia DRM1 (Ade DORMANCY ASSOCIATED GENE 1), represents a robust dormancy marker whose mRNA transcript expression exhibits a strong inverse correlation with the onset of growth following periods of physiological dormancy. Bioinformatic analyses suggest that DRM1 is plant specific and highly conserved at both the nucleotide and protein levels. It is predicted to be an intrinsically disordered protein with two distinct highly conserved domains. Several Actinidia DRM1 homologues, which align into two distinct Actinidia-specific families, Type I and Type II, have been identified. No candidates for the Arabidopsis DRM1-Homologue (AtDRM2) an additional family member, has been identified in Actinidia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The expression of transgenes in plant genomes can be inhibited by either transcriptional gene silencing or posttranscriptional gene silencing (PTGS). Overexpression of the chalcone synthase-A (CHS-A) transgene triggers PTGS of CHS-A and thus results in loss of flower pigmentation in petunia. We previously demonstrated that epigenetic inactivation of CHS-A transgene transcription leads to a reversion of the PTGS phenotype. Although neomycin phosphotransferase II (nptII), a marker gene co-introduced into the genome with the CHS-A transgene, is not normally silenced in petunia, even when CHS-A is silenced, here we found that nptII was silenced in a petunia line in which CHS-A PTGS was induced, but not in the revertant plants that had no PTGS of CHS-A. Transcriptional activity, accumulation of short interfering RNAs, and restoration of mRNA level after infection with viruses that had suppressor proteins of gene silencing indicated that the mechanism for nptII silencing was posttranscriptional. Read-through transcripts of the CHS-A gene toward the nptII gene were detected. Deep-sequencing analysis revealed a striking difference between the predominant size class of small RNAs produced from the read-through transcripts (22 nt) and that from the CHS-A RNAs (21 nt). These results implicate the involvement of read-through transcription and distinct phases of RNA degradation in the coincident PTGS of linked transgenes and provide new insights into the destabilization of transgene expression.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The 3′ UTRs of eukaryotic genes participate in a variety of post-transcriptional (and some transcriptional) regulatory interactions. Some of these interactions are well characterised, but an undetermined number remain to be discovered. While some regulatory sequences in 3′ UTRs may be conserved over long evolutionary time scales, others may have only ephemeral functional significance as regulatory profiles respond to changing selective pressures. Here we propose a sensitive segmentation methodology for investigating patterns of composition and conservation in 3′ UTRs based on comparison of closely related species. We describe encodings of pairwise and three-way alignments integrating information about conservation, GC content and transition/transversion ratios and apply the method to three closely related Drosophila species: D. melanogaster, D. simulans and D. yakuba. Incorporating multiple data types greatly increased the number of segment classes identified compared to similar methods based on conservation or GC content alone. We propose that the number of segments and number of types of segment identified by the method can be used as proxies for functional complexity. Our main finding is that the number of segments and segment classes identified in 3′ UTRs is greater than in the same length of protein-coding sequence, suggesting greater functional complexity in 3′ UTRs. There is thus a need for sustained and extensive efforts by bioinformaticians to delineate functional elements in this important genomic fraction. C code, data and results are available upon request.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, the complete mitochondrial genome of Acraea issoria (Lepidoptera: Nymphalidae: Heliconiinae: Acraeini) is reported; a circular molecule of 15,245 bp in size. For A. issoria, genes are arranged in the same order and orientation as the complete sequenced mitochondrial genomes of the other lepidopteran species, except for the presence of an extra copy of tRNAIle(AUR)b in the control region. All protein-coding genes of A. issoria mitogenome start with a typical ATN codon and terminate in the common stop codon TAA, except that COI gene uses TTG as its initial codon and terminates in a single T residue. All tRNA genes possess the typical clover leaf secondary structure except for tRNASer(AGN), which has a simple loop with the absence of the DHU stem. The sequence, organization and other features including nucleotide composition and codon usage of this mitochondrial genome were also reported and compared with those of other sequenced lepidopterans mitochondrial genomes. There are some short microsatellite-like repeat regions (e.g., (TA)9, polyA and polyT) scattered in the control region, however, the conspicuous macro-repeats units commonly found in other insect species are absent.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Uropathogenic Escherichia coli (UPEC) is responsible for the majority of urinary tract infections (UTI). To cause a UTI, UPEC must adhere to the epithelial cells of the urinary tract and overcome the shear flow forces of urine. This function is mediated primarily by fimbrial adhesins, which mediate specific attachment to host cell receptors. Another group of adhesins that contributes to UPEC-mediated UTI is autotransporter (AT) proteins. AT proteins possess a range of virulence properties, such as adherence, aggregation, invasion, and biofilm formation. One recently characterized AT protein of UPEC is UpaH, a large adhesin-involved-in-diffuse-adherence (AIDA-I)-type AT protein that contributes to biofilm formation and bladder colonization. In this study we characterized a series of naturally occurring variants of UpaH. We demonstrate that extensive sequence variation exists within the passenger-encoding domain of UpaH variants from different UPEC strains. This sequence variation is associated with functional heterogeneity with respect to the ability of UpaH to mediate biofilm formation. In contrast, all of the UpaH variants examined retained a conserved ability to mediate binding to extracellular matrix (ECM) proteins. Bioinformatic analysis of the UpaH passenger domain identified a conserved region (UpaHCR) and a hydrophobic region (UpaHHR). Deletion of these domains reduced biofilm formation but not the binding to ECM proteins. Despite variation in the upaH sequence, the transcription of upaH was repressed by a conserved mechanism involving the global regulator H-NS, and mutation of the hns gene relieved this repression. Overall, our findings shed new light on the regulation and functions of the UpaH AT protein.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Autotransporter (AT) proteins are found in all Escherichia coli pathotypes and are often associated with virulence. In this study we took advantage of the large number of available E. coli genome sequences to perform an in-depth bioinformatic analysis of AT-encoding genes. Twenty-eight E. coli genome sequences were probed using an iterative approach, which revealed a total of 215 AT-encoding sequences that represented three major groups of distinct domain architecture: (i) serine protease AT proteins, (ii) trimeric AT adhesins and (iii) AIDA-I-type AT proteins. A number of subgroups were identified within each broad category, and most subgroups contained at least one characterized AT protein; however, seven subgroups contained no previously described proteins. The AIDA-I-type AT proteins represented the largest and most diverse group, with up to 16 subgroups identified from sequence-based comparisons. Nine of the AIDA-I-type AT protein subgroups contained at least one protein that possessed functional properties associated with aggregation and/or biofilm formation, suggesting a high degree of redundancy for this phenotype. The Ag43, YfaL/EhaC, EhaB/UpaC and UpaG subgroups were found in nearly all E. coli strains. Among the remaining subgroups, there was a tendency for AT proteins to be associated with individual E. coli pathotypes, suggesting that they contribute to tissue tropism or symptoms specific to different disease outcomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Escherichia coli is the primary cause of urinary tract infection (UTI) in the developed world. The major factors associated with virulence of uropathogenic E. coli (UPEC) are fimbrial adhesins, which mediate specific attachment to host receptors and trigger innate host responses. Another group of adhesins is represented by the autotransporter (AT) subgroup of proteins. In this study, we identified a new AT-encoding gene, termed upaH, present in a 6.5-kb unannotated intergenic region in the genome of the prototypic UPEC strain CFT073. Cloning and sequencing of the upaH gene from CFT073 revealed an intact 8.535-kb coding region, contrary to the published genome sequence. The upaH gene was widely distributed among a large collection of UPEC isolates as well as the E. coli Reference (ECOR) strain collection. Bioinformatic analyses suggest β-helix as the predominant structure in the large N-terminal passenger (α) domain and a 12-strand β-barrel for the C-terminal β-domain of UpaH. We demonstrated that UpaH is expressed at the cell surface of CFT073 and promotes biofilm formation. In the mouse UTI model, deletion of the upaH gene in CFT073 and in two other UPEC strains did not significantly affect colonization of the bladder in single-challenge experiments. However, in competitive colonization experiments, CFT073 significantly outcompeted its upaH isogenic mutant strain in urine and the bladder.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Designing novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone. Results We present STAR, Site Targeted Amino acid Recombination predictor, which produces a score indicating the structural disruption caused by recombination, for each position in an amino acid sequence. Example predictions contrasted with those of alternative tools, illustrate STAR'S utility to assist in determining useful recombination sites. Overall, the correlation coefficient between the output of the experimentally validated protein design algorithm SCHEMA and the prediction of STAR is very high (0.89). Conclusion STAR allows the user to explore useful recombination sites in amino acid sequences with unknown structure and unknown evolutionary origin. The predictor service is available from http://pprowler.itee.uq.edu.au/star.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. Results RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene. Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. Conclusions This transcriptomic dataset is a useful resource for molecular genetic studies of the koala, for evolutionary genetic studies of marsupials, for validation and annotation of the koala genome sequence, and for investigation of koala retrovirus. Annotated transcripts can be browsed and queried at http://koalagenome.org

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Methods are presented for the production, affinity purification and analysis of plasmid DNA (pDNA). Batch fermentation is used for the production of the pDNA, and expanded bed chromatography, via the use of a dual affinity glutathione S-transferase (GST) fusion protein, is used for the capture and purification of the pDNA. The protein is composed of GST, which displays affinity for glutathione immobilized to a solid-phase adsorbent, fused to a zinc finger transcription factor, which displays affinity for a target 9-base pair sequence contained within the target pDNA. A Picogreen™ fluorescence assay and/or anx ethidium bromide agarose gel electrophoresis assay can be used to analyze the eluted pDNA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic factors contribute to risk of many common diseases affecting reproduction and fertility. In recent years, methods for genome-wide association studies(GWAS) have revolutionized gene discovery forcommontraits and diseases. Results of GWAS are documented in the Catalog of Published Genome-Wide Association Studies at the National Human Genome Research Institute and report over 70 publications for 32 traits and diseases associated with reproduction. These include endometriosis, uterine fibroids, age at menarche and age at menopause. Results that pass appropriate stringent levels of significance are generally well replicated in independent studies. Examples of genetic variation affecting twinning rate, infertility, endometriosis and age at menarche demonstrate that the spectrum of disease-related variants for reproductive traits is similar to most other common diseases.GWAS 'hits' provide novel insights into biological pathways and the translational value of these studies lies in discovery of novel gene targets for biomarkers, drug development and greater understanding of environmental factors contributing to disease risk. Results also show that genetic data can help define sub-types of disease and co-morbidity with other traits and diseases. To date, many studies on reproductive traits have used relatively small samples. Future genetic marker studies in large samples with detailed phenotypic and clinical information will yield new insights into disease risk, disease classification and co-morbidity for many diseases associated with reproduction and infertility.