267 resultados para Protein annotation
em Queensland University of Technology - ePrints Archive
Resumo:
Background The sequencing, de novo assembly and annotation of transcriptome datasets generated with next generation sequencing (NGS) has enabled biologists to answer genomic questions in non-model species with unprecedented ease. Reliable and accurate de novo assembly and annotation of transcriptomes, however, is a critically important step for transcriptome assemblies generated from short read sequences. Typical benchmarks for assembly and annotation reliability have been performed with model species. To address the reliability and accuracy of de novo transcriptome assembly in non-model species, we generated an RNAseq dataset for an intertidal gastropod mollusc species, Nerita melanotragus, and compared the assembly produced by four different de novo transcriptome assemblers; Velvet, Oases, Geneious and Trinity, for a number of quality metrics and redundancy. Results Transcriptome sequencing on the Ion Torrent PGM™ produced 1,883,624 raw reads with a mean length of 133 base pairs (bp). Both the Trinity and Oases de novo assemblers produced the best assemblies based on all quality metrics including fewer contigs, increased N50 and average contig length and contigs of greater length. Overall the BLAST and annotation success of our assemblies was not high with only 15-19% of contigs assigned a putative function. Conclusions We believe that any improvement in annotation success of gastropod species will require more gastropod genome sequences, but in particular an increase in mollusc protein sequences in public databases. Overall, this paper demonstrates that reliable and accurate de novo transcriptome assemblies can be generated from short read sequencers with the right assembly algorithms. Keywords: Nerita melanotragus; De novo assembly; Transcriptome; Heat shock protein; Ion torrent
Resumo:
Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..
Resumo:
Staphylococcus aureus (S. aureus) is a prominent human and livestock pathogen investigated widely using omic technologies. Critically, due to availability, low visibility or scattered resources, robust network and statistical contextualisation of the resulting data is generally under-represented. Here, we present novel meta-analyses of freely-accessible molecular network and gene ontology annotation information resources for S. aureus omics data interpretation. Furthermore, through the application of the gene ontology annotation resources we demonstrate their value and ability (or lack-there-of) to summarise and statistically interpret the emergent properties of gene expression and protein abundance changes using publically available data. This analysis provides simple metrics for network selection and demonstrates the availability and impact that gene ontology annotation selection can have on the contextualisation of bacterial omics data.
Clustering of Protein Structures Using Hydrophobic Free Energy And Solvent Accessibility of Proteins
Resumo:
This work is concerned with the genetic basis of normal human pigmentation variation. Specifically, the role of polymorphisms within the solute carrier family 45 member 2 (SLC45A2 or membrane associated transporter protein; MATP) gene were investigated with respect to variation in hair, skin and eye colour ― both between and within populations. SLC45A2 is an important regulator of melanin production and mutations in the gene underly the most recently identified form of oculocutaneous albinism. There is evidence to suggest that non-synonymous polymorphisms in SLC45A2 are associated with normal pigmentation variation between populations. Therefore, the underlying hypothesis of this thesis is that polymorphisms in SLC45A2 will alter the function or regulation of the protein, thereby altering the important role it plays in melanogenesis and providing a mechanism for normal pigmentation variation. In order to investigate the role that SLC45A2 polymorphisms play in human pigmentation variation, a DNA database was established which collected pigmentation phenotypic information and blood samples of more than 700 individuals. This database was used as the foundation for two association studies outlined in this thesis, the first of which involved genotyping two previously-described non-synonymous polymorphisms, p.Glu272Lys and p.Phe374Leu, in four different population groups. For both polymorphisms, allele frequencies were significantly different between population groups and the 272Lys and 374Leu alleles were strongly associated with black hair, brown eyes and olive skin colour in Caucasians. This was the first report to show that SLC45A2 polymorphisms were associated with normal human intra-population pigmentation variation. The second association study involved genotyping several SLC45A2 promoter polymorphisms to determine if they also played a role in pigmentation variation. Firstly, the transcription start site (TSS), and hence putative proximal promoter region, was identified using 5' RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE). Two alternate TSSs were identified and the putative promoter region was screened for novel polymorphisms using denaturing high performance liquid chromatography (dHPLC). A novel duplication (c.–1176_–1174dupAAT) was identified along with other previously described single nucleotide polymorphisms (c.–1721C>G and c.–1169G>A). Strong linkage disequilibrium ensured that all three polymorphisms were associated with skin colour such that the –1721G, +dup and –1169A alleles were associated with olive skin in Caucasians. No linkage disequilibrium was observed between the promoter and coding region polymorphisms, suggesting independent effects. The association analyses were complemented with functional data, showing that the –1721G, +dup and –1169A alleles significantly decreased SLC45A2 transcriptional activity. Based on in silico bioinformatic analysis that showed these alleles remove a microphthalmia-associated transcription factor (MITF) binding site, and that MITF is a known regulator of SLC45A2 (Baxter and Pavan, 2002; Du and Fisher, 2002), it was postulated that SLC45A2 promoter polymorphisms could contribute to the regulation of pigmentation by altering MITF binding affinity. Further characterisation of the SLC45A2 promoter was carried out using luciferase reporter assays to determine the transcriptional activity of different regions of the promoter. Five constructs were designed of increasing length and their promoter activity evaluated. Constitutive promoter activity was observed within the first ~200 bp and promoter activity increased as the construct size increased. The functional impact of the –1721G, +dup and –1169A alleles, which removed a MITF consensus binding site, were assessed using electrophoretic mobility shift assays (EMSA) and expression analysis of genotyped melanoblast and melanocyte cell lines. EMSA results confirmed that the promoter polymorphisms affected DNA-protein binding. Interestingly, however, the protein/s involved were not MITF, or at least MITF was not the protein directly binding to the DNA. In an effort to more thoroughly characterise the functional consequences of SLC45A2 promoter polymorphisms, the mRNA expression levels of SLC45A2 and MITF were determined in melanocyte/melanoblast cell lines. Based on SLC45A2’s role in processing and trafficking TYRP1 from the trans-Golgi network to stage 2 melanosmes, the mRNA expression of TYRP1 was also investigated. Expression results suggested a coordinated expression of pigmentation genes. This thesis has substantially contributed to the field of pigmentation by showing that SLC45A2 polymorphisms not only show allele frequency differences between population groups, but also contribute to normal pigmentation variation within a Caucasian population. In addition, promoter polymorphisms have been shown to have functional consequences for SLC45A2 transcription and the expression of other pigmentation genes. Combined, the data presented in this work supports the notion that SLC45A2 is an important contributor to normal pigmentation variation and should be the target of further research to elucidate its role in determining pigmentation phenotypes. Understanding SLC45A2’s function may lead to the development of therapeutic interventions for oculocutaneous albinism and other disorders of pigmentation. It may also help in our understanding of skin cancer susceptibility and evolutionary adaptation to different UV environments, and contribute to the forensic application of pigmentation phenotype prediction.
Resumo:
Infection of plant cells by potyviruses induces the formation of cytoplasmic inclusions ranging in size from 200 to 1000 nm. To determine if the ability to form these ordered, insoluble structures is intrinsic to the potyviral cytoplasmic inclusion protein, we have expressed the cytoplasmic inclusion protein from Potato virus Y in tobacco under the control of the chrysanthemum ribulose-1,5-bisphosphate carboxylase small subunit promoter, a highly active, green tissue promoter. No cytoplasmic inclusions were observed in the leaves of transgenic tobacco using transmission electron microscopy, despite being able to clearly visualize these inclusions in Potato virus Y infected tobacco leaves under the same conditions. However, we did observe a wide range of tissue and sub-cellular abnormalities associated with the expression of the Potato virus Y cytoplasmic inclusion protein. These changes included the disruption of normal cell morphology and organization in leaves, mitochondrial and chloroplast internal reorganization, and the formation of atypical lipid accumulations. Despite these significant structural changes, however, transgenic tobacco plants were viable and the results are discussed in the context of potyviral cytoplasmic inclusion protein function.
Resumo:
Image annotation is a significant step towards semantic based image retrieval. Ontology is a popular approach for semantic representation and has been intensively studied for multimedia analysis. However, relations among concepts are seldom used to extract higher-level semantics. Moreover, the ontology inference is often crisp. This paper aims to enable sophisticated semantic querying of images, and thus contributes to 1) an ontology framework to contain both visual and contextual knowledge, and 2) a probabilistic inference approach to reason the high-level concepts based on different sources of information. The experiment on a natural scene database from LabelMe database shows encouraging results.
Resumo:
To date, automatic recognition of semantic information such as salient objects and mid-level concepts from images is a challenging task. Since real-world objects tend to exist in a context within their environment, the computer vision researchers have increasingly incorporated contextual information for improving object recognition. In this paper, we present a method to build a visual contextual ontology from salient objects descriptions for image annotation. The ontologies include not only partOf/kindOf relations, but also spatial and co-occurrence relations. A two-step image annotation algorithm is also proposed based on ontology relations and probabilistic inference. Different from most of the existing work, we specially exploit how to combine representation of ontology, contextual knowledge and probabilistic inference. The experiments show that image annotation results are improved in the LabelMe dataset.