54 resultados para Protein Sequence
Resumo:
Genomic and proteomic analyses have attracted a great deal of interests in biological research in recent years. Many methods have been applied to discover useful information contained in the enormous databases of genomic sequences and amino acid sequences. The results of these investigations inspire further research in biological fields in return. These biological sequences, which may be considered as multiscale sequences, have some specific features which need further efforts to characterise using more refined methods. This project aims to study some of these biological challenges with multiscale analysis methods and stochastic modelling approach. The first part of the thesis aims to cluster some unknown proteins, and classify their families as well as their structural classes. A development in proteomic analysis is concerned with the determination of protein functions. The first step in this development is to classify proteins and predict their families. This motives us to study some unknown proteins from specific families, and to cluster them into families and structural classes. We select a large number of proteins from the same families or superfamilies, and link them to simulate some unknown large proteins from these families. We use multifractal analysis and the wavelet method to capture the characteristics of these linked proteins. The simulation results show that the method is valid for the classification of large proteins. The second part of the thesis aims to explore the relationship of proteins based on a layered comparison with their components. Many methods are based on homology of proteins because the resemblance at the protein sequence level normally indicates the similarity of functions and structures. However, some proteins may have similar functions with low sequential identity. We consider protein sequences at detail level to investigate the problem of comparison of proteins. The comparison is based on the empirical mode decomposition (EMD), and protein sequences are detected with the intrinsic mode functions. A measure of similarity is introduced with a new cross-correlation formula. The similarity results show that the EMD is useful for detection of functional relationships of proteins. The third part of the thesis aims to investigate the transcriptional regulatory network of yeast cell cycle via stochastic differential equations. As the investigation of genome-wide gene expressions has become a focus in genomic analysis, researchers have tried to understand the mechanisms of the yeast genome for many years. How cells control gene expressions still needs further investigation. We use a stochastic differential equation to model the expression profile of a target gene. We modify the model with a Gaussian membership function. For each target gene, a transcriptional rate is obtained, and the estimated transcriptional rate is also calculated with the information from five possible transcriptional regulators. Some regulators of these target genes are verified with the related references. With these results, we construct a transcriptional regulatory network for the genes from the yeast Saccharomyces cerevisiae. The construction of transcriptional regulatory network is useful for detecting more mechanisms of the yeast cell cycle.
Resumo:
Hepatitis C virus (HCV ) core (C) protein is thought to bind to viral RNA before it undergoes oligomerization leading to RNA encapsidation. Details of these events are so far unknown. The 5ʹ-terminal C protein coding sequence that includes an adenine (A)-rich tract is a part of an internal ribosome entry site(IRES). This nucleotide sequence but not the corresponding protein sequence is needed for proper initiation of translation of viral RNA by an IRES-dependent mechanism. In this study, we examined the importance of this sequence for the ability of the C protein to bind to viral RNA. Serially truncated C proteins with deletions from 10 up to 45 N-terminal amino acids were expressed in Escherichia coli, purified and tested for binding to viral RNA by a gel shift assay. The results showed that truncation of the C protein from its N-terminus by more than 10 amino acids abolished almost completely its expression in E. coli. The latter could be restored by adding a tag to the N-terminus of the protein. The tagged proteins truncated by 15 or more amino acids showed an anomalous migration in SDS-PAGE. Truncation by more than 20 amino acids resulted in a complete loss of ability of tagged C protein to bind to viral RNA. These results provide clues to the early events in the C protein - RNA interactions leading to C protein oligomerization, RNA encapsidation and virion assembly.
Resumo:
Background Flower development in kiwifruit (Actinidia spp.) is initiated in the first growing season, when undifferentiated primordia are established in latent shoot buds. These primordia can differentiate into flowers in the second growing season, after the winter dormancy period and upon accumulation of adequate winter chilling. Kiwifruit is an important horticultural crop, yet little is known about the molecular regulation of flower development. Results To study kiwifruit flower development, nine MADS-box genes were identified and functionally characterized. Protein sequence alignment, phenotypes obtained upon overexpression in Arabidopsis and expression patterns suggest that the identified genes are required for floral meristem and floral organ specification. Their role during budbreak and flower development was studied. A spontaneous kiwifruit mutant was utilized to correlate the extended expression domains of these flowering genes with abnormal floral development. Conclusions This study provides a description of flower development in kiwifruit at the molecular level. It has identified markers for flower development, and candidates for manipulation of kiwifruit growth, phase change and time of flowering. The expression in normal and aberrant flowers provided a model for kiwifruit flower development.
Resumo:
Background The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Resumo:
Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.
Resumo:
In the last decade we have come to understand that the growth of cancer cells in general and of breast cancer in particular depends, in many cases, upon growth factors that will bind to and activate their receptors. One of these growth factor receptors is the erbB-2 protein which plays an important role in the prognosis of breast cancer and is overexpressed in nearly 30% of human breast cancer patients. While evidence accumulates to support the relationship between erbB-2 overexpression and poor overall survival in breast cancer, understanding of the biological consequence(s) of erbB-2 overexpression remains elusive. Our recent discovery of the gp30 has allowed us to identify a number of related but distinct biological endpoints which appear responsive to signal transduction through the erbB-2 receptor. These endpoints of growth, invasiveness, and differentiation have clear implications for the emergence, maintenance and/or control of malignancy, and represent established endpoints in the assessment of malignant progression in breast cancer. We have shown that gp30 induces a biphasic growth effect on cells with erbB-2 over-expression. We have recently determined the protein sequence of gp30 and cloned its full length cDNA sequence. We have also cloned two additional forms to the ligand, that are believed to be different isoforms. We are currently expressing the different forms in order to determine their biological effects. To elucidate the cellular mechanisms underlying cell growth inhibition by gp30, we tested the effect of this ligand on cell growth and differentiation of the human breast cancer cells which overexpress erbB-2 and cells which express low levels of this protooncogene. High concentrations of ligand induced differentiation of cells overexpressing erbB-2, as measured by inhibition of cell growth, and increased synthesis of milk components, and modulation of E-cadherin and up- regulation of c-jun and c-fos. These findings indicate that ligand-induced growth inhibition in cells overexpressing erbB-2 is associated with an apparent induction of differentiation. The availability of gp30 derived synthetic peptides and its full cDNAs provides tools necessary to acquire a better understanding of the mechanism of action of the this ligands and the erbB-2 receptor in breast cancer.
Resumo:
Bahia grass, Paspalum notatum, is an important pollen allergen source with a long season of pollination and wide distribution in subtropical and temperate regions. We aimed to characterize the 55. kDa allergen of Bahia grass pollen (BaGP) and ascertain its clinical importance. BaGP extract was separated by 2D-PAGE and immunoblotted with serum IgE of a grass pollen-allergic patient. The amino-terminal protein sequence of the predominant allergen isoform at 55. kDa had similarity with the group 13 allergens of Timothy grass and maize pollen, Phl p 13 and Zea m 13. Four sequences obtained by rapid amplification of the allergen cDNA ends represented multiple isoforms of Pas n 13. The predicted full length cDNA for Pas n 13 encoded a 423 amino acid glycoprotein including a signal peptide of 28 residues and with a predicted pI of 7.0. Tandem mass spectrometry of tryptic peptides of 2D gel spots identified peptides specific to the deduced amino acid sequence for each of the four Pas n 13 cDNA, representing 47% of the predicted mature protein sequence of Pas n 13. There was 80.6% and 72.6% amino acid identity with Zea m 13 and Phl p 13, respectively. Reactivity with a Phl p 13-specific monoclonal antibody AF6 supported designation of this allergen as Pas n 13. The allergen was purified from BaGP extract by ammonium sulphate precipitation, hydrophobic interaction and size exclusion chromatography. Purified Pas n 13 reacted with serum IgE of 34 of 71 (48%) grass pollen-allergic patients and specifically inhibited IgE reactivity with the 55. kDa band of BaGP for two grass pollen-allergic donors. Four isoforms of Pas n 13 from pI 6.3-7.8 had IgE-reactivity with grass pollen allergic sera. The allergenic activity of purified Pas n 13 was demonstrated by activation of basophils from whole blood of three grass pollen-allergic donors tested but not control donors. Pas n 13 is thus a clinically relevant pollen allergen of the subtropical Bahia grass likely to be important in eliciting seasonal allergic rhinitis and asthma in grass pollen-allergic patients.
Resumo:
Background. A variety of interactions between up to three different movement proteins (MPs), the coat protein (CP) and genomic DNA mediate the inter- and intra-cellular movement of geminiviruses in the genus Begomovirus. Although movement of viruses in the genus Mastrevirus is less well characterized, direct interactions between a single MP and the CP of these viruses is also clearly involved in both intra- and intercellular trafficking of virus genomic DNA. However, it is currently unknown how specific these MP-CP interactions are, nor how disruption of these interactions might impact on virus viability. Results. Using chimaeric genomes of two strains of Maize streak virus (MSV) we adopted a genetic approach to investigate the gross biological effects of interfering with interactions between virus MP and CP homologues derived from genetically distinct MSV isolates. MP and CP genes were reciprocally exchanged, individually and in pairs, between maize (MSV-Kom)- and Setaria sp. (MSV-Set)-adapted isolates sharing 78% genome-wide sequence identity. All chimaeras were infectious in Zea mays c.v. Jubilee and were characterized in terms of symptomatology and infection efficiency. Compared with their parental viruses, all the chimaeras were attenuated in symptom severity, infection efficiency, and the rate at which symptoms appeared. The exchange of individual MP and CP genes resulted in lower infection efficiency and reduced symptom severity in comparison with exchanges of matched MP-CP pairs. Conclusion. Specific interactions between the mastrevirus MP and CP genes themselves and/or their expression products are important determinants of infection efficiency, rate of symptom development and symptom severity. © 2008 van der Walt et al; licensee BioMed Central Ltd.
Resumo:
Potato leafroll virus (PLRV) is a positive-strand RNA virus that generates subgenomic RNAs (sgRNA) for expression of 3' proximal genes. Small RNA (sRNA) sequencing and mapping of the PLRV-derived sRNAs revealed coverage of the entire viral genome with the exception of four distinctive gaps. Remarkably, these gaps mapped to areas of PLRV genome with extensive secondary structures, such as the internal ribosome entry site and 5' transcriptional start site of sgRNA1 and sgRNA2. The last gap mapped to ~500. nt from the 3' terminus of PLRV genome and suggested the possible presence of an additional sgRNA for PLRV. Quantitative real-time PCR and northern blot analysis confirmed the expression of sgRNA3 and subsequent analyses placed its 5' transcriptional start site at position 5347 of PLRV genome. A regulatory role is proposed for the PLRV sgRNA3 as it encodes for an RNA-binding protein with specificity to the 5' of PLRV genomic RNA. © 2013.
Resumo:
GPV is a Chinese serotype isolate of barley yellow dwarf virus (BYDV) that has no reaction with antiserum of MAV, PAV, SGV, RPV and RMV The sequence of the coat protein (CP) of GPV isolate of BYDV was identified and its amino acid sequence was deduced. The coding region for the putative GPV CP is 603 bases nucleotides and encodes a Mr 22 218 (22 ku) protein. The same as MAV, PAV and RPV, GPV contained a second ORF within the coat protein coding region. This protein of 17 024 Mr (17 ku) is thought to correspond to the Virion protein genome linked (Vpg). Sequence comparisons of the CP coding region between the GPV isolate of BYDV and other isolates of BYDV have been done. The nucleotide and amino acid sequence homology of GPV has a greater identity to the sequence of RPV than those of PAV and MAV. The GPV CP sequence stored 83.7% of nucleotide similarity and 77.5% of deduced amino acid similarity, whereas that of the PAV and MAV shared 56.9%, 53.2% and 44.1%, 43.8% respectively. According to BYDV-GPV CP sequence, two primers were designed. The cDNA of CP was produced by RT-PCR. Full-length cDNA of CP was inserted into plasmid to construct expression plasmids named pPPI1, pPPI2 and pPPI5 based on different promoters. The recombinant plasmids were identified by using α-32P-dATP labelled CP probe, α-32P-ATP labelled GPV RNA probe and sequencing to confirm real GPV CP gene cDNA in plasmids.
Resumo:
The nucleotide sequence of the coat protein gene of barley yellow dwarf virus (BYDV, PAV serotype) was determined, and the amino acid sequence was deduced. The open reading frame, encoding a protein of relative molecular mass (Mr) 22,047, was confirmed as the coat protein gene by comparison with amino acid sequences of tryptic peptides derived from dissociated virions. In addition, a fragment of this gene expressed in Escherichia coli produced a product which was recognized by antibodies prepared against purified BYDV virions. An overlapping reading frame encoding an Mr 17,147 protein is contained completely within the coat protein gene. © 1988.
Resumo:
Neurodegenerative disorders are heterogenous in nature and include a range of ataxias with oculomotor apraxia, which are characterised by a wide variety of neurological and ophthalmological features. This family includes recessive and dominant disorders. A subfamily of autosomal recessive cerebellar ataxias are characterised by defects in the cellular response to DNA damage. These include the well characterised disorders Ataxia-Telangiectasia (A-T) and Ataxia-Telangiectasia Like Disorder (A-TLD) as well as the recently identified diseases Spinocerebellar ataxia with axonal neuropathy Type 1 (SCAN1), Ataxia with Oculomotor Apraxia Type 2 (AOA2), as well as the subject of this thesis, Ataxia with Oculomotor Apraxia Type 1 (AOA1). AOA1 is caused by mutations in the APTX gene, which is located at chromosomal locus 9p13. This gene codes for the 342 amino acid protein Aprataxin. Mutations in APTX cause destabilization of Aprataxin, thus AOA1 is a result of Aprataxin deficiency. Aprataxin has three functional domains, an N-terminal Forkhead Associated (FHA) phosphoprotein interaction domain, a central Histidine Triad (HIT) nucleotide hydrolase domain and a C-terminal C2H2 zinc finger. Aprataxins FHA domain has homology to FHA domain of the DNA repair protein 5’ polynucleotide kinase 3’ phosphatase (PNKP). PNKP interacts with a range of DNA repair proteins via its FHA domain and plays a critical role in processing damaged DNA termini. The presence of this domain with a nucleotide hydrolase domain and a DNA binding motif implicated that Aprataxin may be involved in DNA repair and that AOA1 may be caused by a DNA repair deficit. This was substantiated by the interaction of Aprataxin with proteins involved in the repair of both single and double strand DNA breaks (XRay Cross-Complementing 1, XRCC4 and Poly-ADP Ribose Polymerase-1) and the hypersensitivity of AOA1 patient cell lines to single and double strand break inducing agents. At the commencement of this study little was known about the in vitro and in vivo properties of Aprataxin. Initially this study focused on generation of recombinant Aprataxin proteins to facilitate examination of the in vitro properties of Aprataxin. Using recombinant Aprataxin proteins I found that Aprataxin binds to double stranded DNA. Consistent with a role for Aprataxin as a DNA repair enzyme, this binding is not sequence specific. I also report that the HIT domain of Aprataxin hydrolyses adenosine derivatives and interestingly found that this activity is competitively inhibited by DNA. This provided initial evidence that DNA binds to the HIT domain of Aprataxin. The interaction of DNA with the nucleotide hydrolase domain of Aprataxin provided initial evidence that Aprataxin may be a DNA-processing factor. Following these studies, Aprataxin was found to hydrolyse 5’adenylated DNA, which can be generated by unscheduled ligation at DNA breaks with non-standard termini. I found that cell extracts from AOA1 patients do not have DNA-adenylate hydrolase activity indicating that Aprataxin is the only DNA-adenylate hydrolase in mammalian cells. I further characterised this activity by examining the contribution of the zinc finger and FHA domains to DNA-adenylate hydrolysis by the HIT domain. I found that deletion of the zinc finger ablated the activity of the HIT domain against adenylated DNA, indicating that the zinc finger may be required for the formation of a stable enzyme-substrate complex. Deletion of the FHA domain stimulated DNA-adenylate hydrolysis, which indicated that the activity of the HIT domain may be regulated by the FHA domain. Given that the FHA domain is involved in protein-protein interactions I propose that the activity of Aprataxins HIT domain may be regulated by proteins which interact with its FHA domain. We examined this possibility by measuring the DNA-adenylate hydrolase activity of extracts from cells deficient for the Aprataxin-interacting DNA repair proteins XRCC1 and PARP-1. XRCC1 deficiency did not affect Aprataxin activity but I found that Aprataxin is destabilized in the absence of PARP-1, resulting in a deficiency of DNA-adenylate hydrolase activity in PARP-1 knockout cells. This implies a critical role for PARP-1 in the stabilization of Aprataxin. Conversely I found that PARP-1 is destabilized in the absence of Aprataxin. PARP-1 is a central player in a number of DNA repair mechanisms and this implies that not only do AOA1 cells lack Aprataxin, they may also have defects in PARP-1 dependant cellular functions. Based on this I identified a defect in a PARP-1 dependant DNA repair mechanism in AOA1 cells. Additionally, I identified elevated levels of oxidized DNA in AOA1 cells, which is indicative of a defect in Base Excision Repair (BER). I attribute this to the reduced level of the BER protein Apurinic Endonuclease 1 (APE1) I identified in Aprataxin deficient cells. This study has identified and characterised multiple DNA repair defects in AOA1 cells, indicating that Aprataxin deficiency has far-reaching cellular consequences. Consistent with the literature, I show that Aprataxin is a nuclear protein with nucleoplasmic and nucleolar distribution. Previous studies have shown that Aprataxin interacts with the nucleolar rRNA processing factor nucleolin and that AOA1 cells appear to have a mild defect in rRNA synthesis. Given the nucleolar localization of Aprataxin I examined the protein-protein interactions of Aprataxin and found that Aprataxin interacts with a number of rRNA transcription and processing factors. Based on this and the nucleolar localization of Aprataxin I proposed that Aprataxin may have an alternative role in the nucleolus. I therefore examined the transcriptional activity of Aprataxin deficient cells using nucleotide analogue incorporation. I found that AOA1 cells do not display a defect in basal levels of RNA synthesis, however they display defective transcriptional responses to DNA damage. In summary, this thesis demonstrates that Aprataxin is a DNA repair enzyme responsible for the repair of adenylated DNA termini and that it is required for stabilization of at least two other DNA repair proteins. Thus not only do AOA1 cells have no Aprataxin protein or activity, they have additional deficiencies in PolyADP Ribose Polymerase-1 and Apurinic Endonuclease 1 dependant DNA repair mechanisms. I additionally demonstrate DNA-damage inducible transcriptional defects in AOA1 cells, indicating that Aprataxin deficiency confers a broad range of cellular defects and highlighting the complexity of the cellular response to DNA damage and the multiple defects which result from Aprataxin deficiency. My detailed characterization of the cellular consequences of Aprataxin deficiency provides an important contribution to our understanding of interlinking DNA repair processes.
Resumo:
Over the past decade, plants have been used as expression hosts for the production of pharmaceutically important and commercially valuable proteins. Plants offer many advantages over other expression systems such as lower production costs, rapid scale up of production, similar post-translational modification as animals and the low likelihood of contamination with animal pathogens, microbial toxins or oncogenic sequences. However, improving recombinant protein yield remains one of the greatest challenges to molecular farming. In-Plant Activation (InPAct) is a newly developed technology that offers activatable and high-level expression of heterologous proteins in plants. InPAct vectors contain the geminivirus cis elements essential for rolling circle replication (RCR) and are arranged such that the gene of interest is only expressed in the presence of the cognate viral replication-associated protein (Rep). The expression of Rep in planta may be controlled by a tissue-specific, developmentally regulated or chemically inducible promoter such that heterologous protein accumulation can be spatially and temporally controlled. One of the challenges for the successful exploitation of InPAct technology is the control of Rep expression as even very low levels of this protein can reduce transformation efficiency, cause abnormal phenotypes and premature activation of the InPAct vector in regenerated plants. Tight regulation over transgene expression is also essential if expressing cytotoxic products. Unfortunately, many tissue-specific and inducible promoters are unsuitable for controlling expression of Rep due to low basal activity in the absence of inducer or in tissues other than the target tissue. This PhD aimed to control Rep activity through the production of single chain variable fragments (scFvs) specific to the motif III of Tobacco yellow dwarf virus (TbYDV) Rep. Due to the important role played by the conserved motif III in the RCR, it was postulated that such scFvs can be used to neutralise the activity of the low amount of Rep expressed from a “leaky” inducible promoter, thus preventing activation of the TbYDV-based InPAct vector until intentional induction. Such scFvs could also offer the potential to confer partial or complete resistance to TbYDV, and possibly heterologous viruses as motif III is conserved between geminiviruses. Studies were first undertaken to determine the levels of TbYDV Rep and TbYDV replication-associated protein A (RepA) required for optimal transgene expression from a TbYDV-based InPAct vector. Transient assays in a non-regenerable Nicotiana tabacum (NT-1) cell line were undertaken using a TbYDV-based InPAct vector containing the uidA reporter gene (encoding GUS) in combination with TbYDV Rep and RepA under the control of promoters with high (CaMV 35S) or low (Banana bunchy top virus DNA-R, BT1) activity. The replication enhancer protein of Tomato leaf curl begomovirus (ToLCV), REn, was also used in some co-bombardment experiments to examine whether RepA could be substituted by a replication enhancer from another geminivirus genus. GUS expression was observed both quantitatively and qualitatively by fluorometric and histochemical assays, respectively. GUS expression from the TbYDV-based InPAct vector was found to be greater when Rep was expected to be expressed at low levels (BT1 promoter) rather than high levels (35S promoter). GUS expression was further enhanced when Rep and RepA were co-bombarded with a low ratio of Rep to RepA. Substituting TbYDV RepA with ToLCV REn also enhanced GUS expression but more importantly highest GUS expression was observed when cells were co-transformed with expression vectors directing low levels of Rep and high levels of RepA irrespective of the level of REn. In this case, GUS expression was approximately 74-fold higher than that from a non-replicating vector. The use of different terminators, namely CaMV 35S and Nos terminators, in InPAct vectors was found to influence GUS expression. In the presence of Rep, GUS expression was greater using pInPActGUS-Nos rather than pInPActGUS-35S. The only instance of GUS expression being greater from vectors containing the 35S terminator was when comparing expression from cells transformed with Rep, RepA and REnexpressing vectors and either non-replicating vectors, p35SGS-Nos or p35SGS-35S. This difference was most likely caused by an interaction of viral replication proteins with each other and the terminators. These results indicated that (i) the level of replication associated proteins is critical to high transgene expression, (ii) the choice of terminator within the InPAct vector may affect expression levels and (iii) very low levels of Rep can activate InPAct vectors hence controlling its activity is critical. Prior to generating recombinant scFvs, a recombinant TbYDV Rep was produced in E. coli to act as a control to enable the screening for Rep-specific antibodies. A bacterial expression vector was constructed to express recombinant TbYDV Rep with an Nterminal His-tag (N-His-Rep). Despite investigating several purification techniques including Ni-NTA, anion exchange, hydrophobic interaction and size exclusion chromatography, N-His-Rep could only be partially purified using a Ni-NTA column under native conditions. Although it was not certain that this recombinant N-His-Rep had the same conformation as the native TbYDV Rep and was functional, results from an electromobility shift assay (EMSA) showed that N-His-Rep was able to interact with the TbYDV LIR and was, therefore, possibly functional. Two hybridoma cell lines from mice, immunised with a synthetic peptide containing the TbYDV Rep motif III amino acid sequence, were generated by GenScript (USA). Monoclonal antibodies secreted by the two hybridoma cell lines were first screened against denatured N-His-Rep in Western analysis. After demonstrating their ability to bind N-His-Rep, two scFvs (scFv1 and scFv2) were generated using a PCR-based approach. Whereas the variable heavy chain (VH) from both cell lines could be amplified, only the variable light chain (VL) from cell line 2 was amplified. As a result, scFv1 contained VH and VL from cell line 1, whereas scFv2 contained VH from cell line 2 and VL from cell line 1. Both scFvs were first expressed in E. coli in order to evaluate their affinity to the recombinant TbYDV N-His-Rep. The preliminary results demonstrated that both scFvs were able to bind to the denatured N-His-Rep. However, EMSAs revealed that only scFv2 was able to bind to native N-His-Rep and prevent it from interacting with the TbYDV LIR. Each scFv was cloned into plant expression vectors and co-bombarded into NT-1 cells with the TbYDV-based InPAct GUS expression vector and pBT1-Rep to examine whether the scFvs could prevent Rep from mediating RCR. Although it was expected that the addition of the scFvs would result in decreased GUS expression, GUS expression was found to slightly increase. This increase was even more pronounced when the scFvs were targeted to the cell nucleus by the inclusion of the Simian virus 40 large T antigen (SV40) nuclear localisation signal (NLS). It was postulated that the scFvs were binding to a proportion of Rep, leaving a small amount available to mediate RCR. The outcomes of this project provide evidence that very high levels of recombinant protein can theoretically be expressed using InPAct vectors with judicious selection and control of viral replication proteins. However, the question of whether the scFvs generated in this project have sufficient affinity for TbYDV Rep to prevent its activity in a stably transformed plant remains unknown. It may be that other scFvs with different combinations of VH and VL may have greater affinity for TbYDV Rep. Such scFvs, when expressed at high levels in planta, might also confer resistance to TbYDV and possibly heterologous geminiviruses.
Resumo:
A wide range of screening strategies have been employed to isolate antibodies and other proteins with specific attributes, including binding affinity, specificity, stability and improved expression. However, there remains no high-throughput system to screen for target-binding proteins in a mammalian, intracellular environment. Such a system would allow binding reagents to be isolated against intracellular clinical targets such as cell signalling proteins associated with tumour formation (p53, ras, cyclin E), proteins associated with neurodegenerative disorders (huntingtin, betaamyloid precursor protein), and various proteins crucial to viral replication (e.g. HIV-1 proteins such as Tat, Rev and Vif-1), which are difficult to screen by phage, ribosome or cell-surface display. This study used the β-lactamase protein complementation assay (PCA) as the display and selection component of a system for screening a protein library in the cytoplasm of HEK 293T cells. The colicin E7 (ColE7) and Immunity protein 7 (Imm7) *Escherichia coli* proteins were used as model interaction partners for developing the system. These proteins drove effective β-lactamase complementation, resulting in a signal-to-noise ratio (9:1 – 13:1) comparable to that of other β-lactamase PCAs described in the literature. The model Imm7-ColE7 interaction was then used to validate protocols for library screening. Single positive cells that harboured the Imm7 and ColE7 binding partners were identified and isolated using flow cytometric cell sorting in combination with the fluorescent β-lactamase substrate, CCF2/AM. A single-cell PCR was then used to amplify the Imm7 coding sequence directly from each sorted cell. With the screening system validated, it was then used to screen a protein library based the Imm7 scaffold against a proof-of-principle target. The wild-type Imm7 sequence, as well as mutants with wild-type residues in the ColE7- binding loop were enriched from the library after a single round of selection, which is consistent with other eukaryotic screening systems such as yeast and mammalian cell-surface display. In summary, this thesis describes a new technology for screening protein libraries in a mammalian, intracellular environment. This system has the potential to complement existing screening technologies by allowing access to intracellular proteins and expanding the range of targets available to the pharmaceutical industry.