156 resultados para sequence similarity searches
Resumo:
Originally developed in bioinformatics, sequence analysis is being increasingly used in social sciences for the study of life-course processes. The methodology generally employed consists in computing dissimilarities between the trajectories and, if typologies are sought, in clustering the trajectories according to their similarities or dissemblances. The choice of an appropriate dissimilarity measure is a major issue when dealing with sequence analysis for life sequences. Several dissimilarities are available in the literature, but neither of them succeeds to become indisputable. In this paper, instead of deciding upon one dissimilarity measure, we propose to use an optimal convex combination of different dissimilarities. The optimality is automatically determined by the clustering procedure and is defined with respect to the within-class variance.
Resumo:
Reliability of the performance of biometric identity verification systems remains a significant challenge. Individual biometric samples of the same person (identity class) are not identical at each presentation and performance degradation arises from intra-class variability and inter-class similarity. These limitations lead to false accepts and false rejects that are dependent. It is therefore difficult to reduce the rate of one type of error without increasing the other. The focus of this dissertation is to investigate a method based on classifier fusion techniques to better control the trade-off between the verification errors using text-dependent speaker verification as the test platform. A sequential classifier fusion architecture that integrates multi-instance and multisample fusion schemes is proposed. This fusion method enables a controlled trade-off between false alarms and false rejects. For statistically independent classifier decisions, analytical expressions for each type of verification error are derived using base classifier performances. As this assumption may not be always valid, these expressions are modified to incorporate the correlation between statistically dependent decisions from clients and impostors. The architecture is empirically evaluated by applying the proposed architecture for text dependent speaker verification using the Hidden Markov Model based digit dependent speaker models in each stage with multiple attempts for each digit utterance. The trade-off between the verification errors is controlled using the parameters, number of decision stages (instances) and the number of attempts at each decision stage (samples), fine-tuned on evaluation/tune set. The statistical validation of the derived expressions for error estimates is evaluated on test data. The performance of the sequential method is further demonstrated to depend on the order of the combination of digits (instances) and the nature of repetitive attempts (samples). The false rejection and false acceptance rates for proposed fusion are estimated using the base classifier performances, the variance in correlation between classifier decisions and the sequence of classifiers with favourable dependence selected using the 'Sequential Error Ratio' criteria. The error rates are better estimated by incorporating user-dependent (such as speaker-dependent thresholds and speaker-specific digit combinations) and class-dependent (such as clientimpostor dependent favourable combinations and class-error based threshold estimation) information. The proposed architecture is desirable in most of the speaker verification applications such as remote authentication, telephone and internet shopping applications. The tuning of parameters - the number of instances and samples - serve both the security and user convenience requirements of speaker-specific verification. The architecture investigated here is applicable to verification using other biometric modalities such as handwriting, fingerprints and key strokes.
Resumo:
Currently two different fatigue tests are being used to investigate the fatigue susceptibility of roof claddings in the cyclone prone areas of Australia. In order to resolve this issue a detailed investigation was conducted to study the nature of cyclonic wind forces using wind tunnel testing and computer modelling and the fatigue behaviour of metal roof claddings using structural testing. This led to the development of an accurate, but complicated loading matrix for a design cyclone. Based on this matrix, a simplified low-high-low loading sequence has been developed for the testing of roofing systems in cyclone prone areas. This paper first reviews the currently used fatigue loading sequences, then presents details of the cyclonic wind loading matrix and finally the development of the new simplified loading sequence. This simplified sequence should become the only suitable test for most of the cyclone prone areas of Australia covered by Region C which suffers from Category 4 cyclones. For Region D which suffers from Category 5 cyclones, the same loading sequence with 20% increased cycles has been recommended. An experimental programme to validate the new simplified loading sequence has been proposed.
Resumo:
Tobacco plants were transformed with a chimeric transgene comprising sequences encoding β-glucuronidase (GUS) and the satellite RNA (satRNA) of cereal yellow dwarf luteovirus. When transgenic plants were infected with potato leafroll luteovirus (PLRV), which replicated the transgene-derived satRNA to a high level, the satellite sequence of the GUS:Sat transgene became densely methylated. Within the satellite region, all 86 cytosines in the upper strand and 73 of the 75 cytosines in the lower strand were either partially or fully methylated. In contrast, very low levels of DNA methylation were detected in the satellite sequence of the transgene in uninfected plants and in the flanking nonsatellite sequences in both infected and uninfected plants. Substantial amounts of truncated GUS:Sat RNA accumulated in the satRNA-replicating plants, and most of the molecules terminated at nucleotides within the first 60 bp of the satellite sequence. Whereas this RNA truncation was associated with high levels of satRNA replication, it appeared to be independent of the levels of DNA methylation in the satellite sequence, suggesting that it is not caused by methylation. All the sequenced GUS:Sat DNA molecules were hypermethylated in plants with replicating satRNA despite the phloem restriction of the helper PLRV. Also, small, sense and antisense ∼22 nt RNAs, derived from the satRNA, were associated with the replicating satellite. These results suggest that the sequence-specific DNA methylation spread into cells in which no satRNA replication occurred and that this was mediated by the spread of unamplified satRNA and/or its associated 22 nt RNA molecules.
Resumo:
Rubus yellow net virus (RYNV) was cloned and sequenced from a red raspberry (Rubus idaeus L.) plant exhibiting symptoms of mosaic and mottling in the leaves. Its genomic sequence indicates that it is a distinct member of the genus Badnavirus, with 7932. bp and seven ORFs, the first three corresponding in size and location to the ORFs found in the type member Commelina yellow mottle virus. Bioinformatic analysis of the genomic sequence detected several features including nucleic acid binding motifs, multiple zinc finger-like sequences and domains associated with cellular signaling. Subsequent sequencing of the small RNAs (sRNAs) from RYNV-infected R. idaeus leaf tissue was used to determine any RYNV sequences targeted by RNA silencing and identified abundant virus-derived small RNAs (vsRNAs). The majority of the vsRNAs were 22-nt in length. We observed a highly uneven genome-wide distribution of vsRNAs with strong clustering to small defined regions distributed over both strands of the RYNV genome. Together, our data show that sequences of the aphid-transmitted pararetrovirus RYNV are targeted in red raspberry by the interfering RNA pathway, a predominant antiviral defense mechanism in plants. © 2013.
Resumo:
Potato leafroll virus (PLRV) is a positive-strand RNA virus that generates subgenomic RNAs (sgRNA) for expression of 3' proximal genes. Small RNA (sRNA) sequencing and mapping of the PLRV-derived sRNAs revealed coverage of the entire viral genome with the exception of four distinctive gaps. Remarkably, these gaps mapped to areas of PLRV genome with extensive secondary structures, such as the internal ribosome entry site and 5' transcriptional start site of sgRNA1 and sgRNA2. The last gap mapped to ~500. nt from the 3' terminus of PLRV genome and suggested the possible presence of an additional sgRNA for PLRV. Quantitative real-time PCR and northern blot analysis confirmed the expression of sgRNA3 and subsequent analyses placed its 5' transcriptional start site at position 5347 of PLRV genome. A regulatory role is proposed for the PLRV sgRNA3 as it encodes for an RNA-binding protein with specificity to the 5' of PLRV genomic RNA. © 2013.
Resumo:
The complete nucleotide sequence of Subterranean clover mottle virus (SCMoV) genomic RNA has been determined. The SCMoV genome is 4,258 nucleotides in length. It shares most nucleotide and amino acid sequence identity with the genome of Lucerne transient streak virus (LTSV). SCMoV RNA encodes four overlapping open reading frames and has a genome organisation similar to that of Cocksfoot mottle virus (CfMV). ORF1 and ORF4 are predicted to encode single proteins. ORF2 is predicted to encode two proteins that are derived from a -1 translational frameshift between two overlapping reading frames (ORF2a and ORF2b). A search of amino acid databases did not find a significant match for ORF1 and the function of this protein remains unclear. ORF2a contains a motif typical of chymotrypsin-like serine proteases and ORF2b has motifs characteristically present in positive-stranded RNA-dependent RNA polymerases. ORF4 is likely to be expressed from a subgenomic RNA and encodes the viral coat protein. The ORF2a/ORF2b overlapping gene expression strategy used by SCMoV and CfMV is similar to that of the poleroviruses and differ from that of other published sobemoviruses. These results suggest that the sobemoviruses could now be divided into two distinct subgroups based on those that express the RNA-dependent RNA polymerase from a single, in-frame polyprotein, and those that express it via a -1 translational frameshifting mechanism.
Resumo:
Barley yellow dwarf luteovirus-GPV (BYDV-GPV) is a common problem in Chinese wheat crops but is unrecorded elsewhere. A defining characteristic of GPV is its capacity to be transmitted efficiently by both Schizaphis graminum and Rhopaloshiphum padi. This dual aphid species transmission contrasts with those of BYDV-RPV and BYDV-SGV, globally distributed viruses, which are efficiently transmitted only by Rhopaloshiphum padi and Schizaphis graminum respectively. The viral RNA sequences encoding the coat protein (22K) gene, the movement protein (17K) gene, the region surrounding the conserved GDD motif of the polymerase gene and the intergenic sequences between these genes were determined for GPV and an Australian isolate of BYDV-RPV (RPVa). In all three genes, the sequences of GPV and RPVa were more similar to those of an American isolate of BYDV-RPV (RPVu) than to any other luteovirus for which there is data available. RPVa and RPVu were very similar, especially their coat proteins which had 97% identity at the amino acid level. The coat protein of GPV had 76% and 78% amino acid identity with RPVa and RPVu respectively. The data suggest that RPVu and RPVa are correctly named as strains of the same serotype and that GPV is sufficiently different from either RPV strain to be considered a distinct BYDV type. The coat protein and movement protein genes of GPV are very dissimilar to SGV. The polymerase sequences of RPVu, RPVa and GPV show close affinities with those of the sobemo-like luteoviruses and little similarity with those of the carmo-like luteoviruses. The sequences of the coat proteins, movement proteins and the polymerase segments of BYDV serotypes, other than RPV and GPV, form a cluster that is separate from their counterpart sequences from dicot-infecting luteoviruses. The RPV and GPV isolates consistently fall within a dicot-infecting cluster. This suggests that RPV and GPV evolved from within this group of viruses. Since these other viruses all infect dicots it seems likely that their common ancestor infected a dicot and that RPV and GPV evolved from a virus that switched hosts from a dicot to a monocot.
Resumo:
The genomes of an Australian and a Canadian isolate of potato leafroll virus have been cloned and sequenced. The sequences of both isolates are similar (about 93%), but the Canadian isolate (PLRV-C) is more closely related (about 98% identity) to a Scottish (PLRV-S) and a Dutch isolate (PLRV-N) than to the Australian isolate (PLRV-A). The 5'-terminal 18 nucleotide residues of PLRV-C, PLRV-A, PLRV-N and beet western yellows virus have 17 residues in common. In contrast, PLRV-S shows no obvious similarity in this region. PLRV-A and PLRV-C genomic sequences have localized regions of marked diversity, in particular a 600 nucleotide residue sequence in the polymerase gene. These data provide a world-wide perspective on the molecular biology of PLRV strains and their comparison with other luteoviruses and related RNA plant viruses suggests that there are two major subgroups in the plant luteoviruses.
Resumo:
The nucleotide sequence of the coat protein gene of barley yellow dwarf virus (BYDV, PAV serotype) was determined, and the amino acid sequence was deduced. The open reading frame, encoding a protein of relative molecular mass (Mr) 22,047, was confirmed as the coat protein gene by comparison with amino acid sequences of tryptic peptides derived from dissociated virions. In addition, a fragment of this gene expressed in Escherichia coli produced a product which was recognized by antibodies prepared against purified BYDV virions. An overlapping reading frame encoding an Mr 17,147 protein is contained completely within the coat protein gene. © 1988.
Resumo:
Genomic sequences are fundamentally text documents, admitting various representations according to need and tokenization. Gene expression depends crucially on binding of enzymes to the DNA sequence at small, poorly conserved binding sites, limiting the utility of standard pattern search. However, one may exploit the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, framing the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels based on weighted tree structures, traversing the paths within them to capture the features which underpin the task. Experimentally, we and that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data in molecular biology and other domains.
Resumo:
SIC and DRS are related proteins present in only four of the more than 200 Streptococcus pyogenes emm-types. These proteins inhibit complement mediated lysis and/or the activity of certain antimicrobial peptides. A gene encoding a homologue of these proteins, herein called DrsG, has been identified in the related bacterium Streptococcus dysgalactiae subsp equisimilis (SDSE). Here we show that geographically dispersed isolates representing 14 of 50 emm-types examined possess variants of drsG. However not all isolates within the drsG-positive emm-types possess the gene. Sequence comparisons also reveal a high degree of conservation in different SDSE emm-types. To examine the biological activity of DrsG, recombinant versions of two major DrsG variants, DrsGS and DrsGL, were expressed and purified. Western blot analysis using antisera raised to these proteins demonstrated both variants to be expressed and secreted into culture supernatant. Unlike SIC, but similar to DRS, DrsG does not inhibit complement mediated lysis. However, like both SIC and DRS, DrsG is a ligand of the cathelcidin LL-37 and is inhibitory to its bactericidal activity in in vitro assays. The greatest similarity between DrsG and DRS/SIC is found in the signal sequence at the amino terminus and proline rich domains in the C-terminal half of the protein. Conservation of prolines in this latter region also suggests these residues are important in the biology of this family of proteins. This is the first report demonstrating the activity of an AMP inhibitory protein in SDSE. These results also suggest that inhibition of AMP activity is the primary function of this family of proteins. The acquisition of complement inhibitory activity of SIC may reflect its continuing evolution.
Resumo:
Six consecutively hatched cohorts and one cohort of pre-hatch eggs of farmed barramundi (Lates calcarifer) from south Australia were examined for Chlamydia-like organisms associated with epitheliocystis. To identify and characterise the bacteria, 59 gill samples and three pre-hatch egg samples were processed for histology, in situ hybridisation and 16S rRNA amplification, sequencing and comprehensive phylogenetic analysis. Cases of epitheliocystis were observed microscopically and characterised by membrane-enclosed basophilic cysts filled with a granular material that caused hypertrophy of the epithelial cells. In situ hybridisation with a Chlamydiales-specific probe lead to specific labelling of the epitheliocystis inclusions within the gill epithelium. Two distinct but closely related 16S rRNA chlamydial sequences were amplified from gill DNA across the seven cohorts, including from pre-hatch eggs. These genotype sequences were found to be novel, sharing 97.1 - 97.5% similarity to the next closest 16S rRNA sequence, Ca. Similichlamydia latridicola, from Australian striped trumpeter. Comprehensive phylogenetic analysis of these genotype sequences against representative members of the Chlamydiales order and against other epitheliocystis agents revealed these Chlamydia-like organisms to be novel and taxonomically placed them within the recently proposed genus Ca. Similichlamydia. Following Fredricks and Relman's molecular postulates and based on these observations, we propose the epitheliocystis agents of barramundi to be known as "Candidatus Similichlamydia laticola" (sp. nov.).
Resumo:
Chlamydia pecorum is a significant pathogen of domestic livestock and wildlife. We have developed a C. pecorum-specific multilocus sequence analysis (MLSA) scheme to examine the genetic diversity of and relationships between Australian sheep, cattle, and koala isolates. An MLSA of seven concatenated housekeeping gene fragments was performed using 35 isolates, including 18 livestock isolates (11 Australian sheep, one Australian cow, and six U.S. livestock isolates) and 17 Australian koala isolates. Phylogenetic analyses showed that the koala isolates formed a distinct clade, with limited clustering with C. pecorum isolates from Australian sheep. We identified 11 MLSA sequence types (STs) among Australian C. pecorum isolates, 10 of them novel, with koala and sheep sharing at least one identical ST (designated ST2013Aa). ST23, previously identified in global C. pecorum livestock isolates, was observed here in a subset of Australian bovine and sheep isolates. Most notably, ST23 was found in association with multiple disease states and hosts, providing insights into the transmission of this pathogen between livestock hosts. The complexity of the epidemiology of this disease was further highlighted by the observation that at least two examples of sheep were infected with different C. pecorum STs in the eyes and gastrointestinal tract. We have demonstrated the feasibility of our MLSA scheme for understanding the host relationship that exists between Australian C. pecorum strains and provide the first molecular epidemiological data on infections in Australian livestock hosts.