991 resultados para sequence database
Resumo:
In this paper, the complete mitochondrial genome of Acraea issoria (Lepidoptera: Nymphalidae: Heliconiinae: Acraeini) is reported; a circular molecule of 15,245 bp in size. For A. issoria, genes are arranged in the same order and orientation as the complete sequenced mitochondrial genomes of the other lepidopteran species, except for the presence of an extra copy of tRNAIle(AUR)b in the control region. All protein-coding genes of A. issoria mitogenome start with a typical ATN codon and terminate in the common stop codon TAA, except that COI gene uses TTG as its initial codon and terminates in a single T residue. All tRNA genes possess the typical clover leaf secondary structure except for tRNASer(AGN), which has a simple loop with the absence of the DHU stem. The sequence, organization and other features including nucleotide composition and codon usage of this mitochondrial genome were also reported and compared with those of other sequenced lepidopterans mitochondrial genomes. There are some short microsatellite-like repeat regions (e.g., (TA)9, polyA and polyT) scattered in the control region, however, the conspicuous macro-repeats units commonly found in other insect species are absent.
Resumo:
Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation.
Resumo:
Escherichia coli ST131 is now recognised as a leading contributor to urinary tract and bloodstream infections in both community and clinical settings. Here we present the complete, annotated genome of E. coli EC958, which was isolated from the urine of a patient presenting with a urinary tract infection in the Northwest region of England and represents the most well characterised ST131 strain. Sequencing was carried out using the Pacific Biosciences platform, which provided sufficient depth and read-length to produce a complete genome without the need for other technologies. The discovery of spurious contigs within the assembly that correspond to site-specific inversions in the tail fibre regions of prophages demonstrates the potential for this technology to reveal dynamic evolutionary mechanisms. E. coli EC958 belongs to the major subgroup of ST131 strains that produce the CTX-M-15 extended spectrum β-lactamase, are fluoroquinolone resistant and encode the fimH30 type 1 fimbrial adhesin. This subgroup includes the Indian strain NA114 and the North American strain JJ1886. A comparison of the genomes of EC958, JJ1886 and NA114 revealed that differences in the arrangement of genomic islands, prophages and other repetitive elements in the NA114 genome are not biologically relevant and are due to misassembly. The availability of a high quality uropathogenic E. coli ST131 genome provides a reference for understanding this multidrug resistant pathogen and will facilitate novel functional, comparative and clinical studies of the E. coli ST131 clonal lineage.
Resumo:
Autotransporter (AT) proteins are found in all Escherichia coli pathotypes and are often associated with virulence. In this study we took advantage of the large number of available E. coli genome sequences to perform an in-depth bioinformatic analysis of AT-encoding genes. Twenty-eight E. coli genome sequences were probed using an iterative approach, which revealed a total of 215 AT-encoding sequences that represented three major groups of distinct domain architecture: (i) serine protease AT proteins, (ii) trimeric AT adhesins and (iii) AIDA-I-type AT proteins. A number of subgroups were identified within each broad category, and most subgroups contained at least one characterized AT protein; however, seven subgroups contained no previously described proteins. The AIDA-I-type AT proteins represented the largest and most diverse group, with up to 16 subgroups identified from sequence-based comparisons. Nine of the AIDA-I-type AT protein subgroups contained at least one protein that possessed functional properties associated with aggregation and/or biofilm formation, suggesting a high degree of redundancy for this phenotype. The Ag43, YfaL/EhaC, EhaB/UpaC and UpaG subgroups were found in nearly all E. coli strains. Among the remaining subgroups, there was a tendency for AT proteins to be associated with individual E. coli pathotypes, suggesting that they contribute to tissue tropism or symptoms specific to different disease outcomes.
Resumo:
Background Designing novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone. Results We present STAR, Site Targeted Amino acid Recombination predictor, which produces a score indicating the structural disruption caused by recombination, for each position in an amino acid sequence. Example predictions contrasted with those of alternative tools, illustrate STAR'S utility to assist in determining useful recombination sites. Overall, the correlation coefficient between the output of the experimentally validated protein design algorithm SCHEMA and the prediction of STAR is very high (0.89). Conclusion STAR allows the user to explore useful recombination sites in amino acid sequences with unknown structure and unknown evolutionary origin. The predictor service is available from http://pprowler.itee.uq.edu.au/star.
Resumo:
We undertook analyses of mitochondrial DNA gene sequences and echolocation calls to resolve phylogenetic relationships among the related bat taxa Rhinolophus pusillus (sampled across China), R. monoceros (Taiwan), R. cornutus (main islands of Japan), and R. c. pumilus (Okinawa, Japan), Phylogenetic trees and genetic divergence analyses were constructed by combining new complete mitochondrial cytochrome-b gene sequences and partial mitochondrial control region sequences with published sequences. Our work showed that these 4 taxa formed monophyletic groups in the phylogenetic tree. However, low levels of sequence divergence among the taxa, together with similarities in body size and overlapping echolocation call frequencies, point to a lack of taxonomic distinctiveness. We therefore suggest that these taxa are better considered as geographical subspecies rather than distinct species, although this should not diminish the conservation importance of these island populations, which are important evolutionarily significant units. Based on our findings, we suggest that the similarities in body size and echolocation call frequency in these rhinolophids result from their recent common ancestry, whereas similarities in body size and call frequency with R. hipposideros of Europe are the result of convergent evolution.
Resumo:
Protein adsorption at solid-liquid interfaces is critical to many applications, including biomaterials, protein microarrays and lab-on-a-chip devices. Despite this general interest, and a large amount of research in the last half a century, protein adsorption cannot be predicted with an engineering level, design-orientated accuracy. Here we describe a Biomolecular Adsorption Database (BAD), freely available online, which archives the published protein adsorption data. Piecewise linear regression with breakpoint applied to the data in the BAD suggests that the input variables to protein adsorption, i.e., protein concentration in solution; protein descriptors derived from primary structure (number of residues, global protein hydrophobicity and range of amino acid hydrophobicity, isoelectric point); surface descriptors (contact angle); and fluid environment descriptors (pH, ionic strength), correlate well with the output variable-the protein concentration on the surface. Furthermore, neural network analysis revealed that the size of the BAD makes it sufficiently representative, with a neural network-based predictive error of 5% or less. Interestingly, a consistently better fit is obtained if the BAD is divided in two separate sub-sets representing protein adsorption on hydrophilic and hydrophobic surfaces, respectively. Based on these findings, selected entries from the BAD have been used to construct neural network-based estimation routines, which predict the amount of adsorbed protein, the thickness of the adsorbed layer and the surface tension of the protein-covered surface. While the BAD is of general interest, the prediction of the thickness and the surface tension of the protein-covered layers are of particular relevance to the design of microfluidics devices.
Resumo:
Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.
Resumo:
Acinetobacter baumannii isolate A1 was recovered in the United Kingdom in 1982 and belongs to global clone 1 (GC1). Here, we present its complete 3.91-Mbp genome sequence, generated via a combination of short-read sequencing (Illumina), long-read sequencing (PacBio), and manual finishing.
Resumo:
Background Globally, over 800 000 children under five die each year from infectious diseases caused by Streptococcus pneumoniae. To understand genetic relatedness between isolates, study transmission routes, assess the impact of human interventions e.g. vaccines, and determine infection sources, genotyping methods are required. The ‘gold standard’ genotyping method, Multi-Locus Sequence Typing (MLST), is useful for long-term and global studies. Another genotyping method, Multi-Locus Variable Number of Tandem Repeat Analysis (MLVA), has emerged as a more discriminatory, inexpensive and faster technique; however there is no universally accepted method and it is currently suitable for short-term and localised epidemiology studies. Currently Australia has no national MLST database, nor has it adopted any MLVA method for short-term or localised studies. This study aims to improve S. pneumoniae genotyping methods by modifying the existing MLVA techniques to be more discriminatory, faster, cheaper and technically less demanding than previously published MLVA methods and MLST. Methods Four different MLVA protocols, including a modified method, were applied to 317 isolates of serotyped invasive S. pneumoniae isolated from sterile body sites of Queensland children under 15 years from 2007–2012. MLST was applied to 202 isolates for comparison. Results The modified MLVA4 is significantly more discriminatory than the ‘gold standard’ MLST method. MLVA4 has similar discrimination compared to other MLVA techniques in this study). The failure to amplify particular loci in previous MLVA methods were minimised in MLVA4. Failure to amplify BOX-13 and Spneu19 were found to be serotype specific. Conclusion We have modified a highly discriminatory MLVA technique for genotyping Queensland invasive S. pneumoniae. MLVA4 has the ability to enhance our understanding of the pneumococcal epidemiology and the changing genetics of the pneumococcus in localised and short-term studies.
Resumo:
Public buildings and large infrastructure are typically monitored by tens or hundreds of cameras, all capturing different physical spaces and observing different types of interactions and behaviours. However to date, in large part due to limited data availability, crowd monitoring and operational surveillance research has focused on single camera scenarios which are not representative of real-world applications. In this paper we present a new, publicly available database for large scale crowd surveillance. Footage from 12 cameras for a full work day covering the main floor of a busy university campus building, including an internal and external foyer, elevator foyers, and the main external approach are provided; alongside annotation for crowd counting (single or multi-camera) and pedestrian flow analysis for 10 and 6 sites respectively. We describe how this large dataset can be used to perform distributed monitoring of building utilisation, and demonstrate the potential of this dataset to understand and learn the relationship between different areas of a building.
Resumo:
Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.