888 resultados para Exome sequencing
Resumo:
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star
Resumo:
This item provides supplementary materials for the paper mentioned in the title, specifically a range of organisms used in the study. The full abstract for the main paper is as follows: Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, the collected reads must be consistent with the assumed species or species group, and not corrupted in some way. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans,with some strains exhibiting antibiotic resistance. In this paper, we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from alternative pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molec- ular biology, allowing routine clinical sequencing. NGS data consists of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans, with some strains exhibiting antibiotic resistance. Here we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from other pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
We isolated and characterized 21 microsatellite loci in the vulnerable and iconic Australian lungfish, Neoceratodus forsteri. Loci were screened across eight individuals from the Burnett River and 40 individuals from the Pine River. Genetic diversity was low with between one and six alleles per locus within populations and a maximum expected heterozygosity of 0.774. These loci will now be available to assess effective population sizes and genetic structure in N. forsteri across its natural range in South East Queensland, Australia.
Resumo:
The high risk of metabolic disease traits in Polynesians may be partly explained by elevated prevalence of genetic variants involved in energy metabolism. The genetics of Polynesian populations has been shaped by island hoping migration events which have possibly favoured thrifty genes. The aim of this study was to sequence the mitochondrial genome in a group of Maoris in an effort to characterise genome variation in this Polynesian population for use in future disease association studies. We sequenced the complete mitochondrial genomes of 20 non-admixed Maori subjects using Affymetrix technology. DNA diversity analyses showed the Maori group exhibited reduced mitochondrial genome diversity compared to other worldwide populations, which is consistent with historical bottleneck and founder effects. Global phylogenetic analysis positioned these Maori subjects specifically within mitochondrial haplogroup - B4a1a1. Interestingly, we identified several novel variants that collectively form new and unique Maori motifs – B4a1a1c, B4a1a1a3 and B4a1a1a5. Compared to ancestral populations we observed an increased frequency of non-synonymous coding variants of several mitochondrial genes in the Maori group, which may be a result of positive selection and/or genetic drift effects. In conclusion, this study reports the first complete mitochondrial genome sequence data for a Maori population. Overall, these new data reveal novel mitochondrial genome signatures in this Polynesian population and enhance the phylogenetic picture of maternal ancestry in Oceania. The increased frequency of several mitochondrial coding variants makes them good candidates for future studies aimed at assessment of metabolic disease risk in Polynesian populations.
Resumo:
Forward genetic screens have identified numerous genes involved in development and metabolism, and remain a cornerstone of biological research. However, to locate a causal mutation, the practice of crossing to a polymorphic background to generate a mapping population can be problematic if the mutant phenotype is difficult to recognize in the hybrid F2 progeny, or dependent on parental specific traits. Here in a screen for leaf hyponasty mutants, we have performed a single backcross of an Ethane Methyl Sulphonate (EMS) generated hyponastic mutant to its parent. Whole genome deep sequencing of a bulked homozygous F2 population and analysis via the Next Generation EMS mutation mapping pipeline (NGM) unambiguously determined the causal mutation to be a single nucleotide polymorphisim (SNP) residing in HASTY, a previously characterized gene involved in microRNA biogenesis. We have evaluated the feasibility of this backcross approach using three additional SNP mapping pipelines; SHOREmap, the GATK pipeline, and the samtools pipeline. Although there was variance in the identification of EMS SNPs, all returned the same outcome in clearly identifying the causal mutation in HASTY. The simplicity of performing a single parental backcross and genome sequencing a small pool of segregating mutants has great promise for identifying mutations that may be difficult to map using conventional approaches.
Resumo:
Potato leafroll virus (PLRV) is a positive-strand RNA virus that generates subgenomic RNAs (sgRNA) for expression of 3' proximal genes. Small RNA (sRNA) sequencing and mapping of the PLRV-derived sRNAs revealed coverage of the entire viral genome with the exception of four distinctive gaps. Remarkably, these gaps mapped to areas of PLRV genome with extensive secondary structures, such as the internal ribosome entry site and 5' transcriptional start site of sgRNA1 and sgRNA2. The last gap mapped to ~500. nt from the 3' terminus of PLRV genome and suggested the possible presence of an additional sgRNA for PLRV. Quantitative real-time PCR and northern blot analysis confirmed the expression of sgRNA3 and subsequent analyses placed its 5' transcriptional start site at position 5347 of PLRV genome. A regulatory role is proposed for the PLRV sgRNA3 as it encodes for an RNA-binding protein with specificity to the 5' of PLRV genomic RNA. © 2013.
Resumo:
For the first of the baby boomers turning 65 years of age, after a decade littered with financial shocks (dot.com bubble, sub-prime, global financial crisis, sovereign debt), sequencing risk can represent a significant threat to their retirement nest eggs. This paper takes an outcomeoriented approach to the problem, to provide practical insights into how sequencing risk works and the critical dependency of retirement outcomes on sequencing risk. Our analysis challenges the conventional wisdom that it is the accumulated average of investment returns that matter. We show, instead, that it is the realised sequence of returns which largely determines the sustainability of retirement incomes.
Resumo:
Sorghum is a food and feed cereal crop adapted to heat and drought and a staple for 500 million of the world’s poorest people. Its small diploid genome and phenotypic diversity make it an ideal C4 grass model as a complement to C3 rice. Here we present high coverage (16–45 × ) resequenced genomes of 44 sorghum lines representing the primary gene pool and spanning dimensions of geographic origin, end-use and taxonomic group. We also report the first resequenced genome of S. propinquum, identifying 8 M high-quality SNPs, 1.9 M indels and specific gene loss and gain events in S. bicolor. We observe strong racial structure and a complex domestication history involving at least two distinct domestication events. These assembled genomes enable the leveraging of existing cereal functional genomics data against the novel diversity available in sorghum, providing an unmatched resource for the genetic improvement of sorghum and other grass species.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.
Resumo:
In the current market, extensive software development is taking place and the software industry is thriving. Major software giants have stated source code theft as a major threat to revenues. By inserting an identity-establishing watermark in the source code, a company can prove it's ownership over the source code. In this paper, we propose a watermarking scheme for C/C++ source codes by exploiting the language restrictions. If a function calls another function, the latter needs to be defined in the code before the former, unless one uses function pre-declarations. We embed the watermark in the code by imposing an ordering on the mutually independent functions by introducing bogus dependency. Removal of dependency by the attacker to erase the watermark requires extensive manual intervention thereby making the attack infeasible. The scheme is also secure against subtractive and additive attacks. Using our watermarking scheme, an n-bit watermark can be embedded in a program having n independent functions. The scheme is implemented on several sample codes and performance changes are analyzed.
Resumo:
Planning techniques for large scale earthworks have been considered in this article. To improve these activities a “block theoretic” approach was developed that provides an integrated solution consisting of an allocation of cuts to fills and a sequence of cuts and fills over time. It considers the constantly changing terrain by computing haulage routes dynamically. Consequently more realistic haulage costs are used in the decision making process. A digraph is utilised to describe the terrain surface which has been partitioned into uniform grids. It reflects the true state of the terrain, and is altered after each cut and fill. A shortest path algorithm is successively applied to calculate the cost of each haul, and these costs are summed over the entire sequence, to provide a total cost of haulage. To solve this integrated optimisation problem a variety of solution techniques were applied, including constructive algorithms, meta-heuristics and parallel programming. The extensive numerical investigations have successfully shown the applicability of our approach to real sized earthwork problems.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.
Resumo:
Background Small RNA sequencing is commonly used to identify novel miRNAs and to determine their expression levels in plants. There are several miRNA identification tools for animals such as miRDeep, miRDeep2 and miRDeep*. miRDeep-P was developed to identify plant miRNA using miRDeep’s probabilistic model of miRNA biogenesis, but it depends on several third party tools and lacks a user-friendly interface. The objective of our miRPlant program is to predict novel plant miRNA, while providing a user-friendly interface with improved accuracy of prediction. Result We have developed a user-friendly plant miRNA prediction tool called miRPlant. We show using 16 plant miRNA datasets from four different plant species that miRPlant has at least a 10% improvement in accuracy compared to miRDeep-P, which is the most popular plant miRNA prediction tool. Furthermore, miRPlant uses a Graphical User Interface for data input and output, and identified miRNA are shown with all RNAseq reads in a hairpin diagram. Conclusions We have developed miRPlant which extends miRDeep* to various plant species by adopting suitable strategies to identify hairpin excision regions and hairpin structure filtering for plants. miRPlant does not require any third party tools such as mapping or RNA secondary structure prediction tools. miRPlant is also the first plant miRNA prediction tool that dynamically plots miRNA hairpin structure with small reads for identified novel miRNAs. This feature will enable biologists to visualize novel pre-miRNA structure and the location of small RNA reads relative to the hairpin. Moreover, miRPlant can be easily used by biologists with limited bioinformatics skills.
Resumo:
Critical stage in open-pit mining is to determine the optimal extraction sequence of blocks, which has significant impacts on mining profitability. In this paper, a more comprehensive block sequencing optimisation model is developed for the open-pit mines. In the model, material characteristics of blocks, grade control, excavator and block sequencing are investigated and integrated to maximise the short-term benefit of mining. Several case studies are modeled and solved by CPLEX MIP and CP engines. Numerical investigations are presented to illustrate and validate the proposed methodology.