953 resultados para Annotation Tag
Resumo:
This paper presents a novel framework for the unsupervised alignment of an ensemble of temporal sequences. This approach draws inspiration from the axiom that an ensemble of temporal signals stemming from the same source/class should have lower rank when "aligned" rather than "misaligned". Our approach shares similarities with recent state of the art methods for unsupervised images ensemble alignment (e.g. RASL) which breaks the problem into a set of image alignment problems (which have well known solutions i.e. the Lucas-Kanade algorithm). Similarly, we propose a strategy for decomposing the problem of temporal ensemble alignment into a similar set of independent sequence problems which we claim can be solved reliably through Dynamic Time Warping (DTW). We demonstrate the utility of our method using the Cohn-Kanade+ dataset, to align expression onset across multiple sequences, which allows us to automate the rapid discovery of event annotations.
Resumo:
Background The sequencing, de novo assembly and annotation of transcriptome datasets generated with next generation sequencing (NGS) has enabled biologists to answer genomic questions in non-model species with unprecedented ease. Reliable and accurate de novo assembly and annotation of transcriptomes, however, is a critically important step for transcriptome assemblies generated from short read sequences. Typical benchmarks for assembly and annotation reliability have been performed with model species. To address the reliability and accuracy of de novo transcriptome assembly in non-model species, we generated an RNAseq dataset for an intertidal gastropod mollusc species, Nerita melanotragus, and compared the assembly produced by four different de novo transcriptome assemblers; Velvet, Oases, Geneious and Trinity, for a number of quality metrics and redundancy. Results Transcriptome sequencing on the Ion Torrent PGM™ produced 1,883,624 raw reads with a mean length of 133 base pairs (bp). Both the Trinity and Oases de novo assemblers produced the best assemblies based on all quality metrics including fewer contigs, increased N50 and average contig length and contigs of greater length. Overall the BLAST and annotation success of our assemblies was not high with only 15-19% of contigs assigned a putative function. Conclusions We believe that any improvement in annotation success of gastropod species will require more gastropod genome sequences, but in particular an increase in mollusc protein sequences in public databases. Overall, this paper demonstrates that reliable and accurate de novo transcriptome assemblies can be generated from short read sequencers with the right assembly algorithms. Keywords: Nerita melanotragus; De novo assembly; Transcriptome; Heat shock protein; Ion torrent
A tag-based personalized item recommendation system using tensor modeling and topic model approaches
Resumo:
This research falls in the area of enhancing the quality of tag-based item recommendation systems. It aims to achieve this by employing a multi-dimensional user profile approach and by analyzing the semantic aspects of tags. Tag-based recommender systems have two characteristics that need to be carefully studied in order to build a reliable system. Firstly, the multi-dimensional correlation, called as tag assignment
Resumo:
This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.
Resumo:
Discounted Cumulative Gain (DCG) is a well-known ranking evaluation measure for models built with multiple relevance graded data. By handling tagging data used in recommendation systems as an ordinal relevance set of {negative,null,positive}, we propose to build a DCG based recommendation model. We present an efficient and novel learning-to-rank method by optimizing DCG for a recommendation model using the tagging data interpretation scheme. Evaluating the proposed method on real-world datasets, we demonstrate that the method is scalable and outperforms the benchmarking methods by generating a quality top-N item recommendation list.
Resumo:
Staphylococcus aureus (S. aureus) is a prominent human and livestock pathogen investigated widely using omic technologies. Critically, due to availability, low visibility or scattered resources, robust network and statistical contextualisation of the resulting data is generally under-represented. Here, we present novel meta-analyses of freely-accessible molecular network and gene ontology annotation information resources for S. aureus omics data interpretation. Furthermore, through the application of the gene ontology annotation resources we demonstrate their value and ability (or lack-there-of) to summarise and statistically interpret the emergent properties of gene expression and protein abundance changes using publically available data. This analysis provides simple metrics for network selection and demonstrates the availability and impact that gene ontology annotation selection can have on the contextualisation of bacterial omics data.
Resumo:
Environmental sensors collect massive amounts of audio data. This thesis investigates computational methods to support human analysts in identifying faunal vocalisations from that audio. A series of experiments was conducted to trial the effectiveness of novel user interfaces. This research examines the rapid scanning of spectrograms, decision support tools for users, and cleaning methods for folksonomies. Together, these investigations demonstrate that providing computational support to human analysts increases their efficiency and accuracy; this allows bioacoustics projects to efficiently utilise their valuable human analysts.
Resumo:
Single nucleotide polymorphisms (SNPs) are widely acknowledged as the marker of choice for many genetic and genomic applications because they show co-dominant inheritance, are highly abundant across genomes and are suitable for high-throughput genotyping. Here we evaluated the applicability of SNP markers developed from Crassostrea gigas and C. virginica expressed sequence tags (ESTs) in closely related Crassostrea and Ostrea species. A total of 213 putative interspecific level SNPs were identified from re-sequencing data in six amplicons, yielding on average of one interspecific level SNP per seven bp. High polymorphism levels were observed and the high success rate of transferability show that genic EST-derived SNP markers provide an efficient method for rapid marker development and SNP discovery in closely related oyster species. The six EST-SNP markers identified here will provide useful molecular tools for addressing questions in molecular ecology and evolution studies including for stock analysis (pedigree monitoring) in related oyster taxa.
Resumo:
We derive a new method for determining size-transition matrices (STMs) that eliminates probabilities of negative growth and accounts for individual variability. STMs are an important part of size-structured models, which are used in the stock assessment of aquatic species. The elements of STMs represent the probability of growth from one size class to another, given a time step. The growth increment over this time step can be modelled with a variety of methods, but when a population construct is assumed for the underlying growth model, the resulting STM may contain entries that predict negative growth. To solve this problem, we use a maximum likelihood method that incorporates individual variability in the asymptotic length, relative age at tagging, and measurement error to obtain von Bertalanffy growth model parameter estimates. The statistical moments for the future length given an individual's previous length measurement and time at liberty are then derived. We moment match the true conditional distributions with skewed-normal distributions and use these to accurately estimate the elements of the STMs. The method is investigated with simulated tag-recapture data and tag-recapture data gathered from the Australian eastern king prawn (Melicertus plebejus).
Resumo:
Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain ∼38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same region.
Resumo:
James (1991, Biometrics 47, 1519-1530) constructed unbiased estimating functions for estimating the two parameters in the von Bertalanffy growth curve from tag-recapture data. This paper provides unbiased estimating functions for a class of growth models that incorporate stochastic components and explanatory variables. a simulation study using seasonal growth models indicates that the proposed method works well while the least-squares methods that are commonly used in the literature may produce substantially biased estimates. The proposed model and method are also applied to real data from tagged rack lobsters to assess the possible seasonal effect on growth.
Resumo:
Six species of line-caught coral reef fish (Plectropomus spp., Lethrinus miniatus, Lethrinus laticaudis, Lutjanus sebae, Lutjanus malabaricus and Lutjanus erythropterus) were tagged by members of the Australian National Sportsfishing Association (ANSA) in Queensland between 1986 and 2003. Of the 14,757 fish tagged, 1607 were recaptured and we analysed these data to describe movement and determine factors likely to impact release survival. All species were classified as residents since over 80% of recaptures for each species occurred within 1 km of the release site. Few individuals (range 0.8-5%) were recaptured more than 20 km from their release point. L. sebae had a higher recapture rate (19.9%) than the other species studied (range 2.1-11.7%). Venting swimbladder gases, regardless of whether or not fish appeared to be suffering from barotrauma, significantly enhanced (P < 0.05) the survival of L. sebae and L. malabaricus but had no significant effect (P > 0.05) on L. erythropterus. The condition of fish on release, subjectively assessed by anglers, was only a significant effect on recapture rate for L. sebae where fish in "fair" condition had less than half the recapture rate of those assessed as in "excellent" or "good" condition. The recapture rate of L. sebae and L. laticaudis was significantly (P < 0.05) affected by depth with recapture rate declining in depths exceeding 30 m. Overall, the results showed that depth of capture, release condition and treatment for barotrauma influenced recapture rate for some species but these effects were not consistent across all species studied. Recommendations were made to the ANSA tagging clubs to record additional information such as injury, hooking location and hook type to enable a more comprehensive future assessment of the factors influencing release survival.
Resumo:
We derive a new method for determining size-transition matrices (STMs) that eliminates probabilities of negative growth and accounts for individual variability. STMs are an important part of size-structured models, which are used in the stock assessment of aquatic species. The elements of STMs represent the probability of growth from one size class to another, given a time step. The growth increment over this time step can be modelled with a variety of methods, but when a population construct is assumed for the underlying growth model, the resulting STM may contain entries that predict negative growth. To solve this problem, we use a maximum likelihood method that incorporates individual variability in the asymptotic length, relative age at tagging, and measurement error to obtain von Bertalanffy growth model parameter estimates. The statistical moments for the future length given an individual’s previous length measurement and time at liberty are then derived. We moment match the true conditional distributions with skewed-normal distributions and use these to accurately estimate the elements of the STMs. The method is investigated with simulated tag–recapture data and tag–recapture data gathered from the Australian eastern king prawn (Melicertus plebejus).
Resumo:
Background: Rhipicephalus (Boophilus) microplus evades the host's haemostatic system through a complex protein array secreted into tick saliva. Serine protease inhibitors (serpins) conform an important component of saliva which are represented by a large protease inhibitor family in Ixodidae. These secreted and non-secreted inhibitors modulate diverse and essential proteases involved in different physiological processes. Methods: The identification of R. microplus serpin sequences was performed through a web-based bioinformatics environment called Yabi. The database search was conducted on BmiGi V1, BmiGi V2.1, five SSH libraries, Australian tick transcriptome libraries and RmiTR V1 using bioinformatics methods. Semi quantitative PCR was carried out using different adult tissues and tick development stages. The cDNA of four identified R. microplus serpins were cloned and expressed in Pichia pastoris in order to determine biological targets of these serpins utilising protease inhibition assays. Results: A total of four out of twenty-two serpins identified in our analysis are new R. microplus serpins which were named as RmS-19 to RmS-22. The analyses of DNA and predicted amino acid sequences showed high conservation of the R. microplus serpin sequences. The expression data suggested ubiquitous expression of RmS except for RmS-6 and RmS-14 that were expressed only in nymphs and adult female ovaries, respectively. RmS-19, and -20 were expressed in all tissues samples analysed showing their important role in both parasitic and non-parasitic stages of R. microplus development. RmS-21 was not detected in ovaries and RmS-22 was not identified in ovary and nymph samples but were expressed in the rest of the samples analysed. A total of four expressed recombinant serpins showed protease specific inhibition for Chymotrypsin (RmS-1 and RmS-6), Chymotrypsin / Elastase (RmS-3) and Thrombin (RmS-15). Conclusion: This study constitutes an important contribution and improvement to the knowledge about the physiologic role of R. microplus serpins during the host-tick interaction.