364 resultados para Dataset
Resumo:
Background The sequencing, de novo assembly and annotation of transcriptome datasets generated with next generation sequencing (NGS) has enabled biologists to answer genomic questions in non-model species with unprecedented ease. Reliable and accurate de novo assembly and annotation of transcriptomes, however, is a critically important step for transcriptome assemblies generated from short read sequences. Typical benchmarks for assembly and annotation reliability have been performed with model species. To address the reliability and accuracy of de novo transcriptome assembly in non-model species, we generated an RNAseq dataset for an intertidal gastropod mollusc species, Nerita melanotragus, and compared the assembly produced by four different de novo transcriptome assemblers; Velvet, Oases, Geneious and Trinity, for a number of quality metrics and redundancy. Results Transcriptome sequencing on the Ion Torrent PGM™ produced 1,883,624 raw reads with a mean length of 133 base pairs (bp). Both the Trinity and Oases de novo assemblers produced the best assemblies based on all quality metrics including fewer contigs, increased N50 and average contig length and contigs of greater length. Overall the BLAST and annotation success of our assemblies was not high with only 15-19% of contigs assigned a putative function. Conclusions We believe that any improvement in annotation success of gastropod species will require more gastropod genome sequences, but in particular an increase in mollusc protein sequences in public databases. Overall, this paper demonstrates that reliable and accurate de novo transcriptome assemblies can be generated from short read sequencers with the right assembly algorithms. Keywords: Nerita melanotragus; De novo assembly; Transcriptome; Heat shock protein; Ion torrent
Resumo:
Thin plate spline finite element methods are used to fit a surface to an irregularly scattered dataset [S. Roberts, M. Hegland, and I. Altas. Approximation of a Thin Plate Spline Smoother using Continuous Piecewise Polynomial Functions. SIAM, 1:208--234, 2003]. The computational bottleneck for this algorithm is the solution of large, ill-conditioned systems of linear equations at each step of a generalised cross validation algorithm. Preconditioning techniques are investigated to accelerate the convergence of the solution of these systems using Krylov subspace methods. The preconditioners under consideration are block diagonal, block triangular and constraint preconditioners [M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numer., 14:1--137, 2005]. The effectiveness of each of these preconditioners is examined on a sample dataset taken from a known surface. From our numerical investigation, constraint preconditioners appear to provide improved convergence for this surface fitting problem compared to block preconditioners.
Resumo:
S. japonicum infection is believed to be endemic in 28 of the 80 provinces of the Philippines and the most recent data on schistosomiasis prevalence have shown considerable variability between provinces. In order to increase the efficient allocation of parasitic disease control resources in the country, we aimed to describe the small scale spatial variation in S. japonicum prevalence across the Philippines, quantify the role of the physical environment in driving the spatial variation of S. japonicum, and develop a predictive risk map of S. japonicum infection. Data on S. japonicum infection from 35,754 individuals across the country were geo-located at the barangay level and included in the analysis. The analysis was then stratified geographically for Luzon, the Visayas and Mindanao. Zero-inflated binomial Bayesian geostatistical models of S. japonicum prevalence were developed and diagnostic uncertainty was incorporated. Results of the analysis show that in the three regions, males and individuals aged ≥ 20 years had significantly higher prevalence of S. japonicum compared with females and children <5 years. The role of the environmental variables differed between regions of the Philippines. S. japonicum infection was widespread in the Visayas whereas it was much more focal in Luzon and Mindanao. This analysis revealed significant spatial variation in prevalence of S. japonicum infection in the Philippines. This suggests that a spatially targeted approach to schistosomiasis interventions, including mass drug administration, is warranted. When financially possible, additional schistosomiasis surveys should be prioritized to areas identified to be at high risk, but which were underrepresented in our dataset.
Resumo:
As of today, online reviews have become more and more important in decision making process. In recent years, the problem of identifying useful reviews for users has attracted significant attentions. For instance, in order to select reviews that focus on a particular feature, researchers proposed a method which extracts all associated words of this feature as the relevant information to evaluate and find appropriate reviews. However, the extraction of associated words is not that accurate due to the noise in free review text, and this affects the overall performance negatively. In this paper, we propose a method to select reviews according to a given feature by using a review model generated based upon a domain ontology called product feature taxonomy. The proposed review model provides relevant information about the hierarchical relationships of the features in the review which captures the review characteristics accurately. Our experiment results based on real world review dataset show that our approach is able to improve the review selection performance according to the given criteria effectively.
Resumo:
Epigenetic silencing mediated by CpG methylation is a common feature of many cancers. Characterizing aberrant DNA methylation changes associated with tumor progression may identify potential prognostic markers for prostate cancer (PCa). We treated two PCa cell lines, 22Rv1 and DU-145 with the demethylating agent 5-Aza 2’–deoxycitidine (DAC) and global methylation status was analyzed by performing methylation-sensitive restriction enzyme based differential methylation hybridization strategy followed by genome-wide CpG methylation array profiling. In addition, we examined gene expression changes using a custom microarray. Gene Set Enrichment Analysis (GSEA) identified the most significantly dysregulated pathways. In addition, we assessed methylation status of candidate genes that showed reduced CpG methylation and increased gene expression after DAC treatment, in Gleason score (GS) 8 vs. GS6 patients using three independent cohorts of patients; the publically available The Cancer Genome Atlas (TCGA) dataset, and two separate patient cohorts. Our analysis, by integrating methylation and gene expression in PCa cell lines, combined with patient tumor data, identified novel potential biomarkers for PCa patients. These markers may help elucidate the pathogenesis of PCa and represent potential prognostic markers for PCa patients.
Resumo:
For robots operating in outdoor environments, a number of factors, including weather, time of day, rough terrain, high speeds, and hardware limitations, make performing vision-based simultaneous localization and mapping with current techniques infeasible due to factors such as image blur and/or underexposure, especially on smaller platforms and low-cost hardware. In this paper, we present novel visual place-recognition and odometry techniques that address the challenges posed by low lighting, perceptual change, and low-cost cameras. Our primary contribution is a novel two-step algorithm that combines fast low-resolution whole image matching with a higher-resolution patch-verification step, as well as image saliency methods that simultaneously improve performance and decrease computing time. The algorithms are demonstrated using consumer cameras mounted on a small vehicle in a mixed urban and vegetated environment and a car traversing highway and suburban streets, at different times of day and night and in various weather conditions. The algorithms achieve reliable mapping over the course of a day, both when incrementally incorporating new visual scenes from different times of day into an existing map, and when using a static map comprising visual scenes captured at only one point in time. Using the two-step place-recognition process, we demonstrate for the first time single-image, error-free place recognition at recall rates above 50% across a day-night dataset without prior training or utilization of image sequences. This place-recognition performance enables topologically correct mapping across day-night cycles.
Resumo:
Spatial data are now prevalent in a wide range of fields including environmental and health science. This has led to the development of a range of approaches for analysing patterns in these data. In this paper, we compare several Bayesian hierarchical models for analysing point-based data based on the discretization of the study region, resulting in grid-based spatial data. The approaches considered include two parametric models and a semiparametric model. We highlight the methodology and computation for each approach. Two simulation studies are undertaken to compare the performance of these models for various structures of simulated point-based data which resemble environmental data. A case study of a real dataset is also conducted to demonstrate a practical application of the modelling approaches. Goodness-of-fit statistics are computed to compare estimates of the intensity functions. The deviance information criterion is also considered as an alternative model evaluation criterion. The results suggest that the adaptive Gaussian Markov random field model performs well for highly sparse point-based data where there are large variations or clustering across the space; whereas the discretized log Gaussian Cox process produces good fit in dense and clustered point-based data. One should generally consider the nature and structure of the point-based data in order to choose the appropriate method in modelling a discretized spatial point-based data.
Resumo:
Stormwater pollution is linked to stream ecosystem degradation. In predicting stormwater pollution, various types of modelling techniques are adopted. The accuracy of predictions provided by these models depends on the data quality, appropriate estimation of model parameters, and the validation undertaken. It is well understood that available water quality datasets in urban areas span only relatively short time scales unlike water quantity data, which limits the applicability of the developed models in engineering and ecological assessment of urban waterways. This paper presents the application of leave-one-out (LOO) and Monte Carlo cross validation (MCCV) procedures in a Monte Carlo framework for the validation and estimation of uncertainty associated with pollutant wash-off when models are developed using a limited dataset. It was found that the application of MCCV is likely to result in a more realistic measure of model coefficients than LOO. Most importantly, MCCV and LOO were found to be effective in model validation when dealing with a small sample size which hinders detailed model validation and can undermine the effectiveness of stormwater quality management strategies.
Resumo:
Competing events are common in medical research. Ignoring them in the statistical analysis can easily lead to flawed results and conclusions. This article uses a real dataset and a simple simulation to show how standard analysis fails and how such data should be analysed
Resumo:
In this paper we present a novel scheme for improving speaker diarization by making use of repeating speakers across multiple recordings within a large corpus. We call this technique speaker re-diarization and demonstrate that it is possible to reuse the initial speaker-linked diarization outputs to boost diarization accuracy within individual recordings. We first propose and evaluate two novel re-diarization techniques. We demonstrate their complementary characteristics and fuse the two techniques to successfully conduct speaker re-diarization across the SAIVT-BNEWS corpus of Australian broadcast data. This corpus contains recurring speakers in various independent recordings that need to be linked across the dataset. We show that our speaker re-diarization approach can provide a relative improvement of 23% in diarization error rate (DER), over the original diarization results, as well as improve the estimated number of speakers and the cluster purity and coverage metrics.
Resumo:
Rating systems are used by many websites, which allow customers to rate available items according to their own experience. Subsequently, reputation models are used to aggregate available ratings in order to generate reputation scores for items. A problem with current reputation models is that they provide solutions to enhance accuracy of sparse datasets not thinking of their models performance over dense datasets. In this paper, we propose a novel reputation model to generate more accurate reputation scores for items using any dataset; whether it is dense or sparse. Our proposed model is described as a weighted average method, where the weights are generated using the normal distribution. Experiments show promising results for the proposed model over state-of-the-art ones on sparse and dense datasets.
Resumo:
Recent modelling of socio-economic costs by the Australian railway industry in 2010 has estimated the cost of level crossing accidents to exceed AU$116 million annually. To better understand the causal factors of these accidents, a video analytics application is being developed to automatically detect near-miss incidents using forward facing videos from trains. As near-miss events occur more frequently than collisions, by detecting these occurrences there will be more safety data available for analysis. The application that is being developed will improve the objectivity of near-miss reporting by providing quantitative data about the position of vehicles at level crossings through the automatic analysis of video footage. In this paper we present a novel method for detecting near-miss occurrences at railway level crossings from video data of trains. Our system detects and localizes vehicles at railway level crossings. It also detects the position of railways to calculate the distance of the detected vehicles to the railway centerline. The system logs the information about the position of the vehicles and railway centerline into a database for further analysis by the safety data recording and analysis system, to determine whether or not the event is a near-miss. We present preliminary results of our system on a dataset of videos taken from a train that passed through 14 railway level crossings. We demonstrate the robustness of our system by showing the results of our system on day and night videos.
Resumo:
Due to the popularity of security cameras in public places, it is of interest to design an intelligent system that can efficiently detect events automatically. This paper proposes a novel algorithm for multi-person event detection. To ensure greater than real-time performance, features are extracted directly from compressed MPEG video. A novel histogram-based feature descriptor that captures the angles between extracted particle trajectories is proposed, which allows us to capture motion patterns of multi-person events in the video. To alleviate the need for fine-grained annotation, we propose the use of Labelled Latent Dirichlet Allocation, a “weakly supervised” method that allows the use of coarse temporal annotations which are much simpler to obtain. This novel system is able to run at approximately ten times real-time, while preserving state-of-theart detection performance for multi-person events on a 100-hour real-world surveillance dataset (TRECVid SED).
Resumo:
This paper proposes a physically motivated reappraisal of manoeuvring models for ships and presents a new model developed from first principles by application of low aspect-ratio aerodynamic theory and Lagrangian mechanics. The coefficients of the model are shown to be related to physical processes, and validation is presented using the results from a planar motion mechanism dataset.
Resumo:
Premise of the study: Plant invasiveness can be promoted by higher values of adaptive traits (e.g., photosynthetic capacity, biomass accumulation), greater plasticity and coordination of these traits, and by higher and positive relative influence of these functionalities on fitness, such as increasing reproductive output. However, the dataset for this premise rarely include linkages between epidermal-stomatal traits, leaf internal anatomy, and physiological performance. Methods: Three ecological pairs of invasive vs non-invasive (native) woody vine species of South-East Queensland, Australia were investigated for trait differences in leaf morphology and anatomy under varying light intensity. The linkages of these traits with physiological performance (e.g. water use efficiency, photosynthesis, and leaf construction cost) and plant adaptive traits of specific leaf area, biomass, and relative growth rates were also explored. Key results: Mean leaf anatomical trait differed significantly between the two groups, except for stomatal size. Plasticity of traits, and to a very limited extent, their phenotypic integration were higher in the invasive relative to the native species. ANOVA, ordination, and analysis of similarity suggest that for leaf morphology and anatomy, the three functional strategies contribute to the differences between the two groups in the order phenotypic plasticity > trait means > phenotypic integration. Conclusions: The linkages demonstrated in the study between stomatal complex/gross anatomy and physiology are scarce in the ecological literature of plant invasiveness, but the findings suggest that leaf anatomical traits need to be considered routinely as part of weed species assessment and in the worldwide leaf economic spectrum.