340 resultados para datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motor unit number estimation (MUNE) is a method which aims to provide a quantitative indicator of progression of diseases that lead to loss of motor units, such as motor neurone disease. However the development of a reliable, repeatable and fast real-time MUNE method has proved elusive hitherto. Ridall et al. (2007) implement a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm to produce a posterior distribution for the number of motor units using a Bayesian hierarchical model that takes into account biological information about motor unit activation. However we find that the approach can be unreliable for some datasets since it can suffer from poor cross-dimensional mixing. Here we focus on improved inference by marginalising over latent variables to create the likelihood. In particular we explore how this can improve the RJMCMC mixing and investigate alternative approaches that utilise the likelihood (e.g. DIC (Spiegelhalter et al., 2002)). For this model the marginalisation is over latent variables which, for a larger number of motor units, is an intractable summation over all combinations of a set of latent binary variables whose joint sample space increases exponentially with the number of motor units. We provide a tractable and accurate approximation for this quantity and also investigate simulation approaches incorporated into RJMCMC using results of Andrieu and Roberts (2009).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Detailed representations of complex flow datasets are often difficult to generate using traditional vector visualisation techniques such as arrow plots and streamlines. This is particularly true when the flow regime changes in time. Texture-based techniques, which are based on the advection of dense textures, are novel techniques for visualising such flows. We review two popular texture based techniques and their application to flow datasets sourced from active research projects. The techniques investigated were Line integral convolution (LIC) [1], and Image based flow visualisation (IBFV) [18]. We evaluated these and report on their effectiveness from a visualisation perspective. We also report on their ease of implementation and computational overheads.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We conducted an in-situ X-ray micro-computed tomography heating experiment at the Advanced Photon Source (USA) to dehydrate an unconfined 2.3 mm diameter cylinder of Volterra Gypsum. We used a purpose-built X-ray transparent furnace to heat the sample to 388 K for a total of 310 min to acquire a three-dimensional time-series tomography dataset comprising nine time steps. The voxel size of 2.2 μm3 proved sufficient to pinpoint reaction initiation and the organization of drainage architecture in space and time. We observed that dehydration commences across a narrow front, which propagates from the margins to the centre of the sample in more than four hours. The advance of this front can be fitted with a square-root function, implying that the initiation of the reaction in the sample can be described as a diffusion process. Novel parallelized computer codes allow quantifying the geometry of the porosity and the drainage architecture from the very large tomographic datasets (20483 voxels) in unprecedented detail. We determined position, volume, shape and orientation of each resolvable pore and tracked these properties over the duration of the experiment. We found that the pore-size distribution follows a power law. Pores tend to be anisotropic but rarely crack-shaped and have a preferred orientation, likely controlled by a pre-existing fabric in the sample. With on-going dehydration, pores coalesce into a single interconnected pore cluster that is connected to the surface of the sample cylinder and provides an effective drainage pathway. Our observations can be summarized in a model in which gypsum is stabilized by thermal expansion stresses and locally increased pore fluid pressures until the dehydration front approaches to within about 100 μm. Then, the internal stresses are released and dehydration happens efficiently, resulting in new pore space. Pressure release, the production of pores and the advance of the front are coupled in a feedback loop.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper outlines a method for studying online activity using both qualitative and quantitative methods: topical network analysis. A topical network refers to "the collection of sites commenting on a particular event or issue, and the links between them" (Highfield, Kirchhoff, & Nicolai, 2011, p. 341). The approach is a complement for the analysis of large datasets enabling the examination and comparison of different discussions as a means of improving our understanding of the uses of social media and other forms of online communication. Developed for an analysis of political blogging, the method also has wider applications for other social media websites such as Twitter.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. Results We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. Conclusions We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this issue of Cancer Discovery, Guagnano and colleagues use a large and diverse annotated collection of cancer cell lines, the Cancer Cell Line Encyclopedia, to correlate whole-genome expression and genomic alteration datasets with cell line sensitivity data to the novel pan-fibroblast growth factor receptor (FGFR) inhibitor NVP-BGJ398. Their findings underscore not only the preclinical use of such cell line panels in identifying predictive biomarkers, but also the emergence of the FGFRs as valid therapeutic targets, across an increasingly broad range of malignancies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas.We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches.After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Amongst the most prominent uses of Twitter at present is its role in the discussion of widely televised events: Twitter’s own statistics for 2011, for example, list major entertainment spectacles (the MTV Music Awards, the BET Awards) and sports matches (the UEFA Champions League final, the FIFA Women’s World Cup final) amongst the events generating the most tweets per second during the year (Twitter, 2011). User activities during such televised events constitute a specific, unique category of Twitter use, which differs clearly from the other major events which generate a high rate of tweets per second (such as crises and breaking news, from the Japanese earthquake and tsunami to the death of Steve Jobs), as preliminary research has shown. During such major media events, by contrast, Twitter is used most predominantly as a technology of fandom instead: it serves in the first place as a backchannel to television and other streaming audiovisual media, enabling users offer their own running commentary on the universally shared media text of the event broadcast as it unfolds live. Centrally, this communion of fans around the shared text is facilitated by the use of Twitter hashtags – unifying textual markers which are now often promoted to prospective audiences by the broadcasters well in advance of the live event itself. This paper examines the use of Twitter as a technology for the expression of shared fandom in the context of a major, internationally televised annual media event: the Eurovision Song Contest. It constitutes a highly publicised, highly choreographed media spectacle whose eventual outcomes are unknown ahead of time and attracts a diverse international audience. Our analysis draws on comprehensive datasets for the ‘official’ event hashtags, #eurovision, #esc, and #sbseurovision. Using innovative methods which combine qualitative and quantitative approaches to the analysis of Twitter datasets containing several hundreds of thousands, we examine overall patterns of participation to discover how audiences express their fandom throughout the event. Minute-by-minute tracking of Twitter activity during the live broadcasts enables us to identify the most resonant moments during each event; we also examine the networks of interaction between participants to detect thematically or geographically determined clusters of interaction, and to identify the most visible and influential participants in each network. Such analysis is able to provide a unique insight into the use of Twitter as a technology for fandom and for what in cultural studies research is called ‘audiencing’: the public performance of belonging to the distributed audience for a shared media event. Our work thus contributes to the examination of fandom practices led by Henry Jenkins (2006) and other scholars, and points to Twitter as an important new medium facilitating the connection and communion of such fans.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Stormwater is a potential and readily available alternative source for potable water in urban areas. However, its direct use is severely constrained by the presence of toxic pollutants, such as heavy metals (HMs). The presence of HMs in stormwater is of concern because of their chronic toxicity and persistent nature. In addition to human health impacts, metals can contribute to adverse ecosystem health impact on receiving waters. Therefore, the ability to predict the levels of HMs in stormwater is crucial for monitoring stormwater quality and for the design of effective treatment systems. Unfortunately, the current laboratory methods for determining HM concentrations are resource intensive and time consuming. In this paper, applications of multivariate data analysis techniques are presented to identify potential surrogate parameters which can be used to determine HM concentrations in stormwater. Accordingly, partial least squares was applied to identify a suite of physicochemical parameters which can serve as indicators of HMs. Datasets having varied characteristics, such as land use and particle size distribution of solids, were analyzed to validate the efficacy of the influencing parameters. Iron, manganese, total organic carbon, and inorganic carbon were identified as the predominant parameters that correlate with the HM concentrations. The practical extension of the study outcomes to urban stormwater management is also discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper introduces PartSS, a new partition-based fil- tering for tasks performing string comparisons under edit distance constraints. PartSS offers improvements over the state-of-the-art method NGPP with the implementation of a new partitioning scheme and also improves filtering abil- ities by exploiting theoretical results on shifting and scaling ranges, thus accelerating the rate of calculating edit distance between strings. PartSS filtering has been implemented within two major tasks of data integration: similarity join and approximate membership extraction under edit distance constraints. The evaluation on an extensive range of real-world datasets demonstrates major gain in efficiency over NGPP and QGrams approaches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The rapid increase in the deployment of CCTV systems has led to a greater demand for algorithms that are able to process incoming video feeds. These algorithms are designed to extract information of interest for human operators. During the past several years, there has been a large effort to detect abnormal activities through computer vision techniques. Typically, the problem is formulated as a novelty detection task where the system is trained on normal data and is required to detect events which do not fit the learned `normal' model. Many researchers have tried various sets of features to train different learning models to detect abnormal behaviour in video footage. In this work we propose using a Semi-2D Hidden Markov Model (HMM) to model the normal activities of people. The outliers of the model with insufficient likelihood are identified as abnormal activities. Our Semi-2D HMM is designed to model both the temporal and spatial causalities of the crowd behaviour by assuming the current state of the Hidden Markov Model depends not only on the previous state in the temporal direction, but also on the previous states of the adjacent spatial locations. Two different HMMs are trained to model both the vertical and horizontal spatial causal information. Location features, flow features and optical flow textures are used as the features for the model. The proposed approach is evaluated using the publicly available UCSD datasets and we demonstrate improved performance compared to other state of the art methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spatio-Temporal interest points are the most popular feature representation in the field of action recognition. A variety of methods have been proposed to detect and describe local patches in video with several techniques reporting state of the art performance for action recognition. However, the reported results are obtained under different experimental settings with different datasets, making it difficult to compare the various approaches. As a result of this, we seek to comprehensively evaluate state of the art spatio- temporal features under a common evaluation framework with popular benchmark datasets (KTH, Weizmann) and more challenging datasets such as Hollywood2. The purpose of this work is to provide guidance for researchers, when selecting features for different applications with different environmental conditions. In this work we evaluate four popular descriptors (HOG, HOF, HOG/HOF, HOG3D) using a popular bag of visual features representation, and Support Vector Machines (SVM)for classification. Moreover, we provide an in-depth analysis of local feature descriptors and optimize the codebook sizes for different datasets with different descriptors. In this paper, we demonstrate that motion based features offer better performance than those that rely solely on spatial information, while features that combine both types of data are more consistent across a variety of conditions, but typically require a larger codebook for optimal performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Modelling video sequences by subspaces has recently shown promise for recognising human actions. Subspaces are able to accommodate the effects of various image variations and can capture the dynamic properties of actions. Subspaces form a non-Euclidean and curved Riemannian manifold known as a Grassmann manifold. Inference on manifold spaces usually is achieved by embedding the manifolds in higher dimensional Euclidean spaces. In this paper, we instead propose to embed the Grassmann manifolds into reproducing kernel Hilbert spaces and then tackle the problem of discriminant analysis on such manifolds. To achieve efficient machinery, we propose graph-based local discriminant analysis that utilises within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, respectively. Experiments on KTH, UCF Sports, and Ballet datasets show that the proposed approach obtains marked improvements in discrimination accuracy in comparison to several state-of-the-art methods, such as the kernel version of affine hull image-set distance, tensor canonical correlation analysis, spatial-temporal words and hierarchy of discriminative space-time neighbourhood features.