996 resultados para Sequence representation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

DNA sequence representation methods are used to denote a gene structure effectively and help in similarities/dissimilarities analysis of coding sequences. Many different kinds of representations have been proposed in the literature. They can be broadly classified into Numerical, Graphical, Geometrical and Hybrid representation methods. DNA structure and function analysis are made easy with graphical and geometrical representation methods since it gives visual representation of a DNA structure. In numerical method, numerical values are assigned to a sequence and digital signal processing methods are used to analyze the sequence. Hybrid approaches are also reported in the literature to analyze DNA sequences. This paper reviews the latest developments in DNA Sequence representation methods. We also present a taxonomy of various methods. A comparison of these methods where ever possible is also done

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We discuss several approaches to similarity preserving coding of symbol sequences and possible connections of their distributed versions to metric embeddings. Interpreting sequence representation methods with embeddings can help develop an approach to their analysis and may lead to discovering useful properties.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This study investigates the use of unsupervised features derived from word embedding approaches and novel sequence representation approaches for improving clinical information extraction systems. Our results corroborate previous findings that indicate that the use of word embeddings significantly improve the effectiveness of concept extraction models; however, we further determine the influence that the corpora used to generate such features have. We also demonstrate the promise of sequence-based unsupervised features for further improving concept extraction.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Calligraphic writing presents a rich set of challenges to the human movement control system. These challenges include: initial learning, and recall from memory, of prescribed stroke sequences; critical timing of stroke onsets and durations; fine control of grip and contact forces; and letter-form invariance under voluntary size scaling, which entails fine control of stroke direction and amplitude during recruitment and derecruitment of musculoskeletal degrees of freedom. Experimental and computational studies in behavioral neuroscience have made rapid progress toward explaining the learning, planning and contTOl exercised in tasks that share features with calligraphic writing and drawing. This article summarizes computational neuroscience models and related neurobiological data that reveal critical operations spanning from parallel sequence representations to fine force control. Part one addresses stroke sequencing. It treats competitive queuing (CQ) models of sequence representation, performance, learning, and recall. Part two addresses letter size scaling and motor equivalence. It treats cursive handwriting models together with models in which sensory-motor tmnsformations are performed by circuits that learn inverse differential kinematic mappings. Part three addresses fine-grained control of timing and transient forces, by treating circuit models that learn to solve inverse dynamics problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the context of an autologous cell transplantation study, a unilateral biopsy of cortical tissue was surgically performed from the right dorsolateral prefrontal cortex (dlPFC) in two intact adult macaque monkeys (dlPFC lesioned group), together with the implantation of a chronic chamber providing access to the left motor cortex. Three other monkeys were subjected to the same chronic chamber implantation, but without dlPFC biopsy (control group). All monkeys were initially trained to perform sequential manual dexterity tasks, requiring precision grip. The motor performance and the prehension's sequence (temporal order to grasp pellets from different spatial locations) were analysed for each hand. Following the surgery, transient and moderate deficits of manual dexterity per se occurred in both groups, indicating that they were not due to the dlPFC lesion (most likely related to the recording chamber implantation and/or general anaesthesia/medication). In contrast, changes of motor habit were observed for the sequential order of grasping in the two monkeys with dlPFC lesion only. The changes were more prominent in the monkey subjected to the largest lesion, supporting the notion of a specific effect of the dlPFC lesion on the motor habit of the monkeys. These observations are reminiscent of previous studies using conditional tasks with delay that have proposed a specialization of the dlPFC for visuo-spatial working memory, except that this is in a different context of "free-will", non-conditional manual dexterity task, without a component of working memory.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes an efficient pattern extraction algorithm that can be applied on melodic sequences that are represented as strings of abstract intervallic symbols; the melodic representation introduces special “binary don’t care” symbols for intervals that may belong to two partially overlapping intervallic categories. As a special case the well established “step–leap” representation is examined. In the step–leap representation, each melodic diatonic interval is classified as a step (±s), a leap (±l) or a unison (u). Binary don’t care symbols are used to represent the possible overlapping between the various abstract categories e.g. *=s, *=l and #=-s, #=-l. We propose an O(n+d(n-d)+z)-time algorithm for computing all maximal-pairs in a given sequence x=x[1..n], where x contains d occurrences of binary don’t cares and z is the number of reported maximal-pairs.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes an efficient pattern extraction algorithm that can be applied on melodic sequences that are represented as strings of abstract intervallic symbols; the melodic representation introduces special “binary don’t care” symbols for intervals that may belong to two partially overlapping intervallic categories. As a special case the well established “step–leap” representation is examined. In the step–leap representation, each melodic diatonic interval is classified as a step (±s), a leap (±l) or a unison (u). Binary don’t care symbols are used to represent the possible overlapping between the various abstract categories e.g. *=s, *=l and #=-s, #=-l. We propose an O(n+d(n-d)+z)-time algorithm for computing all maximal-pairs in a given sequence x=x[1..n], where x contains d occurrences of binary don’t cares and z is the number of reported maximal-pairs.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Whereas genome sequencing defines the genetic potential of an organism, transcript sequencing defines the utilization of this potential and links the genome with most areas of biology. To exploit the information within the human genome in the fight against cancer, we have deposited some two million expressed sequence tags (ESTs) from human tumors and their corresponding normal tissues in the public databases. The data currently define approximate to23,500 genes, of which only approximate to1,250 are still represented only by ESTs. Examination of the EST coverage of known cancer-related (CR) genes reveals that <1% do not have corresponding ESTs, indicating that the representation of genes associated with commonly studied tumors is high. The careful recording of the origin of all ESTs we have produced has enabled detailed definition of where the genes they represent are expressed in the human body. More than 100,000 ESTs are available for seven tissues, indicating a surprising variability of gene usage that has led to the discovery of a significant number of genes with restricted expression, and that may thus be therapeutically useful. The ESTs also reveal novel nonsynonymous germline variants (although the one-pass nature of the data necessitates careful validation) and many alternatively spliced transcripts. Although widely exploited by the scientific community, vindicating our totally open source policy, the EST data generated still provide extensive information that remains to be systematically explored, and that may further facilitate progress toward both the understanding and treatment of human cancers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Invasive species pose a significant threat to global economies, agriculture and biodiversity. Despite progress towards understanding the ecological factors associated with plant invasions, limited genomic resources have made it difficult to elucidate the evolutionary and genetic factors responsible for invasiveness. This study presents the first expressed sequence tag (EST) collection for Senecio madagascariensis, a globally invasive plant species. Methods We used pyrosequencing of one normalized and two subtractive libraries, derived from one native and one invasive population, to generate an EST collection. ESTs were assembled into contigs, annotated by BLAST comparison with the NCBI non-redundant protein database and assigned gene ontology (GO) terms from the Plant GO Slim ontologies. Key Results Assembly of the 221 746 sequence reads resulted in 12 442 contigs. Over 50 % (6183) of 12 442 contigs showed significant homology to proteins in the NCBI database, representing approx. 4800 independent transcripts. The molecular transducer GO term was significantly over-represented in the native (South African) subtractive library compared with the invasive (Australian) library. Based on NCBI BLAST hits and literature searches, 40 % of the molecular transducer genes identified in the South African subtractive library are likely to be involved in response to biotic stimuli, such as fungal, bacterial and viral pathogens. Conclusions This EST collection is the first representation of the S. madagascariensis transcriptome and provides an important resource for the discovery of candidate genes associated with plant invasiveness. The over-representation of molecular transducer genes associated with defence responses in the native subtractive library provides preliminary support for aspects of the enemy release and evolution of increased competitive ability hypotheses in this successful invasive. This study highlights the contribution of next-generation sequencing to better understanding the molecular mechanisms underlying ecological hypotheses that are important in successful plant invasions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Nicotiana benthamiana has been widely used for transient gene expression assays and as a model plant in the study of plant-microbe interactions, lipid engineering and RNA silencing pathways. Assembling the sequence of its transcriptome provides information that, in conjunction with the genome sequence, will facilitate gaining insight into the plant's capacity for high-level transient transgene expression, generation of mobile gene silencing signals, and hyper-susceptibility to viral infection. Methodology/Results: RNA-seq libraries from 9 different tissues were deep sequenced and assembled, de novo, into a representation of the transcriptome. The assembly, of16GB of sequence, yielded 237,340 contigs, clustering into 119,014 transcripts (unigenes). Between 80 and 85% of reads from all tissues could be mapped back to the full transcriptome. Approximately 63% of the unigenes exhibited a match to the Solgenomics tomato predicted proteins database. Approximately 94% of the Solgenomics N. benthamiana unigene set (16,024 sequences) matched our unigene set (119,014 sequences). Using homology searches we identified 31 homologues that are involved in RNAi-associated pathways in Arabidopsis thaliana, and show that they possess the domains characteristic of these proteins. Of these genes, the RNA dependent RNA polymerase gene, Rdr1, is transcribed but has a 72 nt insertion in exon1 that would cause premature termination of translation. Dicer-like 3 (DCL3) appears to lack both the DEAD helicase motif and second dsRNA binding motif, and DCL2 and AGO4b have unexpectedly high levels of transcription. Conclusions: The assembled and annotated representation of the transcriptome and list of RNAi-associated sequences are accessible at www.benthgenome.com alongside a draft genome assembly. These genomic resources will be very useful for further study of the developmental, metabolic and defense pathways of N. benthamiana and in understanding the mechanisms behind the features which have made it such a well-used model plant. © 2013 Nakasugi et al.