985 resultados para sequence stratigraphy


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Structural genomics initiatives aim to elucidate representative 3D structures for the majority of protein families over the next decade, but many obstacles must be overcome. The correct design of constructs is extremely important since many proteins will be too large or contain unstructured regions and will not be amenable to crystallization. It is therefore essential to identify regions in protein sequences that are likely to be suitable for structural study. Scooby-Domain is a fast and simple method to identify globular domains in protein sequences. Domains are compact units of protein structure and their correct delineation will aid structural elucidation through a divide-and-conquer approach. Scooby-Domain predictions are based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method employs an A*-search to identify sequence regions that form a globular structure and those that are unstructured. On a test set of 173 proteins with consensus CATH and SCOP domain definitions, Scooby-Domain has a sensitivity of 50% and an accuracy of 29%, which is better than current state-of-the-art methods. The method does not rely on homology searches and, therefore, can identify previously unknown domains.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the mining and analysis of a single long sequence, one fundamental and important problem is obtaining accurate frequencies of sequential patterns over the sequence. However, we identify that five previous frequency measures suffer from inherent inaccuracies. To obtain more accurate frequencies, we introduce two basic principles called strict anti-monotonicity and maximum-count for frequency measures. Under the two principles, a new frequency measure is presented. An algorithm is also devised to compute it. Both theoretical analysis and empirical evaluation show that more accurate frequencies can be obtained under the new measure

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A short motif termed Plasmodium export element (PEXEL) or vacuolar targeting signal (VTS) characterizes Plasmodium proteins exported into the host cell. These proteins mediate host cell modifications essential for parasite survival and virulence. However, several PEXEL-negative exported proteins indicate that the currently predicted malaria exportome is not complete and it is unknown whether and how these proteins relate to PEXEL-positive export. Here we show that the N-terminal 10 amino acids of the PEXEL-negative exported protein REX2 (ring-exported protein 2) are necessary for its targeting and that a single-point mutation in this region abolishes export. Furthermore we show that the REX2 transmembrane domain is also essential for export and that together with the N-terminal region it is sufficient to promote export of another protein. An N-terminal region and the transmembrane domain of the unrelated PEXEL-negative exported protein SBP1 (skeleton-binding protein 1) can functionally replace the corresponding regions in REX2, suggesting that these sequence features are also present in other PEXEL-negative exported proteins. Similar to PEXEL proteins we find that REX2 is processed, but in contrast, detect no evidence for N-terminal acetylation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recognising daily activity patterns of people from low-level sensory data is an important problem. Traditional approaches typically rely on generative models such as the hidden Markov models and training on fully labelled data. While activity data can be readily acquired from pervasive sensors, e.g. in smart environments, providing manual labels to support fully supervised learning is often expensive. In this paper, we propose a new approach based on partially-supervised training of discriminative sequence models such as the conditional random field (CRF) and the maximum entropy Markov model (MEMM). We show that the approach can reduce labelling effort, and at the same time, provides us with the flexibility and accuracy of the discriminative framework. Our experimental results in the video surveillance domain illustrate that these models can perform better than their generative counterpart (i.e. the partially hidden Markov model), even when a substantial amount of labels are unavailable.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Expressed Sequence Tags (ESTs) are short DNA sequences generated by sequencing the transcribed cDNAs coming from a gene expression. They can provide significant functional, structural and evolutionary information and thus are a primary resource for gene discovery. EST annotation basically refers to the analysis of unknown ESTs that can be performed by database similarity search for possible identities and database search for functional prediction of translation products. Such kind of annotation typically consists of a series of repetitive tasks which should be automated, and be customizable and amenable to using distributed computing resources. Furthermore, processing of EST data should be done efficiently using a high performance computing platform. In this paper, we describe an EST annotator, EST-PACHPC, which has been developed for harnessing HPC resources potentially from Grid and Cloud systems for high throughput EST annotations. The performance analysis of EST-PACHPC has shown that it provides substantial performance gain in EST annotation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: Self-rated health has been linked to important health and survival outcomes in individuals with co-morbid depression and cardiovascular disease (CVD). It is not clear how the timing of depression onset relative to CVD onset affects this relationship. We aimed to first identify the prevalence of major depressive disorder (MDD) preceding CVD and secondly determine whether sequence of disease onset is associated with mental and physical self-rated health. Methods: This study utilised cross-sectional, populationbased data from 224 respondents of the 2007 Australian National Survey of Mental Health and Wellbeing (NSMHWB). Participants were those diagnosed with MDD and reported ever having a heart/circulatory condition over their lifetime. Age of onset was reported for each condition. Logistic regression was used to explore differences in self-rated mental and physical health for those reporting pre-cardiac and post-cardiac depression. Results: The proportion of individuals in whom MDD preceded CVD was 80.36% (CI: 72.57-88.15). One-fifth (19.64%, CI: 11.85-27.42) reported MDD onset at the time of, or following, CVD. After controlling for covariates, the final model demonstrated that those reporting post-cardiac depression were significantly less likely to report poor selfrated mental health (OR:0.36, CI: 0.14-0.93) than those with pre-existing depression. No significant differences were found in self-rated physical health between groups (OR:0.90 CI: 0.38-2.14). Conclusions: MDD is most common prior to the onset of CVD. Further, there is an association between pre-morbid MDD and poorer self-rated mental health. To our knowledge, this is the first time this has been demonstrated in a national, population-based survey. As self-rated health has been shown to predict important outcomes such as survival, we recommend that those with MDD be identified as vulnerable to CVD onset and poorer health outcomes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human immunodeficiency virus type 1 (HIV-1) contains two copies of genomic RNA that are noncovalently linked via a palindrome sequence within the dimer initiation site (DIS) stem-loop. In contrast to the current paradigm that the DIS stem or stem-loop is critical for HIV-1 infectivity, which arose from studies using T-cell lines, we demonstrate here that HIV-1 mutants with deletions in the DIS stem-loop are replication competent in peripheral blood mononuclear cells (PBMCs). The DIS mutants contained either the wild-type (5′GCGCGC3′) or an arbitrary (5′ACGCGT3′) palindrome sequence in place of the 39-nucleotide DIS stem-loop (NLCGCGCG and NLACGCGT). These DIS mutants were replication defective in SupT1 cells, concurring with the current model in which DIS mutants are replication defective in T-cell lines. All of the HIV-1 DIS mutants were replication competent in PBMCs over a 40-day infection period and had retained their respective DIS mutations at 40 days postinfection. Although the stability of the virion RNA dimer was not affected by our DIS mutations, the RNA dimers exhibited a diffuse migration profile when compared to the wild type. No defect in protein processing of the Gag and GagProPol precursor proteins was found in the DIS mutants. Our data provide direct evidence that the DIS stem-loop is dispensable for viral replication in PBMCs and that the requirement of the DIS stem-loop in HIV-1 replication is cell type dependent.