986 resultados para Sequence Features


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study investigates the use of unsupervised features derived from word embedding approaches and novel sequence representation approaches for improving clinical information extraction systems. Our results corroborate previous findings that indicate that the use of word embeddings significantly improve the effectiveness of concept extraction models; however, we further determine the influence that the corpora used to generate such features have. We also demonstrate the promise of sequence-based unsupervised features for further improving concept extraction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A great deal of experimental studies have shown that many introns of eukaryotic genes function as regulators of transcription. However, comprehensive studies of this problem have not yet been conducted. After checking the transcription frequencies of some Saccharomyces cerevisiae (yeast), genes and their introns, a remarkable phenomenon was discovered that generally the introns of the genes with higher transcription frequencies are longer, and the introns of the genes with lower transcription frequencies are shorter. This suggests that the longer introns of genes with higher transcription frequencies may contain some characteristic sequence structures, which could enhance the transcription of genes. Therefore, two sets of introns of yeast genes were chosen for further study. The transcription frequencies of the first set of genes are higher (>30), and those of the second set of genes are lower (less than or equal to10). Some oligonucleotides are detected by statistically comparative analyses of the occurrence frequencies of oligonucleotides (mainly tetranucleotides and pentanucleotides), whose occurrence frequencies in the first set of introns; are significantly higher than those in the second set of introns, and are also significantly higher than those in the exons flanking the introns of the first set. Some of these extracted oligonucleotides are the same as the regulatory elements of transcription revealed by experimental analyses. Besides, the distributions of these extracted oligonucleotides in the two sets of introns and the exons show that the sequence structures of the first set of introns are favorable for transcription of genes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Background One of the least common types of alternative splicing is the complete retention of an intron in a mature transcript. Intron retention (IR) is believed to be the result of intron, rather than exon, definition associated with failure of the recognition of weak splice sites flanking short introns. Although studies on individual retained introns have been published, few systematic surveys of large amounts of data have been conducted on the mechanisms that lead to IR. Results TTo understand how sequence features are associated with or control IR, and to produce a generalized model that could reveal previously unknown signals that regulate this type of alternative splicing, we partitioned intron retention events observed in human cDNAs into two groups based on the relative abundance of both isoforms and compared relevant features. We found that a higher frequency of IR in human is associated with individual introns that have weaker splice sites, genes with shorter intron lengths, higher expression levels and lower density of both a set of exon splicing silencers (ESSs) and the intronic splicing enhancer GGG. Both groups of retained introns presented events conserved in mouse, in which the retained introns were also short and presented weaker splice sites. Conclusion Although our results confirmed that weaker splice sites are associated with IR, they showed that this feature alone cannot explain a non-negligible fraction of events. Our analysis suggests that cis-regulatory elements are likely to play a crucial role in regulating IR and also reveals previously unknown features that seem to influence its occurrence. These results highlight the importance of considering the interplay among these features in the regulation of the relative frequency of IR.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A short motif termed Plasmodium export element (PEXEL) or vacuolar targeting signal (VTS) characterizes Plasmodium proteins exported into the host cell. These proteins mediate host cell modifications essential for parasite survival and virulence. However, several PEXEL-negative exported proteins indicate that the currently predicted malaria exportome is not complete and it is unknown whether and how these proteins relate to PEXEL-positive export. Here we show that the N-terminal 10 amino acids of the PEXEL-negative exported protein REX2 (ring-exported protein 2) are necessary for its targeting and that a single-point mutation in this region abolishes export. Furthermore we show that the REX2 transmembrane domain is also essential for export and that together with the N-terminal region it is sufficient to promote export of another protein. An N-terminal region and the transmembrane domain of the unrelated PEXEL-negative exported protein SBP1 (skeleton-binding protein 1) can functionally replace the corresponding regions in REX2, suggesting that these sequence features are also present in other PEXEL-negative exported proteins. Similar to PEXEL proteins we find that REX2 is processed, but in contrast, detect no evidence for N-terminal acetylation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results: We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion: The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Ribosome profiling (Ribo-seq), a promising technology for exploring ribosome decoding rates, is characterized by the presence of infrequent high peaks in ribosome footprint density and by long alignment gaps. Here, to reduce the impact of data heterogeneity we introduce a simple normalization method, Ribo-seq Unit Step Transformation (RUST). RUST is robust and outperforms other normalization techniques in the presence of heterogeneous noise. We illustrate how RUST can be used for identifying mRNA sequence features that affect ribosome footprint densities globally. We show that a few parameters extracted with RUST are sufficient for predicting experimental densities with high accuracy. Importantly the application of RUST to 30 publicly available Ribo-seq data sets revealed a substantial variation in sequence determinants of ribosome footprint frequencies, questioning the reliability of Ribo-seq as an accurate representation of local ribosome densities without prior quality control. This emphasizes our incomplete understanding of how protocol parameters affect ribosome footprint densities.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Owing to high evolutionary divergence, it is not always possible to identify distantly related protein domains by sequence search techniques. Intermediate sequences possess sequence features of more than one protein and facilitate detection of remotely related proteins. We have demonstrated recently the employment of Cascade PSI-BLAST where we perform PSI-BLAST for many 'generations', initiating searches from new homologues as well. Such a rigorous propagation through generations of PSI-BLAST employs effectively the role of intermediates in detecting distant similarities between proteins. This approach has been tested on a large number of folds and its performance in detecting superfamily level relationships is similar to 35% better than simple PSI-BLAST searches. We present a web server for this search method that permits users to perform Cascade PSI-BLAST searches against the Pfam, SCOP and SwissProt databases. The URL for this server is http://crick.mbu.iisc.ernet.in/similar to CASCADE/CascadeBlast.html.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Enzymes offer many advantages in industrial processes, such as high specificity, mild treatment conditions and low energy requirements. Therefore, the industry has exploited them in many sectors including food processing. Enzymes can modify food properties by acting on small molecules or on polymers such as carbohydrates or proteins. Crosslinking enzymes such as tyrosinases and sulfhydryl oxidases catalyse the formation of novel covalent bonds between specific residues in proteins and/or peptides, thus forming or modifying the protein network of food. In this study, novel secreted fungal proteins with sequence features typical of tyrosinases and sulfhydryl oxidases were iden-tified through a genome mining study. Representatives of both of these enzyme families were selected for heterologous produc-tion in the filamentous fungus Trichoderma reesei and biochemical characterisation. Firstly, a novel family of putative tyrosinases carrying a shorter sequence than the previously characterised tyrosinases was discovered. These proteins lacked the whole linker and C-terminal domain that possibly play a role in cofactor incorporation, folding or protein activity. One of these proteins, AoCO4 from Aspergillus oryzae, was produced in T. reesei with a production level of about 1.5 g/l. The enzyme AoCO4 was correctly folded and bound the copper cofactors with a type-3 copper centre. However, the enzyme had only a low level of activity with the phenolic substrates tested. Highest activity was obtained with 4-tert-butylcatechol. Since tyrosine was not a substrate for AoCO4, the enzyme was classified as catechol oxidase. Secondly, the genome analysis for secreted proteins with sequence features typical of flavin-dependent sulfhydryl oxidases pinpointed two previously uncharacterised proteins AoSOX1 and AoSOX2 from A. oryzae. These two novel sulfhydryl oxidases were produced in T. reesei with production levels of 70 and 180 mg/l, respectively, in shake flask cultivations. AoSOX1 and AoSOX2 were FAD-dependent enzymes with a dimeric tertiary structure and they both showed activity on small sulfhydryl compounds such as glutathione and dithiothreitol, and were drastically inhibited by zinc sulphate. AoSOX2 showed good stabil-ity to thermal and chemical denaturation, being superior to AoSOX1 in this respect. Thirdly, the suitability of AoSOX1 as a possible baking improver was elucidated. The effect of AoSOX1, alone and in combi-nation with the widely used improver ascorbic acid was tested on yeasted wheat dough, both fresh and frozen, and on fresh water-flour dough. In all cases, AoSOX1 had no effect on the fermentation properties of fresh yeasted dough. AoSOX1 nega-tively affected the fermentation properties of frozen doughs and accelerated the damaging effects of the frozen storage, i.e. giving a softer dough with poorer gas retention abilities than the control. In combination with ascorbic acid, AoSOX1 gave harder doughs. In accordance, rheological studies in yeast-free dough showed that the presence of only AoSOX1 resulted in weaker and more extensible dough whereas a dough with opposite properties was obtained if ascorbic acid was also used. Doughs containing ascorbic acid and increasing amounts of AoSOX1 were harder in a dose-dependent manner. Sulfhydryl oxidase AoSOX1 had an enhancing effect on the dough hardening mechanism of ascorbic acid. This was ascribed mainly to the produc-tion of hydrogen peroxide in the SOX reaction which is able to convert the ascorbic acid to the actual improver dehydroascorbic acid. In addition, AoSOX1 could possibly oxidise the free glutathione in the dough and thus prevent the loss of dough strength caused by the spontaneous reduction of the disulfide bonds constituting the dough protein network. Sulfhydryl oxidase AoSOX1 is therefore able to enhance the action of ascorbic acid in wheat dough and could potentially be applied in wheat dough baking.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Experimental conditions or the presence of interacting components can lead to variations in the structural models of macromolecules. However, the role of these factors in conformational selection is often omitted by in silico methods to extract dynamic information from protein structural models. Structures of small peptides, considered building blocks for larger macromolecular structural models, can substantially differ in the context of a larger protein. This limitation is more evident in the case of modeling large multi-subunit macromolecular complexes using structures of the individual protein components. Here we report an analysis of variations in structural models of proteins with high sequence similarity. These models were analyzed for sequence features of the protein, the role of scaffolding segments including interacting proteins or affinity tags and the chemical components in the experimental conditions. Conformational features in these structural models could be rationalized by conformational selection events, perhaps induced by experimental conditions. This analysis was performed on a non-redundant dataset of protein structures from different SCOP classes. The sequence-conformation correlations that we note here suggest additional features that could be incorporated by in silico methods to extract dynamic information from protein structural models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

State-of-the-art speech recognisers are usually based on hidden Markov models (HMMs). They model a hidden symbol sequence with a Markov process, with the observations independent given that sequence. These assumptions yield efficient algorithms, but limit the power of the model. An alternative model that allows a wide range of features, including word- and phone-level features, is a log-linear model. To handle, for example, word-level variable-length features, the original feature vectors must be segmented into words. Thus, decoding must find the optimal combination of segmentation of the utterance into words and word sequence. Features must therefore be extracted for each possible segment of audio. For many types of features, this becomes slow. In this paper, long-span features are derived from the likelihoods of word HMMs. Derivatives of the log-likelihoods, which break the Markov assumption, are appended. Previously, decoding with this model took cubic time in the length of the sequence, and longer for higher-order derivatives. This paper shows how to decode in quadratic time. © 2013 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Lower member of the lower Ganchaigou Formation in the southwestern of Qaidam Basin is one of the main targeted exploration zones. With the advancement of exploration, the targets are gradually switching into the lithologic reservoirs and it is urgent to gain the more precise research results in distribution of sedimentary facies and sandstones. Guided by the theory of sequence stratigraphy and sedimentology as well as on the basis of many logging data, drillings, seismic data and chemical tests, the paper comprehensively analyzes the sedimentary facies and sandstones in the lower member of lower Ganchaigou Formation in the southern of Chaixi. According to the identification marks of the key interface in sequence stratigraphy, the key interfaces in lower member of lower Ganchaigou Formation in the southwestern of Qaidam Basin are identified as two third-order sequences SQ1、SQ2. By calibrating the synthetic seismogram, the seismic sequence, well drilling and logging sequences are united. Based on the works above, this paper chooses seven primary cross-sections and builds connecting-well stratigraphic correlation of seven main connecting-well sections. Ultimately, the high-resolution sequence stratigraphic frameworks in the lower member of the lower Ganchaigou Formation, which are uniform to logging and seismic data, are figured out. In terms of study on each sequence features, the main style of the base-level cycle overlay which forms the third-order sequence is confirmed. It contains asymmetric “becoming deep upward” style and symmetry style. Researching on the spreading characters of sequence stratigraphy indicates that SQ1 and SQ2 are rather thicker near northwest well Shashen 20 and Shaxin1 while they are quite thiner near Hongcan 1, Yuejin, Qie 4 and Dong8-Wu3, and the thickness of SQ1 is thicker than SQ2.Based on the deep analysis of the marks for depositional facies, it is proposed that the lake facies and braid river deltas facies mainly occurred in study areas. Besides, the sorts of sub-facies and micro-facies model are divided and described. Under the control of high-resolution sequence stratigraphic framework, three source directions from Arlarer Mountain、Qimantage Mountain and Dongchai Mountain are identified by using the features of heavy mineral assemblage and paleogeomorphy. In addition, regularities of distribution sedimentary facies in sequence stratigraphic framework are studied in accordance with research thinking of the "point" (single well) "line" (section) "face" (plane). In the stage of lower member in the lower Ganchaigou Formation in the southwestern of Qaidam Basin, it is at the early phrase of evolution of the lake basin with the gradual outspread and the rise of the lake level. Combined with physical analysis of reservoir sands formed in different sedimentary environment, the paper studies the style of favorable sandstone bodies that are underwater distributary channel of braided rive delta front, coarse sand in mouth bar and the sand body in sand flat of shore-shallow lacustrine facies. Finally, this article comprehensively analyzes the distribution relationship between sedimentary facies and favorable sandstone body and proposes the ideas that sequence SQ1 Yuejin area, well east 8-wu3 area, well qie4-qie1 area and well hongcan2 area are distributed areas of favorable sandstone.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Um dos maiores avanços científicos do século XX foi o desenvolvimento de tecnologia que permite a sequenciação de genomas em larga escala. Contudo, a informação produzida pela sequenciação não explica por si só a sua estrutura primária, evolução e seu funcionamento. Para esse fim novas áreas como a biologia molecular, a genética e a bioinformática são usadas para estudar as diversas propriedades e funcionamento dos genomas. Com este trabalho estamos particularmente interessados em perceber detalhadamente a descodificação do genoma efectuada no ribossoma e extrair as regras gerais através da análise da estrutura primária do genoma, nomeadamente o contexto de codões e a distribuição dos codões. Estas regras estão pouco estudadas e entendidas, não se sabendo se poderão ser obtidas através de estatística e ferramentas bioinfomáticas. Os métodos tradicionais para estudar a distribuição dos codões no genoma e seu contexto não providenciam as ferramentas necessárias para estudar estas propriedades à escala genómica. As tabelas de contagens com as distribuições de codões, assim como métricas absolutas, estão actualmente disponíveis em bases de dados. Diversas aplicações para caracterizar as sequências genéticas estão também disponíveis. No entanto, outros tipos de abordagens a nível estatístico e outros métodos de visualização de informação estavam claramente em falta. No presente trabalho foram desenvolvidos métodos matemáticos e computacionais para a análise do contexto de codões e também para identificar zonas onde as repetições de codões ocorrem. Novas formas de visualização de informação foram também desenvolvidas para permitir a interpretação da informação obtida. As ferramentas estatísticas inseridas no modelo, como o clustering, análise residual, índices de adaptação dos codões revelaram-se importantes para caracterizar as sequências codificantes de alguns genomas. O objectivo final é que a informação obtida permita identificar as regras gerais que governam o contexto de codões em qualquer genoma.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Une stratégie de synthèse efficace de différents composés de type azabicyclo[X.Y.0]alkanone fonctionnalisés a été développée. La stratégie synthétique implique la préparation de dipeptides par couplage avec des motifs vinyl-, allyl-, homoallyl- et homohomoallylglycine suivi d’une réaction de fermeture de cycle par métathèse permettant d’obtenir des lactames macrocycliques de 8, 9 et 10 membres, qui subissent une iodolactamisation transannulaire menant à l’obtention de mimes peptidiques bicycliques portant un groupement iode. Des couplages croisés catalysés par des métaux de transition ont été développés pour la synthèse d’acides aminés ω-insaturés énantiomériquement purs à partir de l’iodoanaline. L’étude du mécanisme suggère que l’iodure subit une attaque du coté le moins stériquement encombré de la lactame macrocyclique insaturée pour mener à l’obtention d’un intermédiaire iodonium. La cyclisation se produit ensuite par une route minimisant les interactions diaxiales et la tension allylique. L’iodolactamisation des différentes lactames macrocycliques insaturées a mené à l’obtention regio- et diastéréosélective d’acides aminés 5,5- et 6,6-iodobicycicliques. De plus, une imidate azabicyclo[4.3.1]alkane pontée de type anti-Bredt fut synthétisée à partir d’une lactame macrocyclique insaturé à neuf membres. Les analyses cristallographiques et spectroscopiques des macrocycles à 8, 9 et 10 membres, du composé iodobicyclique 5,5 ainsi que de l’imidate pontée, montrent bien le potentiel de ces dipeptides rigidifiés de servir en tant que mimes des résidus centraux de tours β de type I, II’, II et VI.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: The study aims to investigate a possible correlation between the main clinical and ophthalmological characteristics, age and Robin sequence in patients with the Stickler syndrome. Introduction: The Stickler syndrome is an autosomal dominant genetic disorder, characterised by ocular, orofacial and skeletal anomalies and/or auditory loss. Patients with Robin sequence features and respiratory complications are frequently diagnosed with the Stickler syndrome. The heterogeneous phenotypic manifestations may present a challenge for early clinical diagnosis. Methods: We performed a retrospective study of the 98 patients with the Stickler syndrome, between November 1995 and June 2009. The data were compared to investigate their ocular alterations and association with the Robin sequence. To be included, patients had to present with the following triad: cleft palate, facial features (hypoplastic midface, micrognathia and prominent eyes) and ocular anomalies (myopia and/or abnormalities of the retina). Results: Fifty-one percent of the patients presenting with Robin sequence features had been diagnosed with the Stickler syndrome. Ocular alterations were found in 50% of the patients. Discussion: The Robin sequence may appear as an isolated condition or associated with other features, or else as part of other known syndromes. Currently, the diagnosis of the Stickler syndrome is based on clinical signs. Affected individuals eventually develop hearing loss, retinal detachment and blindness. The ophthalmological complications associated are usually progressive and can lead to blindness.