895 resultados para sequence data mining
Resumo:
One of the challenges of tumour immunology remains the identification of strongly immunogenic tumour antigens for vaccination. Reverse immunology, that is, the procedure to predict and identify immunogenic peptides from the sequence of a gene product of interest, has been postulated to be a particularly efficient, high-throughput approach for tumour antigen discovery. Over one decade after this concept was born, we discuss the reverse immunology approach in terms of costs and efficacy: data mining with bioinformatic algorithms, molecular methods to identify tumour-specific transcripts, prediction and determination of proteasomal cleavage sites, peptide-binding prediction to HLA molecules and experimental validation, assessment of the in vitro and in vivo immunogenic potential of selected peptide antigens, isolation of specific cytolytic T lymphocyte clones and final validation in functional assays of tumour cell recognition. We conclude that the overall low sensitivity and yield of every prediction step often requires a compensatory up-scaling of the initial number of candidate sequences to be screened, rendering reverse immunology an unexpectedly complex approach.
Resumo:
Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.
Resumo:
The bacterial insertion sequence IS21 shares with many insertion sequences a two-step, reactive junction transposition pathway, for which a model is presented in this review: a reactive junction with abutted inverted repeats is first formed and subsequently integrated into the target DNA. The reactive junction occurs in IS21-IS21 tandems and IS21 minicircles. In addition, IS21 shows a unique specialization of transposition functions. By alternative translation initiation, the transposase gene codes for two products: the transposase, capable of promoting both steps of the reactive junction pathway, and the cointegrase, which only promotes the integration of reactive junctions but with higher efficiency. This review also includes a survey of the IS21 family and speculates on the possibility that other members present a similar transpositional specialization.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
The sequence profile method (Gribskov M, McLachlan AD, Eisenberg D, 1987, Proc Natl Acad Sci USA 84:4355-4358) is a powerful tool to detect distant relationships between amino acid sequences. A profile is a table of position-specific scores and gap penalties, providing a generalized description of a protein motif, which can be used for sequence alignments and database searches instead of an individual sequence. A sequence profile is derived from a multiple sequence alignment. We have found 2 ways to improve the sensitivity of sequence profiles: (1) Sequence weights: Usage of individual weights for each sequence avoids bias toward closely related sequences. These weights are automatically assigned based on the distance of the sequences using a published procedure (Sibbald PR, Argos P, 1990, J Mol Biol 216:813-818). (2) Amino acid substitution table: In addition to the alignment, the construction of a profile also needs an amino acid substitution table. We have found that in some cases a new table, the BLOSUM45 table (Henikoff S, Henikoff JG, 1992, Proc Natl Acad Sci USA 89:10915-10919), is more sensitive than the original Dayhoff table or the modified Dayhoff table used in the current implementation. Profiles derived by the improved method are more sensitive and selective in a number of cases where previous methods have failed to completely separate true members from false positives.
Resumo:
While the influence of HLA-AB and -DRB1 matching on the outcome of bone marrow transplantation (BMT) with unrelated donors is clear, the evaluation of HLA-C has been hampered by its poor serological definition. Because the low resolution of standard HLA-C typing could explain the significant number of positive cytotoxic T lymphocyte precursor frequency (CTLpf) tests found among HLA-AB-subtype, DRB1/B3/B5-subtype matched patient/donor pairs, we have identified by sequencing the incompatibilities recognized by CD8+ CTL clones obtained from such positive CTLpf tests. In most cases the target molecules were HLA-C antigens that had escaped detection by serology (e.g. Cw*1601, 1502 or 0702). Direct recognition of HLA-C by a CTL clone was demonstrated by lysis of the HLA class I-negative 721.221 cell line transfected with Cw*1601 cDNA. Because of the functional importance of Cw polymorphism, a PCR-SSO oligotyping procedure was set up allowing the resolution of 29 Cw alleles. Oligotyping of a panel of 382 individuals (including 101 patients and their 272 potential unrelated donors, 5 related donors and 4 platelet donors) allowed to determine HLA-C and HLA A-B-Cw-DRB1 allelic frequencies, as well as a number of A-Cw, B-Cw, and DRB1-Cw associations. Two new HLA-Cw alleles (Cw*02023 and Cw*0707) were identified by DNA sequencing of PCR-amplified exon 2-intron 2-exon 3 amplicons. Furthermore, we determined the degree of HLA-C compatibility in 287 matched pairs that could be formed from 73 patients and their 184 potential unrelated donors compatible for HLA-AB by serology and for HLA-DRB1/ B3/B5 by oligotyping. Cw mismatches were identified in 42.1% of these pairs, and AB-subtype oligotyping showed that 30% of these Cw-incompatible pairs were also mismatched for A or B-locus subtype. The degree of HLA-C incompatibility was strongly influenced by the linkage with B alleles and by the ABDR haplotypes. Cw alleles linked with B*4403, B*5101, B18, and B62 haplotypes were frequently mismatched. Apparently high resolution DNA typing for HLA-AB does not result in full matching at locus C. Since HLA-C polymorphism is recognized by alloreactive CTLs, such incompatibilities might be as relevant as AB-subtype mismatches in clinical transplantation.
Resumo:
The amino acid sequence of mouse brain beta spectrin (beta fodrin), deduced from the nucleotide sequence of complementary DNA clones, reveals that this non-erythroid beta spectrin comprises 2363 residues, with a molecular weight of 274,449 Da. Brain beta spectrin contains three structural domains and we suggest the position of several functional domains including f-actin, synapsin I, ankyrin and spectrin self association sites. Analysis of deduced amino acid sequences indicated striking homology and similar structural characteristics of brain beta spectrin repeats beta 11 and beta 12 to globins. In vitro analysis has demonstrated that heme is capable of specific attachment to brain spectrin, suggesting possible new functions in electron transfer, oxygen binding, nitric oxide binding or heme scavenging.
Resumo:
During the last 2 years, several novel genes that encode glucose transporter-like proteins have been identified and characterized. Because of their sequence similarity with GLUT1, these genes appear to belong to the family of solute carriers 2A (SLC2A, protein symbol GLUT). Sequence comparisons of all 13 family members allow the definition of characteristic sugar/polyol transporter signatures: (1) the presence of 12 membrane-spanning helices, (2) seven conserved glycine residues in the helices, (3) several basic and acidic residues at the intracellular surface of the proteins, (4) two conserved tryptophan residues, and (5) two conserved tyrosine residues. On the basis of sequence similarities and characteristic elements, the extended GLUT family can be divided into three subfamilies, namely class I (the previously known glucose transporters GLUT1-4), class II (the previously known fructose transporter GLUT5, the GLUT7, GLUT9 and GLUT11), and class III (GLUT6, 8, 10, 12, and the myo-inositol transporter HMIT1). Functional characteristics have been reported for some of the novel GLUTs. Like GLUT1-4, they exhibit a tissue/cell-specific expression (GLUT6, leukocytes, brain; GLUT8, testis, blastocysts, brain, muscle, adipocytes; GLUT9, liver, kidney; GLUT10, liver, pancreas; GLUT11, heart, skeletal muscle). GLUT6 and GLUT8 appear to be regulated by sub-cellular redistribution, because they are targeted to intra-cellular compartments by dileucine motifs in a dynamin dependent manner. Sugar transport has been reported for GLUT6, 8, and 11; HMIT1 has been shown to be a H+/myo-inositol co-transporter. Thus, the members of the extended GLUT family exhibit a surprisingly diverse substrate specificity, and the definition of sequence elements determining this substrate specificity will require a full functional characterization of all members.
Resumo:
The biological properties of wild-type A75/17 and cell culture-adapted Onderstepoort canine distemper virus differ markedly. To learn more about the molecular basis for these differences, we have isolated and sequenced the protein-coding regions of the attachment and fusion proteins of wild-type canine distemper virus strain A75/17. In the attachment protein, a total of 57 amino acid differences were observed between the Onderstepoort strain and strain A75/17, and these were distributed evenly over the entire protein. Interestingly, the attachment protein of strain A75/17 contained an extension of three amino acids at the C terminus. Expression studies showed that the attachment protein of strain A75/17 had a higher apparent molecular mass than the attachment protein of the Onderstepoort strain, in both the presence and absence of tunicamycin. In the fusion protein, 60 amino acid differences were observed between the two strains, of which 44 were clustered in the much smaller F2 portion of the molecule. Significantly, the AUG that has been proposed as a translation initiation codon in the Onderstepoort strain is an AUA codon in strain A75/17. Detailed mutation analyses showed that both the first and second AUGs of strain A75/17 are the major translation initiation sites of the fusion protein. Similar analyses demonstrated that, also in the Onderstepoort strain, the first two AUGs are the translation initiation codons which contribute most to the generation of precursor molecules yielding the mature form of the fusion protein.
Resumo:
A novel member of the tumor necrosis factor (TNF) receptor family, designated TRAMP, has been identified. The structural organization of the 393 amino acid long human TRAMP is most homologous to TNF receptor 1. TRAMP is abundantly expressed on thymocytes and lymphocytes. Its extracellular domain is composed of four cysteine-rich domains, and the cytoplasmic region contains a death domain known to signal apoptosis. Overexpression of TRAMP leads to two major responses, NF-kappaB activation and apoptosis. TRAMP-induced cell death is inhibited by an inhibitor of ICE-like proteases, but not by Bcl-2. In addition, TRAMP does not appear to interact with any of the known apoptosis-inducing ligands of the TNF family.
Resumo:
Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.
Resumo:
Many species contain genetic lineages that are phylogenetically intermixed with those of other species. In the Sorex araneus group, previous results based on mtDNA and Y chromosome sequence data showed an incongruent position of Sorex granarius within this group. In this study, we explored the relationship between species within the S. araneus group, aiming to resolve the particular position of S. granarius. In this context, we sequenced a total of 2447 base pairs (bp) of X-linked and nuclear genes from 47 individuals of the S. araneus group. The same taxa were also analyzed within a Bayesian framework with nine autosomal microsatellites. These analyses revealed that all markers apart from mtDNA showed similar patterns, suggesting that the problematic position of S. granarius is best explained by an incongruent behavior by mtDNA. Given their close phylogenetic relationship and their close geographic distribution, the most likely explanation for this pattern is past mtDNA introgression from S. araneus race Carlit to S. granarius.
Resumo:
The malic enzyme (ME) gene is a target for both thyroid hormone receptors and peroxisome proliferator-activated receptors (PPAR). Within the ME promoter, two direct repeat (DR)-1-like elements, MEp and MEd, have been identified as putative PPAR response elements (PPRE). We demonstrate that only MEp and not MEd is able to bind PPAR/retinoid X receptor (RXR) heterodimers and mediate peroxisome proliferator signaling. Taking advantage of the close sequence resemblance of MEp and MEd, we have identified crucial determinants of a PPRE. Using reciprocal mutation analyses of these two elements, we show the preference for adenine as the spacing nucleotide between the two half-sites of the PPRE and demonstrate the importance of the two first bases flanking the core DR1 in 5'. This latter feature of the PPRE lead us to consider the polarity of the PPAR/RXR heterodimer bound to its cognate element. We demonstrate that, in contrast to the polarity of RXR/TR and RXR/RAR bound to DR4 and DR5 elements respectively, PPAR binds to the 5' extended half-site of the response element, while RXR occupies the 3' half-site. Consistent with this polarity is our finding that formation and binding of the PPAR/RXR heterodimer requires an intact hinge T region in RXR while its integrity is not required for binding of the RXR/TR heterodimer to a DR4.