34 resultados para Independent Sequence
Resumo:
The goals of the human genome project did not include sequencing of the heterochromatic regions. We describe here an initial sequence of 1.1 Mb of the short arm of human chromosome 21 (HSA21p), estimated to be 10% of 21p. This region contains extensive euchromatic-like sequence and includes on average one transcript every 100 kb. These transcripts show multiple inter- and intrachromosomal copies, and extensive copy number and sequence variability. The sequencing of the "heterochromatic" regions of the human genome is likely to reveal many additional functional elements and provide important evolutionary information.
Resumo:
The construction of metagenomic libraries has permitted the study of microorganisms resistant to isolation and the analysis of 16S rDNA sequences has been used for over two decades to examine bacterial biodiversity. Here, we show that the analysis of random sequence reads (RSRs) instead of 16S is a suitable shortcut to estimate the biodiversity of a bacterial community from metagenomic libraries. We generated 10,010 RSRs from a metagenomic library of microorganisms found in human faecal samples. Then searched them using the program BLASTN against a prokaryotic sequence database to assign a taxon to each RSR. The results were compared with those obtained by screening and analysing the clones containing 16S rDNA sequences in the whole library. We found that the biodiversity observed by RSR analysis is consistent with that obtained by 16S rDNA. We also show that RSRs are suitable to compare the biodiversity between different metagenomic libraries. RSRs can thus provide a good estimate of the biodiversity of a metagenomic library and, as an alternative to 16S, this approach is both faster and cheaper.
Resumo:
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.
Resumo:
Selenocysteine (Sec) is co-translationally inserted into selenoproteins in response to codon UGA with the help of the selenocysteine insertion sequence (SECIS) element. The number of selenoproteins in animals varies, with humans having 25 and mice having 24 selenoproteins. To date, however, only one selenoprotein, thioredoxin reductase, has been detected in Caenorhabditis elegans, and this enzyme contains only one Sec. Here, we characterize the selenoproteomes of C.elegans and Caenorhabditis briggsae with three independent algorithms, one searching for pairs of homologous nematode SECIS elements, another searching for Cys- or Sec-containing homologs of potential nematode selenoprotein genes and the third identifying Sec-containing homologs of annotated nematode proteins. These methods suggest that thioredoxin reductase is the only Sec-containing protein in the C.elegans and C.briggsae genomes. In contrast, we identified additional selenoproteins in other nematodes. Assuming that Sec insertion mechanisms are conserved between nematodes and other eukaryotes, the data suggest that nematode selenoproteomes were reduced during evolution, and that in an extreme reduction case Sec insertion systems probably decode only a single UGA codon in C.elegans and C.briggsae genomes. In addition, all detected genes had a rare form of SECIS element containing a guanosine in place of a conserved adenosine present in most other SECIS structures, suggesting that in organisms with small selenoproteomes SECIS elements may change rapidly.
Resumo:
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
Resumo:
Background: Single nucleotide polymorphisms (SNPs) are the most frequent type of sequence variation between individuals, and represent a promising tool for finding genetic determinants of complex diseases and understanding the differences in drug response. In this regard, it is of particular interest to study the effect of non-synonymous SNPs in the context of biological networks such as cell signalling pathways. UniProt provides curated information about the functional and phenotypic effects of sequence variation, including SNPs, as well as on mutations of protein sequences. However, no strategy has been developed to integrate this information with biological networks, with the ultimate goal of studying the impact of the functional effect of SNPs in the structure and dynamics of biological networks. Results: First, we identified the different challenges posed by the integration of the phenotypic effect of sequence variants and mutations with biological networks. Second, we developed a strategy for the combination of data extracted from public resources, such as UniProt, NCBI dbSNP, Reactome and BioModels. We generated attribute files containing phenotypic and genotypic annotations to the nodes of biological networks, which can be imported into network visualization tools such as Cytoscape. These resources allow the mapping and visualization of mutations and natural variations of human proteins and their phenotypic effect on biological networks (e.g. signalling pathways, protein-protein interaction networks, dynamic models). Finally, an example on the use of the sequence variation data in the dynamics of a network model is presented. Conclusion: In this paper we present a general strategy for the integration of pathway and sequence variation data for visualization, analysis and modelling purposes, including the study of the functional impact of protein sequence variations on the dynamics of signalling pathways. This is of particular interest when the SNP or mutation is known to be associated to disease. We expect that this approach will help in the study of the functional impact of disease-associated SNPs on the behaviour of cell signalling pathways, which ultimately will lead to a better understanding of the mechanisms underlying complex diseases.
Resumo:
Background: A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners. Results: The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST. Conclusion: Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.
Resumo:
Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.
Resumo:
In todays competitive markets, the importance of goodscheduling strategies in manufacturing companies lead to theneed of developing efficient methods to solve complexscheduling problems.In this paper, we studied two production scheduling problemswith sequence-dependent setups times. The setup times areone of the most common complications in scheduling problems,and are usually associated with cleaning operations andchanging tools and shapes in machines.The first problem considered is a single-machine schedulingwith release dates, sequence-dependent setup times anddelivery times. The performance measure is the maximumlateness.The second problem is a job-shop scheduling problem withsequence-dependent setup times where the objective is tominimize the makespan.We present several priority dispatching rules for bothproblems, followed by a study of their performance. Finally,conclusions and directions of future research are presented.
Resumo:
Given $n$ independent replicates of a jointly distributed pair $(X,Y)\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk is small. We investigate the possibility of empirically assessingthe {\em complexity} of each model class, that is, the actual difficulty of the estimation problem within each class. The estimated complexities are in turn used to define an adaptive model selection procedure, which is based on complexity penalized empirical risk.The available data are divided into two parts. The first is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error. An estimate is chosen from the list of candidates in order to minimize the sum of class complexity and empirical risk. A distinguishing feature of the approach is that the complexity of each model class is assessed empirically, based on the size of its empirical cover.Finite sample performance bounds are established for the estimates, and these bounds are applied to several non-parametric estimation problems. The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand. In addition, it is shown that the estimate can be consistent,and even possess near optimal rates of convergence, when each model class has an infinite VC or pseudo dimension.For regression estimation with squared loss we modify our estimate to achieve a faster rate of convergence.
Resumo:
Descripció de la seqüència estratigràfica i dels registres paleoambientals dels sediments holocens de Sant Julià de Boada
Resumo:
En els darrers mesos el sentiment sobiranista entre la població catalana ha experimentat un gran creixement. Actualment, és un dels temes centrals en l'actualitat política a la premsa catalana i espanyola. És per aquest motiu que en aquest treball analitzarem la viabilitat econòmica d'un hipotètic Estat català. Comencem comparant les dades macroeconòmiques de Catalunya amb la resta de països de la Unió Europea, on veurem com alguns països amb característiques similars a les de Catalunya són els països europeus amb un nivell més alt de desenvolupament. L'eix central del debat sobiranista en l'aspecte econòmic es fonamenta en la balança fiscal, d'una banda, i en la balança comercial, d'una altra banda. La balança fiscal de Catalunya amb l'Estat espanyol presenta un dèficit crònic des de fa bastants anys. Pel que fa a la balança comercial, les exportacions catalanes a l'estranger estan guanyant cada cop més pes vers les exportacions catalanes a la resta de l'Estat espanyol, per la qual cosa la dependència de l'Estat espanyol per part de les empreses catalanes cada cop va minvant més. La darrera polèmica la trobem en la inversió pública de l'Estat espanyol a Catalunya. Els catalans i catalanes aporten una sèrie d'impostos a l'Estat però no reben la part que els pertoca en inversions en infraestructures i en despesa pública, fet que posa en perill el futur de Catalunya.
Resumo:
We present Stroemgren uvby and Hbeta_ photometry for a set of 575 northern main sequence A type stars, most of them belonging to the Hipparcos Input Catalogue, with V from 5mag to 10mag and with known radial velocities. These observations enlarge the catalogue we began to compile some years ago to more than 1500 stars. Our catalogue includes kinematic and astrophysical data for each star. Our future goal is to perform an accurate analysis of the kinematical behaviour of these stars in the solar neighbourhood.
Resumo:
Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera1 and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium2, and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.
Resumo:
Within an effective field theory framework, we obtain an expression, with O(1/m2) accuracy, for the energies of the gluonic excitations between heavy quarks, which holds beyond perturbation theory. For the singlet heavy-quarkantiquark energy, in particular, we also obtain an expression in terms of Wilson loops. This provides, twenty years after the seminal work of Eichten and Feinberg, the first complete expression for the heavy quarkonium potential up to O(1/m2) for pure gluodynamics. Several errors present in the previous literature (also in the work of Eichten and Feinberg) have been corrected. We also briefly discuss the power counting of NRQCD in the nonperturbative regime.