894 resultados para SEQUENCE DATABASES
Resumo:
This book gives a general view of sequence analysis, the statistical study of successions of states or events. It includes innovative contributions on life course studies, transitions into and out of employment, contemporaneous and historical careers, and political trajectories. The approach presented in this book is now central to the life-course perspective and the study of social processes more generally. This volume promotes the dialogue between approaches to sequence analysis that developed separately, within traditions contrasted in space and disciplines. It includes the latest developments in sequential concepts, coding, atypical datasets and time patterns, optimal matching and alternative algorithms, survey optimization, and visualization. Field studies include original sequential material related to parenting in 19th-century Belgium, higher education and work in Finland and Italy, family formation before and after German reunification, French Jews persecuted in occupied France, long-term trends in electoral participation, and regime democratization. Overall the book reassesses the classical uses of sequences and it promotes new ways of collecting, formatting, representing and processing them. The introduction provides basic sequential concepts and tools, as well as a history of the method. Chapters are presented in a way that is both accessible to the beginner and informative to the expert.
Resumo:
We improved, evaluated, and used Sanger sequencing for quantification of single nucleotide polymorphism (SNP) variants in transcripts and gDNA samples. This improved assay resulted in highly reproducible relative allele frequencies (e.g., for a heterozygous gDNA 50.0+/-1.4%, and for a missense mutation-bearing transcript 46.9+/-3.7%) with a lower detection limit of 3-9%. It provided excellent accuracy and linear correlation between expected and observed relative allele frequencies. This sequencing assay, which can also be used for the quantification of copy number variations (CNVs), methylations, mosaicisms, and DNA pools, enabled us to analyze transcripts of the FBN1 gene in fibroblasts and blood samples of patients with suspected Marfan syndrome not only qualitatively but also quantitatively. We report a total of 18 novel and 19 known FBN1 sequence variants leading to a premature termination codon (PTC), 26 of which we analyzed by quantitative sequencing both at gDNA and cDNA levels. The relative amounts of PTC-containing FBN1 transcripts in fresh and PAXgene-stabilized blood samples were significantly higher (33.0+/-3.9% to 80.0+/-7.2%) than those detected in affected fibroblasts with inhibition of nonsense-mediated mRNA decay (NMD) (11.0+/-2.1% to 25.0+/-1.8%), whereas in fibroblasts without NMD inhibition no mutant alleles could be detected. These results provide evidence for incomplete NMD in leukocytes and have particular importance for RNA-based analyses not only in FBN1 but also in other genes.
Resumo:
Starting from a biologically active recombinant DNA clone of exogenous unintegrated GR mouse mammary tumor virus, we have generated three subclones of PstI fragments of 1.45, 1.1, and 2.0 kb in the plasmid vector PBR322. The nucleotide sequence has been determined for the clone of 1.45 kb which includes almost the complete region of the long terminal repeat (LTR) plus an adjacent stretch of unique sequence DNA. A short region of the 2.0 kb clone, containing the beginning of the LTR, has also been sequenced. Starting with the A of an initiation codon outside the LTR, we detected an open reading frame of 960 nucleotides, potentially coding for a protein of 320 amino acids (36K). Two hundred nucleotides downstream from the termination codon, and approximately 25 nucleotides upstream from the presumptive initiation site of viral RNA synthesis, we found a promoter-like sequence. The sequence AGTAAA was detected approximately 15-20 nucleotides upstream from the 3' end of virion RNA and probably serves as a polyadenylation signal. The 1.45 kb PstI fragment has been transfected into Ltk- cells together with a plasmid containing the thymidine kinase gene of herpes simplex virus. The virus-specific RNA synthesis detected in a Tk+ cell clone was strongly stimulated by the addition of dexamethasone.
Resumo:
Methicillin-resistant Staphylococcus aureus (MRSA) is a major cause of nosocomial infections worldwide. To differentiate reliably among S. aureus isolates, we recently developed double locus sequence typing (DLST) based on the analysis of partial sequences of clfB and spa genes. In the present study, we evaluated the usefulness of DLST for epidemiological investigations of MRSA by routinely typing 1242 strains isolated in Western Switzerland. Additionally, particular local and international collections were typed by pulsed field gel electrophoresis (PFGE) and DLST to check the compatibility of DLST with the results obtained by PFGE, and for international comparisons. Using DLST, we identified the major MRSA clones of Western Switzerland, and demonstrated the close relationship between local and international clones. The congruence of 88% between the major PFGE and DLST clones indicated that our results obtained by DLST were compatible with earlier results obtained by PFGE. DLST could thus easily be incorporated in a routine surveillance procedure. In addition, the unambiguous definition of DLST types makes this method more suitable than PFGE for long-term epidemiological surveillance. Finally, the comparison of the results obtained by DLST, multilocus sequence typing, PFGE, Staphylococcal cassette chromosome mec typing and the detection of Panton-Valentine leukocidin genes indicated that no typing scheme should be used on its own. It is only the combination of data from different methods that gives the best chance of describing precisely the epidemiology and phylogeny of MRSA.
Resumo:
Three-dimensional sequence stratigraphy is a potent exploration and development tool for the discovery of subtle stratigraphic traps. Reservoir morphology, heterogeneity and subtle stratigraphic trapping mechanisms can be better understood through systematic horizontal identification of sedimentary facies of systems tracts provided by three-dimensional attribute maps used as an important complement to the sequential analysis on the two-dimensional seismic lines and the well log data. On new prospects as well as on already-producing fields, the additional input of sequential analysis on three-dimensional data enables the identification, location and precise delimitation of new potentially productive zones. The first part of this paper presents four typical horizontal seismic facies assigned to the successive systems tracts of a third- or fourth-order sequence deposited in inner to outer neritic conditions on a elastic shelf. The construction of this synthetic representative sequence is based on the observed reproducibility of the horizontal seismic facies response to cyclic eustatic events on more than 35 sequences registered in the Gulf coast Plio-Pleistocene and Late Miocene, offshore Louisiana in the West Cameron region of the Gulf of Mexico. The second part shows how three-dimensional sequence stratigraphy can contribute in localizing and understanding sedimentary facies associated with productive zones. A case study in the early Middle Miocene Cibicides opima sands shows multiple stacked gas accumulations in the top slope fan, prograding wedge and basal transgressive systems tract of the third-order sequence between SB15.5 and SB 13.8 Ma.
Resumo:
The data indispensable for carrying out the comprehensive, multi-faceted process of medical technology assessment (MTA) should be collected from a variety of sources. The authors distinguish between type "A" general data, useful for assessment but collected without this specific aim, and type "B" data. Registries of health care procedures or of diseases, as well as clinical data bases are quoted as examples of type "B" data, specifically relating to MTA. Since demographic methods are of importance for the evaluation of long-term effects of medical technologies, examples of sources of type "A" data are presented. Their significance for health policy making is discussed.
Resumo:
The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Resumo:
Background: The variety of DNA microarray formats and datasets presently available offers an unprecedented opportunity to perform insightful comparisons of heterogeneous data. Cross-species studies, in particular, have the power of identifying conserved, functionally important molecular processes. Validation of discoveries can now often be performed in readily available public data which frequently requires cross-platform studies.Cross-platform and cross-species analyses require matching probes on different microarray formats. This can be achieved using the information in microarray annotations and additional molecular biology databases, such as orthology databases. Although annotations and other biological information are stored using modern database models ( e. g. relational), they are very often distributed and shared as tables in text files, i.e. flat file databases. This common flat database format thus provides a simple and robust solution to flexibly integrate various sources of information and a basis for the combined analysis of heterogeneous gene expression profiles.Results: We provide annotationTools, a Bioconductor-compliant R package to annotate microarray experiments and integrate heterogeneous gene expression profiles using annotation and other molecular biology information available as flat file databases. First, annotationTools contains a specialized set of functions for mining this widely used database format in a systematic manner. It thus offers a straightforward solution for annotating microarray experiments. Second, building on these basic functions and relying on the combination of information from several databases, it provides tools to easily perform cross-species analyses of gene expression data.Here, we present two example applications of annotationTools that are of direct relevance for the analysis of heterogeneous gene expression profiles, namely a cross-platform mapping of probes and a cross-species mapping of orthologous probes using different orthology databases. We also show how to perform an explorative comparison of disease-related transcriptional changes in human patients and in a genetic mouse model.Conclusion: The R package annotationTools provides a simple solution to handle microarray annotation and orthology tables, as well as other flat molecular biology databases. Thereby, it allows easy integration and analysis of heterogeneous microarray experiments across different technological platforms or species.
Resumo:
SUMMARYIn order to increase drug safety we must better understand how medication interacts with the body of our patients and this knowledge should be made easily available for the clinicians prescribing the medication. This thesis contributes to how the knowledge of some drug properties can increase and how to make information readily accessible for the medical professionals. Furthermore it investigates the use of Therapeutic drug monitoring, drug interaction databases and pharmacogenetic tests in pharmacovigilance.Two pharmacogenetic studies in the naturalistic setting of psychiatric in-patients clinics have been performed; one with the antidepressant mirtazapine, the other with the antipsychotic clozapine. Forty-five depressed patients have been treated with mirtazapine and were followed for 8 weeks. The therapeutic effect was as seen in other previous studies. Enantioselective analyses could confirm an influence of age, gender and smoking in the pharmacokinetics of mirtazapine; it showed a significant influence of the CYP2D6 genotype on the antidepressant effective S-enantiomer, and for the first time an influence of the CYP2B6 genotype on the plasma concentrations of the 8-OH metabolite was found. The CYP2B6*/*6 genotype was associated to better treatment response. A detailed hypothesis of the metabolic pathways of mirtazapine is proposed. In the second pharmacogenetic study, analyses of 75 schizophrenic patients treated with clozapine showed the influence of CYP450 and ABCB1 genotypes on its pharmacokinetics. For the first time we could demonstrate an in vivo effect of the CYP2C19 genotype and an influence of P-glycoprotein on the plasma concentrations of clozapine. Further we confirmed in vivo the prominent role of CYP1A2 in the metabolism of clozapine.Identifying risk factors for the occurrence of serious adverse drug reactions (SADR) would allow a more individualized and safer drug therapy. SADR are rare events and therefore difficult to study. We tested the feasibility of a nested matched case-control study to examine the influence of high drug plasma levels and CYP2D6 genotypes on the risk to experience an SADR. In our sample we compared 62 SADR cases with 82 controls; both groups were psychiatric patients from the in-patient clinic Königsfelden. Drug plasma levels of >120% of the upper recommended references could be identified as a risk factor with a statistically significant odds ratio of 3.5, a similar trend could be seen for CYP2D6 poor metaboliser. Although a matched case-control design seems a valid method, 100% matching is not easy to perform in a relative small cohort of one in-patient clinic. However, a nested case-control study is feasible.On the base of the experience gained in the AMSP+ study and the fact that we have today only sparse data indicating that routine drug plasma concentration monitoring and/or pharmacogenetic testing in psychiatry are justified to minimize the risk for ADR, we developed a test algorithm named "TDM plus" (TDM plus interaction checks plus pharmacogenetic testing).Pharmacovigilance programs such as the AMSP project (AMSP = Arzneimittelsicherheit in der Psychiatrie) survey psychiatric in-patients in order to collect SADR and to detect new safety signals. Case reports of such SADR are, although anecdotal, valuable to illustrate rare clinical events and sometimes confirm theoretical assumptions of e.g. drug interactions. Seven pharmacovigilance case reports are summarized in this thesis.To provide clinicians with meaningful information on the risk of drug combinations, during the course of this thesis the internet based drug interaction program mediQ.ch (in German) has been developed. Risk estimation is based on published clinical and pharmacological information of single drugs and alimentary products, including adverse drug reaction profiles. Information on risk factors such as renal and hepatic insufficiency and specific genotypes are given. More than 20'000 drug pairs have been described in detail. Over 2000 substances with their metabolic and transport pathways are included and all information is referenced with links to the published scientific literature or other information sources. Medical professionals of more than 100 hospitals and 300 individual practitioners do consult mediQ.ch regularly. Validations with comparisons to other drug interaction programs show good results.Finally, therapeutic drug monitoring, drug interaction programs and pharmacogenetic tests are helpful tools in pharmacovigilance and should, in absence of sufficient routine tests supporting data, be used as proposed in our TDM plus algorithm.RESUMEPour améliorer la sécurité d'emploi des médicaments il est important de mieux comprendre leurs interactions dans le corps des patients. Ensuite le clinicien qui prescrit une pharmacothérapie doit avoir un accès simple à ces informations. Entre autres, cette thèse contribue à mieux connaître les caractéristiques pharmacocinétiques de deux médicaments. Elle examine aussi l'utilisation de trois outils en pharmacovigilance : le monitorage thérapeutique des taux plasmatiques des médicaments (« therapeutic drug monitoring »), un programme informatisé d'estimation du risque de combinaisons médicamenteuses, et enfin des tests pharmacogénétiques.Deux études cliniques pharmacogénétiques ont été conduites dans le cadre habituel de clinique psychiatrique : l'une avec la mirtazapine (antidépresseur), l'autre avec la clozapine (antipsychotique). On a traité 45 patients dépressifs avec de la mirtazapine pendant 8 semaines. L'effet thérapeutique était semblable à celui des études précédentes. Nous avons confirmé l'influence de l'âge et du sexe sur la pharmacocinétique de la mirtazapine et la différence dans les concentrations plasmatiques entre fumeurs et non-fumeurs. Au moyen d'analyses énantiomères sélectives, nous avons pu montrer une influence significative du génotype CYP2D6 sur l'énantiomère S+, principalement responsable de l'effet antidépresseur. Pour la première fois, nous avons trouvé une influence du génotype CYP2B6 sur les taux plasmatiques de la 8-OH-mirtazapine. Par ailleurs, le génotype CYP2B6*6/*6 était associé à une meilleure réponse thérapeutique. Une hypothèse sur les voies métaboliques détaillées de la mirtazapine est proposée. Dans la deuxième étude, 75 patients schizophrènes traités avec de la clozapine ont été examinés pour étudier l'influence des génotypes des iso-enzymes CYP450 et de la protéine de transport ABCB1 sur la pharmacocinétique de cet antipsychotique. Pour la première fois, on a montré in vivo un effet des génotypes CYP2C19 et ABCB1 sur les taux plasmatiques de la clozapine. L'importance du CYP1A2 dans le métabolisme de la clozapine a été confirmée.L'identification de facteurs de risques dans la survenue d'effets secondaire graves permettrait une thérapie plus individualisée et plus sûre. Les effets secondaires graves sont rares. Dans une étude de faisabilité (« nested matched case-control design » = étude avec appariement) nous avons comparé des patients avec effets secondaires graves à des patients-contrôles prenant le même type de médicaments mais sans effets secondaires graves. Des taux plasmatiques supérieurs à 120% de la valeur de référence haute sont associés à un risque avec « odds ratio » significatif de 3.5. Une tendance similaire est apparue pour le génotype du CYP2D6. Le « nested matched case-control design » semble une méthode valide qui présente cependant une difficulté : trouver des patients-contrôles dans le cadre d'une seule clinique psychiatrique. Par contre la conduite d'une « nested case-control study » sans appariement est recommandable.Sur la base de notre expérience de l'étude AMSP+ et le fait que nous disposons que de peux de données justifiant des monitorings de taux plasmatiques et/ou de tests pharmacogénétiques de routine, nous avons développé un test algorithme nommé « TDMplus » (TDM + vérification d'interactions médicamenteuses + tests pharmacogénétique).Des programmes de pharmacovigilances comme celui de l'AMSP (Arzneimittelsicherheit in der Psychiatrie = pharmacovigilance en psychiatrie) collectent les effets secondaires graves chez les patients psychiatriques hospitalisés pour identifier des signaux d'alertes. La publication de certains de ces cas même anecdotiques est précieuse. Elle décrit des événements rares et quelques fois une hypothèse sur le potentiel d'une interaction médicamenteuse peut ainsi être confirmée. Sept publications de cas sont résumées ici.Dans le cadre de cette thèse, on a développé un programme informatisé sur internet (en allemand) - mediQ.ch - pour estimer le potentiel de risques d'une interaction médicamenteuse afin d'offrir en ligne ces informations utiles aux cliniciens. Les estimations de risques sont fondées sur des informations cliniques (y compris les profils d'effets secondaires) et pharmacologiques pour chaque médicament ou substance combinés. Le programme donne aussi des informations sur les facteurs de risques comme l'insuffisance rénale et hépatique et certains génotypes. Actuellement il décrit en détail les interactions potentielles de plus de 20'000 paires de médicaments, et celles de 2000 substances actives avec leurs voies de métabolisation et de transport. Chaque information mentionne sa source d'origine; un lien hypertexte permet d'y accéder. Le programme mediQ.ch est régulièrement consulté par les cliniciens de 100 hôpitaux et par 300 praticiens indépendants. Les premières validations et comparaisons avec d'autres programmes sur les interactions médicamenteuses montrent de bons résultats.En conclusion : le monitorage thérapeutique des médicaments, les programmes informatisés contenant l'information sur le potentiel d'interaction médicamenteuse et les tests pharmacogénétiques sont de précieux outils en pharmacovigilance. Nous proposons de les utiliser en respectant l'algorithme « TDM plus » que nous avons développé.
Resumo:
We propose and validate a multivariate classification algorithm for characterizing changes in human intracranial electroencephalographic data (iEEG) after learning motor sequences. The algorithm is based on a Hidden Markov Model (HMM) that captures spatio-temporal properties of the iEEG at the level of single trials. Continuous intracranial iEEG was acquired during two sessions (one before and one after a night of sleep) in two patients with depth electrodes implanted in several brain areas. They performed a visuomotor sequence (serial reaction time task, SRTT) using the fingers of their non-dominant hand. Our results show that the decoding algorithm correctly classified single iEEG trials from the trained sequence as belonging to either the initial training phase (day 1, before sleep) or a later consolidated phase (day 2, after sleep), whereas it failed to do so for trials belonging to a control condition (pseudo-random sequence). Accurate single-trial classification was achieved by taking advantage of the distributed pattern of neural activity. However, across all the contacts the hippocampus contributed most significantly to the classification accuracy for both patients, and one fronto-striatal contact for one patient. Together, these human intracranial findings demonstrate that a multivariate decoding approach can detect learning-related changes at the level of single-trial iEEG. Because it allows an unbiased identification of brain sites contributing to a behavioral effect (or experimental condition) at the level of single subject, this approach could be usefully applied to assess the neural correlates of other complex cognitive functions in patients implanted with multiple electrodes.
Resumo:
The goals of the human genome project did not include sequencing of the heterochromatic regions. We describe here an initial sequence of 1.1 Mb of the short arm of human chromosome 21 (HSA21p), estimated to be 10% of 21p. This region contains extensive euchromatic-like sequence and includes on average one transcript every 100 kb. These transcripts show multiple inter- and intrachromosomal copies, and extensive copy number and sequence variability. The sequencing of the "heterochromatic" regions of the human genome is likely to reveal many additional functional elements and provide important evolutionary information.
Resumo:
Understanding the molecular mechanisms responsible for the regulation of the transcriptome present in eukaryotic cells isone of the most challenging tasks in the postgenomic era. In this regard, alternative splicing (AS) is a key phenomenoncontributing to the production of different mature transcripts from the same primary RNA sequence. As a plethora ofdifferent transcript forms is available in databases, a first step to uncover the biology that drives AS is to identify thedifferent types of reflected splicing variation. In this work, we present a general definition of the AS event along with anotation system that involves the relative positions of the splice sites. This nomenclature univocally and dynamically assignsa specific ‘‘AS code’’ to every possible pattern of splicing variation. On the basis of this definition and the correspondingcodes, we have developed a computational tool (AStalavista) that automatically characterizes the complete landscape of ASevents in a given transcript annotation of a genome, thus providing a platform to investigate the transcriptome diversityacross genes, chromosomes, and species. Our analysis reveals that a substantial part—in human more than a quarter—ofthe observed splicing variations are ignored in common classification pipelines. We have used AStalavista to investigate andto compare the AS landscape of different reference annotation sets in human and in other metazoan species and found thatproportions of AS events change substantially depending on the annotation protocol, species-specific attributes, andcoding constraints acting on the transcripts. The AStalavista system therefore provides a general framework to conductspecific studies investigating the occurrence, impact, and regulation of AS.
Resumo:
The construction of metagenomic libraries has permitted the study of microorganisms resistant to isolation and the analysis of 16S rDNA sequences has been used for over two decades to examine bacterial biodiversity. Here, we show that the analysis of random sequence reads (RSRs) instead of 16S is a suitable shortcut to estimate the biodiversity of a bacterial community from metagenomic libraries. We generated 10,010 RSRs from a metagenomic library of microorganisms found in human faecal samples. Then searched them using the program BLASTN against a prokaryotic sequence database to assign a taxon to each RSR. The results were compared with those obtained by screening and analysing the clones containing 16S rDNA sequences in the whole library. We found that the biodiversity observed by RSR analysis is consistent with that obtained by 16S rDNA. We also show that RSRs are suitable to compare the biodiversity between different metagenomic libraries. RSRs can thus provide a good estimate of the biodiversity of a metagenomic library and, as an alternative to 16S, this approach is both faster and cheaper.
Resumo:
Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.