935 resultados para Complete Genome Sequence
Resumo:
Madrepora is one of the most ecologically important genera of reef-building scleractinians in the deep sea, occurring from tropical to high-latitude regions. Despite this, the taxonomic affinities and relationships within the genus Madrepora remain unclear. To clarify these issues, we sequenced the mitochondrial (mt) genome of the most widespread Madrepora species, M. oculata, and compared this with data for other scleractinians. The architecture of the M. oculara mt genome was very similar to that of other scleractinians, except for a novel gene rearrangement affecting only cox2 and cox3. This pattern of gene organization was common to four geographically distinct M. oculata individuals as well as the congeneric species M. minutiseptum, but was not shared by other genera that are closely related on the basis of cox1 sequence analysis nor other oculinids, suggesting that it might be unique to Madrepora. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
Abstract Background The mitochondrial DNA of kinetoplastid flagellates is distinctive in the eukaryotic world due to its massive size, complex form and large sequence content. Comprised of catenated maxicircles that contain rRNA and protein-coding genes and thousands of heterogeneous minicircles encoding small guide RNAs, the kinetoplast network has evolved along with an extreme form of mRNA processing in the form of uridine insertion and deletion RNA editing. Many maxicircle-encoded mRNAs cannot be translated without this post-transcriptional sequence modification. Results We present the complete sequence and annotation of the Trypanosoma cruzi maxicircles for the CL Brener and Esmeraldo strains. Gene order is syntenic with Trypanosoma brucei and Leishmania tarentolae maxicircles. The non-coding components have strain-specific repetitive regions and a variable region that is unique for each strain with the exception of a conserved sequence element that may serve as an origin of replication, but shows no sequence identity with L. tarentolae or T. brucei. Alternative assemblies of the variable region demonstrate intra-strain heterogeneity of the maxicircle population. The extent of mRNA editing required for particular genes approximates that seen in T. brucei. Extensively edited genes were more divergent among the genera than non-edited and rRNA genes. Esmeraldo contains a unique 236-bp deletion that removes the 5'-ends of ND4 and CR4 and the intergenic region. Esmeraldo shows additional insertions and deletions outside of areas edited in other species in ND5, MURF1, and MURF2, while CL Brener has a distinct insertion in MURF2. Conclusion The CL Brener and Esmeraldo maxicircles represent two of three previously defined maxicircle clades and promise utility as taxonomic markers. Restoration of the disrupted reading frames might be accomplished by strain-specific RNA editing. Elements in the non-coding region may be important for replication, transcription, and anchoring of the maxicircle within the kinetoplast network.
Resumo:
Abstract Background One of the least common types of alternative splicing is the complete retention of an intron in a mature transcript. Intron retention (IR) is believed to be the result of intron, rather than exon, definition associated with failure of the recognition of weak splice sites flanking short introns. Although studies on individual retained introns have been published, few systematic surveys of large amounts of data have been conducted on the mechanisms that lead to IR. Results TTo understand how sequence features are associated with or control IR, and to produce a generalized model that could reveal previously unknown signals that regulate this type of alternative splicing, we partitioned intron retention events observed in human cDNAs into two groups based on the relative abundance of both isoforms and compared relevant features. We found that a higher frequency of IR in human is associated with individual introns that have weaker splice sites, genes with shorter intron lengths, higher expression levels and lower density of both a set of exon splicing silencers (ESSs) and the intronic splicing enhancer GGG. Both groups of retained introns presented events conserved in mouse, in which the retained introns were also short and presented weaker splice sites. Conclusion Although our results confirmed that weaker splice sites are associated with IR, they showed that this feature alone cannot explain a non-negligible fraction of events. Our analysis suggests that cis-regulatory elements are likely to play a crucial role in regulating IR and also reveals previously unknown features that seem to influence its occurrence. These results highlight the importance of considering the interplay among these features in the regulation of the relative frequency of IR.
Resumo:
Abstract Background Plasmodium vivax is the most widely distributed human malaria, responsible for 70–80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected. Methods A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10-30 was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology Results A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them. Conclusion These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite.
Resumo:
Membrane proteins are a large and important class of proteins. They are responsible for several of the key functions in a living cell, e.g. transport of nutrients and ions, cell-cell signaling, and cell-cell adhesion. Despite their importance it has not been possible to study their structure and organization in much detail because of the difficulty to obtain 3D structures. In this thesis theoretical studies of membrane protein sequences and structures have been carried out by analyzing existing experimental data. The data comes from several sources including sequence databases, genome sequencing projects, and 3D structures. Prediction of the membrane spanning regions by hydrophobicity analysis is a key technique used in several of the studies. A novel method for this is also presented and compared to other methods. The primary questions addressed in the thesis are: What properties are common to all membrane proteins? What is the overall architecture of a membrane protein? What properties govern the integration into the membrane? How many membrane proteins are there and how are they distributed in different organisms? Several of the findings have now been backed up by experiments. An analysis of the large family of G-protein coupled receptors pinpoints differences in length and amino acid composition of loops between proteins with and without a signal peptide and also differences between extra- and intracellular loops. Known 3D structures of membrane proteins have been studied in terms of hydrophobicity, distribution of secondary structure and amino acid types, position specific residue variability, and differences between loops and membrane spanning regions. An analysis of several fully and partially sequenced genomes from eukaryotes, prokaryotes, and archaea has been carried out. Several differences in the membrane protein content between organisms were found, the most important being the total number of membrane proteins and the distribution of membrane proteins with a given number of transmembrane segments. Of the properties that were found to be similar in all organisms, the most obvious is the bias in the distribution of positive charges between the extra- and intracellular loops. Finally, an analysis of homologues to membrane proteins with known topology uncovered two related, multi-spanning proteins with opposite predicted orientations. The predicted topologies were verified experimentally, providing a first example of "divergent topology evolution".
Resumo:
[EN] First description of the complete embryo and larval development of the Canarian abalone (Haliotis tuberculata coccinea Reeve.) was conducted along 39 stages from fertilization to the appearance of the third tubule on the cephalic tentacles and illustrated in a microphotographic sequence. Eggs obtained by induced spawning with hydrogen peroxide from the GIA captive broodstock were stocked at a density of 10 eggs/mL and kept at 23 0.5 BC for 62 h until the formation of the third tubule. Live eggs and larvae were continuously observed on a 24 h basis at a 3400 magnification under transmitted light. At each stages, specific morphological features, illustrated by microscopic photographs, were described, as well as the time required for their apparition. Fertilized eggs diameter was 205 8 mm (mean SD), whereas length and width of larvae ready to undergo metamorphosis were 216.6 5.3 mmand 172 8.8 mm, respectively. Knowledge on the larval morphological development acquired through this study will contribute to the improvement of larval rearing techniques for this abalone species.
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.
Resumo:
Das Hepatitis C Virus (HCV) ist ein umhülltes Virus aus der Familie der Flaviviridae. Es besitzt ein Plusstrang-RNA Genom von ca. 9600 Nukleotiden Länge, das nur ein kodierendes Leseraster besitzt. Das Genom wird am 5’ und 3’ Ende von nicht-translatierten Sequenzen (NTRs) flankiert, welche für die Translation und vermutlich auch Replikation von Bedeutung sind. Die 5’ NTR besitzt eine interne Ribosomeneintrittsstelle (IRES), die eine cap-unabhängige Translation des ca. 3000 Aminosäure langen viralen Polyproteins erlaubt. Dieses wird ko- und posttranslational von zellulären und viralen Proteasen in 10 funktionelle Komponenten gespalten. Inwieweit die 5’ NTR auch für die Replikation der HCV RNA benötigt wird, war zu Beginn der Arbeit nicht bekannt. Die 3’ NTR besitzt eine dreigeteilte Struktur, bestehend aus einer variablen Region, dem polyU/UC-Bereich und der sogenannten X-Sequenz, eine hochkonservierte 98 Nukleotide lange Region, die vermutlich für die RNA-Replikation und möglicherweise auch für die Translation benötigt wird. Die genuae Rolle der 3’ NTR für diese beiden Prozesse war zu Beginn der Arbeit jedoch nicht bekannt. Ziel der Dissertation war deshalb eine detaillierte genetische Untersuchung der NTRs hinsichtlich ihrer Bedeutung für die RNA-Translation und -Replikation. In die Analyse mit einbezogen wurden auch RNA-Strukturen innerhalb der kodierenden Region, die zwischen verschiedenen HCV-Genotypen hoch konserviert sind und die mit verschiedenen computer-basierten Modellen vorhergesagt wurden. Zur Kartierung der für RNA-Replikation benötigten Minimallänge der 5’ NTR wurde eine Reihe von Chimären hergestellt, in denen unterschiedlich lange Bereiche der HCV 5’ NTR 3’ terminal mit der IRES des Poliovirus fusioniert wurden. Mit diesem Ansatz konnten wir zeigen, dass die ersten 120 Nukleotide der HCV 5’ NTR als Minimaldomäne für Replikation ausreichen. Weiterhin ergab sich eine klare Korrelation zwischen der Länge der HCV 5’ NTR und der Replikationseffizienz. Mit steigender Länge der 5’ NTR nahm auch die Replikationseffizienz zu, die dann maximal war, wenn das vollständige 5’ Element mit der Poliovirus-IRES fusioniert wurde. Die hier gefundene Kopplung von Translation und Replikation in der HCV 5’ NTR könnte auf einen Mechanismus zur Regulation beider Funktionen hindeuten. Es konnte allerdings noch nicht geklärt werden, welche Bereiche innerhalb der Grenzen des IRES-Elements genau für die RNA-Replikation benötigt werden. Untersuchungen im Bereich der 3’ NTR ergaben, dass die variable Region für die Replikation entbehrlich, die X-Sequenz jedoch essentiell ist. Der polyU/UC-Bereich musste eine Länge von mindestens 11-30 Uridinen besitzen, wobei maximale Replikation ab einer Länge von 30-50 Uridinen beobachtet wurde. Die Addition von heterologen Sequenzen an das 3’ Ende der HCV-RNA führte zu einer starken Reduktion der Replikation. In den hier durchgeführten Untersuchungen zeigte keines der Elemente in der 3’ NTR einen signifikanten Einfluss auf die Translation. Ein weiteres cis aktives RNA-Element wurde im 3’ kodierenden Bereich für das NS5B Protein beschrieben. Wir fanden, dass Veränderungen dieser Struktur durch stille Punktmutationen die Replikation hemmten, welche durch die Insertion einer intakten Version dieses RNA-Elements in die variable Region der 3’ NTR wieder hergestellt werden konnte. Dieser Versuchsansatz erlaubte die genaue Untersuchung der für die Replikation kritischen Strukturelemente. Dadurch konnte gezeigt werden, dass die Struktur und die Primärsequenz der Loopbereiche essentiell sind. Darüber hinaus wurde eine Sequenzkomplementarität zwischen dem Element in der NS5B-kodierenden Region und einem RNA-Bereich in der X-Sequenz der 3’ NTR gefunden, die eine sog. „kissing loop“ Interaktion eingehen kann. Mit Hilfe von gezielten Mutationen konnten wir zeigen, dass diese RNA:RNA Interaktion zumindest transient stattfindet und für die Replikation des HCV essentiell ist.
Resumo:
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.
Resumo:
Immune modulation by herpesviruses, such as cytomegalovirus, is critical for the establishment of acute and persistent infection confronting a vigorous antiviral immune response of the host. Therefore, the action of immune-modulatory proteins has long been the subject of research, with the final goal to identify new strategies for antiviral therapy.rnIn the case of murine cytomegalovirus (mCMV), the viral m152 protein has been identified to play a major role in targeting components of both the innate and the adaptive immune system in terms of infected host-cell recognition in the effector phase of the antiviral immune response. On the one hand, it inhibits cell surface expression of RAE-1 and thereby prevents ligation of the activating natural killer (NK)-cell receptor NKG2D. On the other hand, it decreases cell surface expression of peptide-loaded MHC class I molecules thereby preventing antigen presentation to CD8 T cells. Ultimately, the outcome of CMV infection is determined by the interplay between viral and cellular factors.rnIn this context, the work presented here has revealed a novel and intriguing connection between viral m152 and cellular interferon (IFN), a key cytokine of the immune system: rnthe m152 promoter region contains an interferon regulatory factor element (IRFE) perfectly matching the consensus sequence of cellular IRFEs.rnThe biological relevance of this regulatory element was first suggested by sequence comparisons revealing its evolutionary conservation among various established laboratory strains of mCMV and more recent low-passage wild-derived virus isolates. Moreover, search of the mCMV genome revealed only three IRFE sites in the complete sequence. Importantly, the functionality of the IRFE in the m152 promoter was confirmed with the use of a mutant virus, representing a functional deletion of the IRFE, and its corresponding revertant virus. In particular, m152 gene expression was found to be inhibited in an IRFE-dependent manner in infected cells. Essentially, this inhibition proved to have a severe impact on the immune-modulatory function of m152, first demonstrated by a restored direct antigen presentation on infected cells for CD8 T-cell activation. Even more importantly, this effect of IRFE-mediated IFN signaling was validated in vivo by showing that the protective antiviral capacity of adoptively-transferred, antigen-specific CD8 T cells is also significantly restored by the IRFE-dependent inhibition of m152. Somewhat curious and surprising, the decrease in m152 protein simultaneously prevented an enhanced activation of NK cells in acute-infected mice, apparently independent of the RAE-1/NKG2D ligand/receptor interaction but rather due to reduced ‘missing-self’ recognition.rnTaken together, this work presents a so far unknown mechanism of IFN signaling to control mCMV immune modulation in acute infection.rnrn
Resumo:
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
Resumo:
This case presentation documents the treatment sequence of a 74 years old patient who complained about a sore spot of the palatal mucosa underneath the complete denture. The intraoral examination revealed a dark spot, redness and swelling of the mucosa around this spot and halitosis. The mucosa exhibited a perforation of 3 x 10 mm in diameter. A radiographic 3-D picture showed an impacted canine tooth, which was partly covered by the palatal bone. Firstly the denture base was relieved and the swelling gradually disappeared. Then a biopsy was taken for histological analysis to exclude any malignant process. In local anesthesia the tooth was extracted, which exhibited a deep carious lesion of the entire crown. After surgery a visible collapse of the jaw crest was observed. During a period of two months the denture was relined with a soft material to improve its fit and to enhance the healing process. With a final rebasement, the existing denture could be adapted again and the patient continued to wear it.
Resumo:
The potential for mitochondrial (mt) DNA mutation accumulation during antiretroviral therapy (ART), and preferential accumulation in patients with lipoatrophy compared with control participants, remains controversial. We sequenced the entire mitochondrial genome, both before ART and after ART exposure, in 29 human immunodeficiency virus (HIV)-infected Swiss HIV Cohort Study participants initiating a first-line thymidine analogue-containing ART regimen. No accumulation of mtDNA mutations or deletions was detected in 13 participants who developed lipoatrophy or in 16 control participants after significant and comparable ART exposure (median duration, 3.3 and 3.7 years, respectively). In HIV-infected persons, the development of lipoatrophy is unlikely to be associated with accumulation of mtDNA mutations detectable in peripheral blood.
Resumo:
Cytochrome P450 enzymes (CYP450s) represent a superfamily of haem-thiolate proteins. CYP450s are most abundant in the liver, a major site of drug metabolism, and play key roles in the metabolism of a variety of substrates, including drugs and environmental contaminants. Interaction of two or more different drugs with the same enzyme can account for adverse effects and failure of therapy. Human CYP3A4 metabolizes about 50% of all known drugs, but little is known about the orthologous CYP450s in horses. We report here the genomic organization of the equine CYP3A gene cluster as well as a comparative analysis with the human CYP3A gene cluster. The equine CYP450 genes of the 3A family are located on ECA 13 between 6.97-7.53 Mb, in a region syntenic to HSA 7 99.05-99.35 Mb. Seven potential, closely linked equine CYP3A genes were found, in contrast to only four genes in the human genome. RNA was isolated from an equine liver sample, and the approximately 1.5-kb coding sequence of six CYP3A genes could be amplified by RT-PCR. Sequencing of the RT-PCR products revealed numerous hitherto unknown single nucleotide polymorphisms (SNPs) in these six CYP3A genes, and one 6-bp deletion compared to the reference sequence (EquCab2.0). The presence of the variants was confirmed in a sample of genomic DNA from the same horse. In conclusion, orthologous genes for the CYP3A family exist in horses, but their number differs from those of the human CYP3A gene family. CYP450 genes of the same family show high homology within and between mammalian species, but can be highly polymorphic.