992 resultados para Gene clustering


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The small leucine-rich repeat proteoglycans (or SLRPs) are a group of extracellular proteins (ECM) that belong to the leucine-rich repeat (LRR) superfamily of proteins. The LRR is a protein folding motif composed of 20–30 amino acids with leucines in conserved positions. LRR-containing proteins are present in a broad spectrum of organisms and possess diverse cellular functions and localization. In mammals, the SLRPs are abundant in connective tissues, such as bones, cartilage, tendons, skin, and blood vessels. We have discovered a new member of the class I small leucine rich repeat proteoglycan (SLRP) family which is distinct from the other class I SLRPs since it possesses a unique stretch of aspartate residues at its N-terminus. For this reason, we called the molecule asporin. The deduced amino acid sequence is about 50% identical (and 70% similar) to decorin and biglycan. However, asporin does not contain a serine/glycine dipeptide sequence required for the assembly of O-linked glycosaminoglycans and is probably not a proteoglycan. The tissue expression of asporin partially overlaps with the expression of decorin and biglycan. During mouse embryonic development, asporin mRNA expression was detected primarily in the skeleton and other specialized connective tissues; very little asporin message was detected in the major parenchymal organs. The mouse asporin gene structure is similar to that of biglycan and decorin with 8 exons. The asporin gene is localized to human chromosome 9q22-9g21.3 where asporin is part of a SLRP gene cluster that includes ECM2, osteoadherin, and osteoglycin. This gene cluster of four LRR-encoding genes is embedded in a 238 kilobase intron of another novel gene named Tes9orf that is expressed primarily in the testes of the adult mouse. The SLRP genes are not present in Drosophila or C. elegans , but reside in three separate gene clusters in the puffer fish, mice and humans. Targeted disruption of individual mouse SLRP genes display minor connective tissue defects such as skin fragility, tendon laxity, minor growth plate defects, and mild osteoporosis. However, double and triple knockouts of SLRP genes exacerbate these phenotypes. Both the double epiphycan/biglycan and the triple PRELP/fibromodulin/biglycan knockout mice exhibit premature osteoarthritis. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific life history.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Euphorbiaceae produce a diverse range of diterpenoids, many of which have pharmacological activities. These diterpenoids include ingenol mebutate, which is licensed for the treatment of a precancerous skin condition (actinic keratosis), and phorbol derivatives such as resiniferatoxin and prostratin, which are undergoing investigation for the treatment of severe pain and HIV, respectively. Despite the interest in these diterpenoids, their biosynthesis is poorly understood at present, with the only characterized step being the conversion of geranylgeranyl pyrophosphate into casbene. Here, we report a physical cluster of diterpenoid biosynthetic genes from castor (Ricinus communis), including casbene synthases and cytochrome P450s from the CYP726A subfamily. CYP726A14, CYP726A17, and CYP726A18 were able to catalyze 5-oxidation of casbene, a conserved oxidation step in the biosynthesis of this family of medicinally important diterpenoids. CYP726A16 catalyzed 7,8-epoxidation of 5-keto-casbene and CYP726A15 catalyzed 5-oxidation of neocembrene. Evidence of similar gene clustering was also found in two other Euphorbiaceae, including Euphorbia peplus, the source organism of ingenol mebutate. These results demonstrate conservation of gene clusters at the higher taxonomic level of the plant family and that this phenomenon could prove useful in further elucidating diterpenoid biosynthetic pathways.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Objective. We aimed to evaluate whether the differential gene expression profiles of patients with rheumatoid arthritis (RA) could distinguish responders from nonresponders to methotrexate (MTX) and, in the case of MTX nonresponders, responsiveness to MTX plus anti-tumor necrosis factor-alpha (anti-TNF) combined therapy. Methods. We evaluated 25 patients with RA taking MTX 15-20 mg/week as a monotherapy (8 responders and 17 nonresponders). All MTX nonresponders received intliximab and were reassessed after 20 weeks to evaluate their anti-TNF responsiveness using the European League Against Rheumatism response criteria. A differential gene expression analysis from peripheral blood mononuclear cells was performed in terms of hierarchical gene clustering, and an evaluation of differentially expressed genes was performed using the significance analysis of microarrays program. Results. Hierarchical gene expression clustering discriminated MTX responders from nonresponders, and MTX plus anti-TNF responders from nonresponders. The evaluation of only highly modulated genes (fold change > 1.3 or < 0.7) yielded 5 induced (4 antiapoptotic and CCL4) and 4 repressed (4 proapoptotic) genes in MTX nonresponders compared to responders. In MTX plus anti-TNF nonresponders, the CCL4, CD83, and BCL2A1 genes were induced in relation to responders. Conclusion. Study of the gene expression profiles of RA peripheral blood cells permitted differentiation of responders from nonresponders to MTX and anti-TNF. Several candidate genes in MTX non-responders (CCL4, HTRA2, PRKCD, BCL2A1, CAV1, TNIP1 CASP8AP2, MXD1, and BTG2) and 3 genes in MTX plus anti-TNF nonresponders (CCL4, CD83, and BCL2A1) were identified for further study. (First Release July 1 2012; J Rheumatol 2012;39:1524-32; doi:10.3899/jrheum.120092)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Swine influenza is a highly contagious viral infection in pigs affecting the respiratory tract that can have significant economic impacts. Streptococcus suis serotype 2 is one of the most important post-weaning bacterial pathogens in swine causing different infections, including pneumonia. Both pathogens are important contributors to the porcine respiratory disease complex. Outbreaks of swine influenza virus with a significant level of co-infections due to S. suis have lately been reported. In order to analyze, for the first time, the transcriptional host response of swine tracheal epithelial (NPTr) cells to H1N1 swine influenza virus (swH1N1) infection, S. suis serotype 2 infection and a dual infection, we carried out a comprehensive gene expression profiling using a microarray approach. Results: Gene clustering showed that the swH1N1 and swH1N1/S. suis infections modified the expression of genes in a similar manner. Additionally, infection of NPTr cells by S. suis alone resulted in fewer differentially expressed genes compared to mock-infected cells. However, some important genes coding for inflammatory mediators such as chemokines, interleukins, cell adhesion molecules, and eicosanoids were significantly upregulated in the presence of both pathogens compared to infection with each pathogen individually. This synergy may be the consequence, at least in part, of an increased bacterial adhesion/invasion of epithelial cells previously infected by swH1N1, as recently reported. Conclusion: Influenza virus would replicate in the respiratory epithelium and induce an inflammatory infiltrate comprised of mononuclear cells and neutrophils. In a co-infection situation, although these cells would be unable to phagocyte and kill S. suis, they are highly activated by this pathogen. S. suis is not considered a primary pulmonary pathogen, but an exacerbated production of proinflammatory mediators during a co-infection with influenza virus may be important in the pathogenesis and clinical outcome of S. suis-induced respiratory diseases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Homeobox genes encode DNA-binding proteins, many of which are implicated in the control of embryonic development. Evolutionarily, most homeobox genes fall into two related clades: the ANTP and the PRD classes. Some genes in ANTP class, notably Hox, ParaHox, and NK genes, have an intriguing arrangement into physical clusters. To investigate the evolutionary history of these gene clusters, we examined homeobox gene chromosomal locations in the cephalochordate amphioxus, Branchiostoma floridae. We deduce that 22 amphioxus ANTP class homeobox genes localize in just three chromosomes. One contains the Hox cluster plus AmphiEn, AmphiMnx, and AmphiDll. The ParaHox cluster resides in another chromosome, whereas a third chromosome contains the NK type homeobox genes, including AmphiMsx and ArnphiTlx. By comparative analysis we infer that clustering of ANTP class homeobox genes evolved just once, during a series of extensive cis-duplication events of genes early in animal evolution. A trans-duplication event occurred later to yield the Hox and ParaHox gene clusters on different chromosomes. The results obtained have implications for understanding the origin of homeobox gene clustering, the diversification of the ANTP class of homeobox genes, and the evolution of animal genomes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Previous genetic association studies have overlooked the potential for biased results when analyzing different population structures in ethnically diverse populations. The purpose of the present study was to quantify this bias in two-locus association studies conducted on an admixtured urban population. We studied the genetic structure distribution of angiotensin-converting enzyme insertion/deletion (ACE I/D) and angiotensinogen methionine/threonine (M/T) polymorphisms in 382 subjects from three subgroups in a highly admixtured urban population. Group I included 150 white subjects; group II, 142 mulatto subjects, and group III, 90 black subjects. We conducted sample size simulation studies using these data in different genetic models of gene action and interaction and used genetic distance calculation algorithms to help determine the population structure for the studied loci. Our results showed a statistically different population structure distribution of both ACE I/D (P = 0.02, OR = 1.56, 95% CI = 1.05-2.33 for the D allele, white versus black subgroup) and angiotensinogen M/T polymorphism (P = 0.007, OR = 1.71, 95% CI = 1.14-2.58 for the T allele, white versus black subgroup). Different sample sizes are predicted to be determinant of the power to detect a given genotypic association with a particular phenotype when conducting two-locus association studies in admixtured populations. In addition, the postulated genetic model is also a major determinant of the power to detect any association in a given sample size. The present simulation study helped to demonstrate the complex interrelation among ethnicity, power of the association, and the postulated genetic model of action of a particular allele in the context of clustering studies. This information is essential for the correct planning and interpretation of future association studies conducted on this population.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Rigid adherence to pre-specified thresholds and static graphical representations can lead to incorrect decisions on merging of clusters. As an alternative to existing automated or semi-automated methods, we developed a visual analytics approach for performing hierarchical clustering analysis of short time-series gene expression data. Dynamic sliders control parameters such as the similarity threshold at which clusters are merged and the level of relative intra-cluster distinctiveness, which can be used to identify "weak-edges" within clusters. An expert user can drill down to further explore the dendrogram and detect nested clusters and outliers. This is done by using the sliders and by pointing and clicking on the representation to cut the branches of the tree in multiple-heights. A prototype of this tool has been developed in collaboration with a small group of biologists for analysing their own datasets. Initial feedback on the tool has been positive.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Macro- and microarrays are well-established technologies to determine gene functions through repeated measurements of transcript abundance. We constructed a chicken skeletal muscle-associated array based on a muscle-specific EST database, which was used to generate a tissue expression dataset of similar to 4500 chicken genes across 5 adult tissues (skeletal muscle, heart, liver, brain, and skin). Only a small number of ESTs were sufficiently well characterized by BLAST searches to determine their probable cellular functions. Evidence of a particular tissue-characteristic expression can be considered an indication that the transcript is likely to be functionally significant. The skeletal muscle macroarray platform was first used to search for evidence of tissue-specific expression, focusing on the biological function of genes/transcripts, since gene expression profiles generated across tissues were found to be reliable and consistent. Hierarchical clustering analysis revealed consistent clustering among genes assigned to 'developmental growth', such as the ontology genes and germ layers. Accuracy of the expression data was supported by comparing information from known transcripts and tissue from which the transcript was derived with macroarray data. Hybridization assays resulted in consistent tissue expression profile, which will be useful to dissect tissue-regulatory networks and to predict functions of novel genes identified after extensive sequencing of the genomes of model organisms. Screening our skeletal-muscle platform using 5 chicken adult tissues allowed us identifying 43 'tissue-specific' transcripts, and 112 co-expressed uncharacterized transcripts with 62 putative motifs. This platform also represents an important tool for functional investigation of novel genes; to determine expression pattern according to developmental stages; to evaluate differences in muscular growth potential between chicken lines, and to identify tissue-specific genes.