8 resultados para Cluster Analysis. Information Theory. Entropy. Cross Information Potential. Complex Data
em National Center for Biotechnology Information - NCBI
Resumo:
A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
Resumo:
Varicella-zoster virus open reading frame 10 (ORF10) protein, the homolog of the herpes simplex virus protein VP16, can transactivate immediate-early promoters from both viruses. A protein sequence comparison procedure termed hydrophobic cluster analysis was used to identify a motif centered at Phe-28, near the amino terminus of ORF10, that strongly resembles the sequence of the activating domain surrounding Phe-442 of VP16. With a series of GAL4-ORF10 fusion proteins, we mapped the ORF10 transcriptional-activation domain to the amino-terminal region (aa 5-79). Extensive mutagenesis of Phe-28 in GAL4-ORF10 fusion proteins demonstrated the importance of an aromatic or bulky hydrophobic amino acid at this position, as shown previously for Phe-442 of VP16. Transactivation by the native ORF10 protein was abolished when Phe-28 was replaced by Ala. Similar amino-terminal domains were identified in the VP16 homologs of other alphaherpesviruses. Hydrophobic cluster analysis correctly predicted activation domains of ORF10 and VP16 that share critical characteristics of a distinctive subclass of acidic activation domains.
Resumo:
Computer models were used to examine whether and under what conditions the multimeric protein complex is inhibited by high concentrations of one of its components—an effect analogous to the prozone phenomenon in precipitin tests. A series of idealized simple “ball-and-stick” structures representing small oligomeric complexes of protein molecules formed by reversible binding reactions were analyzed to determine the binding steps leading to each structure. The equilibrium state of each system was then determined over a range of starting concentrations and Kds and the steady-state concentration of structurally complete oligomer calculated for each situation. A strong inhibitory effect at high concentrations was shown by any protein molecule forming a bridge between two or more separable parts of the complex. By contrast, proteins linked to the outside of the complex by a single bond showed no inhibition whatsoever at any concentration. Nonbridging, multivalent proteins in the body of the complex could show an inhibitory effect or not depending on the structure of the complex and the strength of its bonds. On the basis of this study, we suggest that the prozone phenomenon will occur widely in living cells and that it could be a crucial factor in the regulation of protein complex formation.
Resumo:
Two objects with homologous landmarks are said to be of the same shape if the configurations of landmarks of one object can be exactly matched with that of the other by translation, rotation/reflection, and scaling. The observations on an object are coordinates of its landmarks with reference to a set of orthogonal coordinate axes in an appropriate dimensional space. The origin, choice of units, and orientation of the coordinate axes with respect to an object may be different from object to object. In such a case, how do we quantify the shape of an object, find the mean and variation of shape in a population of objects, compare the mean shapes in two or more different populations, and discriminate between objects belonging to two or more different shape distributions. We develop some methods that are invariant to translation, rotation, and scaling of the observations on each object and thereby provide generalizations of multivariate methods for shape analysis.
Resumo:
Averaged event-related potential (ERP) data recorded from the human scalp reveal electroencephalographic (EEG) activity that is reliably time-locked and phase-locked to experimental events. We report here the application of a method based on information theory that decomposes one or more ERPs recorded at multiple scalp sensors into a sum of components with fixed scalp distributions and sparsely activated, maximally independent time courses. Independent component analysis (ICA) decomposes ERP data into a number of components equal to the number of sensors. The derived components have distinct but not necessarily orthogonal scalp projections. Unlike dipole-fitting methods, the algorithm does not model the locations of their generators in the head. Unlike methods that remove second-order correlations, such as principal component analysis (PCA), ICA also minimizes higher-order dependencies. Applied to detected—and undetected—target ERPs from an auditory vigilance experiment, the algorithm derived ten components that decomposed each of the major response peaks into one or more ICA components with relatively simple scalp distributions. Three of these components were active only when the subject detected the targets, three other components only when the target went undetected, and one in both cases. Three additional components accounted for the steady-state brain response to a 39-Hz background click train. Major features of the decomposition proved robust across sessions and changes in sensor number and placement. This method of ERP analysis can be used to compare responses from multiple stimuli, task conditions, and subject states.
Resumo:
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Resumo:
Neurotrophic factors such as nerve growth factor (NGF) promote a wide variety of responses in neurons, including differentiation, survival, plasticity, and repair. Such actions often require changes in gene expression. To identify the regulated genes and thereby to more fully understand the NGF mechanism, we carried out serial analysis of gene expression (SAGE) profiling of transcripts derived from rat PC12 cells before and after NGF-promoted neuronal differentiation. Multiple criteria supported the reliability of the profile. Approximately 157,000 SAGE tags were analyzed, representing at least 21,000 unique transcripts. Of these, nearly 800 were regulated by 6-fold or more in response to NGF. Approximately 150 of the regulated transcripts have been matched to named genes, the majority of which were not previously known to be NGF-responsive. Functional categorization of the regulated genes provides insight into the complex, integrated mechanism by which NGF promotes its multiple actions. It is anticipated that as genomic sequence information accrues the data derived here will continue to provide information about neurotrophic factor mechanisms.
Resumo:
Early detection is an effective means of reducing cancer mortality. Here, we describe a highly sensitive high-throughput screen that can identify panels of markers for the early detection of solid tumor cells disseminated in peripheral blood. The method is a two-step combination of differential display and high-sensitivity cDNA arrays. In a primary screen, differential display identified 170 candidate marker genes differentially expressed between breast tumor cells and normal breast epithelial cells. In a secondary screen, high-sensitivity arrays assessed expression levels of these genes in 48 blood samples, 22 from healthy volunteers and 26 from breast cancer patients. Cluster analysis identified a group of 12 genes that were elevated in the blood of cancer patients. Permutation analysis of individual genes defined five core genes (P ≤ 0.05, permax test). As a group, the 12 genes generally distinguished accurately between healthy volunteers and patients with breast cancer. Mean expression levels of the 12 genes were elevated in 77% (10 of 13) untreated invasive cancer patients, whereas cluster analysis correctly classified volunteers and patients (P = 0.0022, Fisher's exact test). Quantitative real-time PCR confirmed array results and indicated that the sensitivity of the assay (1:2 × 108 transcripts) was sufficient to detect disseminated solid tumor cells in blood. Expression-based blood assays developed with the screening approach described here have the potential to detect and classify solid tumor cells originating from virtually any primary site in the body.