950 resultados para Similarity analysis
Resumo:
Modelling class B G-protein-coupled receptors (GPCRs) using class A GPCR structural templates is difficult due to lack of homology. The plant GPCR, GCR1, has homology to both class A and class B GPCRs. We have used this to generate a class A-class B alignment, and by incorporating maximum lagged correlation of entropy and hydrophobicity into a consensus score, we have been able to align receptor transmembrane regions. We have applied this analysis to generate active and inactive homology models of the class B calcitonin gene-related peptide (CGRP) receptor, and have supported it with site-directed mutagenesis data using 122 CGRP receptor residues and 144 published mutagenesis results on other class B GPCRs. The variation of sequence variability with structure, the analysis of polarity violations, the alignment of group-conserved residues and the mutagenesis results at 27 key positions were particularly informative in distinguishing between the proposed and plausible alternative alignments. Furthermore, we have been able to associate the key molecular features of the class B GPCR signalling machinery with their class A counterparts for the first time. These include the [K/R]KLH motif in intracellular loop 1, [I/L]xxxL and KxxK at the intracellular end of TM5 and TM6, the NPXXY/VAVLY motif on TM7 and small group-conserved residues in TM1, TM2, TM3 and TM7. The equivalent of the class A DRY motif is proposed to involve Arg(2.39), His(2.43) and Glu(3.46), which makes a polar lock with T(6.37). These alignments and models provide useful tools for understanding class B GPCR function.
Resumo:
Epitope identification is the basis of modern vaccine design. The present paper studied the supermotif of the HLA-A3 superfamily, using comparative molecular similarity indices analysis (CoMSIA). Four alleles with high phenotype frequencies were used: A*1101, A*0301, A*3101 and A*6801. Five physicochemical properties—steric bulk, electrostatic potential, local hydro-phobicity, hydrogen-bond donor and acceptor abilities—were considered and ‘all fields’ models were produced for each of the alleles. The models have a moderate level of predictivity and there is a good correlation between the data. A revised HLA-A3 supermotif was defined based on the comparison of favoured and disfavoured properties for each position of the MHC bound peptide. The present study demonstrated that CoMSIA is an effective tool for studying peptide–MHC interactions.
Resumo:
We discuss several approaches to similarity preserving coding of symbol sequences and possible connections of their distributed versions to metric embeddings. Interpreting sequence representation methods with embeddings can help develop an approach to their analysis and may lead to discovering useful properties.
Resumo:
The paper presents a computational analysis of Bulgarian dialect variation, concentrating on pronunciation differences. It describes the phonetic data set compiled during the project* ‘Measuring Linguistic Unity and Diversity in Europe’ that consists of the pronunciations of 157 words collected at 197 sites from all over Bulgaria. We also present the results of analyzing this data set using various quantitative methods and compare them to the traditional scholarship on Bulgarian dialects. The results have shown that various dialectometrical techniques clearly identify east-west division of the country along the ‘jat’ border, as well as the third group of varieties in the Rodopi area. The rest of the groups specified in the traditional atlases either were not confirmed or were confirmed with a low confidence.
Resumo:
In this paper we propose a quantum algorithm to measure the similarity between a pair of unattributed graphs. We design an experiment where the two graphs are merged by establishing a complete set of connections between their nodes and the resulting structure is probed through the evolution of continuous-time quantum walks. In order to analyze the behavior of the walks without causing wave function collapse, we base our analysis on the recently introduced quantum Jensen-Shannon divergence. In particular, we show that the divergence between the evolution of two suitably initialized quantum walks over this structure is maximum when the original pair of graphs is isomorphic. We also prove that under special conditions the divergence is minimum when the sets of eigenvalues of the Hamiltonians associated with the two original graphs have an empty intersection.
Resumo:
Mammalian C3 is a pivotal complement protein, encoded for by a single gene. In some vertebrate species multiple C3 isoforms are products of different C3 genes. The goal of this study was to determine whether multiple genes encode for shark C3. A protocol was developed for the isolation of mRNA from shark blood for the isolation of C3 cDNA clones. RT-PCR amplification of mRNA, using sense (GCGEQNM) and antisense (TWLTAYV) primers encoding conserved regions of human C3, yielded 21 clones. The C3-like clones isolated shared 97% similarity with each other and 40% similarity to human C3. RACE-PCR amplification of shark liver RNA, using gene specific primers, yielded products ranging from 1800bp to 3000bp. Deduced amino acid sequence, corresponding to 408bp of the 1800bp fragment, was obtained which showed 51% similarity to human C3. These results suggest that nurse shark C3 might be encoded for by more than one gene. ^
Resumo:
This dissertation develops a new figure of merit to measure the similarity (or dissimilarity) of Gaussian distributions through a novel concept that relates the Fisher distance to the percentage of data overlap. The derivations are expanded to provide a generalized mathematical platform for determining an optimal separating boundary of Gaussian distributions in multiple dimensions. Real-world data used for implementation and in carrying out feasibility studies were provided by Beckman-Coulter. It is noted that although the data used is flow cytometric in nature, the mathematics are general in their derivation to include other types of data as long as their statistical behavior approximate Gaussian distributions. ^ Because this new figure of merit is heavily based on the statistical nature of the data, a new filtering technique is introduced to accommodate for the accumulation process involved with histogram data. When data is accumulated into a frequency histogram, the data is inherently smoothed in a linear fashion, since an averaging effect is taking place as the histogram is generated. This new filtering scheme addresses data that is accumulated in the uneven resolution of the channels of the frequency histogram. ^ The qualitative interpretation of flow cytometric data is currently a time consuming and imprecise method for evaluating histogram data. This method offers a broader spectrum of capabilities in the analysis of histograms, since the figure of merit derived in this dissertation integrates within its mathematics both a measure of similarity and the percentage of overlap between the distributions under analysis. ^
Resumo:
This work outlines the theoretical advantages of multivariate methods in biomechanical data, validates the proposed methods and outlines new clinical findings relating to knee osteoarthritis that were made possible by this approach. New techniques were based on existing multivariate approaches, Partial Least Squares (PLS) and Non-negative Matrix Factorization (NMF) and validated using existing data sets. The new techniques developed, PCA-PLS-LDA (Principal Component Analysis – Partial Least Squares – Linear Discriminant Analysis), PCA-PLS-MLR (Principal Component Analysis – Partial Least Squares –Multiple Linear Regression) and Waveform Similarity (based on NMF) were developed to address the challenging characteristics of biomechanical data, variability and correlation. As a result, these new structure-seeking technique revealed new clinical findings. The first new clinical finding relates to the relationship between pain, radiographic severity and mechanics. Simultaneous analysis of pain and radiographic severity outcomes, a first in biomechanics, revealed that the knee adduction moment’s relationship to radiographic features is mediated by pain in subjects with moderate osteoarthritis. The second clinical finding was quantifying the importance of neuromuscular patterns in brace effectiveness for patients with knee osteoarthritis. I found that brace effectiveness was more related to the patient’s unbraced neuromuscular patterns than it was to mechanics, and that these neuromuscular patterns were more complicated than simply increased overall muscle activity, as previously thought.
Resumo:
With the development of information technology, the theory and methodology of complex network has been introduced to the language research, which transforms the system of language in a complex networks composed of nodes and edges for the quantitative analysis about the language structure. The development of dependency grammar provides theoretical support for the construction of a treebank corpus, making possible a statistic analysis of complex networks. This paper introduces the theory and methodology of the complex network and builds dependency syntactic networks based on the treebank of speeches from the EEE-4 oral test. According to the analysis of the overall characteristics of the networks, including the number of edges, the number of the nodes, the average degree, the average path length, the network centrality and the degree distribution, it aims to find in the networks potential difference and similarity between various grades of speaking performance. Through clustering analysis, this research intends to prove the network parameters’ discriminating feature and provide potential reference for scoring speaking performance.
Resumo:
Les courriels Spams (courriels indésirables ou pourriels) imposent des coûts annuels extrêmement lourds en termes de temps, d’espace de stockage et d’argent aux utilisateurs privés et aux entreprises. Afin de lutter efficacement contre le problème des spams, il ne suffit pas d’arrêter les messages de spam qui sont livrés à la boîte de réception de l’utilisateur. Il est obligatoire, soit d’essayer de trouver et de persécuter les spammeurs qui, généralement, se cachent derrière des réseaux complexes de dispositifs infectés, ou d’analyser le comportement des spammeurs afin de trouver des stratégies de défense appropriées. Cependant, une telle tâche est difficile en raison des techniques de camouflage, ce qui nécessite une analyse manuelle des spams corrélés pour trouver les spammeurs. Pour faciliter une telle analyse, qui doit être effectuée sur de grandes quantités des courriels non classés, nous proposons une méthodologie de regroupement catégorique, nommé CCTree, permettant de diviser un grand volume de spams en des campagnes, et ce, en se basant sur leur similarité structurale. Nous montrons l’efficacité et l’efficience de notre algorithme de clustering proposé par plusieurs expériences. Ensuite, une approche d’auto-apprentissage est proposée pour étiqueter les campagnes de spam en se basant sur le but des spammeur, par exemple, phishing. Les campagnes de spam marquées sont utilisées afin de former un classificateur, qui peut être appliqué dans la classification des nouveaux courriels de spam. En outre, les campagnes marquées, avec un ensemble de quatre autres critères de classement, sont ordonnées selon les priorités des enquêteurs. Finalement, une structure basée sur le semiring est proposée pour la représentation abstraite de CCTree. Le schéma abstrait de CCTree, nommé CCTree terme, est appliqué pour formaliser la parallélisation du CCTree. Grâce à un certain nombre d’analyses mathématiques et de résultats expérimentaux, nous montrons l’efficience et l’efficacité du cadre proposé.
Resumo:
fuzzySim is an R package for calculating fuzzy similarity in species occurrence patterns. It includes functions for data preparation, such as converting species lists (long format) to presence-absence tables (wide format), obtaining unique abbreviations of species names, or transposing (parts of) complex data frames; and sample data sets for providing practical examples. It can convert binary presence-absence to fuzzy occurrence data, using e.g. trend surface analysis, inverse distance interpolation or prevalence-independent environmental favourability modelling, for multiple species simultaneously. It then calculates fuzzy similarity among (fuzzy) species distributions and/or among (fuzzy) regional species compositions. Currently available similarity indices are Jaccard, Sørensen, Simpson, and Baroni-Urbani & Buser.
Resumo:
Background: Aspergillosis has been identified as one of the hospital acquired infections but the contribution of water and inhouse air as possible sources of Aspergillus infection in immunocompromised individuals like HIV-TB patients have not been studied in any hospital setting in Nigeria. Objective: To identify and investigate genetic relationship between clinical and environmental Aspergillus species associated with HIV-TB co infected patients. Methods: DNA extraction, purification, amplification and sequencing of Internal Transcribed Spacer (ITS) genes were performed using standard protocols. Similarity search using BLAST on NCBI was used for species identification and MEGA 5.0 was used for phylogenetic analysis. Results: Analyses of sequenced ITS genes of selected fourteen (14) Aspergillus isolates identified in the GenBank database revealed Aspergillus niger (28.57%), Aspergillus tubingensis (7.14%), Aspergillus flavus (7.14%) and Aspergillus fumigatus (57.14%). Aspergillus in sputum of HIV patients were Aspergillus niger, A. fumigatus, A. tubingensis and A. flavus. Also, A. niger and A. fumigatus were identified from water and open-air. Phylogenetic analysis of sequences yielded genetic relatedness between clinical and environmental isolates. Conclusion: Water and air in health care settings in Nigeria are important sources of Aspergillus sp. for HIV-TB patients.
Resumo:
The xeroderma pigmentosum complementation group B (XPB) protein is involved in both DNA repair and transcription in human cells. It is a component of the transcription factor IIH (TFIIH) and is responsible for DNA helicase activity during nucleotide (nt) excision repair (NER). Its high evolutionary conservation has allowed identification of homologous proteins in different organisms, including plants. In contrast to other organisms, Arabidopsis thaliana harbors a duplication of the XPB orthologue (AtXPB1 and AtXPB2), and the proteins encoded by the duplicated genes are very similar (95% amino acid identity). Complementation assays in yeast rad25 mutant strains suggest the involvement of AtXPB2 in DNA repair, as already shown for AtXPB1, indicating that these proteins may be functionally redundant in the removal of DNA lesions in A. thaliana. Although both genes are expressed in a constitutive manner during the plant life cycle, Northern blot analyses suggest that light modulates the expression level of both XPB copies, and transcript levels increase during early stages of development. Considering the high similarity between AtXPB1 and AtXPB2 and that both of predicted proteins may act in DNA repair, it is possible that this duplication may confer more flexibility and resistance to DNA damaging agents in thale cress. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
A simple method developed for genomic DNA isolation from fungus was tested on the red alga, Gelidium sesquipedale (Clem.) Born et Thur., which is commercially exploited for its high sulfated polysaccharide (agar) content. This method is faster, cheaper, and less toxic than conventional phenol/chloroform methods. Random amplified polymorphic DNA (RAPD) amplifications were performed successfully without the necessity of purifying the DNA. RAPD markers were used to investigate the genetic similarity among three natural populations of G. sesquipedale from southern Portugal. Bulked-genomic DNA samples of 15 different individuals were made in each population. These can be conceived of as a sample of the population DNA. Of the 62 primers screened, 41 produced bands and 22 revealed polymorphisms. Genetic similarities among populations were high. Populations that are further away from each other have the lowest similarity coefficients, whereas the intermediate Ingrina population, located on the south coast, showed higher genetic similarity with the Odeceixe population located on the southwest coast, than with the Sao Rafael southern population. This suggests a higher genetic flow between Odeceixe and Ingrina or the result may be a founder effect in the sense that the species has propagated from the east coast to the south coast of Portugal. We conclude that the use of this isolation method with RAPD analysis is appropriate to characterize the genetic variability of this commercial species along its geographical distribution. Large sample sizes can be screened at a relatively low cost. Finding genetic markers for commercial populations of C. sesquipedale may be of industrial interest.
Resumo:
A simple method developed for genomic DNA isolation from fungus was tested on the red alga, Gelidium sesquipedale (Clem.) Born et Thur., which is commercially exploited for its high sulfated polysaccharide (agar) content. This method is faster, cheaper, and less toxic than conventional phenol/chloroform methods. Random amplified polymorphic DNA (RAPD) amplifications were performed successfully without the necessity of purifying the DNA. RAPD markers were used to investigate the genetic similarity among three natural populations of G. sesquipedale from southern Portugal. Bulked-genomic DNA samples of 15 different individuals were made in each population. These can be conceived of as a sample of the population DNA. Of the 62 primers screened, 41 produced bands and 22 revealed polymorphisms. Genetic similarities among populations were high. Populations that are further away from each other have the lowest similarity coefficients, whereas the intermediate Ingrina population, located on the south coast, showed higher genetic similarity with the Odeceixe population located on the southwest coast, than with the Sao Rafael southern population. This suggests a higher genetic flow between Odeceixe and Ingrina or the result may be a founder effect in the sense that the species has propagated from the east coast to the south coast of Portugal. We conclude that the use of this isolation method with RAPD analysis is appropriate to characterize the genetic variability of this commercial species along its geographical distribution. Large sample sizes can be screened at a relatively low cost. Finding genetic markers for commercial populations of C. sesquipedale may be of industrial interest.