957 resultados para Text similarity analysis
Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications.
Resumo:
A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.
Resumo:
Query suggestion is an important feature of the search engine with the explosive and diverse growth of web contents. Different kind of suggestions like query, image, movies, music and book etc. are used every day. Various types of data sources are used for the suggestions. If we model the data into various kinds of graphs then we can build a general method for any suggestions. In this paper, we have proposed a general method for query suggestion by combining two graphs: (1) query click graph which captures the relationship between queries frequently clicked on common URLs and (2) query text similarity graph which finds the similarity between two queries using Jaccard similarity. The proposed method provides literally as well as semantically relevant queries for users' need. Simulation results show that the proposed algorithm outperforms heat diffusion method by providing more number of relevant queries. It can be used for recommendation tasks like query, image, and product suggestion.
Resumo:
Quantum molecular similarity (QMS) techniques are used to assess the response of the electron density of various small molecules to application of a static, uniform electric field. Likewise, QMS is used to analyze the changes in electron density generated by the process of floating a basis set. The results obtained show an interrelation between the floating process, the optimum geometry, and the presence of an external field. Cases involving the Le Chatelier principle are discussed, and an insight on the changes of bond critical point properties, self-similarity values and density differences is performed
Resumo:
HTLV-1 is endemic in Brazil and HIV/ HTLV-1 coinfection has been detected, mostly in the northeast region. Cosmopolitan HTLV-1a is the main subtype that circulates in Brazil. This study characterized 17 HTLV-1 isolates from HIV coinfected patients of southern (n = 7) and southeastern (n = 10) Brazil. HTLV-1 provirus DNA was amplified by nested PCR (env and LTR) and sequenced. Env sequences (705 bp) from 15 isolates and LTR sequences (731 bp) from 17 isolates showed 99.5% and 98.8% similarity among sequences, respectively. Comparing these sequences with ATK (HTLV-1a) and Mel5 (HTLV-1c) prototypes, similarities of 99% and 97.4%, respectively, for env and LTR with ATK, and 91.6% and 90.3% with Mel5, were detected. Phylogenetic analysis showed that all sequences belonged to the transcontinental subgroup A of the Cosmopolitan subtype, clustering in two Latin American clusters.
Resumo:
Les cadriciels et les bibliothèques sont indispensables aux systèmes logiciels d'aujourd'hui. Quand ils évoluent, il est souvent fastidieux et coûteux pour les développeurs de faire la mise à jour de leur code. Par conséquent, des approches ont été proposées pour aider les développeurs à migrer leur code. Généralement, ces approches ne peuvent identifier automatiquement les règles de modification une-remplacée-par-plusieurs méthodes et plusieurs-remplacées-par-une méthode. De plus, elles font souvent un compromis entre rappel et précision dans leur résultats en utilisant un ou plusieurs seuils expérimentaux. Nous présentons AURA (AUtomatic change Rule Assistant), une nouvelle approche hybride qui combine call dependency analysis et text similarity analysis pour surmonter ces limitations. Nous avons implanté AURA en Java et comparé ses résultats sur cinq cadriciels avec trois approches précédentes par Dagenais et Robillard, M. Kim et al., et Schäfer et al. Les résultats de cette comparaison montrent que, en moyenne, le rappel de AURA est 53,07% plus que celui des autre approches avec une précision similaire (0,10% en moins).
Resumo:
Pesiqta Rabbati is a unique homiletic midrash that follows the liturgical calendar in its presentation of homilies for festivals and special Sabbaths. This article attempts to utilize Pesiqta Rabbati in order to present a global theory of the literary production of rabbinic/homiletic literature. In respect to Pesiqta Rabbati it explores such areas as dating, textual witnesses, integrative apocalyptic meta-narrative, describing and mapping the structure of the text, internal and external constraints that impacted upon the text, text linguistic analysis, form-analysis: problems in the texts and linguistic gap-filling, transmission of text, strict formalization of a homiletic unit, deconstructing and reconstructing homiletic midrashim based upon form-analytic units of the homily, Neusner’s documentary hypothesis, surface structures of the homiletic unit, and textual variants. The suggested methodology may assist scholars in their production of editions of midrashic works by eliminating superfluous material and in their decoding and defining of ancient texts.
Resumo:
Background: Biomineralization is a process encompassing all mineral containing tissues produced within an organism. One of the most dynamic examples of this process is the formation of the mollusk shell, comprising a variety of crystal phases and microstructures. The organic component incorporated within the shell is said to dictate this architecture. However general understanding of how this process is achieved remains ambiguous. The mantle is a conserved organ involved in shell formation throughout molluscs. Specifically the mantle is thought to be responsible for secreting the protein component of the shell. This study employs molecular approaches to determine the spatial expression of genes within the mantle tissue to further the elucidation of the shell biomineralization. Results: A microarray platform was custom generated (PmaxArray 1.0) from the pearl oyster Pinctada maxima. PmaxArray 1.0 consists of 4992 expressed sequence tags (ESTs) originating from mantle tissue. This microarray was used to analyze the spatial expression of ESTs throughout the mantle organ. The mantle was dissected into five discrete regions and analyzed for differential gene expression with PmaxArray 1.0. Over 2000 ESTs were determined to be differentially expressed among the tissue sections, identifying five major expression regions. In situ hybridization validated and further localized the expression for a subset of these ESTs. Comparative sequence similarity analysis of these ESTs revealed a number of the transcripts were novel while others showed significant sequence similarities to previously characterized shell related genes.
Resumo:
Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.
Resumo:
Antifolates are competitive inhibitors of dihydrofolate reductase ( DHFR), a conserved enzyme that is central to metabolism and widely targeted in pathogenic diseases, cancer and autoimmune disorders. Although most clinically used antifolates are known to be target specific, some display a fair degree of cross-reactivity with DHFRs from other species. A method that enables identification of determinants of affinity and specificity in target DHFRs from different species and provides guidelines for the design of antifolates is currently lacking. To address this, we first captured the potential druggable space of a DHFR in a substructure called the `supersite' and classified supersites of DHFRs from 56 species into 16 `site-types' based on pairwise structural similarity. Analysis of supersites across these site-types revealed that DHFRs exhibit varying extents of dissimilarity at structurally equivalent positions in and around the binding site. We were able to explain the pattern of affinities towards chemically diverse antifolates exhibited by DHFRs of different site-types based on these structural differences. We then generated an antifolate-DHFR network by mapping known high-affinity antifolates to their respective supersites and used this to identify antifolates that can be repurposed based on similarity between supersites or antifolates. Thus, we identified 177 human-specific and 458 pathogen-specific antifolates, a large number of which are supported by available experimental data. Thus, in the light of the clinical importance of DHFR, we present a novel approach to identifying differences in the druggable space of DHFRs that can be utilized for rational design of antifolates.
Resumo:
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuously cored boreholes, 100 to 220m deep were drilled in the northern part of the Po Plain by Regione Lombardia in the last five years. Quantitative provenance analysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carried out by using multivariate statistical analysis (principal component analysis, PCA, and similarity analysis) on an integrated data set, including high-resolution bulk petrography and heavy-mineral analyses on Pleistocene sands and of 250 major and minor modern rivers draining the southern flank of the Alps from West to East (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations, metamorphic and quartzofeldspathic detritus from the Western and Central Alps was carried from the axial belt to the Po basin longitudinally parallel to the SouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenario rapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset of the first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA and similarity analysis from core samples show that the longitudinal trunk river at this time was shifted southward by the rapid southward and westward progradation of transverse alluvial river systems fed from the Central and Southern Alps. Sediments were transported southward by braided river systems as well as glacial sediments transported by Alpine valley glaciers invaded the alluvial plain. Kew words: Detrital modes; Modern sands; Provenance; Principal Components Analysis; Similarity, Canberra Distance; palaeodrainage
Resumo:
Foram coletadas 143 amostras de mãos de humanos e camas hospitalares, através de swabs no caldo BHI, em um hospital escola da cidade de Ribeirão Preto/SP. As amostras coletadas foram incubadas a 37ºC por 24 horas e após este período as culturas foram semeadas em placas de Petri contendo agar Staphylococcus Médium 110. As colônias típicas do gênero Staphylococcus foram colhidas e estocados a 4ºC até o momento de elaboração das provas de catalase, manitol, hemólise, DNAse e coagulase. As cepas isoladas foram analisadas através da técnica de RAPD-PCR para verificar o grau de similaridade. A sensibilidade das cepas isoladas foi testada frente a 10 diferentes antibióticos. Das 92 cepas de Staphylococcus sp isoladas, 67 (72,8%) foram identificados como Staphylococcus coagulase-negativas e 25 (27,2%) como Staphylococcus coagulase-positivas. A análise de similaridade mostrou uma grande heterogeneidade entre as cepas, entretanto foram isoladas algumas cepas com 100% de similaridade. Resistência a oxacilina foi encontrada em 39 (42%) cepas. Duas cepas de estafilococos coagulase-negativos mostraram-se resistentes a vancomicina. Onze cepas (12%) de estafilococos foram consideradas multirresistentes. Medidas de desinfecção das mãos de pessoal e dos leitos hospitalares e a racionalização do uso indiscriminado de antibióticos podem contribuir para a queda da transmissão de patógenos e diminuição da pressão de seleção, e conseqüentemente diminuindo a freqüência e letalidade das infecções nosocomiais.
Resumo:
This study aims to characterize and compare three Cerrado areas (one cerradão and two cerrado sensu stricto areas) in Patrânia, São Paulo state, southeastern Brazil, concerning the floristic composition. In total, 250 taxa were found belonging to four species of pteridophytes, one species of an exotic gymnosperm and 243 species of angiosperms. Differences in species number and proportion of the woody and herbaceous components were observed among the three Cerrado areas. The similarity analysis revealed that the cerradão seems quite peculiar, showing low similarity level with the cerrado sensu stricto areas contiguous to it, being more similar to other cerradão areas located in nearby municipalities. © 2012 Check List and Authors.
Resumo:
As distributed collaborative applications and architectures are adopting policy based management for tasks such as access control, network security and data privacy, the management and consolidation of a large number of policies is becoming a crucial component of such policy based systems. In large-scale distributed collaborative applications like web services, there is the need of analyzing policy interactions and integrating policies. In this thesis, we propose and implement EXAM-S, a comprehensive environment for policy analysis and management, which can be used to perform a variety of functions such as policy property analyses, policy similarity analysis, policy integration etc. As part of this environment, we have proposed and implemented new techniques for the analysis of policies that rely on a deep study of state of the art techniques. Moreover, we propose an approach for solving heterogeneity problems that usually arise when considering the analysis of policies belonging to different domains. Our work focuses on analysis of access control policies written in the dialect of XACML (Extensible Access Control Markup Language). We consider XACML policies because XACML is a rich language which can represent many policies of interest to real world applications and is gaining widespread adoption in the industry.
Resumo:
Se comparan y contrastan las destrezas requeridas para la comprensión lectora con aquellas que se necesitan para la producción de escritos correctos, en inglés, coherentes y bien cohesionados. Se comentan las actividades didácticas relacionadas con ello.The aim of this article is to establish the relevance of teaching reading and writing skills to students at Madrid Polytechnic University, and to show the relationship and interdependence of these activities in EAP courses. The skills involved in reading and writing processes for academic purposes for L2 students are compared and commented on from a rhetorical point of view. Learning tasks based on text-type analysis are recommended as adequate activities to build schemata for writing and represent a synthesis of the teaching objectives proposed for reading and writing English courses.