26 resultados para Lexical similarity

em BORIS: Bern Open Repository and Information System - Berna - Suiça


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome predictions based on selected genes would be a very welcome approach for taxonomic studies, including DNA-DNA similarity, G+C content and representative phylogeny of bacteria. At present, DNA-DNA hybridizations are still considered the gold standard in species descriptions. However, this method is time-consuming and troublesome, and datasets can vary significantly between experiments as well as between laboratories. For the same reasons, full matrix hybridizations are rarely performed, weakening the significance of the results obtained. The authors established a universal sequencing approach for the three genes recN, rpoA and thdF for the Pasteurellaceae, and determined if the sequences could be used for predicting DNA-DNA relatedness within the family. The sequence-based similarity values calculated using a previously published formula proved most useful for species and genus separation, indicating that this method provides better resolution and no experimental variation compared to hybridization. By this method, cross-comparisons within the family over species and genus borders easily become possible. The three genes also serve as an indicator of the genome G+C content of a species. A mean divergence of around 1 % was observed from the classical method, which in itself has poor reproducibility. Finally, the three genes can be used alone or in combination with already-established 16S rRNA, rpoB and infB gene-sequencing strategies in a multisequence-based phylogeny for the family Pasteurellaceae. It is proposed to use the three sequences as a taxonomic tool, replacing DNA-DNA hybridization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Numerous genetic variants of the Echinococcus antigen B (AgB) are encountered within a single metacestode. This could be a reflection of gene redundancy or the result of a somatic hypermutation process. We evaluate the complexity of the AgB multigene family by characterizing the upstream promoter regions of the 4 already known genes (EgAgB1-EgAgB4) and evaluating their redundancy in the genome of 3 Echinococcus species (E. granulosus, E. ortleppi and E. multilocularis) using PCR-based approaches. We have ascertained that the number of AgB gene copies is quite variable, both within and between species. The most repetitive gene seems to be AgB3, of which there are more than 110 copies in E. ortleppi. For E. granulosus, we have cloned and characterized 10 distinct upstream promoter regions of AgB3 from a single metacestode. Our sequences suggest that AgB1 and AgB3 are involved in gene conversion. These results are discussed in light of the role of gene redundancy and recombination in parasite evasion mechanisms of host immunity, which at present are known for protozoan organisms, but virtually unknown for multicellular parasites.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multilocus sequence analysis (MLSA) based on recN, rpoA and thdF genes was done on more than 30 species of the family Enterobacteriaceae with a focus on Cronobacter and the related genus Enterobacter. The sequences provide valuable data for phylogenetic, taxonomic and diagnostic purposes. Phylogenetic analysis showed that the genus Cronobacter forms a homogenous cluster related to recently described species of Enterobacter, but distant to other species of this genus. Combining sequence information on all three genes is highly representative for the species' %GC-content used as taxonomic marker. Sequence similarity of the three genes and even of recN alone can be used to extrapolate genetic similarities between species of Enterobacteriaceae. Finally, the rpoA gene sequence, which is the easiest one to determine, provides a powerful diagnostic tool to identify and differentiate species of this family. The comparative analysis gives important insights into the phylogeny and genetic relatedness of the family Enterobacteriaceae and will serve as a basis for further studies and clarifications on the taxonomy of this large and heterogeneous family.