19 resultados para textual similarity

em Aston University Research Archive


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Jaccard has been the choice similarity metric in ecology and forensic psychology for comparison of sites or offences, by species or behaviour. This paper applies a more powerful hierarchical measure - taxonomic similarity (s), recently developed in marine ecology - to the task of behaviourally linking serial crime. Forensic case linkage attempts to identify behaviourally similar offences committed by the same unknown perpetrator (called linked offences). s considers progressively higher-level taxa, such that two sites show some similarity even without shared species. We apply this index by analysing 55 specific offence behaviours classified hierarchically. The behaviours are taken from 16 sexual offences by seven juveniles where each offender committed two or more offences. We demonstrate that both Jaccard and s show linked offences to be significantly more similar than unlinked offences. With up to 20% of the specific behaviours removed in simulations, s is equally or more effective at distinguishing linked offences than where Jaccard uses a full data set. Moreover, s retains significant difference between linked and unlinked pairs, with up to 50% of the specific behaviours removed. As police decision-making often depends upon incomplete data, s has clear advantages and its application may extend to other crime types. Copyright © 2007 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is evidence for both advantages and disadvantages in normal recognition of living over nonliving things. This paradox has been attributed to high levels of perceptual similarity within living categories having a different effect on performance in different contexts. However, since living things are intrinsically more similar to each other, previous studies could not determine whether the various category effects were due to perceptual similarity, or to other characteristics of living things. We used novel animal and vehicle stimuli that were matched for similarity to measure the influence of perceptual similarity in different contexts. We found that displaying highly similar objects in blocked sets reduced their perceived similarity, eliminating the detrimental effect on naming performance. Experiment 1 demonstrated a disadvantage for highly similar objects in name learning and name verification using mixed groups of similar and dissimilar animals and vehicles. Experiment 2 demonstrated no disadvantage for the same highly similar objects when they were blocked, e.g., similar animals presented alone. Thus, perceptual similarity, rather than other characteristics particular to living things, is affected by context, and could create apparent category effects under certain testing conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The present work studies the overall structuring of radio news discourse via investigating three metatextual/interactive functions: (1) Discourse Organizing Elements (DOEs), (2) Attribution and (3) Sentential and Nominal Background Information (SBI & NBI). An extended corpus of about 73,000 words from BBC and Radio Damascus news is used to study DOEs and a restricted corpus of 38,000 words for Attribution and S & NBI. A situational approach is adopted to assess the influence of factors such as medium and audience on these functions and their frequence. It is found that: (1) DOEs are organizational and their frequency is determined by length of text; (2) Attribution Function in accordance with the editor's strategy and its frequency is audience sensitive; and (3) BI provides background information and is determined by audience and news topics. Secondly, the salient grammatical elements in DOEs are discourse deictic demonstratives, address pronouns and nouns referring to `the news'. Attribution is realized in reporting/reported clauses, and BI in a sentence, a clause or a nominal group. Thirdly, DOEs establish a hierarchy of (1) news, (2) summary/expansion and (3) item: including topic introduction and details. While Attribution is generally, and SBI solely, a function of detailing, NBI and proper names are generally a function of summary and topic introduction. Being primarily addressed to audience and referring metatextually, the functions investigated support Sinclair's interactive and autonomous planes of discourse. They also shed light on the part(s) of the linguistic system which realize the metatextual/interactive function. Strictly, `discourse structure' inevitably involves a rank-scale; but news discourse also shows a convention of item `listing'. Hence only within the boundary of variety (ultimately interpreted across language and in its situation) can textual functions and discourse structure be studied. Finally, interlingual variety study provides invaluable insights into a level of translation that goes beyond matching grammatical systems or situational factors, an interpretive level which has to be described in linguistic analysis of translation data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this research project is to compare published history textbooks written for upper-secondary/tertiary study in the U.S. and Spain using Halliday's (1994) Theme/Rheme construct. The motivation for using the Theme/Rheme construct to analyze professional texts in the two languages is two-fold. First of all, while there exists a multitude of studies at the grammatical and phonological levels between the two languages, very little analysis has been carried out in comparison at the level of text, beyond that of comparing L1/L2 student writing. Secondly, thematic considerations allow the analyst to highlight areas of textual organization in a systematic way for purposes of comparison. The basic hypothesis tested here rests on the premise that similarity in the social function of the texts results in similar Theme choice and thematic patterning across languages, barring certain linguistic constraints. The corpus for this study consists of 20 texts: 10 from various history textbooks published in the U.S. and 10 from various history textbooks published in Spain. The texts chosen represent a variety of authors, in order to control for author style or preference. Three overall areas of analysis were carried out, representing Halliday's (1994) three metafunctions: the ideational, the interpersonal and the textual. The ideational analysis shows similarities across the two corpora in terms of participant roles and circumstances as Theme, with a slight difference in participants involved in material processes, which is shown to reflect a minor difference in the construal of the field of history in the two cultures. The textual analysis shows overall similarities with respect to text organization, and the interpersonal analysis shows overall similarities as regards the downplay of discrepant interpretations of historical events as well as a low frequency of interactive textual features, manifesting the informational focus of the texts. At the same time, differences in results amongst texts within each of the corpora demonstrate possible effect of subject matter, in many cases, and individual author style in others. Overall, the results confirm that similarity in content, but above all in purpose and audience, result in texts which show similarities in textual features, setting aside certain grammatical constraints.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modelling class B G-protein-coupled receptors (GPCRs) using class A GPCR structural templates is difficult due to lack of homology. The plant GPCR, GCR1, has homology to both class A and class B GPCRs. We have used this to generate a class A-class B alignment, and by incorporating maximum lagged correlation of entropy and hydrophobicity into a consensus score, we have been able to align receptor transmembrane regions. We have applied this analysis to generate active and inactive homology models of the class B calcitonin gene-related peptide (CGRP) receptor, and have supported it with site-directed mutagenesis data using 122 CGRP receptor residues and 144 published mutagenesis results on other class B GPCRs. The variation of sequence variability with structure, the analysis of polarity violations, the alignment of group-conserved residues and the mutagenesis results at 27 key positions were particularly informative in distinguishing between the proposed and plausible alternative alignments. Furthermore, we have been able to associate the key molecular features of the class B GPCR signalling machinery with their class A counterparts for the first time. These include the [K/R]KLH motif in intracellular loop 1, [I/L]xxxL and KxxK at the intracellular end of TM5 and TM6, the NPXXY/VAVLY motif on TM7 and small group-conserved residues in TM1, TM2, TM3 and TM7. The equivalent of the class A DRY motif is proposed to involve Arg(2.39), His(2.43) and Glu(3.46), which makes a polar lock with T(6.37). These alignments and models provide useful tools for understanding class B GPCR function.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In April 2009, Google Images added a filter for narrowing search results by colour. Several other systems for searching image databases by colour were also released around this time. These colour-based image retrieval systems enable users to search image databases either by selecting colours from a graphical palette (i.e., query-by-colour), by drawing a representation of the colour layout sought (i.e., query-by-sketch), or both. It was comments left by readers of online articles describing these colour-based image retrieval systems that provided us with the inspiration for this research. We were surprised to learn that the underlying query-based technology used in colour-based image retrieval systems today remains remarkably similar to that of systems developed nearly two decades ago. Discovering this ageing retrieval approach, as well as uncovering a large user demographic requiring image search by colour, made us eager to research more effective approaches for colour-based image retrieval. In this thesis, we detail two user studies designed to compare the effectiveness of systems adopting similarity-based visualisations, query-based approaches, or a combination of both, for colour-based image retrieval. In contrast to query-based approaches, similarity-based visualisations display and arrange database images so that images with similar content are located closer together on screen than images with dissimilar content. This removes the need for queries, as users can instead visually explore the database using interactive navigation tools to retrieve images from the database. As we found existing evaluation approaches to be unreliable, we describe how we assessed and compared systems adopting similarity-based visualisations, query-based approaches, or both, meaningfully and systematically using our Mosaic Test - a user-based evaluation approach in which evaluation study participants complete an image mosaic of a predetermined target image using the colour-based image retrieval system under evaluation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Little research has been undertaken into high stakes deception, and even less into high stakes deception in written text. This study addresses that gap. In this thesis, I present a new approach to detecting deception in written narratives based on the definition of deception as a progression and focusing on identifying deceptive linguistic strategy rather than individual cues. I propose a new approach for subdividing whole narratives into their constituent episodes, each of which is linguistically profiled and their progression mapped to identify authors’ deceptive strategies based on cue interaction. I conduct a double blind study using qualitative and quantitative analysis in which linguistic strategy (cue interaction and progression) and overall cue presence are used to predict deception in witness statements. This results in linguistic strategy analysis correctly predicting 85% of deceptive statements (92% overall) compared to 54% (64% overall) with cues identified on a whole statement basis. These results suggest that deception cues are not static, and that the value of individual cues as deception predictors is linked to their interaction with other cues. Results also indicate that in certain cue combinations, individual self-references (I, Me and My), previously believed to be indicators of truthfulness, are effective predictors of deceptive linguistic strategy at work

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a new similarity measure to compute the pair-wise similarity of text-based documents based on patterns of the words in the documents. First we develop a kappa measure for pair-wise comparison of documents then we use ordered weighting averaging operator to define a document similarity measure for a set of documents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background - Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel. Results - The kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP [1], MCHBN [2], and MHCBench [3]. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 AROC for the MHCBench data sets (up from 0.756), and an average of 0.96 AROC for multiple alleles of the MHCPEP database. Conclusion - The method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A set of 38 epitopes and 183 non-epitopes, which bind to alleles of the HLA-A3 supertype, was subjected to a combination of comparative molecular similarity indices analysis (CoMSIA) and soft independent modeling of class analogy (SIMCA). During the process of T cell recognition, T cell receptors (TCR) interact with the central section of the bound nonamer peptide; thus only positions 4−8 were considered in the study. The derived model distinguished 82% of the epitopes and 73% of the non-epitopes after cross-validation in five groups. The overall preference from the model is for polar amino acids with high electron density and the ability to form hydrogen bonds. These so-called “aggressive” amino acids are flanked by small-sized residues, which enable such residues to protrude from the binding cleft and take an active role in TCR-mediated T cell recognition. Combinations of “aggressive” and “passive” amino acids in the middle part of epitopes constitute a putative TCR binding motif

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Epitope identification is the basis of modern vaccine design. The present paper studied the supermotif of the HLA-A3 superfamily, using comparative molecular similarity indices analysis (CoMSIA). Four alleles with high phenotype frequencies were used: A*1101, A*0301, A*3101 and A*6801. Five physicochemical properties—steric bulk, electrostatic potential, local hydro-phobicity, hydrogen-bond donor and acceptor abilities—were considered and ‘all fields’ models were produced for each of the alleles. The models have a moderate level of predictivity and there is a good correlation between the data. A revised HLA-A3 supermotif was defined based on the comparison of favoured and disfavoured properties for each position of the MHC bound peptide. The present study demonstrated that CoMSIA is an effective tool for studying peptide–MHC interactions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Motivation: Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function, and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Results: Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from “first passage probability distribution” to summarize statistics of ensemble averaged amino acid propensity values. In this paper, we introduce and elaborate this approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS, ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY WITH PRIOR ARRANGEMENT