894 resultados para similarity retrieval
Resumo:
Previous behavioral studies reported a robust effect of increased naming latencies when objects to be named were blocked within semantic category, compared to items blocked between category. This semantic context effect has been attributed to various mechanisms including inhibition or excitation of lexico-semantic representations and incremental learning of associations between semantic features and names, and is hypothesized to increase demands on verbal self-monitoring during speech production. Objects within categories also share many visual structural features, introducing a potential confound when interpreting the level at which the context effect might occur. Consistent with previous findings, we report a significant increase in response latencies when naming categorically related objects within blocks, an effect associated with increased perfusion fMRI signal bilaterally in the hippocampus and in the left middle to posterior superior temporal cortex. No perfusion changes were observed in the middle section of the left middle temporal cortex, a region associated with retrieval of lexical-semantic information in previous object naming studies. Although a manipulation of visual feature similarity did not influence naming latencies, we observed perfusion increases in the perirhinal cortex for naming objects with similar visual features that interacted with the semantic context in which objects were named. These results provide support for the view that the semantic context effect in object naming occurs due to an incremental learning mechanism, and involves increased demands on verbal self-monitoring.
Resumo:
This thesis presents new methods for classification and thematic grouping of billions of web pages, at scales previously not achievable. This process is also known as document clustering, where similar documents are automatically associated with clusters that represent various distinct topic. These automatically discovered topics are in turn used to improve search engine performance by only searching the topics that are deemed relevant to particular user queries.
Resumo:
Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..
Resumo:
This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.
Resumo:
The function of a protein can be partially determined by the information contained in its amino acid sequence. It can be assumed that proteins with similar amino acid sequences normally have closer functions. Hence analysing the similarity of proteins has become one of the most important areas of protein study. In this work, a layered comparison method is used to analyze the similarity of proteins. It is based on the empirical mode decomposition (EMD) method, and protein sequences are characterized by the intrinsic mode functions (IMFs). The similarity of proteins is studied with a new cross-correlation formula. It seems that the EMD method can be used to detect the functional relationship of two proteins. This kind of similarity method is a complement of traditional sequence similarity approaches which focus on the alignment of amino acids
Resumo:
This thesis targets on a challenging issue that is to enhance users' experience over massive and overloaded web information. The novel pattern-based topic model proposed in this thesis can generate high-quality multi-topic user interest models technically by incorporating statistical topic modelling and pattern mining. We have successfully applied the pattern-based topic model to both fields of information filtering and information retrieval. The success of the proposed model in finding the most relevant information to users mainly comes from its precisely semantic representations to represent documents and also accurate classification of the topics at both document level and collection level.
Resumo:
Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
Resumo:
Studies of delayed nonmatching-to-sample (DNMS) performance following lesions of the monkey cortex have revealed a critical circuit of brain regions involved in forming memories and retaining and retrieving stimulus representations. Using event-related functional magnetic resonance imaging (fMRI), we measured brain activity in 10 healthy human participants during performance of a trial-unique visual DNMS task using novel barcode stimuli. The event-related design enabled the identification of activity during the different phases of the task (encoding, retention, and retrieval). Several brain regions identified by monkey studies as being important for successful DNMS performance showed selective activity during the different phases, including the mediodorsal thalamic nucleus (encoding), ventrolateral prefrontal cortex (retention), and perirhinal cortex (retrieval). Regions showing sustained activity within trials included the ventromedial and dorsal prefrontal cortices and occipital cortex. The present study shows the utility of investigating performance on tasks derived from animal models to assist in the identification of brain regions involved in human recognition memory.
Resumo:
Ignoring an object slows subsequent naming responses to it, a phenomenon known as negative priming (NP). A central issue in NP research concerns the level of representation at which the effect occurs. As object naming is typically considered to involve access to abstract semantic representations, Tipper 1985 proposed that the NP effect occurred at this level of processing, and other researchers supported this proposal by demonstrating a similar result with categorically related objects (e.g., Allport et al., 1985; Murray, 1995), an effect referred to as semantic NP. However, objects within categories share more physical or structural features than objects from different categories. Consequently, the NP effect observed with categorically related objects might occur at a structural rather than semantic level of representation. We used event related fMRI interleaving overt object naming and image acquisition to demonstrate for the first time that the semantic NP effect activates the left posterior-mid fusiform and insular-opercular cortices. Moreover, both naming latencies and left posterior-mid fusiform cortex responses were influenced by the structural similarity of prime-probe object pairings in the categorically related condition, increasing with the number of shared features. None of the cerebral regions activated in a previous fMRI study of the identity NP effect (de Zubicaray et al., 2006) showed similar activation during semantic NP, including the left anterolateral temporal cortex, a region considered critical for semantic processing. The results suggest that the identity and semantic NP effects differ with respect to their neural mechanisms, and the label "semantic NP" might be a misnomer. We conclude that the effect is most likely the result of competition between structurally similar category exemplars that determines the efficiency of object name retrieval.