Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf*idf and Rocchio.


Non-small cell lung carcinoma remains by far the leading cause of cancer-related deaths worldwide. Overexpression of FLIP, which blocks the extrinsic apoptotic pathway by inhibiting caspase-8 activation, has been identified in various cancers. We investigated FLIP and procaspase-8 expression in NSCLC and the effect of HDAC inhibitors on FLIP expression, activation of caspase-8 and drug resistance in NSCLC and normal lung cell line models. Immunohistochemical analysis of cytoplasmic and nuclear FLIP and procaspase-8 protein expression was carried out using a novel digital pathology approach. Both FLIP and procaspase-8 were found to be significantly overexpressed in tumours, and importantly, high cytoplasmic expression of FLIP significantly correlated with shorter overall survival. Treatment with HDAC inhibitors targeting HDAC1-3 downregulated FLIP expression predominantly via post-transcriptional mechanisms, and this resulted in death receptor- and caspase-8-dependent apoptosis in NSCLC cells, but not normal lung cells. In addition, HDAC inhibitors synergized with TRAIL and cisplatin in NSCLC cells in a FLIP- and caspase-8-dependent manner. Thus, FLIP and procaspase-8 are overexpressed in NSCLC, and high cytoplasmic FLIP expression is indicative of poor prognosis. Targeting high FLIP expression using HDAC1-3 selective inhibitors such as entinostat to exploit high procaspase-8 expression in NSCLC has promising therapeutic potential, particularly when used in combination with TRAIL receptor-targeted agents.


Purpose The presence of a lymphocytic infiltration in autonomic ganglia and an increased prevalence of autoantibodies and iritis in diabetic patients with autonomic neuropathy suggests a role for autoimmune mechanisms in the development of diabetic and perhaps somatic neuropathy. Corneal Langerhans cells are antigenpresenting cells which can be identified in corneal immunologic conditions using in-vivo confocal microscopy. The aim of this study was to assess the presence and density of Langerhans cells (LCs) in Bowman’s layer of the cornea in diabetic patients with varying degrees of neuropathy compared to healthy control subjects. Method 128 diabetic patients aged 58±1 years with differing severity of neuropathy (NDS – 4.7±0.28) and 26 control subjects aged 53±3 years were examined with in-vivo corneal confocal microscopy to quantify the density of “Langerhans cells” (LCs). Results LCs were observed more often in diabetic patients (73.8%) compared to control subjects (46.1%), P = 0.001. The LC density (number/mm2) was also significantly increased in diabetic patients (17.73±1.45) compared to control subjects (6.94±1.58, P = 0.001). There was a significant correlation between the density of LCs with age (r = 0.162, P = 0.047) and severity of neuropathy assessed by NDS (r =−0.202, P = 0.02). Conclusions In vivo corneal confocal microscopy enables quantification of Langerhans cells in Bowman’s layer of the cornea. There is a relationship between density of LCs and the degree of nerve damage. Corneal confocal microscopy could be a valuable tool to establish the role of immune mediated corneal nerve damage and provide insights into the pathogenesis of diabetic neuropathy.


The Quantum Probability Ranking Principle (QPRP) has been recently proposed, and accounts for interdependent document relevance when ranking. However, to be instantiated, the QPRP requires a method to approximate the interference" between two documents. In this poster, we empirically evaluate a number of different methods of approximation on two TREC test collections for subtopic retrieval. It is shown that these approximations can lead to significantly better retrieval performance over the state of the art.


Retrieval with Logical Imaging is derived from belief revision and provides a novel mechanism for estimating the relevance of a document through logical implication (i.e. P(q -> d)). In this poster, we perform the first comprehensive evaluation of Logical Imaging (LI) in Information Retrieval (IR) across several TREC test Collections. When compared against standard baseline models, we show that LI fails to improve performance. This failure can be attributed to a nuance within the model that means non-relevant documents are promoted in the ranking, while relevant documents are demoted. This is an important contribution because it not only contextualizes the effectiveness of LI, but crucially ex- plains why it fails. By addressing this nuance, future LI models could be significantly improved.


In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity. In the first approach subtopic diversification is achieved implicitly, by choosing documents that are different from each other, while in the second approach this is done explicitly, by estimating the subtopics covered by documents. Within this context, we compare methods belonging to the two paradigms. Furthermore, we investigate possible strategies for integrating the two paradigms with the aim of formulating a new ranking method for subtopic retrieval. We conduct a number of experiments to empirically validate and contrast the state-of-the-art approaches as well as instantiations of our integration approach. The results show that the integration approach outperforms state-of-the-art strategies with respect to a number of measures.


Ranking documents according to the Probability Ranking Principle has been theoretically shown to guarantee optimal retrieval effectiveness in tasks such as ad hoc document retrieval. This ranking strategy assumes independence among document relevance assessments. This assumption, however, often does not hold, for example in the scenarios where redundancy in retrieved documents is of major concern, as it is the case in the sub–topic retrieval task. In this chapter, we propose a new ranking strategy for sub–topic retrieval that builds upon the interdependent document relevance and topic–oriented models. With respect to the topic– oriented model, we investigate both static and dynamic clustering techniques, aiming to group topically similar documents. Evidence from clusters is then combined with information about document dependencies to form a new document ranking. We compare and contrast the proposed method against state–of–the–art approaches, such as Maximal Marginal Relevance, Portfolio Theory for Information Retrieval, and standard cluster–based diversification strategies. The empirical investigation is performed on the ImageCLEF 2009 Photo Retrieval collection, where images are assessed with respect to sub–topics of a more general query topic. The experimental results show that our approaches outperform the state–of–the–art strategies with respect to a number of diversity measures.


In the last years several works have investigated a formal model for Information Retrieval (IR) based on the mathematical formalism underlying quantum theory. These works have mainly exploited geometric and logical–algebraic features of the quantum formalism, for example entanglement, superposition of states, collapse into basis states, lattice relationships. In this poster I present an analogy between a typical IR scenario and the double slit experiment. This experiment exhibits the presence of interference phenomena between events in a quantum system, causing the Kolmogorovian law of total probability to fail. The analogy allows to put forward the routes for the application of quantum probability theory in IR. However, several questions need still to be addressed; they will be the subject of my PhD research


The assumptions underlying the Probability Ranking Principle (PRP) have led to a number of alternative approaches that cater or compensate for the PRP’s limitations. All alternatives deviate from the PRP by incorporating dependencies. This results in a re-ranking that promotes or demotes documents depending upon their relationship with the documents that have been already ranked. In this paper, we compare and contrast the behaviour of state-of-the-art ranking strategies and principles. To do so, we tease out analytical relationships between the ranking approaches and we investigate the document kinematics to visualise the effects of the different approaches on document ranking.


For TREC Crowdsourcing 2011 (Stage 2) we propose a networkbased approach for assigning an indicative measure of worker trustworthiness in crowdsourced labelling tasks. Workers, the gold standard and worker/gold standard agreements are modelled as a network. For the purpose of worker trustworthiness assignment, a variant of the PageRank algorithm, named TurkRank, is used to adaptively combine evidence that suggests worker trustworthiness, i.e., agreement with other trustworthy co-workers and agreement with the gold standard. A single parameter controls the importance of co-worker agreement versus gold standard agreement. The TurkRank score calculated for each worker is incorporated with a worker-weighted mean label aggregation.


In this paper we define two models of users that require diversity in search results; these models are theoretically grounded in the notion of intrinsic and extrinsic diversity. We then examine Intent-Aware Expected Reciprocal Rank (ERR-IA), one of the official measures used to assess diversity in TREC 2011-12, with respect to the proposed user models. By analyzing ranking preferences as expressed by the user models and those estimated by ERR-IA, we investigate whether ERR-IA assesses document rankings according to the requirements of the diversity retrieval task expressed by the two models. Empirical results demonstrate that ERR-IA neglects query-intents coverage by attributing excessive importance to redundant relevant documents. ERR-IA behavior is contrary to the user models that require measures to first assess diversity through the coverage of intents, and then assess the redundancy of relevant intents. Furthermore, diversity should be considered separately from document relevance and the documents positions in the ranking.


This study presents an acoustic emission (AE) based fault diagnosis for low speed bearing using multi-class relevance vector machine (RVM). A low speed test rig was developed to simulate the various defects with shaft speeds as low as 10 rpm under several loading conditions. The data was acquired using anAEsensor with the test bearing operating at a constant loading (5 kN) andwith a speed range from20 to 80 rpm. This study is aimed at finding a reliable method/tool for low speed machines fault diagnosis based on AE signal. In the present study, component analysis was performed to extract the bearing feature and to reduce the dimensionality of original data feature. The result shows that multi-class RVM offers a promising approach for fault diagnosis of low speed machines.


Karasek's Job Demand-Control model proposes that control mitigates the positive effects of work stressors on employee strain. Evidence to date remains mixed and, although a number of individual-level moderators have been examined, the role of broader, contextual, group factors has been largely overlooked. In this study, the extent to which control buffered or exacerbated the effects of demands on strain at the individual level was hypothesized to be influenced by perceptions of collective efficacy at the group level. Data from 544 employees in Australian organizations, nested within 23 workgroups, revealed significant three-way cross-level interactions among demands, control and collective efficacy on anxiety and job satisfaction. When the group perceived high levels of collective efficacy, high control buffered the negative consequences of high demands on anxiety and satisfaction. Conversely, when the group perceived low levels of collective efficacy, high control exacerbated the negative consequences of high demands on anxiety, but not satisfaction. In addition, a stress-exacerbating effect for high demands on anxiety and satisfaction was found when there was a mismatch between collective efficacy and control (i.e. combined high collective efficacy and low control). These results provide support for the notion that the stressor-strain relationship is moderated by both individual- and group-level factors.


With the growing size and variety of social media files on the web, it’s becoming critical to efficiently organize them into clusters for further processing. This paper presents a novel scalable constrained document clustering method that harnesses the power of search engines capable of dealing with large text data. Instead of calculating distance between the documents and all of the clusters’ centroids, a neighborhood of best cluster candidates is chosen using a document ranking scheme. To make the method faster and less memory dependable, the in-memory and in-database processing are combined in a semi-incremental manner. This method has been extensively tested in the social event detection application. Empirical analysis shows that the proposed method is efficient both in computation and memory usage while producing notable accuracy.


Relevation! is a system for performing relevance judgements for information retrieval evaluation. Relevation! is web-based, fully configurable and expandable; it allows researchers to effectively collect assessments and additional qualitative data. The system is easily deployed allowing assessors to smoothly perform their relevance judging tasks, even remotely. Relevation! is available as an open source project at: http://ielab.github.io/relevation.