996 resultados para Document Ranking


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Enterprise Systems (ES) can be understood as the de facto standard for holistic operational and managerial support within an organization. Most commonly ES are offered as commercial off-the-shelf packages, requiring customization in the user organization. This process is a complex and resource-intensive task, which often prevents small and midsize enterprises (SME) from undertaking configuration projects. Especially in the SME market independent software vendors provide pre-configured ES for a small customer base. The problem of ES configuration is shifted from the customer to the vendor, but remains critical. We argue that the yet unexplored link between process configuration and business document configuration must be closer examined as both types of configuration are closely tied to one another.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Particulate matter research is essential because of the well known significant adverse effects of aerosol particles on human health and the environment. In particular, identification of the origin or sources of particulate matter emissions is of paramount importance in assisting efforts to control and reduce air pollution in the atmosphere. This thesis aims to: identify the sources of particulate matter; compare pollution conditions at urban, rural and roadside receptor sites; combine information about the sources with meteorological conditions at the sites to locate the emission sources; compare sources based on particle size or mass; and ultimately, provide the basis for control and reduction in particulate matter concentrations in the atmosphere. To achieve these objectives, data was obtained from assorted local and international receptor sites over long sampling periods. The samples were analysed using Ion Beam Analysis and Scanning Mobility Particle Sizer methods to measure the particle mass with chemical composition and the particle size distribution, respectively. Advanced data analysis techniques were employed to derive information from large, complex data sets. Multi-Criteria Decision Making (MCDM), a ranking method, drew on data variability to examine the overall trends, and provided the rank ordering of the sites and years that sampling was conducted. Coupled with the receptor model Positive Matrix Factorisation (PMF), the pollution emission sources were identified and meaningful information pertinent to the prioritisation of control and reduction strategies was obtained. This thesis is presented in the thesis by publication format. It includes four refereed papers which together demonstrate a novel combination of data analysis techniques that enabled particulate matter sources to be identified and sampling site/year ranked. The strength of this source identification process was corroborated when the analysis procedure was expanded to encompass multiple receptor sites. Initially applied to identify the contributing sources at roadside and suburban sites in Brisbane, the technique was subsequently applied to three receptor sites (roadside, urban and rural) located in Hong Kong. The comparable results from these international and national sites over several sampling periods indicated similarities in source contributions between receptor site-types, irrespective of global location and suggested the need to apply these methods to air pollution investigations worldwide. Furthermore, an investigation into particle size distribution data was conducted to deduce the sources of aerosol emissions based on particle size and elemental composition. Considering the adverse effects on human health caused by small-sized particles, knowledge of particle size distribution and their elemental composition provides a different perspective on the pollution problem. This thesis clearly illustrates that the application of an innovative combination of advanced data interpretation methods to identify particulate matter sources and rank sampling sites/years provides the basis for the prioritisation of future air pollution control measures. Moreover, this study contributes significantly to knowledge based on chemical composition of airborne particulate matter in Brisbane, Australia and on the identity and plausible locations of the contributing sources. Such novel source apportionment and ranking procedures are ultimately applicable to environmental investigations worldwide.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Grading osteoarthritic tissue has, until now, been a laboratory process confined to research activities. This thesis establishes a scientific protocol that extends osteoarthritic tissue ranking to surgical practice. The innovative protocol, which now incorporates the structural degeneration of collagen, enhances the traditional Modified Mankin ranking system, enabling its application to real time decision during surgery. Because it is fast and without time consuming laboratory process, it would potentially enable the cataloguing of tissues in osteoarthritic joints in all compartments of diseased joints during surgery for epistemological study and insight into the manifestation of osteoarthritis across age, gender, occupation, physical activities and race.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper elaborates the approach used by the Applied Data Mining Research Group (ADMRG) for the Social Event Detection (SED) Tasks of the 2013 MediaEval Benchmark. We extended the constrained clustering algorithm to apply to the first semi-supervised clustering task, and we compared several classifiers with Latent Dirichlet Allocation as feature selector in the second event classification task. The proposed approach focuses on scalability and efficient memory allocation when applied to a high dimensional data with large clusters. Results of the first task show the effectiveness of the proposed method. Results from task 2 indicate that attention on the imbalance categories distributions is needed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Besides responding to challenges of rapid urbanization and growing traffic congestion, the development of smart transport systems has attracted much attention in recent times. Many promising initiatives have emerged over the years. Despite these initiatives, there is still a lack of understanding about an appropriate definition of smart transport system. As such, it is challenging to identify the appropriate indicators of ‘smartness’. This paper proposes a comprehensive and practical framework to benchmark cities according to the smartness in their transportation systems. The proposed methodology was illustrated using a set of data collected from 26 cities across the world through web search and contacting relevant transport authorities and agencies. Results showed that London, Seattle and Sydney were among the world’s top smart transport cities. In particular, Seattle and Paris ranked high in smart private transport services while London and Singapore scored high on public transport services. London also appeared to be the smartest in terms of emergency transport services. The key value of the proposed innovative framework lies in a comparative analysis among cities, facilitating city-to-city learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cognitive impairment and physical disability are common in Parkinson’s disease (PD). As a result diet can be difficult to measure. This study aimed to evaluate the use of a photographic dietary record (PhDR) in people with PD. During a 12-week nutrition intervention study, 19 individuals with PD kept 3-day PhDRs on three occasions using point-and-shoot digital cameras. Details on food items present in the PhDRs and those not photographed were collected retrospectively during an interview. Following the first use of the PhDR method, the photographer completed a questionnaire (n=18). In addition, the quality of the PhDRs was evaluated at each time point. The person with PD was the sole photographer in 56% of the cases, with the remainder by the carer or combination of person with PD and the carer. The camera was rated as easy to use by 89%, keeping a PhDR was considered acceptable by 94% and none would rather use a “pen and paper” method. Eighty-three percent felt confident to use the camera again to record intake. Of the photos captured (n=730), 89% were of adequate quality (items visible, in-focus), while only 21% could be used alone (without interview information) to assess intake. Over the study, 22% of eating/drinking occasions were not photographed. PhDRs were considered an easy and acceptable method to measure intake among individuals with PD and their carers. The majority of PhDRs were of adequate quality, however in order to quantify intake the interview was necessary to obtain sufficient detail and capture missing items.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering is rarely known. Patterns are always thought to be more representative than single terms for representing documents. In this paper, a novel information filtering model, Pattern-based Topic Model(PBTM) , is proposed to represent the text documents not only using the topic distributions at general level but also using semantic pattern representations at detailed specific level, both of which contribute to the accurate document representation and document relevance ranking. Extensive experiments are conducted to evaluate the effectiveness of PBTM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model achieves outstanding performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A known limitation of the Probability Ranking Principle (PRP) is that it does not cater for dependence between documents. Recently, the Quantum Probability Ranking Principle (QPRP) has been proposed, which implicitly captures dependencies between documents through “quantum interference”. This paper explores whether this new ranking principle leads to improved performance for subtopic retrieval, where novelty and diversity is required. In a thorough empirical investigation, models based on the PRP, as well as other recently proposed ranking strategies for subtopic retrieval (i.e. Maximal Marginal Relevance (MMR) and Portfolio Theory(PT)), are compared against the QPRP. On the given task, it is shown that the QPRP outperforms these other ranking strategies. And unlike MMR and PT, one of the main advantages of the QPRP is that no parameter estimation/tuning is required; making the QPRP both simple and effective. This research demonstrates that the application of quantum theory to problems within information retrieval can lead to significant improvements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Quantum Probability Ranking Principle (QPRP) has been recently proposed, and accounts for interdependent document relevance when ranking. However, to be instantiated, the QPRP requires a method to approximate the interference" between two documents. In this poster, we empirically evaluate a number of different methods of approximation on two TREC test collections for subtopic retrieval. It is shown that these approximations can lead to significantly better retrieval performance over the state of the art.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Retrieval with Logical Imaging is derived from belief revision and provides a novel mechanism for estimating the relevance of a document through logical implication (i.e. P(q -> d)). In this poster, we perform the first comprehensive evaluation of Logical Imaging (LI) in Information Retrieval (IR) across several TREC test Collections. When compared against standard baseline models, we show that LI fails to improve performance. This failure can be attributed to a nuance within the model that means non-relevant documents are promoted in the ranking, while relevant documents are demoted. This is an important contribution because it not only contextualizes the effectiveness of LI, but crucially ex- plains why it fails. By addressing this nuance, future LI models could be significantly improved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work, we summarise the development of a ranking principle based on quantum probability theory, called the Quantum Probability Ranking Principle (QPRP), and we also provide an overview of the initial experiments performed employing the QPRP. The main difference between the QPRP and the classic Probability Ranking Principle, is that the QPRP implicitly captures the dependencies between documents by means of quantum interference". Subsequently, the optimal ranking of documents is not based solely on documents' probability of relevance but also on the interference with the previously ranked documents. Our research shows that the application of quantum theory to problems within information retrieval can lead to consistently better retrieval effectiveness, while still being simple, elegant and tractable.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this paper is to investigate the role of emotion features in diversifying document rankings to improve the effectiveness of Information Retrieval (IR) systems. For this purpose, two approaches are proposed to consider emotion features for diversification, and they are empirically tested on the TREC 678 Interactive Track collection. The results show that emotion features are capable of enhancing retrieval effectiveness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For TREC Crowdsourcing 2011 (Stage 2) we propose a networkbased approach for assigning an indicative measure of worker trustworthiness in crowdsourced labelling tasks. Workers, the gold standard and worker/gold standard agreements are modelled as a network. For the purpose of worker trustworthiness assignment, a variant of the PageRank algorithm, named TurkRank, is used to adaptively combine evidence that suggests worker trustworthiness, i.e., agreement with other trustworthy co-workers and agreement with the gold standard. A single parameter controls the importance of co-worker agreement versus gold standard agreement. The TurkRank score calculated for each worker is incorporated with a worker-weighted mean label aggregation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we define two models of users that require diversity in search results; these models are theoretically grounded in the notion of intrinsic and extrinsic diversity. We then examine Intent-Aware Expected Reciprocal Rank (ERR-IA), one of the official measures used to assess diversity in TREC 2011-12, with respect to the proposed user models. By analyzing ranking preferences as expressed by the user models and those estimated by ERR-IA, we investigate whether ERR-IA assesses document rankings according to the requirements of the diversity retrieval task expressed by the two models. Empirical results demonstrate that ERR-IA neglects query-intents coverage by attributing excessive importance to redundant relevant documents. ERR-IA behavior is contrary to the user models that require measures to first assess diversity through the coverage of intents, and then assess the redundancy of relevant intents. Furthermore, diversity should be considered separately from document relevance and the documents positions in the ranking.