966 resultados para Guido, Tomás
Resumo:
The assumptions underlying the Probability Ranking Principle (PRP) have led to a number of alternative approaches that cater or compensate for the PRP’s limitations. All alternatives deviate from the PRP by incorporating dependencies. This results in a re-ranking that promotes or demotes documents depending upon their relationship with the documents that have been already ranked. In this paper, we compare and contrast the behaviour of state-of-the-art ranking strategies and principles. To do so, we tease out analytical relationships between the ranking approaches and we investigate the document kinematics to visualise the effects of the different approaches on document ranking.
Resumo:
In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity. In the first approach subtopic diversification is achieved implicitly, by choosing documents that are different from each other, while in the second approach this is done explicitly, by estimating the subtopics covered by documents. Within this context, we compare methods belonging to the two paradigms. Furthermore, we investigate possible strategies for integrating the two paradigms with the aim of formulating a new ranking method for subtopic retrieval. We conduct a number of experiments to empirically validate and contrast the state-of-the-art approaches as well as instantiations of our integration approach. The results show that the integration approach outperforms state-of-the-art strategies with respect to a number of measures.
Resumo:
In this paper we describe the approaches adopted to generate the runs submitted to ImageCLEFPhoto 2009 with an aim to promote document diversity in the rankings. Four of our runs are text based approaches that employ textual statistics extracted from the captions of images, i.e. MMR [1] as a state of the art method for result diversification, two approaches that combine relevance information and clustering techniques, and an instantiation of Quantum Probability Ranking Principle. The fifth run exploits visual features of the provided images to re-rank the initial results by means of Factor Analysis. The results reveal that our methods based on only text captions consistently improve the performance of the respective baselines, while the approach that combines visual features with textual statistics shows lower levels of improvements.
Resumo:
While the Probability Ranking Principle for Information Retrieval provides the basis for formal models, it makes a very strong assumption regarding the dependence between documents. However, it has been observed that in real situations this assumption does not always hold. In this paper we propose a reformulation of the Probability Ranking Principle based on quantum theory. Quantum probability theory naturally includes interference effects between events. We posit that this interference captures the dependency between the judgement of document relevance. The outcome is a more sophisticated principle, the Quantum Probability Ranking Principle, that provides a more sensitive ranking which caters for interference/dependence between documents’ relevance.
Resumo:
Semantic Space models, which provide a numerical representation of words’ meaning extracted from corpus of documents, have been formalized in terms of Hermitian operators over real valued Hilbert spaces by Bruza et al. [1]. The collapse of a word into a particular meaning has been investigated applying the notion of quantum collapse of superpositional states [2]. While the semantic association between words in a Semantic Space can be computed by means of the Minkowski distance [3] or the cosine of the angle between the vector representation of each pair of words, a new procedure is needed in order to establish relations between two or more Semantic Spaces. We address the question: how can the distance between different Semantic Spaces be computed? By representing each Semantic Space as a subspace of a more general Hilbert space, the relationship between Semantic Spaces can be computed by means of the subspace distance. Such distance needs to take into account the difference in the dimensions between subspaces. The availability of a distance for comparing different Semantic Subspaces would enable to achieve a deeper understanding about the geometry of Semantic Spaces which would possibly translate into better effectiveness in Information Retrieval tasks.
Resumo:
This study investigates if and why assessing relevance of clinical records for a clinical retrieval task is cognitively demanding. Previous research has highlighted the challenges and issues information retrieval systems are faced with when determining the relevance of documents in this domain, e.g., the vocabulary mismatch problem. Determining if this assessment imposes cognitive load on human assessors, and why this is the case, may shed lights on what are the (cognitive) processes that assessors use for determining document relevance (in this domain). High cognitive load may impair the ability of the user to make accurate relevance judgements and hence the design of IR mechanisms may need to take this into account in order to reduce the load.
Resumo:
In this paper we propose a method that integrates the no- tion of understandability, as a factor of document relevance, into the evaluation of information retrieval systems for con- sumer health search. We consider the gain-discount evaluation framework (RBP, nDCG, ERR) and propose two understandability-based variants (uRBP) of rank biased precision, characterised by an estimation of understandability based on document readability and by different models of how readability influences user understanding of document content. The proposed uRBP measures are empirically contrasted to RBP by comparing system rankings obtained with each measure. The findings suggest that considering understandability along with topicality in the evaluation of in- formation retrieval systems lead to different claims about systems effectiveness than considering topicality alone.
Resumo:
This paper reports on the 2nd ShARe/CLEFeHealth evaluation lab which continues our evaluation resource building activities for the medical domain. In this lab we focus on patients' information needs as opposed to the more common campaign focus of the specialised information needs of physicians and other healthcare workers. The usage scenario of the lab is to ease patients and next-of-kins' ease in understanding eHealth information, in particular clinical reports. The 1st ShARe/CLEFeHealth evaluation lab was held in 2013. This lab consisted of three tasks. Task 1 focused on named entity recognition and normalization of disorders; Task 2 on normalization of acronyms/abbreviations; and Task 3 on information retrieval to address questions patients may have when reading clinical reports. This year's lab introduces a new challenge in Task 1 on visual-interactive search and exploration of eHealth data. Its aim is to help patients (or their next-of-kin) in readability issues related to their hospital discharge documents and related information search on the Internet. Task 2 then continues the information extraction work of the 2013 lab, specifically focusing on disorder attribute identification and normalization from clinical text. Finally, this year's Task 3 further extends the 2013 information retrieval task, by cleaning the 2013 document collection and introducing a new query generation method and multilingual queries. De-identified clinical reports used by the three tasks were from US intensive care and originated from the MIMIC II database. Other text documents for Tasks 1 and 3 were from the Internet and originated from the Khresmoi project. Task 2 annotations originated from the ShARe annotations. For Tasks 1 and 3, new annotations, queries, and relevance assessments were created. 50, 79, and 91 people registered their interest in Tasks 1, 2, and 3, respectively. 24 unique teams participated with 1, 10, and 14 teams in Tasks 1, 2 and 3, respectively. The teams were from Africa, Asia, Canada, Europe, and North America. The Task 1 submission, reviewed by 5 expert peers, related to the task evaluation category of Effective use of interaction and targeted the needs of both expert and novice users. The best system had an Accuracy of 0.868 in Task 2a, an F1-score of 0.576 in Task 2b, and Precision at 10 (P@10) of 0.756 in Task 3. The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients. The organisers have made data and tools available for future research and development.
Resumo:
Background Brominated flame retardants (BFRs), are chemicals widely used in consumer products including electronics, vehicles, plastics and textiles to reduce flammability. Experimental animal studies have confirmed that these compounds may interfere with thyroid hormone homeostasis and neurodevelopment but to date health effects in humans have not been systematically examined. Objectives To conduct a systematic review of studies on the health impacts of exposure to BFRs in humans, with a particular focus on children. Methods A systematic review was conducted using the Medline and EMBASE electronic databases up to 1 February 2012. Published cohort, cross-sectional, and case-control studies exploring the relationship between BFR exposure and various health outcomes were included. Results In total, 36 epidemiological studies meeting the pre-determined inclusion criteria were included. Plausible outcomes associated with BFR exposure include diabetes, neurobehavioral and developmental disorders, cancer, reproductive health effects and alteration in thyroid function. Evidence for a causal relationship between exposure to BFRs and health outcomes was evaluated within the Bradford Hill framework. Conclusion Although there is suggestive evidence that exposure to BFRs is harmful to health, further epidemiological investigations particularly among children, and long-term monitoring and surveillance of chemical impacts on humans are required to confirm these relationships.
Resumo:
In this paper, we have compiled and reviewed the most recent literature, published from January2010 to December 2012, relating to the human exposure, environmental distribution, behaviour, fate and concentration time trends of polybrominated diphenyl ether (PBDE) and hexabromocyclododecane (HBCD) flame retardants, in order to establish their current trends and priorities for future study. Due to the large volume of literature included, we have provided full detail of the reviewed studies as Electronic Supplementary Information and here summarise the most relevant findings. Decreasing time trends for penta-mix PBDE congeners were seen for soils in northern Europe, sewage sludge in Sweden and the USA, carp from a US river, trout from three of the Great Lakes and in Arctic and UK marine mammals and many birds, but increasing time trends continue in Arctic polar bears and some birds at high trophic levels in northern Europe. This is a result of the time delay inherent in long-range atmospheric transport processes. In general, concentrations of BDE209 (the major component of the deca-mix PBDE product) are continuing to increase. Of major concern is the possible/likely debromination of the large reservoir of BDE209 in soils and sediments worldwide, to yield lower brominated congeners which are both more mobile and more toxic, and we have compiled the most recent evidence for the occurrence of this degradation process. Numerous studies reported here reinforce the importance o f this future concern. Time trends for HBCDs are mixed, with both increases and decreases evident in different matrices and locations and, notably, with increasing occurrence in birds of prey.
Resumo:
Background Australian national biomonitoring for persistent organic pollutants (POPs) relies upon age-specific pooled serum samples to characterize central tendencies of concentrations but does not provide estimates of upper bound concentrations. This analysis compares population variation from biomonitoring datasets from the US, Canada, Germany, Spain, and Belgium to identify and test patterns potentially useful for estimating population upper bound reference values for the Australian population. Methods Arithmetic means and the ratio of the 95th percentile to the arithmetic mean (P95:mean) were assessed by survey for defined age subgroups for three polychlorinated biphenyls (PCBs 138, 153, and 180), hexachlorobenzene (HCB), p,p-dichlorodiphenyldichloroethylene (DDE), 2,2′,4,4′ tetrabrominated diphenylether (PBDE 47), perfluorooctanoic acid (PFOA) and perfluorooctane sulfonate (PFOS). Results Arithmetic mean concentrations of each analyte varied widely across surveys and age groups. However, P95:mean ratios differed to a limited extent, with no systematic variation across ages. The average P95:mean ratios were 2.2 for the three PCBs and HCB; 3.0 for DDE; 2.0 and 2.3 for PFOA and PFOS, respectively. The P95:mean ratio for PBDE 47 was more variable among age groups, ranging from 2.7 to 4.8. The average P95:mean ratios accurately estimated age group-specific P95s in the Flemish Environmental Health Survey II and were used to estimate the P95s for the Australian population by age group from the pooled biomonitoring data. Conclusions Similar population variation patterns for POPs were observed across multiple surveys, even when absolute concentrations differed widely. These patterns can be used to estimate population upper bounds when only pooled sampling data are available.
Resumo:
Bisphenol A (BPA) is used extensively in food-contact materials and has been detected routinely in populations worldwide, and this exposure has been linked to a range of negative health outcomes in humans. There is some evidence of an association between BPA and different socioeconomic variables which may be the result of different dietary patterns. The aim of this study was to conduct a preliminary investigation of the association between BPA and socioeconomic status in Australian children using pooled urine specimens and an area level socioeconomic index. Surplus pathology urine specimens collected from children aged 0-15 years in Queensland, Australia as samples of convenience (n = 469) were pooled by age, sex and area level socioeconomic index (n = 67 pools), and analysed for total BPA using online solid phase extraction LC-MS/MS. Concentration ranged from 1.08-27.4 ng/ml with geometric mean 2.57 ng/ml, and geometric mean exposure was estimated as 70.3 ng/kg d-1. Neither BPA concentration nor excretion was associated with age or sex, and the authors found no evidence of an association with socioeconomic status. These results suggest that BPA exposure is not associated with socioeconomic status in the Australian population due to relatively homogenous exposures in Australia, or that the socioeconomic gradient is relatively slight in Australia compared with other OECD countries.
Resumo:
Some perfluoroalkyl and polyfluoroalkyl substances (PFASs) have become widespread pollutants detected in human and wildlife samples worldwide. The main objective of this study was to assess temporal trends of PFAS concentrations in human blood in Australia over the last decade (2002–2011), taking into consideration age and sex trends. Pooled human sera from 2002/03 (n=26); 2008/09 (n=24) and 2010/11 (n=24) from South East Queensland, Australia were obtained from de-identified surplus pathology samples and compared with samples collected previously from 2006/07 (n=84). A total of 9775 samples in 158 pools were available for assessment of PFASs. Stratification criteria included sex and age: <16 years (2002/03 only); 0–4 (2006/07, 2008/09, 2010/11); 5–15 (2006/07, 2008/09, 2010/11); 16–30; 31–45; 46–60; and >60 years (all collection periods). Sera were analyzed using on-line solid-phase extraction coupled to high-performance liquid chromatography-isotope dilution-tandem mass spectrometry. Perfluorooctane sulfonate (PFOS) was detected in the highest concentrations ranging from 5.3–19.2 ng/ml (2008/09) to 4.4–17.4 ng/ml (2010/11). Perfluorooctanoate (PFOA) was detected in the next highest concentration ranging from 2.8–7.3 ng/ml (2008/09) to 3.1–6.5 ng/ml (2010/11). All other measured PFASs were detected at concentrations <1 ng/ml with the exception of perfluorohexane sulfonate which ranged from 1.2–5.7 ng/ml (08/09) and 1.4–5.4 ng/ml (10/11). The mean concentrations of both PFOS and PFOA in the 2010/11 period compared to 2002/03 were lower for all adult age groups by 56%. For 5-15 year olds, the decrease was 66% (PFOS) and 63% (PFOA) from 2002/03 to 2010/11. For 0-4 year olds the decrease from 2006/07 (when data were first available for this age group) was 50% (PFOS) and 22% (PFOA). This study provides strong evidence for decreasing serum PFOS and PFOA concentrations in an Australian population from 2002 through 2011. Age trends were variable and concentrations were higher in males than females. Global use has been in decline since around 2002 and hence primary exposure levels are expected to be decreasing. Further biomonitoring will allow assessment of PFAS exposures to confirm trends in exposure as primary and eventually secondary sources are depleted.
Resumo:
Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in terms of stability and robustness, of active learning models built using conditional random fields (CRFs) for information extraction applications. Stability, defined as a small variation of performance when small variation of the training data or a small variation of the parameters occur, is a major issue for machine learning models, but even more so in the active learning framework which aims to minimise the amount of training data required. The factors we investigate are a) the choice of incremental vs. standard active learning, b) the feature set used as a representation of the text (i.e., morphological features, syntactic features, or semantic features) and c) Gaussian prior variance as one of the important CRFs parameters. Our empirical findings show that incremental learning and the Gaussian prior variance lead to more stable and robust models across iterations. Our study also demonstrates that orthographical, morphological and contextual features as a group of basic features play an important role in learning effective models across all iterations.