540 resultados para Cross-lingual document retrieval


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose The purpose of this study is to explore the safety climate perceptions of the multicultural nursing workforce, and to investigate the influence of diversity of the multicultural nursing workforce on clinical safety in a large tertiary hospital in Saudi Arabia. Background Working in a multicultural environment is challenging. Each culture has its own unique characteristics and dimensions that shape the language, lifestyle, beliefs, values, customs, traditions, and patterns of behaviour, which expatriate nurses must come to terms with. However, cultural diversity in the health care environment can potentially affect the quality of care and patient safety. Method A mixed-method case study (survey, interview and document analysis) was employed. A primary study phase entailed the administration of the Safety Climate Survey (SCS). A population sampling strategy was used and 319 nurses participated, yielding a 76.8% response rate. Descriptive and inferential statistics (Kruskal–Wallis test) were used to analyse survey data. Results The data revealed the nurses’ perceptions of the clinical safety climate in this multicultural environment was unsafe, with a mean score of 3.9 out of 5. No significant difference was found between the age groups, years of nursing experience and their perceptions of the safety climate in this context. A significant difference was observed between the national background categories of nurses and perceptions of safety climate. Conclusion Cultural diversity within the nursing workforce could have a significant influence on perceptions of clinical safety. These findings have the potential to inform policy and practice related to cultural diversity in Saudi Arabia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this study was to investigate the association between temperament in Australian infants aged 2–7 months and feeding practices of their first-time mothers (n=698). Associations between feeding practices and beliefs (Infant Feeding Questionnaire) and infant temperament (easy-difficult continuous scale from the Short Temperament Scale for Infants) were tested using linear and binary logistic regression models adjusted for a comprehensive range of covariates. Mothers of infants with a more difficult temperament reported a lower awareness of infant cues, were more likely to use food to calm and reported high concern about overweight and underweight. The covariate maternal depression score largely mirrored these associations. Infant temperament may be an important variable to consider in future research on the prevention of childhood obesity. In practice, mothers of temperamentally difficult infants may need targeted feeding advice to minimise the adoption of undesirable feeding practices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper analyses the pairwise distances of signatures produced by the TopSig retrieval model on two document collections. The distribution of the distances are compared to purely random signatures. It explains why TopSig is only competitive with state of the art retrieval models at early precision. Only the local neighbourhood of the signatures is interpretable. We suggest this is a common property of vector space models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Search technologies are critical to enable clinical sta to rapidly and e ectively access patient information contained in free-text medical records. Medical search is challenging as terms in the query are often general but those in rel- evant documents are very speci c, leading to granularity mismatch. In this paper we propose to tackle granularity mismatch by exploiting subsumption relationships de ned in formal medical domain knowledge resources. In symbolic reasoning, a subsumption (or `is-a') relationship is a parent-child rela- tionship where one concept is a subset of another concept. Subsumed concepts are included in the retrieval function. In addition, we investigate a number of initial methods for combining weights of query concepts and those of subsumed concepts. Subsumption relationships were found to provide strong indication of relevant information; their inclusion in retrieval functions yields performance improvements. This result motivates the development of formal models of rela- tionships between medical concepts for retrieval purposes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a graph-based method to weight medical concepts in documents for the purposes of information retrieval. Medical concepts are extracted from free-text documents using a state-of-the-art technique that maps n-grams to concepts from the SNOMED CT medical ontology. In our graph-based concept representation, concepts are vertices in a graph built from a document, edges represent associations between concepts. This representation naturally captures dependencies between concepts, an important requirement for interpreting medical text, and a feature lacking in bag-of-words representations. We apply existing graph-based term weighting methods to weight medical concepts. Using concepts rather than terms addresses vocabulary mismatch as well as encapsulates terms belonging to a single medical entity into a single concept. In addition, we further extend previous graph-based approaches by injecting domain knowledge that estimates the importance of a concept within the global medical domain. Retrieval experiments on the TREC Medical Records collection show our method outperforms both term and concept baselines. More generally, this work provides a means of integrating background knowledge contained in medical ontologies into data-driven information retrieval approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper gives an overview of the INEX 2011 Snippet Retrieval Track. The goal of the Snippet Retrieval Track is to provide a common forum for the evaluation of the effectiveness of snippets, and to investigate how best to generate snippets for search results, which should provide the user with sufficient information to determine whether the underlying document is relevant. We discuss the setup of the track, and the evaluation results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Entity-oriented retrieval aims to return a list of relevant entities rather than documents to provide exact answers for user queries. The nature of entity-oriented retrieval requires identifying the semantic intent of user queries, i.e., understanding the semantic role of query terms and determining the semantic categories which indicate the class of target entities. Existing methods are not able to exploit the semantic intent by capturing the semantic relationship between terms in a query and in a document that contains entity related information. To improve the understanding of the semantic intent of user queries, we propose concept-based retrieval method that not only automatically identifies the semantic intent of user queries, i.e., Intent Type and Intent Modifier but introduces concepts represented by Wikipedia articles to user queries. We evaluate our proposed method on entity profile documents annotated by concepts from Wikipedia category and list structure. Empirical analysis reveals that the proposed method outperforms several state-of-the-art approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Maternal perceptions and practices regarding child feeding have been extensively studied in the context of childhood overweight and obesity. To date, there is scant evidence on the role of fathers in child feeding. This cross-sectional study aimed to identify whether characteristics of fathers and their concerns about their children’s risk of overweight were associated with child feeding perceptions and practices. Questionnaires were used to collect data from 436 Australian fathers (mean age = 37 years, SD = 6) of a child (53% boys) aged between 2-5 years (M = 3.5 years, SD = 0.9). These data included a range of demographic variables and selected subscales from the Child Feeding Questionnaire on concern about child weight, perceived responsibility for child feeding and controlling practices (pressure to eat and restriction). Multivariable linear regression was used to examine associations between demographic variables and fathers’ feeding perceptions and practices. Results indicated that fathers’ who were more concerned about their child becoming overweight reported higher perceived responsibility for child feeding and were more controlling of what and how much their child eats. Greater time commitment to paid work, possessing a health care card (indicative of socioeconomic disadvantage) and younger child age were associated with fathers’ perceiving less responsibility for feeding. Factors such as paternal BMI and education level, as well as child gender were not associated with feeding perceptions or practices. This study contributes to the extant literature on fathers’ role in child feeding, revealing several implications for research and interventions in the child feeding field.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose Women who experience cancer treatment-induced menopause are at risk of long-term chronic morbidity. This risk can be prevented or offset with adherence to health promotion and risk reduction guidelines. The purpose of this study was to explore health behaviours in younger female survivors of cancer and the variables (quality of life and psychological distress) believed to moderate health behaviours. Design Cross-sectional survey of a convenience sample of women (n = 85) in southeast Queensland. Methods Health behaviour and health status were elicited with items from the Australian Health Survey and the Behavioural Risk Factor Surveillance System. The WHO Quality of Life (Brief) measured participants’ self-reported quality of life and their satisfaction with their health. The Brief Symptom Inventory-18 measured psychological distress. Findings Higher self-reported health status was associated with regular exercise and better quality of life. However, a substantial proportion of participants did not engage in the physical activity, dietary or cervical screening practices recommended by Australian guidelines. Conclusions The participants require education regarding the benefits of diet, exercise, weight loss and decreased alcohol intake, as well as information on future health risks and possible comorbidities. These education sessions could be addressed by a nurse-led health promotion model of care at the time of discharge or in the community.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Topic modelling has been widely used in the fields of information retrieval, text mining, machine learning, etc. In this paper, we propose a novel model, Pattern Enhanced Topic Model (PETM), which makes improvements to topic modelling by semantically representing topics with discriminative patterns, and also makes innovative contributions to information filtering by utilising the proposed PETM to determine document relevance based on topics distribution and maximum matched patterns proposed in this paper. Extensive experiments are conducted to evaluate the effectiveness of PETM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many mature term-based or pattern-based approaches have been used in the field of information filtering to generate users’ information needs from a collection of documents. A fundamental assumption for these approaches is that the documents in the collection are all about one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, and this has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering has not been so well explored. Patterns are always thought to be more discriminative than single terms for describing documents. However, the enormous amount of discovered patterns hinder them from being effectively and efficiently used in real applications, therefore, selection of the most discriminative and representative patterns from the huge amount of discovered patterns becomes crucial. To deal with the above mentioned limitations and problems, in this paper, a novel information filtering model, Maximum matched Pattern-based Topic Model (MPBTM), is proposed. The main distinctive features of the proposed model include: (1) user information needs are generated in terms of multiple topics; (2) each topic is represented by patterns; (3) patterns are generated from topic models and are organized in terms of their statistical and taxonomic features, and; (4) the most discriminative and representative patterns, called Maximum Matched Patterns, are proposed to estimate the document relevance to the user’s information needs in order to filter out irrelevant documents. Extensive experiments are conducted to evaluate the effectiveness of the proposed model by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Quantum Probability Ranking Principle (QPRP) has been recently proposed, and accounts for interdependent document relevance when ranking. However, to be instantiated, the QPRP requires a method to approximate the interference" between two documents. In this poster, we empirically evaluate a number of different methods of approximation on two TREC test collections for subtopic retrieval. It is shown that these approximations can lead to significantly better retrieval performance over the state of the art.