154 resultados para Semantic Publishing, Linked Data, Bibliometrics, Informetrics, Data Retrieval, Citations


Relevância:

50.00% 50.00%

Publicador:

Resumo:

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and positions the file signatures model in the class of Vector Space retrieval models.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Objective: To determine whether primary care management of chronic heart failure (CHF) differed between rural and urban areas in Australia. Design: A cross-sectional survey stratified by Rural, Remote and Metropolitan Areas (RRMA) classification. The primary source of data was the Cardiac Awareness Survey and Evaluation (CASE) study. Setting: Secondary analysis of data obtained from 341 Australian general practitioners and 23 845 adults aged 60 years or more in 1998. Main outcome measures: CHF determined by criteria recommended by the World Health Organization, diagnostic practices, use of pharmacotherapy, and CHF-related hospital admissions in the 12 months before the study. Results: There was a significantly higher prevalence of CHF among general practice patients in large and small rural towns (16.1%) compared with capital city and metropolitan areas (12.4%) (P < 0.001). Echocardiography was used less often for diagnosis in rural towns compared with metropolitan areas (52.0% v 67.3%, P < 0.001). Rates of specialist referral were also significantly lower in rural towns than in metropolitan areas (59.1% v 69.6%, P < 0.001), as were prescribing rates of angiotensin-converting enzyme inhibitors (51.4% v 60.1%, P < 0.001). There was no geographical variation in prescribing rates of β-blockers (12.6% [rural] v 11.8% [metropolitan], P = 0.32). Overall, few survey participants received recommended “evidence-based practice” diagnosis and management for CHF (metropolitan, 4.6%; rural, 3.9%; and remote areas, 3.7%). Conclusions: This study found a higher prevalence of CHF, and significantly lower use of recommended diagnostic methods and pharmacological treatment among patients in rural areas.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Since manually constructing domain-specific sentiment lexicons is extremely time consuming and it may not even be feasible for domains where linguistic expertise is not available. Research on the automatic construction of domain-specific sentiment lexicons has become a hot topic in recent years. The main contribution of this paper is the illustration of a novel semi-supervised learning method which exploits both term-to-term and document-to-term relations hidden in a corpus for the construction of domain specific sentiment lexicons. More specifically, the proposed two-pass pseudo labeling method combines shallow linguistic parsing and corpusbase statistical learning to make domain-specific sentiment extraction scalable with respect to the sheer volume of opinionated documents archived on the Internet these days. Another novelty of the proposed method is that it can utilize the readily available user-contributed labels of opinionated documents (e.g., the user ratings of product reviews) to bootstrap the performance of sentiment lexicon construction. Our experiments show that the proposed method can generate high quality domain-specific sentiment lexicons as directly assessed by human experts. Moreover, the system generated domain-specific sentiment lexicons can improve polarity prediction tasks at the document level by 2:18% when compared to other well-known baseline methods. Our research opens the door to the development of practical and scalable methods for domain-specific sentiment analysis.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

High levels of sitting have been linked with poor health outcomes. Previously a pragmatic MTI accelerometer data cut-point (100 count/min-1) has been used to estimate sitting. Data on the accuracy of this cut-point is unavailable. PURPOSE: To ascertain whether the 100 count/min-1 cut-point accurately isolates sitting from standing activities. METHODS: Participants fitted with an MTI accelerometer were observed performing a range of sitting, standing, light & moderate activities. 1-min epoch MTI data were matched to observed activities, then re-categorized as either sitting or not using the 100 count/min-1 cut-point. Self-report demographics and current physical activity were collected. Generalized estimating equation for repeated measures with a binary logistic model analyses (GEE), corrected for age, gender and BMI, were conducted to ascertain the odds of the MTI data being misclassified. RESULTS: Data were from 26 healthy subjects (8 men; 50% aged <25 years; mean BMI (SD) 22.7(3.8)m/kg2). MTI sitting and standing data mode was 0 count/min-1, with 46% of sitting activities and 21% of standing activities recording 0 count/min-1. The GEE was unable to accurately isolate sitting from standing activities using the 100 count/min-1 cut-point, since all sitting activities were incorrectly predicted as standing (p=0.05). To further explore the sensitivity of MTI data to delineate sitting from standing, the upper 95% confidence interval of the mean for the sitting activities (46 count/min-1) was used to re-categorise the data; this resulted in the GEE correctly classifying 49% of sitting, and 69% of standing activities. Using the 100 count/min-1 cut-point the data were re-categorised into a combined ‘sit/stand’ category and tested against other light activities: 88% of sit/stand and 87% of light activities were accurately predicted. Using Freedson’s moderate cut-point of 1952 count/min-1 the GEE accurately predicted 97% of light vs. 90% of moderate activities. CONCLUSION: The distributions of MTI recorded sitting and standing data overlap considerably, as such the 100 count/min -1 cut-point did not accurately isolate sitting from other static standing activities. The 100 count/min -1 cut-point more accurately predicted sit/stand vs. other movement orientated activities.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Purpose – Interactive information retrieval (IR) involves many human cognitive shifts at different information behaviour levels. Cognitive science defines a cognitive shift or shift in cognitive focus as triggered by the brain's response and change due to some external force. This paper aims to provide an explication of the concept of “cognitive shift” and then report results from a study replicating Spink's study of cognitive shifts during interactive IR. This work aims to generate promising insights into aspects of cognitive shifts during interactive IR and a new IR evaluation measure – information problem shift. Design/methodology/approach – The study participants (n=9) conducted an online search on an in-depth personal medical information problem. Data analysed included the pre- and post-search questionnaires completed by each study participant. Implications for web services and further research are discussed. Findings – Key findings replicated the results in Spink's study, including: all study participants reported some level of cognitive shift in their information problem, information seeking and personal knowledge due to their search interaction; and different study participants reported different levels of cognitive shift. Some study participants reported major cognitive shifts in various user-based variables such as information problem or information-seeking stage. Unlike Spink's study, no participant experienced a negative shift in their information problem stage or level of information problem understanding. Originality/value – This study builds on the previous study by Spink using a different dataset. The paper provides valuable insights for further research into cognitive shifts during interactive IR.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In this paper we present a sequential Monte Carlo algorithm for Bayesian sequential experimental design applied to generalised non-linear models for discrete data. The approach is computationally convenient in that the information of newly observed data can be incorporated through a simple re-weighting step. We also consider a flexible parametric model for the stimulus-response relationship together with a newly developed hybrid design utility that can produce more robust estimates of the target stimulus in the presence of substantial model and parameter uncertainty. The algorithm is applied to hypothetical clinical trial or bioassay scenarios. In the discussion, potential generalisations of the algorithm are suggested to possibly extend its applicability to a wide variety of scenarios

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Projects funded by the Australian National Data Service(ANDS). The specific projects that were funded included: a) Greenhouse Gas Emissions Project (N2O) with Prof. Peter Grace from QUT’s Institute of Sustainable Resources. b) Q150 Project for the management of multimedia data collected at Festival events with Prof. Phil Graham from QUT’s Institute of Creative Industries. c) Bio-diversity environmental sensing with Prof. Paul Roe from the QUT Microsoft eResearch Centre. For the purposes of these projects the Eclipse Rich Client Platform (Eclipse RCP) was chosen as an appropriate software development framework within which to develop the respective software. This poster will present a brief overview of the requirements of the projects, an overview of the experiences of the project team in using Eclipse RCP, report on the advantages and disadvantages of using Eclipse and it’s perspective on Eclipse as an integrated tool for supporting future data management requirements.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Concerns regarding groundwater contamination with nitrate and the long-term sustainability of groundwater resources have prompted the development of a multi-layered three dimensional (3D) geological model to characterise the aquifer geometry of the Wairau Plain, Marlborough District, New Zealand. The 3D geological model which consists of eight litho-stratigraphic units has been subsequently used to synthesise hydrogeological and hydrogeochemical data for different aquifers in an approach that aims to demonstrate how integration of water chemistry data within the physical framework of a 3D geological model can help to better understand and conceptualise groundwater systems in complex geological settings. Multivariate statistical techniques(e.g. Principal Component Analysis and Hierarchical Cluster Analysis) were applied to groundwater chemistry data to identify hydrochemical facies which are characteristic of distinct evolutionary pathways and a common hydrologic history of groundwaters. Principal Component Analysis on hydrochemical data demonstrated that natural water-rock interactions, redox potential and human agricultural impact are the key controls of groundwater quality in the Wairau Plain. Hierarchical Cluster Analysis revealed distinct hydrochemical water quality groups in the Wairau Plain groundwater system. Visualisation of the results of the multivariate statistical analyses and distribution of groundwater nitrate concentrations in the context of aquifer lithology highlighted the link between groundwater chemistry and the lithology of host aquifers. The methodology followed in this study can be applied in a variety of hydrogeological settings to synthesise geological, hydrogeological and hydrochemical data and present them in a format readily understood by a wide range of stakeholders. This enables a more efficient communication of the results of scientific studies to the wider community.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Serving as a powerful tool for extracting localized variations in non-stationary signals, applications of wavelet transforms (WTs) in traffic engineering have been introduced; however, lacking in some important theoretical fundamentals. In particular, there is little guidance provided on selecting an appropriate WT across potential transport applications. This research described in this paper contributes uniquely to the literature by first describing a numerical experiment to demonstrate the shortcomings of commonly-used data processing techniques in traffic engineering (i.e., averaging, moving averaging, second-order difference, oblique cumulative curve, and short-time Fourier transform). It then mathematically describes WT’s ability to detect singularities in traffic data. Next, selecting a suitable WT for a particular research topic in traffic engineering is discussed in detail by objectively and quantitatively comparing candidate wavelets’ performances using a numerical experiment. Finally, based on several case studies using both loop detector data and vehicle trajectories, it is shown that selecting a suitable wavelet largely depends on the specific research topic, and that the Mexican hat wavelet generally gives a satisfactory performance in detecting singularities in traffic and vehicular data.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Mandatory data breach notification laws have been a significant legislative reform in response to unauthorized disclosures of personal information by public and private sector organizations. These laws originated in the state-based legislatures of the United States during the last decade and have subsequently garnered worldwide legislative interest. We contend that there are conceptual and practical concerns regarding mandatory data breach notification laws which limit the scope of their applicability, particularly in relation to existing information privacy law regimes. We outline these concerns here, in the light of recent European Union and Australian legal developments in this area.