69 resultados para Text similarity analysis


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis addressed issues that have prevented qualitative researchers from using thematic discovery algorithms. The central hypothesis evaluated whether allowing qualitative researchers to interact with thematic discovery algorithms and incorporate domain knowledge improved their ability to address research questions and trust the derived themes. Non-negative Matrix Factorisation and Latent Dirichlet Allocation find latent themes within document collections but these algorithms are rarely used, because qualitative researchers do not trust and cannot interact with the themes that are automatically generated. The research determined the types of interactivity that qualitative researchers require and then evaluated interactive algorithms that matched these requirements. Theoretical contributions included the articulation of design guidelines for interactive thematic discovery algorithms, the development of an Evaluation Model and a Conceptual Framework for Interactive Content Analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Prescription medicine samples provided by pharmaceutical companies are predominantly newer and more expensive products. The range of samples provided to practices may not represent the drugs that the doctors desire to have available. Few studies have used a qualitative design to explore the reasons behind sample use. Objective The aim of this study was to explore the opinions of a variety of Australian key informants about prescription medicine samples, using a qualitative methodology. Methods Twenty-three organizations involved in quality use of medicines in Australia were identified, based on the authors' previous knowledge. Each organization was invited to nominate 1 or 2 representatives to participate in semistructured interviews utilizing seeding questions. Each interview was recorded and transcribed verbatim. Leximancer v2.25 text analysis software (Leximancer Pty Ltd., Jindalee, Queensland, Australia) was used for textual analysis. The top 10 concepts from each analysis group were interrogated back to the original transcript text to determine the main emergent opinions. Results A total of 18 key interviewees representing 16 organizations participated. Samples, patient, doctor, and medicines were the major concepts among general opinions about samples. The concept drug became more frequent and the concept companies appeared when marketing issues were discussed. The Australian Pharmaceutical Benefits Scheme and cost were more prevalent in discussions about alternative sample distribution models, indicating interviewees were cognizant of budgetary implications. Key interviewee opinions added richness to the single-word concepts extracted by Leximancer. Conclusions Participants recognized that prescription medicine samples have an influence on quality use of medicines and play a role in the marketing of medicines. They also believed that alternative distribution systems for samples could provide benefits. The cost of a noncommercial system for distributing samples or starter packs was a concern. These data will be used to design further research investigating alternative models for distribution of samples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

XRF spectrometry was applied to provenance studies of Iron Age pottery specimens that originated from the Mngeni river area in South Africa. Ten transition metals (Sc to Zn) mere determined in 107 potsherds, excavated from four different sites. The data were subjected to a computerized mathematical technique (correspondence analysis), which was used to group the samples according to the similarity of their elemental distributions. The groupings were interpreted in terms of social or cultural interaction between the sites. (C) 1997 by John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Provenance studies of iron-age pottery specimens originating from the Mngeni river area in South Africa was carried out by applying XRF spectrometry. A total of sixteen major and trace elements were analysed in a batch of 107 potsherds, excavated from four different archaeological sites in the aforementioned area. A multivariate statistical programme Correspondence Analysis was used in this study to obtain the relevant clustering patterns according to the similarity of the elemental distributions. Differences and similarities in the clusters obtained for the majors and trace elements are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The purpose of this study was to estimate the incidence of fatal and non-fatal Low Speed Vehicle Run Over (LSVRO) events among children aged 0–15 years in Queensland, Australia, at a population level. Methods Fatal and non-fatal LSVRO events that occurred in children resident in Queensland over eleven calendar years (1999-2009) were identified using ICD codes, text description, word searches and medical notes clarification, obtained from five health related data bases across the continuum of care (pre-hospital to fatality). Data were manually linked. Population data provided by the Australian Bureau of Statistics were used to calculate crude incidence rates for fatal and non-fatal LSVRO events. Results There were 1611 LSVROs between 1999–2009 (IR = 16.87/100,000/annum). Incidence of non-fatal events (IR = 16.60/100,000/annum) was 61.5 times higher than fatal events (IR = 0.27/100,000/annum). LSVRO events were more common in boys (IR = 20.97/100,000/annum) than girls (IR = 12.55/100,000/annum), and among younger children aged 0–4 years (IR = 21.45/100000/annum; 39% or all events) than older children (5–9 years: IR = 16.47/100,000/annum; 10–15 years IR = 13.59/100,000/annum). A total of 896 (56.8%) children were admitted to hospital for 24 hours of more following an LSVRO event (IR = 9.38/100,000/annum). Total LSVROs increased from 1999 (IR = 14.79/100,000) to 2009 (IR = 18.56/100,000), but not significantly. Over the 11 year period, there was a slight (non –significant) increase in fatalities (IR = 0.37-0.42/100,000/annum); a significant decrease in admissions (IR = 12.39–5.36/100,000/annum), and significant increase in non-admissions (IR = 2.02-12.77/100,000/annum). Trends over time differed by age, gender and severity. Conclusion This is the most comprehensive, population-based epidemiological study on fatal and non-fatal LSVRO events to date. Results from this study indicate that LSVROs incur a substantial burden. Further research is required on the characteristics and risk factors associated with these events, in order to adequately inform injury prevention. Strategies are urgently required in order to prevent these events, especially among young children aged 0-4 years.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Text is the main method of communicating information in the digital age. Messages, blogs, news articles, reviews, and opinionated information abounds on the Internet. People commonly purchase products online and post their opinions about purchased items. This feedback is displayed publicly to assist others with their purchasing decisions, creating the need for a mechanism with which to extract and summarize useful information for enhancing the decision-making process. Our contribution is to improve the accuracy of extraction by combining different techniques from three major areas, named Data Mining, Natural Language Processing techniques and Ontologies. The proposed framework sequentially mines product’s aspects and users’ opinions, groups representative aspects by similarity, and generates an output summary. This paper focuses on the task of extracting product aspects and users’ opinions by extracting all possible aspects and opinions from reviews using natural language, ontology, and frequent “tag” sets. The proposed framework, when compared with an existing baseline model, yielded promising results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Maternity care reform plans have been proposed at state and national levels in Australia, but the extent to which these respond to maternity care consumers’ expressed needs is unclear. This study examines open-text survey comments to identify women’s unmet needs and priorities for maternity care. It is then considered whether these needs and priorities are addressed in current reform plans. Methods Women who had a live single or multiple birth in Queensland, Australia, in 2010 (n 3,635) were invited to complete a retrospective self-report survey. In addition to questions about clinical and interpersonal maternity care experiences from pregnancy to postpartum, women were asked an open-ended question “Is there anything else you’d like to tell us about having your baby?” This paper describes a detailed thematic analysis of open-ended responses from a random selection of 150 women (10% of 1,510 who responded to the question). Results Four broad themes emerged relevant to improving women’s experiences of maternity care: quality of care (interpersonal and technical); access to choices and involvement in decision-making; unmet information needs; and dissatisfaction with the care environment. Some of these topics are reflected in current reform goals, while others provide evidence of the need for further reforms. Conclusions The findings reinforce the importance of some existing maternity reform objectives, and describe how these might best be met. Findings affirm the importance of information provision to enable informed choices; a goal of Queensland and national reform agendas. Improvement opportunities not currently specified in reform agendas were also identified, including the quality of interpersonal relationships between women and staff, particular unmet information needs (e.g., breastfeeding), and concerns regarding the care environment (e.g., crowding and long waiting times).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Engineers must have deep and accurate conceptual understanding of their field and Concept inventories (CIs) are one method of assessing conceptual understanding and providing formative feedback. Current CI tests use Multiple Choice Questions (MCQ) to identify misconceptions and have undergone reliability and validity testing to assess conceptual understanding. However, they do not readily provide the diagnostic information about students’ reasoning and therefore do not effectively point to specific actions that can be taken to improve student learning. We piloted the textual component of our diagnostic CI on electrical engineering students using items from the signals and systems CI. We then analysed the textual responses using automated lexical analysis software to test the effectiveness of these types of software and interviewed the students regarding their experience using the textual component. Results from the automated text analysis revealed that students held both incorrect and correct ideas for certain conceptual areas and provided indications of student misconceptions. User feedback also revealed that the inclusion of the textual component is helpful to students in assessing and reflecting on their own understanding.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole - term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR. In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering. Copyright 2014 ACM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Full-resolution 3D Ground-Penetrating Radar (GPR) data were combined with high-resolution hydraulic conductivity (K) data from vertical Direct-Push (DP) profiles to characterize a portion of the highly heterogeneous MAcro Dispersion Experiment (MADE) site. This is an important first step to better understand the influence of aquifer heterogeneities on observed anomalous transport. Statistical evaluation of DP data indicates non-normal distributions that have much higher similarity within each GPR facies than between facies. The analysis of GPR and DP data provides high-resolution estimates of the 3D geometry of hydrostratigraphic zones, which can then be populated with stochastic K fields. The lack of such estimates has been a significant limitation for testing and parameterizing a range of novel transport theories at sites where the traditional advection-dispersion model has proven inadequate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Efficient error-Propagating Block Chaining (EPBC) is a block cipher mode intended to simultaneously provide both confidentiality and integrity protection for messages. Mitchell’s analysis pointed out a weakness in the EPBC integrity mechanism that can be used in a forgery attack. This paper identifies and corrects a flaw in Mitchell’s analysis of EPBC, and presents other attacks on the EPBC integrity mechanism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The requirement for dual screening of titles and abstracts to select papers to examine in full text can create a huge workload, not least when the topic is complex and a broad search strategy is required, resulting in a large number of results. An automated system to reduce this burden, while still assuring high accuracy, has the potential to provide huge efficiency savings within the review process. Objectives To undertake a direct comparison of manual screening with a semi‐automated process (priority screening) using a machine classifier. The research is being carried out as part of the current update of a population‐level public health review. Methods Authors have hand selected studies for the review update, in duplicate, using the standard Cochrane Handbook methodology. A retrospective analysis, simulating a quasi‐‘active learning’ process (whereby a classifier is repeatedly trained based on ‘manually’ labelled data) will be completed, using different starting parameters. Tests will be carried out to see how far different training sets, and the size of the training set, affect the classification performance; i.e. what percentage of papers would need to be manually screened to locate 100% of those papers included as a result of the traditional manual method. Results From a search retrieval set of 9555 papers, authors excluded 9494 papers at title/abstract and 52 at full text, leaving 9 papers for inclusion in the review update. The ability of the machine classifier to reduce the percentage of papers that need to be manually screened to identify all the included studies, under different training conditions, will be reported. Conclusions The findings of this study will be presented along with an estimate of any efficiency gains for the author team if the screening process can be semi‐automated using text mining methodology, along with a discussion of the implications for text mining in screening papers within complex health reviews.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article presents and evaluates a model to automatically derive word association networks from text corpora. Two aspects were evaluated: To what degree can corpus-based word association networks (CANs) approximate human word association networks with respect to (1) their ability to quantitatively predict word associations and (2) their structural network characteristics. Word association networks are the basis of the human mental lexicon. However, extracting such networks from human subjects is laborious, time consuming and thus necessarily limited in relation to the breadth of human vocabulary. Automatic derivation of word associations from text corpora would address these limitations. In both evaluations corpus-based processing provided vector representations for words. These representations were then employed to derive CANs using two measures: (1) the well known cosine metric, which is a symmetric measure, and (2) a new asymmetric measure computed from orthogonal vector projections. For both evaluations, the full set of 4068 free association networks (FANs) from the University of South Florida word association norms were used as baseline human data. Two corpus based models were benchmarked for comparison: a latent topic model and latent semantic analysis (LSA). We observed that CANs constructed using the asymmetric measure were slightly less effective than the topic model in quantitatively predicting free associates, and slightly better than LSA. The structural networks analysis revealed that CANs do approximate the FANs to an encouraging degree.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).