973 resultados para Compressed text search
Resumo:
An increasing amount of people seek health advice on the web using search engines; this poses challenging problems for current search technologies. In this paper we report an initial study of the effectiveness of current search engines in retrieving relevant information for diagnostic medical circumlocutory queries, i.e., queries that are issued by people seeking information about their health condition using a description of the symptoms they observes (e.g. hives all over body) rather than the medical term (e.g. urticaria). This type of queries frequently happens when people are unfamiliar with a domain or language and they are common among health information seekers attempting to self-diagnose or self-treat themselves. Our analysis reveals that current search engines are not equipped to effectively satisfy such information needs; this can have potential harmful outcomes on people’s health. Our results advocate for more research in developing information retrieval methods to support such complex information needs.
Resumo:
Background There is evidence that family and friends influence children's decisions to smoke. Objectives To assess the effectiveness of interventions to help families stop children starting smoking. Search methods We searched 14 electronic bibliographic databases, including the Cochrane Tobacco Addiction Group specialized register, MEDLINE, EMBASE, PsycINFO, CINAHL unpublished material, and key articles' reference lists. We performed free-text internet searches and targeted searches of appropriate websites, and hand-searched key journals not available electronically. We consulted authors and experts in the field. The most recent search was 3 April 2014. There were no date or language limitations. Selection criteria Randomised controlled trials (RCTs) of interventions with children (aged 5-12) or adolescents (aged 13-18) and families to deter tobacco use. The primary outcome was the effect of the intervention on the smoking status of children who reported no use of tobacco at baseline. Included trials had to report outcomes measured at least six months from the start of the intervention. Data collection and analysis We reviewed all potentially relevant citations and retrieved the full text to determine whether the study was an RCT and matched our inclusion criteria. Two authors independently extracted study data for each RCT and assessed them for risk of bias. We pooled risk ratios using a Mantel-Haenszel fixed effect model. Main results Twenty-seven RCTs were included. The interventions were very heterogeneous in the components of the family intervention, the other risk behaviours targeted alongside tobacco, the age of children at baseline and the length of follow-up. Two interventions were tested by two RCTs, one was tested by three RCTs and the remaining 20 distinct interventions were tested only by one RCT. Twenty-three interventions were tested in the USA, two in Europe, one in Australia and one in India. The control conditions fell into two main groups: no intervention or usual care; or school-based interventions provided to all participants. These two groups of studies were considered separately. Most studies had a judgement of 'unclear' for at least one risk of bias criteria, so the quality of evidence was downgraded to moderate. Although there was heterogeneity between studies there was little evidence of statistical heterogeneity in the results. We were unable to extract data from all studies in a format that allowed inclusion in a meta-analysis. There was moderate quality evidence family-based interventions had a positive impact on preventing smoking when compared to a no intervention control. Nine studies (4810 participants) reporting smoking uptake amongst baseline non-smokers could be pooled, but eight studies with about 5000 participants could not be pooled because of insufficient data. The pooled estimate detected a significant reduction in smoking behaviour in the intervention arms (risk ratio [RR] 0.76, 95% confidence interval [CI] 0.68 to 0.84). Most of these studies used intensive interventions. Estimates for the medium and low intensity subgroups were similar but confidence intervals were wide. Two studies in which some of the 4487 participants already had smoking experience at baseline did not detect evidence of effect (RR 1.04, 95% CI 0.93 to 1.17). Eight RCTs compared a combined family plus school intervention to a school intervention only. Of the three studies with data, two RCTS with outcomes for 2301 baseline never smokers detected evidence of an effect (RR 0.85, 95% CI 0.75 to 0.96) and one study with data for 1096 participants not restricted to never users at baseline also detected a benefit (RR 0.60, 95% CI 0.38 to 0.94). The other five studies with about 18,500 participants did not report data in a format allowing meta-analysis. One RCT also compared a family intervention to a school 'good behaviour' intervention and did not detect a difference between the two types of programme (RR 1.05, 95% CI 0.80 to 1.38, n = 388). No studies identified any adverse effects of intervention. Authors' conclusions There is moderate quality evidence to suggest that family-based interventions can have a positive effect on preventing children and adolescents from starting to smoke. There were more studies of high intensity programmes compared to a control group receiving no intervention, than there were for other compairsons. The evidence is therefore strongest for high intensity programmes used independently of school interventions. Programmes typically addressed family functioning, and were introduced when children were between 11 and 14 years old. Based on this moderate quality evidence a family intervention might reduce uptake or experimentation with smoking by between 16 and 32%. However, these findings should be interpreted cautiously because effect estimates could not include data from all studies. Our interpretation is that the common feature of the effective high intensity interventions was encouraging authoritative parenting (which is usually defined as showing strong interest in and care for the adolescent, often with rule setting). This is different from authoritarian parenting (do as I say) or neglectful or unsupervised parenting.
Resumo:
Objective To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Design Systematic review. Data sources The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. Selection criteria For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. Methods The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Results Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. Conclusions The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field.
Resumo:
Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.
Resumo:
This thesis presents a promising boundary setting method for solving challenging issues in text classification to produce an effective text classifier. A classifier must identify boundary between classes optimally. However, after the features are selected, the boundary is still unclear with regard to mixed positive and negative documents. A classifier combination method to boost effectiveness of the classification model is also presented. The experiments carried out in the study demonstrate that the proposed classifier is promising.
Resumo:
This article presents and evaluates a model to automatically derive word association networks from text corpora. Two aspects were evaluated: To what degree can corpus-based word association networks (CANs) approximate human word association networks with respect to (1) their ability to quantitatively predict word associations and (2) their structural network characteristics. Word association networks are the basis of the human mental lexicon. However, extracting such networks from human subjects is laborious, time consuming and thus necessarily limited in relation to the breadth of human vocabulary. Automatic derivation of word associations from text corpora would address these limitations. In both evaluations corpus-based processing provided vector representations for words. These representations were then employed to derive CANs using two measures: (1) the well known cosine metric, which is a symmetric measure, and (2) a new asymmetric measure computed from orthogonal vector projections. For both evaluations, the full set of 4068 free association networks (FANs) from the University of South Florida word association norms were used as baseline human data. Two corpus based models were benchmarked for comparison: a latent topic model and latent semantic analysis (LSA). We observed that CANs constructed using the asymmetric measure were slightly less effective than the topic model in quantitatively predicting free associates, and slightly better than LSA. The structural networks analysis revealed that CANs do approximate the FANs to an encouraging degree.
Resumo:
The literacy demands of tables and graphs are different from those of prose texts such as narrative. This paper draws from part of a qualitative case study which sought to investigate strategies that scaffold and enhance the teaching and learning of varied representations in text. As indicated in the paper, the method focused on the teaching and learning of tables and graphs with use of Freebody and Luke's (1990) four resources model from literacy education.
Resumo:
Traditional text classification technology based on machine learning and data mining techniques has made a big progress. However, it is still a big problem on how to draw an exact decision boundary between relevant and irrelevant objects in binary classification due to much uncertainty produced in the process of the traditional algorithms. The proposed model CTTC (Centroid Training for Text Classification) aims to build an uncertainty boundary to absorb as many indeterminate objects as possible so as to elevate the certainty of the relevant and irrelevant groups through the centroid clustering and training process. The clustering starts from the two training subsets labelled as relevant or irrelevant respectively to create two principal centroid vectors by which all the training samples are further separated into three groups: POS, NEG and BND, with all the indeterminate objects absorbed into the uncertain decision boundary BND. Two pairs of centroid vectors are proposed to be trained and optimized through the subsequent iterative multi-learning process, all of which are proposed to collaboratively help predict the polarities of the incoming objects thereafter. For the assessment of the proposed model, F1 and Accuracy have been chosen as the key evaluation measures. We stress the F1 measure because it can display the overall performance improvement of the final classifier better than Accuracy. A large number of experiments have been completed using the proposed model on the Reuters Corpus Volume 1 (RCV1) which is important standard dataset in the field. The experiment results show that the proposed model has significantly improved the binary text classification performance in both F1 and Accuracy compared with three other influential baseline models.
Resumo:
Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).
Resumo:
In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.
Resumo:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.
Resumo:
Despite extensive literature on female mate choice, empirical evidence on women’s mating preferences in the search for a sperm donor is scarce, even though this search, by isolating a male’s genetic impact on offspring from other factors like paternal investment, offers a naturally ”controlled” research setting. In this paper, we work to fill this void by examining the rapidly growing online sperm donor market, which is raising new challenges by offering women novel ways to seek out donor sperm. We not only identify individual factors that influence women’s mating preferences but find strong support for the proposition that behavioural traits (inner values) are more important in these choices than physical appearance (exterior values). We also report evidence that physical factors matter more than resources or other external cues of material success, perhaps because the relevance of good character in donor selection is part of a female psychological adaptation throughout evolutionary history. The lack of evidence on a preference for material resources, on the other hand, may indicate the ability of peer socialization and better access to resources to rapidly shape the female decision process. Overall, the paper makes useful contributions to both the literature on human behaviour and that on decision-making in extreme and highly important situations.
Resumo:
Monte-Carlo Tree Search (MCTS) is a heuristic to search in large trees. We apply it to argumentative puzzles where MCTS pursues the best argumentation with respect to a set of arguments to be argued. To make our ideas as widely applicable as possible, we integrate MCTS to an abstract setting for argumentation where the content of arguments is left unspecified. Experimental results show the pertinence of this integration for learning argumentations by comparing it with a basic reinforcement learning.
Resumo:
Particle swarm optimization (PSO), a new population based algorithm, has recently been used on multi-robot systems. Although this algorithm is applied to solve many optimization problems as well as multi-robot systems, it has some drawbacks when it is applied on multi-robot search systems to find a target in a search space containing big static obstacles. One of these defects is premature convergence. This means that one of the properties of basic PSO is that when particles are spread in a search space, as time increases they tend to converge in a small area. This shortcoming is also evident on a multi-robot search system, particularly when there are big static obstacles in the search space that prevent the robots from finding the target easily; therefore, as time increases, based on this property they converge to a small area that may not contain the target and become entrapped in that area.Another shortcoming is that basic PSO cannot guarantee the global convergence of the algorithm. In other words, initially particles explore different areas, but in some cases they are not good at exploiting promising areas, which will increase the search time.This study proposes a method based on the particle swarm optimization (PSO) technique on a multi-robot system to find a target in a search space containing big static obstacles. This method is not only able to overcome the premature convergence problem but also establishes an efficient balance between exploration and exploitation and guarantees global convergence, reducing the search time by combining with a local search method, such as A-star.To validate the effectiveness and usefulness of algorithms,a simulation environment has been developed for conducting simulation-based experiments in different scenarios and for reporting experimental results. These experimental results have demonstrated that the proposed method is able to overcome the premature convergence problem and guarantee global convergence.
Resumo:
This report identifies the outcomes of a program evaluation of the five year Workplace Health and Safety Strategy (2012-2017), specifically, the engagement component within the Queensland Ambulance Service. As part of the former Department of Community Safety, their objective was to work towards harmonising the occupational health and safety policies and process to improve the workplace culture. The report examines and assess the process paths and resource inputs into the strategy, provides feedback on progress to achieving identified goals as well as identify opportunities for improvements and barriers to progress. Consultations were held with key stakeholders within QAS and focus groups were facilitated with managers and health and safety representatives of each Local Area Service Network.