283 resultados para Compressed text search
Resumo:
Textual document set has become an important and rapidly growing information source in the web. Text classification is one of the crucial technologies for information organisation and management. Text classification has become more and more important and attracted wide attention of researchers from different research fields. In this paper, many feature selection methods, the implement algorithms and applications of text classification are introduced firstly. However, because there are much noise in the knowledge extracted by current data-mining techniques for text classification, it leads to much uncertainty in the process of text classification which is produced from both the knowledge extraction and knowledge usage, therefore, more innovative techniques and methods are needed to improve the performance of text classification. It has been a critical step with great challenge to further improve the process of knowledge extraction and effectively utilization of the extracted knowledge. Rough Set decision making approach is proposed to use Rough Set decision techniques to more precisely classify the textual documents which are difficult to separate by the classic text classification methods. The purpose of this paper is to give an overview of existing text classification technologies, to demonstrate the Rough Set concepts and the decision making approach based on Rough Set theory for building more reliable and effective text classification framework with higher precision, to set up an innovative evaluation metric named CEI which is very effective for the performance assessment of the similar research, and to propose a promising research direction for addressing the challenging problems in text classification, text mining and other relative fields.
Resumo:
Topic modelling has been widely used in the fields of information retrieval, text mining, machine learning, etc. In this paper, we propose a novel model, Pattern Enhanced Topic Model (PETM), which makes improvements to topic modelling by semantically representing topics with discriminative patterns, and also makes innovative contributions to information filtering by utilising the proposed PETM to determine document relevance based on topics distribution and maximum matched patterns proposed in this paper. Extensive experiments are conducted to evaluate the effectiveness of PETM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models.
Resumo:
Genomic sequences are fundamentally text documents, admitting various representations according to need and tokenization. Gene expression depends crucially on binding of enzymes to the DNA sequence at small, poorly conserved binding sites, limiting the utility of standard pattern search. However, one may exploit the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, framing the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels based on weighted tree structures, traversing the paths within them to capture the features which underpin the task. Experimentally, we and that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data in molecular biology and other domains.
Resumo:
A significant minority of young job-seekers remain unemployed for many months, and are at risk of developing depression. Both empirical studies and theoretical models suggest that cognitive, behavioural and social isolation factors interact to increase this risk. Thus, interventions that reduce or prevent depression in young unemployed job-seekers by boosting their resilience are required. Mobile phones may be an effective medium to deliver resilience-boosting support to young unemployed people by using SMS messages to interrupt the feedback loop of depression and social isolation. Three focus groups were conducted to explore young unemployed job-seekers’ attitudes to receiving and requesting regular SMS messages that would help them to feel supported and motivated while job-seeking. Participants reacted favourably to this proposal, and thought that it would be useful to continue to receive and request SMS messages for a few months after commencing employment as well.
Resumo:
The aim of this research is to report initial experimental results and evaluation of a clinician-driven automated method that can address the issue of misdiagnosis from unstructured radiology reports. Timely diagnosis and reporting of patient symptoms in hospital emergency departments (ED) is a critical component of health services delivery. However, due to disperse information resources and vast amounts of manual processing of unstructured information, a point-of-care accurate diagnosis is often difficult. A rule-based method that considers the occurrence of clinician specified keywords related to radiological findings was developed to identify limb abnormalities, such as fractures. A dataset containing 99 narrative reports of radiological findings was sourced from a tertiary hospital. The rule-based method achieved an F-measure of 0.80 and an accuracy of 0.80. While our method achieves promising performance, a number of avenues for improvement were identified using advanced natural language processing (NLP) techniques.
Resumo:
This paper presents the prototype of an information retrieval system for medical records that utilises visualisation techniques, namely word clouds and timelines. The system simplifies and assists information seeking tasks within the medical domain. Access to patient medical information can be time consuming as it requires practitioners to review a large number of electronic medical records to find relevant information. Presenting a summary of the content of a medical document by means of a word cloud may permit information seekers to decide upon the relevance of a document to their information need in a simple and time effective manner. We extend this intuition, by mapping word clouds of electronic medical records onto a timeline, to provide temporal information to the user. This allows exploring word clouds in the context of a patient’s medical history. To enhance the presentation of word clouds, we also provide the means for calculating aggregations and differences between patient’s word clouds.
Resumo:
Objective To develop and evaluate machine learning techniques that identify limb fractures and other abnormalities (e.g. dislocations) from radiology reports. Materials and Methods 99 free-text reports of limb radiology examinations were acquired from an Australian public hospital. Two clinicians were employed to identify fractures and abnormalities from the reports; a third senior clinician resolved disagreements. These assessors found that, of the 99 reports, 48 referred to fractures or abnormalities of limb structures. Automated methods were then used to extract features from these reports that could be useful for their automatic classification. The Naive Bayes classification algorithm and two implementations of the support vector machine algorithm were formally evaluated using cross-fold validation over the 99 reports. Result Results show that the Naive Bayes classifier accurately identifies fractures and other abnormalities from the radiology reports. These results were achieved when extracting stemmed token bigram and negation features, as well as using these features in combination with SNOMED CT concepts related to abnormalities and disorders. The latter feature has not been used in previous works that attempted classifying free-text radiology reports. Discussion Automated classification methods have proven effective at identifying fractures and other abnormalities from radiology reports (F-Measure up to 92.31%). Key to the success of these techniques are features such as stemmed token bigrams, negations, and SNOMED CT concepts associated with morphologic abnormalities and disorders. Conclusion This investigation shows early promising results and future work will further validate and strengthen the proposed approaches.
Resumo:
Review(s) of: Settling the Pop Score: Pop Texts and Identity Politics, Stan Hawkins, Aldershot, Hants. : Ashgate, 2002, ISBN 0 7546 0352 0; pb, 234pp, ill, music exx, bibl. , discog. , index. The scholarly study of popular music has its origins in sociology and cultural studies, disciplinary areas in which musical meaning is often attributed to aspects of economical and sociological function. Against this tradition, recent writers have offered what is now referred to as ‘popular musicology’: a method or approach that tends towards a specific engagement with ‘pop texts’ on aesthetic, and perhaps even ‘musical’ terms. Stan Hawkins uses the term popular musicology ‘at his own peril,’ clearly recognising the implicit scholarly danger in his approach, whereby ‘formalist questions of musical analysis’ are dealt with ‘alongside the more intertextual discursive theorisations of musical expression’ (p. xii). In other words, popular musicologists dare to tread that fine line between text and context. As editor of the journal Popular Musicology Online, Hawkins is a leading advocate of this practice, specifically in the application of music-analytical techniques to popular music. His methodology attests to the influence of other leading figures in the area, notably Richard Middleton, Allan F. Moore and Derek Scott (general editor of the Ashgate Popular and Folk Music Series in which this book is published).
Resumo:
Background Timely diagnosis and reporting of patient symptoms in hospital emergency departments (ED) is a critical component of health services delivery. However, due to dispersed information resources and a vast amount of manual processing of unstructured information, accurate point-of-care diagnosis is often difficult. Aims The aim of this research is to report initial experimental evaluation of a clinician-informed automated method for the issue of initial misdiagnoses associated with delayed receipt of unstructured radiology reports. Method A method was developed that resembles clinical reasoning for identifying limb abnormalities. The method consists of a gazetteer of keywords related to radiological findings; the method classifies an X-ray report as abnormal if it contains evidence contained in the gazetteer. A set of 99 narrative reports of radiological findings was sourced from a tertiary hospital. Reports were manually assessed by two clinicians and discrepancies were validated by a third expert ED clinician; the final manual classification generated by the expert ED clinician was used as ground truth to empirically evaluate the approach. Results The automated method that attempts to individuate limb abnormalities by searching for keywords expressed by clinicians achieved an F-measure of 0.80 and an accuracy of 0.80. Conclusion While the automated clinician-driven method achieved promising performances, a number of avenues for improvement were identified using advanced natural language processing (NLP) and machine learning techniques.
Resumo:
This study investigates how the interaction of institutional market orientation and external search breadth influence the ability to use absorptive capacity to raise the level of corporate entrepreneurship. The findings of a sample of 331 supplier companies providing products and services to the mining industry of Australia and Iran indicate that the positive association between absorptive capacity and corporate entrepreneurship is stronger for companies with greater external knowledge search breadth. Moreover, operating in a less market-oriented institutional context such as, Iran diminishes the ability to utilise a firm’s absorptive capacity to raise their level of corporate entrepreneurship. Yet, firms operating in such contexts are able to overcome these disadvantages posed by their institutional context by engaging in broader external search of knowledge.
Resumo:
Searching for relevant peer-reviewed material is an integral part of corporate and academic researchers. Researchers collect huge amount of information over the years and sometimes struggle organizing it. Based on a study with 30 academic researchers, we explore, in combination, different searching and archiving activities of document-based information. Based on our results we provide several implications for design.
Resumo:
This thesis developed new search engine models that elicit the meaning behind the words found in documents and queries, rather than simply matching keywords. These new models were applied to searching medical records: an area where search is particularly challenging yet can have significant benefits to our society.