95 resultados para Text summarization
Resumo:
A big challenge for classification on text is the noisy of text data. It makes classification quality low. Many classification process can be divided into two sequential steps scoring and threshold setting (thresholding). Therefore to deal with noisy data problem, it is important to describe positive feature effectively scoring and to set a suitable threshold. Most existing text classifiers do not concentrate on these two jobs. In this paper, we propose a novel text classifier with pattern-based scoring that describe positive feature effectively, followed by threshold setting. The thresholding is based on score of training set, make it is simple to implement in other scoring methods. Experiment shows that our pattern-based classifier is promising.
Resumo:
Much has been written on Michel Foucault’s reluctance to clearly delineate a research method, particularly with respect to genealogy (Harwood 2000; Meadmore, Hatcher, & McWilliam 2000; Tamboukou 1999). Foucault (1994, p. 288) himself disliked prescription stating, “I take care not to dictate how things should be” and wrote provocatively to disrupt equilibrium and certainty, so that “all those who speak for others or to others” no longer know what to do. It is doubtful, however, that Foucault ever intended for researchers to be stricken by that malaise to the point of being unwilling to make an intellectual commitment to methodological possibilities. Taking criticism of “Foucauldian” discourse analysis as a convenient point of departure to discuss the objectives of poststructural analyses of language, this paper develops what might be called a discursive analytic; a methodological plan to approach the analysis of discourses through the location of statements that function with constitutive effects.
Resumo:
It is a big challenge to find useful associations in databases for user specific needs. The essential issue is how to provide efficient methods for describing meaningful associations and pruning false discoveries or meaningless ones. One major obstacle is the overwhelmingly large volume of discovered patterns. This paper discusses an alternative approach called multi-tier granule mining to improve frequent association mining. Rather than using patterns, it uses granules to represent knowledge implicitly contained in databases. It also uses multi-tier structures and association mappings to represent association rules in terms of granules. Consequently, association rules can be quickly accessed and meaningless association rules can be justified according to the association mappings. Moreover, the proposed structure is also an precise compression of patterns which can restore the original supports. The experimental results shows that the proposed approach is promising.
Resumo:
Text categorisation is challenging, due to the complex structure with heterogeneous, changing topics in documents. The performance of text categorisation relies on the quality of samples, effectiveness of document features, and the topic coverage of categories, depending on the employing strategies; supervised or unsupervised; single labelled or multi-labelled. Attempting to deal with these reliability issues in text categorisation, we propose an unsupervised multi-labelled text categorisation approach that maps the local knowledge in documents to global knowledge in a world ontology to optimise categorisation result. The conceptual framework of the approach consists of three modules; pattern mining for feature extraction; feature-subject mapping for categorisation; concept generalisation for optimised categorisation. The approach has been promisingly evaluated by compared with typical text categorisation methods, based on the ground truth encoded by human experts.
Resumo:
In a classification problem typically we face two challenging issues, the diverse characteristic of negative documents and sometimes a lot of negative documents that are closed to positive documents. Therefore, it is hard for a single classifier to clearly classify incoming documents into classes. This paper proposes a novel gradual problem solving to create a two-stage classifier. The first stage identifies reliable negatives (negative documents with weak positive characteristics). It concentrates on minimizing the number of false negative documents (recall-oriented). We use Rocchio, an existing recall based classifier, for this stage. The second stage is a precision-oriented “fine tuning”, concentrates on minimizing the number of false positive documents by applying pattern (a statistical phrase) mining techniques. In this stage a pattern-based scoring is followed by threshold setting (thresholding). Experiment shows that our statistical phrase based two-stage classifier is promising.
Resumo:
We propose a cluster ensemble method to map the corpus documents into the semantic space embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous cluster ensemble is constructed with multiple types of relations i.e. document-term, document-concept and document-category. A final clustering solution is obtained by exploiting associations between document pairs and hubness of the documents. Empirical analysis with various real data sets reveals that the proposed meth-od outperforms state-of-the-art text clustering approaches.
Resumo:
Reliability of the performance of biometric identity verification systems remains a significant challenge. Individual biometric samples of the same person (identity class) are not identical at each presentation and performance degradation arises from intra-class variability and inter-class similarity. These limitations lead to false accepts and false rejects that are dependent. It is therefore difficult to reduce the rate of one type of error without increasing the other. The focus of this dissertation is to investigate a method based on classifier fusion techniques to better control the trade-off between the verification errors using text-dependent speaker verification as the test platform. A sequential classifier fusion architecture that integrates multi-instance and multisample fusion schemes is proposed. This fusion method enables a controlled trade-off between false alarms and false rejects. For statistically independent classifier decisions, analytical expressions for each type of verification error are derived using base classifier performances. As this assumption may not be always valid, these expressions are modified to incorporate the correlation between statistically dependent decisions from clients and impostors. The architecture is empirically evaluated by applying the proposed architecture for text dependent speaker verification using the Hidden Markov Model based digit dependent speaker models in each stage with multiple attempts for each digit utterance. The trade-off between the verification errors is controlled using the parameters, number of decision stages (instances) and the number of attempts at each decision stage (samples), fine-tuned on evaluation/tune set. The statistical validation of the derived expressions for error estimates is evaluated on test data. The performance of the sequential method is further demonstrated to depend on the order of the combination of digits (instances) and the nature of repetitive attempts (samples). The false rejection and false acceptance rates for proposed fusion are estimated using the base classifier performances, the variance in correlation between classifier decisions and the sequence of classifiers with favourable dependence selected using the 'Sequential Error Ratio' criteria. The error rates are better estimated by incorporating user-dependent (such as speaker-dependent thresholds and speaker-specific digit combinations) and class-dependent (such as clientimpostor dependent favourable combinations and class-error based threshold estimation) information. The proposed architecture is desirable in most of the speaker verification applications such as remote authentication, telephone and internet shopping applications. The tuning of parameters - the number of instances and samples - serve both the security and user convenience requirements of speaker-specific verification. The architecture investigated here is applicable to verification using other biometric modalities such as handwriting, fingerprints and key strokes.
Resumo:
Managing large cohorts of undergraduate student nurses during off-campus clinical placement is complex and challenging. Clinical facilitators are required to support and assess nursing students during clinical placement. Therefore clear communication between university academic coordinators and clinical facilitators is essential for consistency and prompt management of emerging issues. Increasing work demands require both coordinators and facilitators to have an efficient and effective mode of communication. The aim of this study was to explore the use of Short Message Service (SMS) texts, sent between mobile phones, for communication between university Unit Coordinators and off-campus Clinical Facilitators. This study used an after-only design. During a two week clinical placement 46 clinical facilitators working with first and second year Bachelor of Nursing students from a large metropolitan Australian university were regularly sent SMS texts of relevant updates and reminders from the university coordinator. A 15 item questionnaire comprising x of 5 point likert scale and 3 open-ended questions was then used to survey the clinical facilitators. The response rate was 47.8% (n=22). Correlations were found between the approachability of the coordinator and facilitator perception of a) that the coordinator understood issues on clinical placement (r=0.785, p<0.001,), and b) being part of the teaching team (r=0.768, p<0.001). Analysis of responses to qualitative questions revealed three themes: connection, approachability and collaboration. Results indicate that SMS communication is convenient and appropriate in this setting. This quasi-experimental after-test study found regular SMS communication improves a sense of connection, approachability and collaboration.
Resumo:
Over the last decade, the majority of existing search techniques is either keyword- based or category-based, resulting in unsatisfactory effectiveness. Meanwhile, studies have illustrated that more than 80% of users preferred personalized search results. As a result, many studies paid a great deal of efforts (referred to as col- laborative filtering) investigating on personalized notions for enhancing retrieval performance. One of the fundamental yet most challenging steps is to capture precise user information needs. Most Web users are inexperienced or lack the capability to express their needs properly, whereas the existent retrieval systems are highly sensitive to vocabulary. Researchers have increasingly proposed the utilization of ontology-based tech- niques to improve current mining approaches. The related techniques are not only able to refine search intentions among specific generic domains, but also to access new knowledge by tracking semantic relations. In recent years, some researchers have attempted to build ontological user profiles according to discovered user background knowledge. The knowledge is considered to be both global and lo- cal analyses, which aim to produce tailored ontologies by a group of concepts. However, a key problem here that has not been addressed is: how to accurately match diverse local information to universal global knowledge. This research conducts a theoretical study on the use of personalized ontolo- gies to enhance text mining performance. The objective is to understand user information needs by a \bag-of-concepts" rather than \words". The concepts are gathered from a general world knowledge base named the Library of Congress Subject Headings. To return desirable search results, a novel ontology-based mining approach is introduced to discover accurate search intentions and learn personalized ontologies as user profiles. The approach can not only pinpoint users' individual intentions in a rough hierarchical structure, but can also in- terpret their needs by a set of acknowledged concepts. Along with global and local analyses, another solid concept matching approach is carried out to address about the mismatch between local information and world knowledge. Relevance features produced by the Relevance Feature Discovery model, are determined as representatives of local information. These features have been proven as the best alternative for user queries to avoid ambiguity and consistently outperform the features extracted by other filtering models. The two attempt-to-proposed ap- proaches are both evaluated by a scientific evaluation with the standard Reuters Corpus Volume 1 testing set. A comprehensive comparison is made with a num- ber of the state-of-the art baseline models, including TF-IDF, Rocchio, Okapi BM25, the deploying Pattern Taxonomy Model, and an ontology-based model. The gathered results indicate that the top precision can be improved remarkably with the proposed ontology mining approach, where the matching approach is successful and achieves significant improvements in most information filtering measurements. This research contributes to the fields of ontological filtering, user profiling, and knowledge representation. The related outputs are critical when systems are expected to return proper mining results and provide personalized services. The scientific findings have the potential to facilitate the design of advanced preference mining models, where impact on people's daily lives.
Resumo:
Background The prevalence of type 2 diabetes is rising internationally. Patients with diabetes have a higher risk of cardiovascular events accounting for substantial premature morbidity and mortality, and health care expenditure. Given healthcare workforce limitations, there is a need to improve interventions that promote positive self-management behaviours that enable patients to manage their chronic conditions effectively, across different cultural contexts. Previous studies have evaluated the feasibility of including telephone and Short Message Service (SMS) follow up in chronic disease self-management programs, but only for single diseases or in one specific population. Therefore, the aim of this study is to evaluate the feasibility and short-term efficacy of incorporating telephone and text messaging to support the care of patients with diabetes and cardiac disease, in Australia and in Taiwan. Methods/design A randomised controlled trial design will be used to evaluate a self-management program for people with diabetes and cardiac disease that incorporates the use of simple remote-access communication technologies. A sample size of 180 participants from Australia and Taiwan will be recruited and randomised in a one-to-one ratio to receive either the intervention in addition to usual care (intervention) or usual care alone (control). The intervention will consist of in-hospital education as well as follow up utilising personal telephone calls and SMS reminders. Primary short term outcomes of interest include self-care behaviours and self-efficacy assessed at baseline and four weeks. Discussion If the results of this investigation substantiate the feasibility and efficacy of the telephone and SMS intervention for promoting self management among patients with diabetes and cardiac disease in Australia and Taiwan, it will support the external validity of the intervention. It is anticipated that empirical data from this investigation will provide valuable information to inform future international collaborations, while providing a platform for further enhancements of the program, which has potential to benefit patients internationally.
Resumo:
This article examines manual textual categorisation by human coders with the hypothesis that the law of total probability may be violated for difficult categories. An empirical evaluation was conducted to compare a one step categorisation task with a two step categorisation task using crowdsourcing. It was found that the law of total probability was violated. Both a quantum and classical probabilistic interpretations for this violation are presented. Further studies are required to resolve whether quantum models are more appropriate for this task.
Resumo:
Purpose Contrast adaptation has been speculated to be an error signal for emmetropization. Myopic children exhibit higher contrast adaptation than emmetropic children. This study aimed to determine whether contrast adaptation varies with the type of text viewed by emmetropic and myopic young adults. Methods Baseline contrast sensitivity was determined in 25 emmetropic and 25 spectacle-corrected myopic young adults for 0.5, 1.2, 2.7, 4.4, and 6.2 cycles per degree (cpd) horizontal sine wave gratings. The adults spent periods looking at a 6.2 cpd high-contrast horizontal grating and reading lines of English and Chinese text (these texts comprised 1.2 cpd row and 6 cpd stroke frequencies). The effects of these near tasks on contrast sensitivity were determined, with decreases in sensitivity indicating contrast adaptation. Results Contrast adaptation was affected by the near task (F2,672 = 43.0; P < 0.001). Adaptation was greater for the grating task (0.13 ± 0.17 log unit, averaged across all frequencies) than reading tasks, but there was no significant difference between the two reading tasks (English 0.05 ± 0.13 log unit versus Chinese 0.04 ± 0.13 log unit). The myopic group showed significantly greater adaptation (by 0.04, 0.04, and 0.05 log units for English, Chinese, and grating tasks, respectively) than the emmetropic group (F1,48 = 5.0; P = 0.03). Conclusions In young adults, reading Chinese text induced similar contrast adaptation as reading English text. Myopes exhibited greater contrast adaptation than emmetropes. Contrast adaptation, independent of text type, might be associated with myopia development.
Resumo:
Background Managing large student cohorts can be a challenge for university academics, coordinating these units. Bachelor of Nursing programmes have the added challenge of managing multiple groups of students and clinical facilitators whilst completing clinical placement. Clear, time efficient and effective communication between coordinating academics and clinical facilitators is needed to ensure consistency between student and teaching groups and prompt management of emerging issues. Methods This study used a descriptive survey to explore the use of text messaging via a mobile phone, sent from coordinating academics to off-campus clinical facilitators, as an approach to providing direction and support. Results The response rate was 47.8% (n = 22). Correlations were found between the approachability of the coordinating academic and clinical facilitator perception that, a) the coordinating academic understood issues on clinical placement (r = 0.785, p < 0.001), and b) being part of the teaching team (r = 0.768, p < 0.001). Analysis of responses to qualitative questions revealed three themes: connection, approachability and collaboration. Conclusions This study demonstrates that use of regular text messages improves communication between coordinating academics and clinical facilitators. Findings suggest improved connection, approachability and collaboration between the coordinating academic and clinical facilitation staff.
Resumo:
Textual document set has become an important and rapidly growing information source in the web. Text classification is one of the crucial technologies for information organisation and management. Text classification has become more and more important and attracted wide attention of researchers from different research fields. In this paper, many feature selection methods, the implement algorithms and applications of text classification are introduced firstly. However, because there are much noise in the knowledge extracted by current data-mining techniques for text classification, it leads to much uncertainty in the process of text classification which is produced from both the knowledge extraction and knowledge usage, therefore, more innovative techniques and methods are needed to improve the performance of text classification. It has been a critical step with great challenge to further improve the process of knowledge extraction and effectively utilization of the extracted knowledge. Rough Set decision making approach is proposed to use Rough Set decision techniques to more precisely classify the textual documents which are difficult to separate by the classic text classification methods. The purpose of this paper is to give an overview of existing text classification technologies, to demonstrate the Rough Set concepts and the decision making approach based on Rough Set theory for building more reliable and effective text classification framework with higher precision, to set up an innovative evaluation metric named CEI which is very effective for the performance assessment of the similar research, and to propose a promising research direction for addressing the challenging problems in text classification, text mining and other relative fields.