881 resultados para Text linguistics
Resumo:
Reflective writing is an important learning task to help foster reflective practice, but even when assessed it is rarely analysed or critically reviewed due to its subjective and affective nature. We propose a process for capturing subjective and affective analytics based on the identification and recontextualisation of anomalous features within reflective text. We evaluate 2 human supervised trials of the process, and so demonstrate the potential for an automated Anomaly Recontextualisation process for Learning Analytics.
Resumo:
Background The requirement for dual screening of titles and abstracts to select papers to examine in full text can create a huge workload, not least when the topic is complex and a broad search strategy is required, resulting in a large number of results. An automated system to reduce this burden, while still assuring high accuracy, has the potential to provide huge efficiency savings within the review process. Objectives To undertake a direct comparison of manual screening with a semi‐automated process (priority screening) using a machine classifier. The research is being carried out as part of the current update of a population‐level public health review. Methods Authors have hand selected studies for the review update, in duplicate, using the standard Cochrane Handbook methodology. A retrospective analysis, simulating a quasi‐‘active learning’ process (whereby a classifier is repeatedly trained based on ‘manually’ labelled data) will be completed, using different starting parameters. Tests will be carried out to see how far different training sets, and the size of the training set, affect the classification performance; i.e. what percentage of papers would need to be manually screened to locate 100% of those papers included as a result of the traditional manual method. Results From a search retrieval set of 9555 papers, authors excluded 9494 papers at title/abstract and 52 at full text, leaving 9 papers for inclusion in the review update. The ability of the machine classifier to reduce the percentage of papers that need to be manually screened to identify all the included studies, under different training conditions, will be reported. Conclusions The findings of this study will be presented along with an estimate of any efficiency gains for the author team if the screening process can be semi‐automated using text mining methodology, along with a discussion of the implications for text mining in screening papers within complex health reviews.
Resumo:
Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.
Resumo:
This thesis presents a promising boundary setting method for solving challenging issues in text classification to produce an effective text classifier. A classifier must identify boundary between classes optimally. However, after the features are selected, the boundary is still unclear with regard to mixed positive and negative documents. A classifier combination method to boost effectiveness of the classification model is also presented. The experiments carried out in the study demonstrate that the proposed classifier is promising.
Resumo:
This article presents and evaluates a model to automatically derive word association networks from text corpora. Two aspects were evaluated: To what degree can corpus-based word association networks (CANs) approximate human word association networks with respect to (1) their ability to quantitatively predict word associations and (2) their structural network characteristics. Word association networks are the basis of the human mental lexicon. However, extracting such networks from human subjects is laborious, time consuming and thus necessarily limited in relation to the breadth of human vocabulary. Automatic derivation of word associations from text corpora would address these limitations. In both evaluations corpus-based processing provided vector representations for words. These representations were then employed to derive CANs using two measures: (1) the well known cosine metric, which is a symmetric measure, and (2) a new asymmetric measure computed from orthogonal vector projections. For both evaluations, the full set of 4068 free association networks (FANs) from the University of South Florida word association norms were used as baseline human data. Two corpus based models were benchmarked for comparison: a latent topic model and latent semantic analysis (LSA). We observed that CANs constructed using the asymmetric measure were slightly less effective than the topic model in quantitatively predicting free associates, and slightly better than LSA. The structural networks analysis revealed that CANs do approximate the FANs to an encouraging degree.
Resumo:
This new volume, Exploring with Grammar in the Primary Years (Exley, Kevin & Mantei, 2014), follows on from Playing with Grammar in the Early Years (Exley & Kervin, 2013). We extend our thanks to the ALEA membership for their take up of the first volume and the vibrant conversations around our first attempt at developing a pedagogy for the teaching of grammar in the early years. Your engagement at locally held ALEA events has motivated us to complete this second volume and reassert our interest in the pursuit of socially-just outcomes in the primary years. As noted in Exley and Kervin (2013), we believe that mastering a range of literacy competences includes not only the technical skills for learning, but also the resources for viewing and constructing the world (Freire and Macdeo, 1987). Rather than seeing knowledge about language as the accumulation of technical skills alone, the viewpoint to which we subscribe treats knowledge about language as a dialectic that evolves from, is situated in, and contributes to active participation within a social arena (Halliday, 1978). We acknowledge that to explore is to engage in processes of discovery as we look closely and examine the opportunities before us. As such, we draw on Janks’ (2000; 2014) critical literacy theory to underpin many of the learning experiences in this text. Janks (2000) argues that effective participation in society requires knowledge about how the power of language promotes views, beliefs and values of certain groups to the exclusion of others. Powerful language users can identify not only how readers are positioned by these views, but also the ways these views are conveyed through the design of the text, that is, the combination of vocabulary, syntax, image, movement and sound. Similarly, powerful designers of texts can make careful modal choices in written and visual design to promote certain perspectives that position readers and viewers in new ways to consider more diverse points of view. As the title of our text suggests, our activities are designed to support learners in exploring the design of texts to achieve certain purposes and to consider the potential for the sharing of their own views through text production. In Exploring with Grammar in the Primary Years, we focus on the Year 3 to Year 6 grouping in line with the Australian Curriculum, Assessment and Reporting Authority’s (hereafter ACARA) advice on the ‘nature of learners’ (ACARA, 2014). Our goal in this publication is to provide a range of highly practical strategies for scaffolding students’ learning through some of the Content Descriptions from the Australian Curriculum: English Version 7.2, hereafter AC:E (ACARA, 2014). We continue to express our belief in the power of using whole texts from a range of authentic sources including high quality children’s literature, the internet, and examples of community-based texts to expose students to the richness of language. Taking time to look at language patterns within actual texts is a pathway to ‘…capture interest, stir the imagination and absorb the [child]’ into the world of language and literacy (Saxby, 1993, p. 55). It is our intention to be more overt this time and send a stronger message that our learning experiences are simply ‘sample’ activities rather than a teachers’ workbook or a program of study to be followed. We’re hoping that teachers and students will continue to explore their bookshelves, the internet and their community for texts that provide powerful opportunities to engage with language-based learning experiences. In the following three sections, we have tried to remain faithful to our interpretation of the AC:E Content Descriptions without giving an exhaustive explanation of the grammatical terms. This recently released curriculum offers a new theoretical approach to building students’ knowledge about language. The AC:E uses selected traditional terms through an approach developed in systemic functional linguistics (see Halliday and Matthiessen, 2004) to highlight the dynamic forms and functions of multimodal language in texts. For example, the following statement, taken from the ‘Language: Knowing about the English language’ strand states: English uses standard grammatical terminology within a contextual framework, in which language choices are seen to vary according to the topics at hand, the nature and proximity of the relationships between the language users, and the modalities or channels of communication available (ACARA, 2014). Put simply, traditional grammar terms are used within a functional framework made up of field, tenor, and mode. An understanding of genre is noted with the reference to a ‘contextual framework’. The ‘topics at hand’ concern the field or subject matter of the text. The ‘relationships between the language users’ is a description of tenor. There is reference to ‘modalities’, such as spoken, written or visual text. We posit that this innovative approach is necessary for working with contemporary multimodal and cross-cultural texts (see Exley & Mills, 2012). Other excellent tomes, such as Derewianka (2011), Humphrey, Droga and Feez (2012), and Rossbridge and Rushton (2011) provide more comprehensive explanations of this unique metalanguage, as does the AC:E Glossary. We’ve reproduced some of the AC:E Glossary at the end of this publication. We’ve also kept the same layout for our learning experiences, ensuring that our teacher notes are not only succinct but also prudent in their placement. Each learning experience is connected to a Content Description from the AC:E and contains an experience with an identified purpose, suggested resource text and a possible sequence for the experience that always commences with an orientation to text followed by an examination of a particular grammatical resource. Our plans allow for focused discussion, shared exploration and opportunities to revisit the same text for the purpose of enhancing meaning making. Some learning experiences finish with deconstruction of a stimulus text while others invite students to engage in the design of new texts. We encourage you to look for opportunities in your own classrooms to move from text deconstruction to text design. In this way, students can express not only their emerging grammatical understandings, but also the ways they might position readers or viewers through the creation of their own texts. We expect that each of these learning experiences will vary in the time taken. Some may indeed take a couple if not a few teaching episodes to work through, especially if students are meeting a concept or a pedagogical strategy for the first time. We hope you use as much, or as little, of each experience as is needed for your students. We do not want the teaching of grammar to slip into a crisis of irrelevance or to be seen as a series of worksheet drills with finite answers. We firmly believe that strategies for effective deconstruction and design practice, however, have much portability. We three are very keen to hear from teachers who are adopting and adapting these learning experiences in their classrooms. Please email us on b.exley@qut.edu.au, lkervin@uow.edu.au or jessicam@ouw.edu.au. We’d love to continue the conversation with you over time. Beryl Exley, Lisa Kervin & Jessica Mantei
Resumo:
Traditional text classification technology based on machine learning and data mining techniques has made a big progress. However, it is still a big problem on how to draw an exact decision boundary between relevant and irrelevant objects in binary classification due to much uncertainty produced in the process of the traditional algorithms. The proposed model CTTC (Centroid Training for Text Classification) aims to build an uncertainty boundary to absorb as many indeterminate objects as possible so as to elevate the certainty of the relevant and irrelevant groups through the centroid clustering and training process. The clustering starts from the two training subsets labelled as relevant or irrelevant respectively to create two principal centroid vectors by which all the training samples are further separated into three groups: POS, NEG and BND, with all the indeterminate objects absorbed into the uncertain decision boundary BND. Two pairs of centroid vectors are proposed to be trained and optimized through the subsequent iterative multi-learning process, all of which are proposed to collaboratively help predict the polarities of the incoming objects thereafter. For the assessment of the proposed model, F1 and Accuracy have been chosen as the key evaluation measures. We stress the F1 measure because it can display the overall performance improvement of the final classifier better than Accuracy. A large number of experiments have been completed using the proposed model on the Reuters Corpus Volume 1 (RCV1) which is important standard dataset in the field. The experiment results show that the proposed model has significantly improved the binary text classification performance in both F1 and Accuracy compared with three other influential baseline models.
Resumo:
This report identifies the outcomes of a program evaluation of the five year Workplace Health and Safety Strategy (2012-2017), specifically, the engagement component within the Queensland Ambulance Service. As part of the former Department of Community Safety, their objective was to work towards harmonising the occupational health and safety policies and process to improve the workplace culture. The report examines and assess the process paths and resource inputs into the strategy, provides feedback on progress to achieving identified goals as well as identify opportunities for improvements and barriers to progress. Consultations were held with key stakeholders within QAS and focus groups were facilitated with managers and health and safety representatives of each Local Area Service Network.
Resumo:
In this paper we present a robust method to detect handwritten text from unconstrained drawings on normal whiteboards. Unlike printed text on documents, free form handwritten text has no pattern in terms of size, orientation and font and it is often mixed with other drawings such as lines and shapes. Unlike handwritings on paper, handwritings on a normal whiteboard cannot be scanned so the detection has to be based on photos. Our work traces straight edges on photos of the whiteboard and builds graph representation of connected components. We use geometric properties such as edge density, graph density, aspect ratio and neighborhood similarity to differentiate handwritten text from other drawings. The experiment results show that our method achieves satisfactory precision and recall. Furthermore, the method is robust and efficient enough to be deployed in a mobile device. This is an important enabler of business applications that support whiteboard-centric visual meetings in enterprise scenarios. © 2012 IEEE.
Resumo:
Assessing students’ conceptual understanding of technical content is important for instructors as well as students to learn content and apply knowledge in various contexts. Concept inventories that identify possible misconceptions through validated multiple-choice questions are helpful in identifying a misconception that may exist, but do not provide a meaningful assessment of why they exist or the nature of the students’ understanding. We conducted a case study with undergraduate students in an electrical engineering course by testing a validated multiple-choice response concept inventory that we augmented with a component for students to provide written explanations for their multiple-choice selection. Results revealed that correctly chosen multiple-choice selections did not always match correct conceptual understanding for question testing a specific concept. The addition of a text-response to multiple-choice concept inventory questions provided an enhanced and meaningful assessment of students’ conceptual understanding and highlighted variables associated with current concept inventories or multiple choice questions.
Resumo:
Legal translation theory brooks little interference with the source legal text. With few exceptions (Joseph 2005; Hammel 2008; Harvey 2002; Kahaner 2005; Kasirer 2001; Lawson 2006), lawyers and linguists tend to tether themselves to the pole of literalism. More a tight elastic band than an unyielding rope, this tether constrains — rather than prohibits — liberal legal translations. It can stretch to accommodate a degree of freedom by the legal translator however, should it go too far, it snaps back to the default position of linguistic fidelity. This ‘stretch and snap’ gives legal translation a unique place in general translation theory. In the general debate over the ‘degree of freedom’ the translator enjoys in conveying the meaning of the text, legal translation theory has reached its own settlement. Passivity is the default; creativity, the ‘qualified’ exception (Hammel 2008: 275).
Resumo:
Concept mapping involves determining relevant concepts from a free-text input, where concepts are defined in an external reference ontology. This is an important process that underpins many applications for clinical information reporting, derivation of phenotypic descriptions, and a number of state-of-the-art medical information retrieval methods. Concept mapping can be cast into an information retrieval (IR) problem: free-text mentions are treated as queries and concepts from a reference ontology as the documents to be indexed and retrieved. This paper presents an empirical investigation applying general-purpose IR techniques for concept mapping in the medical domain. A dataset used for evaluating medical information extraction is adapted to measure the effectiveness of the considered IR approaches. Standard IR approaches used here are contrasted with the effectiveness of two established benchmark methods specifically developed for medical concept mapping. The empirical findings show that the IR approaches are comparable with one benchmark method but well below the best benchmark.
Resumo:
With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.