904 resultados para expository text
Resumo:
Experiences showed that developing business applications that base on text analysis normally requires a lot of time and expertise in the field of computer linguistics. Several approaches of integrating text analysis systems with business applications have been proposed, but so far there has been no coordinated approach which would enable building scalable and flexible applications of text analysis in enterprise scenarios. In this paper, a service-oriented architecture for text processing applications in the business domain is introduced. It comprises various groups of processing components and knowledge resources. The architecture, created as a result of our experiences with building natural language processing applications in business scenarios, allows for the reuse of text analysis and other components, and facilitates the development of business applications. We verify our approach by showing how the proposed architecture can be applied to create a text analytics enabled business application that addresses a concrete business scenario. © 2010 IEEE.
Resumo:
Assessing students’ conceptual understanding of technical content is important for instructors as well as students to learn content and apply knowledge in various contexts. Concept inventories that identify possible misconceptions through validated multiple-choice questions are helpful in identifying a misconception that may exist, but do not provide a meaningful assessment of why they exist or the nature of the students’ understanding. We conducted a case study with undergraduate students in an electrical engineering course by testing a validated multiple-choice response concept inventory that we augmented with a component for students to provide written explanations for their multiple-choice selection. Results revealed that correctly chosen multiple-choice selections did not always match correct conceptual understanding for question testing a specific concept. The addition of a text-response to multiple-choice concept inventory questions provided an enhanced and meaningful assessment of students’ conceptual understanding and highlighted variables associated with current concept inventories or multiple choice questions.
Resumo:
Concept mapping involves determining relevant concepts from a free-text input, where concepts are defined in an external reference ontology. This is an important process that underpins many applications for clinical information reporting, derivation of phenotypic descriptions, and a number of state-of-the-art medical information retrieval methods. Concept mapping can be cast into an information retrieval (IR) problem: free-text mentions are treated as queries and concepts from a reference ontology as the documents to be indexed and retrieved. This paper presents an empirical investigation applying general-purpose IR techniques for concept mapping in the medical domain. A dataset used for evaluating medical information extraction is adapted to measure the effectiveness of the considered IR approaches. Standard IR approaches used here are contrasted with the effectiveness of two established benchmark methods specifically developed for medical concept mapping. The empirical findings show that the IR approaches are comparable with one benchmark method but well below the best benchmark.
Resumo:
With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.
Resumo:
In my master’s thesis I analyse mystical Islamic poetry in ritualistic performance context, samā` , focusing on the poetry used by the Chishti Sufis. The work is based on both literary sources and ethnographic material collected in India. The central textual source is Surūd-i Rūhānī, a compilation of mystical poetry. Textual sources, however, can be understood properly only in relation to the living performance context and therefore I also utilise interviews of Sufis and performers of mystical music and recordings of samā` assemblies along with texts. First part of the thesis concentrates on thematic overview of the poems and the process of selecting a suitable text for performance. The poems are written in three languages, viz. in Persian, Urdu and Hindi. Among the authors are both Sufis and non-Sufis. The poems, mystical and non-mystical alike, share the same poetic images and they acquire a mystical meaning when they are set to qawwali music and performed in samā` assemblies. My work includes several translations of verses not previously translated. Latter part of the thesis analyses the musical idiom of qawwali and the ways in which the impact of text on listeners is intensified in performance. Typically the intensification is accomplished in the level of a single poem through three different techniques: using introductory verses, inserting verses between the verses of the main poem and repeating individual units of text. The former two techniques are tied to creating a mystical state in the listeners while the latter aims at sustaining it. It is customary that a listener enraptured by mystical experience offers a monetary contribution to the performers. Thus, intensification of the text’s impact aims at enabling the listeners to experience mystical states.
Resumo:
Objective Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates – an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates. Methods Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model. Results The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable. Conclusion The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.
Resumo:
Objective Melanoma is on the rise, especially in Caucasian populations exposed to high ultraviolet radiation such as in Australia. This paper examined the psychological components facilitating change in skin cancer prevention or early detection behaviours following a text message intervention. Methods The Queensland-based participants were 18 to 42 years old, from the Healthy Text study (N = 546). Overall, 512 (94%) participants completed the 12-month follow-up questionnaires. Following the social cognitive model, potential mediators of skin self-examination (SSE) and sun protection behaviour change were examined using stepwise logistic regression models. Results At 12-month follow-up, odds of performing an SSE in the past 12 months were mediated by baseline confidence in finding time to check skin (an outcome expectation), with a change in odds ratio of 11.9% in the SSE group versus the control group when including the mediator. Odds of greater than average sun protective habits index at 12-month follow-up were mediated by (a) an attempt to get a suntan at baseline (an outcome expectation) and (b) baseline sun protective habits index, with a change in odds ratio of 10.0% and 11.8%, respectively in the SSE group versus the control group. Conclusions Few of the suspected mediation pathways were confirmed with the exception of outcome expectations and past behaviours. Future intervention programmes could use alternative theoretical models to elucidate how improvements in health behaviours can optimally be facilitated.
Resumo:
XML documents are becoming more and more common in various environments. In particular, enterprise-scale document management is commonly centred around XML, and desktop applications as well as online document collections are soon to follow. The growing number of XML documents increases the importance of appropriate indexing methods and search tools in keeping the information accessible. Therefore, we focus on content that is stored in XML format as we develop such indexing methods. Because XML is used for different kinds of content ranging all the way from records of data fields to narrative full-texts, the methods for Information Retrieval are facing a new challenge in identifying which content is subject to data queries and which should be indexed for full-text search. In response to this challenge, we analyse the relation of character content and XML tags in XML documents in order to separate the full-text from data. As a result, we are able to both reduce the size of the index by 5-6\% and improve the retrieval precision as we select the XML fragments to be indexed. Besides being challenging, XML comes with many unexplored opportunities which are not paid much attention in the literature. For example, authors often tag the content they want to emphasise by using a typeface that stands out. The tagged content constitutes phrases that are descriptive of the content and useful for full-text search. They are simple to detect in XML documents, but also possible to confuse with other inline-level text. Nonetheless, the search results seem to improve when the detected phrases are given additional weight in the index. Similar improvements are reported when related content is associated with the indexed full-text including titles, captions, and references. Experimental results show that for certain types of document collections, at least, the proposed methods help us find the relevant answers. Even when we know nothing about the document structure but the XML syntax, we are able to take advantage of the XML structure when the content is indexed for full-text search.
Resumo:
This paper presents 'vSpeak', the first initiative taken in Pakistan for ICT enabled conversion of dynamic Sign Urdu gestures into natural language sentences. To realize this, vSpeak has adopted a novel approach for feature extraction using edge detection and image compression which gives input to the Artificial Neural Network that recognizes the gesture. This technique caters for the blurred images as well. The training and testing is currently being performed on a dataset of 200 patterns of 20 words from Sign Urdu with target accuracy of 90% and above.
Resumo:
This paper describes an approach based on Zernike moments and Delaunay triangulation for localization of hand-written text in machine printed text documents. The Zernike moments of the image are first evaluated and we classify the text as hand-written using the nearest neighbor classifier. These features are independent of size, slant, orientation, translation and other variations in handwritten text. We then use Delaunay triangulation to reclassify the misclassified text regions. When imposing Delaunay triangulation on the centroid points of the connected components, we extract features based on the triangles and reclassify the text. We remove the noise components in the document as part of the preprocessing step so this method works well on noisy documents. The success rate of the method is found to be 86%. Also for specific hand-written elements such as signatures or similar text the accuracy is found to be even higher at 93%.
Resumo:
This paper presents an overview of the 6th ALTA shared task that ran in 2015. The task was to identify in English texts all the potential cognates from the perspective of the French language. In other words, identify all the words in the English text that would acceptably translate into a similar word in French. We present the motivations for the task, the description of the data and the results of the 4 participating teams. We discuss the results against a baseline and prior work.