994 resultados para 280205 Text Processing


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Clinical text understanding (CTU) is of interest to health informatics because critical clinical information frequently represented as unconstrained text in electronic health records are extensively used by human experts to guide clinical practice, decision making, and to document delivery of care, but are largely unusable by information systems for queries and computations. Recent initiatives advocating for translational research call for generation of technologies that can integrate structured clinical data with unstructured data, provide a unified interface to all data, and contextualize clinical information for reuse in multidisciplinary and collaborative environment envisioned by CTSA program. This implies that technologies for the processing and interpretation of clinical text should be evaluated not only in terms of their validity and reliability in their intended environment, but also in light of their interoperability, and ability to support information integration and contextualization in a distributed and dynamic environment. This vision adds a new layer of information representation requirements that needs to be accounted for when conceptualizing implementation or acquisition of clinical text processing tools and technologies for multidisciplinary research. On the other hand, electronic health records frequently contain unconstrained clinical text with high variability in use of terms and documentation practices, and without commitmentto grammatical or syntactic structure of the language (e.g. Triage notes, physician and nurse notes, chief complaints, etc). This hinders performance of natural language processing technologies which typically rely heavily on the syntax of language and grammatical structure of the text. This document introduces our method to transform unconstrained clinical text found in electronic health information systems to a formal (computationally understandable) representation that is suitable for querying, integration, contextualization and reuse, and is resilient to the grammatical and syntactic irregularities of the clinical text. We present our design rationale, method, and results of evaluation in processing chief complaints and triage notes from 8 different emergency departments in Houston Texas. At the end, we will discuss significance of our contribution in enabling use of clinical text in a practical bio-surveillance setting.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper describes a module for the prediction of emotions in text chats in Spanish, oriented to its use in specific-domain text-to-speech systems. A general overview of the system is given, and the results of some evaluations carried out with two corpora of real chat messages are described. These results seem to indicate that this system offers a performance similar to other systems described in the literature, for a more complex task than other systems (identification of emotions and emotional intensity in the chat domain).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the animal behaviour domain. Our objective was to see how much could be done in a simple and relatively rapid manner using a corpus of journal papers. We used a sequence of pre-existing text processing steps, and here describe the different choices made to clean the input, to derive a set of terms and to structure those terms in a number of hierarchies. We describe some of the challenges, especially that of focusing the ontology appropriately given a starting point of a heterogeneous corpus. Results - Using mainly automated techniques, we were able to construct an 18055 term ontology-like structure with 73% recall of animal behaviour terms, but a precision of only 26%. We were able to clean unwanted terms from the nascent ontology using lexico-syntactic patterns that tested the validity of term inclusion within the ontology. We used the same technique to test for subsumption relationships between the remaining terms to add structure to the initially broad and shallow structure we generated. All outputs are available at http://thirlmere.aston.ac.uk/~kiffer/animalbehaviour/ webcite. Conclusion - We present a systematic method for the initial steps of ontology or structured vocabulary construction for scientific domains that requires limited human effort and can make a contribution both to ontology learning and maintenance. The method is useful both for the exploration of a scientific domain and as a stepping stone towards formally rigourous ontologies. The filtering of recognised terms from a heterogeneous corpus to focus upon those that are the topic of the ontology is identified to be one of the main challenges for research in ontology learning.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the Animal Behaviour domain. Our objective was to see how much could be done in a simple and rapid manner using a corpus of journal papers. We used a sequence of text processing steps, and describe the different choices made to clean the input, to derive a set of terms and to structure those terms in a hierarchy. We were able in a very short space of time to construct a 17000 term ontology with a high percentage of suitable terms. We describe some of the challenges, especially that of focusing the ontology appropriately given a starting point of a heterogeneous corpus.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The paper informs about the history of manuscript digitization in the National Library of the Czech Republic as well as about other issues concerning processing of manuscripts. The main consequence of the massive digitization and record and/or full text processing is a paradigm shift leading to the digital history.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated the model of the analysis of the text of the technical project is submitted, the attribute grammar of a technical specification, intended for formalization of limited Russian is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical project as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consists of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated a technique of the text analysis of a technical specification is submitted, the expanded fuzzy attribute grammar of a technical specification, intended for formalization of limited Russian language is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical specification as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consist of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this demo the basic text mining technologies by using RapidMining have been reviewed. RapidMining basic characteristics and operators of text mining have been described. Text mining example by using Navie Bayes algorithm and process modeling have been revealed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

By examining Japanese fictional novels, this article will discuss how anaphoric devices (noun phrases (NPs), third person pronouns (TPPs), and zero anaphors) are selected and arranged in a given discourse. The traditional view of anaphora considers the co-referential relationship between anaphoric devices to be syntagmatic; that is, a pronoun, for example, refers back to its antecedent. It also declares the hierarchical order of information values between anaphoric devices; NPs are semantically the most informative, indicating an episode boundary, and pronouns less informative. Furthermore, zero anaphora is the most referentially transparent, showing the most accessibility of a topic. However, real text shows the contrary. NPs occur frequently while there is no apparent discourse boundary, and the same episode is continuous. This is because zero anaphors and TPPs (if they occur) break down readily due to the nature of a forthcoming sentence and the NP is reinstated, in order to continue the same topic in a given discourse. Therefore, the article opposes the traditional view of anaphora. Based on the concept of text processing, using ‘mental representations’, this article will determine certain occurrence patterns of the three anaphoric devices.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper introduces the metaphorism pattern of relational specification and addresses how specification following this pattern can be refined into recursive programs. Metaphorisms express input-output relationships which preserve relevant information while at the same time some intended optimization takes place. Text processing, sorting, representation changers, etc., are examples of metaphorisms. The kind of metaphorism refinement proposed in this paper is a strategy known as change of virtual data structure. It gives sufficient conditions for such implementations to be calculated using relation algebra and illustrates the strategy with the derivation of quicksort as example.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Taking into account the study of Luegi (2006), where eye movements of 20 Portuguese university students while reading text passages were analyzed, in this article we discuss some methodological issues concerning eye tracking measures to evaluate reading difficulties. Relating syntactic complexity, grammaticality and ambiguity to eye movements, we will discuss the use of many different dependent variables that indicate the immediate and delayed processes in text processing. We propose a new measure that we called Progression-Path which permits analyzing, in the critical region, what happens when the reader proceeds on the sentence instead of going backwards to solve a problem that s/he found (which is the most common expected behavior but not the only one, as is illustrated by some of our examples).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objectives: The current study examined younger and older adults’ error detection accuracy, prediction calibration, and postdiction calibration on a proofreading task, to determine if age-related difference would be present in this type of common error detection task. Method: Participants were given text passages, and were first asked to predict the percentage of errors they would detect in the passage. They then read the passage and circled errors (which varied in complexity and locality), and made postdictions regarding their performance, before repeating this with another passage and answering a comprehension test of both passages. Results: There were no age-related differences in error detection accuracy, text comprehension, or metacognitive calibration, though participants in both age groups were overconfident overall in their metacognitive judgments. Both groups gave similar ratings of motivation to complete the task. The older adults rated the passages as more interesting than younger adults did, although this level of interest did not appear to influence error-detection performance. Discussion: The age equivalence in both proofreading ability and calibration suggests that the ability to proofread text passages and the associated metacognitive monitoring used in judging one’s own performance are maintained in aging. These age-related similarities persisted when younger adults completed the proofreading tasks on a computer screen, rather than with paper and pencil. The findings provide novel insights regarding the influence that cognitive aging may have on metacognitive accuracy and text processing in an everyday task.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analyses of various kinds. This paper presents a novel multidimensional projection technique based on least square approximations. The approximations compute the coordinates of a set of projected points based on the coordinates of a reduced number of control points with defined geometry. We name the technique Least Square Projections ( LSP). From an initial projection of the control points, LSP defines the positioning of their neighboring points through a numerical solution that aims at preserving a similarity relationship between the points given by a metric in mD. In order to perform the projection, a small number of distance calculations are necessary, and no repositioning of the points is required to obtain a final solution with satisfactory precision. The results show the capability of the technique to form groups of points by degree of similarity in 2D. We illustrate that capability through its application to mapping collections of textual documents from varied sources, a strategic yet difficult application. LSP is faster and more accurate than other existing high-quality methods, particularly where it was mostly tested, that is, for mapping text sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This text aims to elucidate the textual processing strategies that give meaning to chronicle "The miracle of the leaves", of Clarice Lispector (1920-1977). Therefore, it starts from the basic postulate that the meaning is not in the fabric or verbal imagery, but the meanings constructed from the elements of language that are there to meet with the reader (BRONCKART, 2009, p. 257). Justified this reading strategy, since the chronic Lispector reveals multiple meanings endowed with a language and dialogue. The text processing is discussed in terms of an audience potentially youthful element that assigns characteristics relevant to the understanding of the functioning of this type of text in relation to younger readers. To achieve the goal, we start from the assumption that chronic, being a literary text endowed with aesthetic validity, not only conveys a content, but recreates it, adding to it new meanings.