994 resultados para Text processing


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this demo the basic text mining technologies by using RapidMining have been reviewed. RapidMining basic characteristics and operators of text mining have been described. Text mining example by using Navie Bayes algorithm and process modeling have been revealed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

By examining Japanese fictional novels, this article will discuss how anaphoric devices (noun phrases (NPs), third person pronouns (TPPs), and zero anaphors) are selected and arranged in a given discourse. The traditional view of anaphora considers the co-referential relationship between anaphoric devices to be syntagmatic; that is, a pronoun, for example, refers back to its antecedent. It also declares the hierarchical order of information values between anaphoric devices; NPs are semantically the most informative, indicating an episode boundary, and pronouns less informative. Furthermore, zero anaphora is the most referentially transparent, showing the most accessibility of a topic. However, real text shows the contrary. NPs occur frequently while there is no apparent discourse boundary, and the same episode is continuous. This is because zero anaphors and TPPs (if they occur) break down readily due to the nature of a forthcoming sentence and the NP is reinstated, in order to continue the same topic in a given discourse. Therefore, the article opposes the traditional view of anaphora. Based on the concept of text processing, using ‘mental representations’, this article will determine certain occurrence patterns of the three anaphoric devices.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces the metaphorism pattern of relational specification and addresses how specification following this pattern can be refined into recursive programs. Metaphorisms express input-output relationships which preserve relevant information while at the same time some intended optimization takes place. Text processing, sorting, representation changers, etc., are examples of metaphorisms. The kind of metaphorism refinement proposed in this paper is a strategy known as change of virtual data structure. It gives sufficient conditions for such implementations to be calculated using relation algebra and illustrates the strategy with the derivation of quicksort as example.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Taking into account the study of Luegi (2006), where eye movements of 20 Portuguese university students while reading text passages were analyzed, in this article we discuss some methodological issues concerning eye tracking measures to evaluate reading difficulties. Relating syntactic complexity, grammaticality and ambiguity to eye movements, we will discuss the use of many different dependent variables that indicate the immediate and delayed processes in text processing. We propose a new measure that we called Progression-Path which permits analyzing, in the critical region, what happens when the reader proceeds on the sentence instead of going backwards to solve a problem that s/he found (which is the most common expected behavior but not the only one, as is illustrated by some of our examples).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objectives: The current study examined younger and older adults’ error detection accuracy, prediction calibration, and postdiction calibration on a proofreading task, to determine if age-related difference would be present in this type of common error detection task. Method: Participants were given text passages, and were first asked to predict the percentage of errors they would detect in the passage. They then read the passage and circled errors (which varied in complexity and locality), and made postdictions regarding their performance, before repeating this with another passage and answering a comprehension test of both passages. Results: There were no age-related differences in error detection accuracy, text comprehension, or metacognitive calibration, though participants in both age groups were overconfident overall in their metacognitive judgments. Both groups gave similar ratings of motivation to complete the task. The older adults rated the passages as more interesting than younger adults did, although this level of interest did not appear to influence error-detection performance. Discussion: The age equivalence in both proofreading ability and calibration suggests that the ability to proofread text passages and the associated metacognitive monitoring used in judging one’s own performance are maintained in aging. These age-related similarities persisted when younger adults completed the proofreading tasks on a computer screen, rather than with paper and pencil. The findings provide novel insights regarding the influence that cognitive aging may have on metacognitive accuracy and text processing in an everyday task.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analyses of various kinds. This paper presents a novel multidimensional projection technique based on least square approximations. The approximations compute the coordinates of a set of projected points based on the coordinates of a reduced number of control points with defined geometry. We name the technique Least Square Projections ( LSP). From an initial projection of the control points, LSP defines the positioning of their neighboring points through a numerical solution that aims at preserving a similarity relationship between the points given by a metric in mD. In order to perform the projection, a small number of distance calculations are necessary, and no repositioning of the points is required to obtain a final solution with satisfactory precision. The results show the capability of the technique to form groups of points by degree of similarity in 2D. We illustrate that capability through its application to mapping collections of textual documents from varied sources, a strategic yet difficult application. LSP is faster and more accurate than other existing high-quality methods, particularly where it was mostly tested, that is, for mapping text sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This text aims to elucidate the textual processing strategies that give meaning to chronicle "The miracle of the leaves", of Clarice Lispector (1920-1977). Therefore, it starts from the basic postulate that the meaning is not in the fabric or verbal imagery, but the meanings constructed from the elements of language that are there to meet with the reader (BRONCKART, 2009, p. 257). Justified this reading strategy, since the chronic Lispector reveals multiple meanings endowed with a language and dialogue. The text processing is discussed in terms of an audience potentially youthful element that assigns characteristics relevant to the understanding of the functioning of this type of text in relation to younger readers. To achieve the goal, we start from the assumption that chronic, being a literary text endowed with aesthetic validity, not only conveys a content, but recreates it, adding to it new meanings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The exponential increase of subjective, user-generated content since the birth of the Social Web, has led to the necessity of developing automatic text processing systems able to extract, process and present relevant knowledge. In this paper, we tackle the Opinion Retrieval, Mining and Summarization task, by proposing a unified framework, composed of three crucial components (information retrieval, opinion mining and text summarization) that allow the retrieval, classification and summarization of subjective information. An extensive analysis is conducted, where different configurations of the framework are suggested and analyzed, in order to determine which is the best one, and under which conditions. The evaluation carried out and the results obtained show the appropriateness of the individual components, as well as the framework as a whole. By achieving an improvement over 10% compared to the state-of-the-art approaches in the context of blogs, we can conclude that subjective text can be efficiently dealt with by means of our proposed framework.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El campo de procesamiento de lenguaje natural (PLN), ha tenido un gran crecimiento en los últimos años; sus áreas de investigación incluyen: recuperación y extracción de información, minería de datos, traducción automática, sistemas de búsquedas de respuestas, generación de resúmenes automáticos, análisis de sentimientos, entre otras. En este artículo se presentan conceptos y algunas herramientas con el fin de contribuir al entendimiento del procesamiento de texto con técnicas de PLN, con el propósito de extraer información relevante que pueda ser usada en un gran rango de aplicaciones. Se pueden desarrollar clasificadores automáticos que permitan categorizar documentos y recomendar etiquetas; estos clasificadores deben ser independientes de la plataforma, fácilmente personalizables para poder ser integrados en diferentes proyectos y que sean capaces de aprender a partir de ejemplos. En el presente artículo se introducen estos algoritmos de clasificación, se analizan algunas herramientas de código abierto disponibles actualmente para llevar a cabo estas tareas y se comparan diversas implementaciones utilizando la métrica F en la evaluación de los clasificadores.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present a complete system for the treatment of both geographical and temporal dimensions in text and its application to information retrieval. This system has been evaluated in both the GeoTime task of the 8th and 9th NTCIR workshop in the years 2010 and 2011 respectively, making it possible to compare the system to contemporary approaches to the topic. In order to participate in this task we have added the temporal dimension to our GIR system. The system proposed here has a modular architecture in order to add or modify features. In the development of this system, we have followed a QA-based approach as well as multi-search engines to improve the system performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we explore the use of text-mining methods for the identification of the author of a text. We apply the support vector machine (SVM) to this problem, as it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of a text. We performed a number of experiments with texts from a German newspaper. With nearly perfect reliability the SVM was able to reject other authors and detected the target author in 60–80% of the cases. In a second experiment, we ignored nouns, verbs and adjectives and replaced them by grammatical tags and bigrams. This resulted in slightly reduced performance. Author detection with SVMs on full word forms was remarkably robust even if the author wrote about different topics.

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Machine learning techniques for prediction and rule extraction from artificial neural network methods are used. The hypothesis that market sentiment and IPO specific attributes are equally responsible for first-day IPO returns in the US stock market is tested. Machine learning methods used are Bayesian classifications, support vector machines, decision tree techniques, rule learners and artificial neural networks. The outcomes of the research are predictions and rules associated With first-day returns of technology IPOs. The hypothesis that first-day returns of technology IPOs are equally determined by IPO specific and market sentiment is rejected. Instead lower yielding IPOs are determined by IPO specific and market sentiment attributes, while higher yielding IPOs are largely dependent on IPO specific attributes.