35 resultados para Natural language processing (Computer science) -- TFC

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach for assisting low-literacy readers in accessing Web online information. The oEducational FACILITAo tool is a Web content adaptation tool that provides innovative features and follows more intuitive interaction models regarding accessibility concerns. Especially, we propose an interaction model and a Web application that explore the natural language processing tasks of lexical elaboration and named entity labeling for improving Web accessibility. We report on the results obtained from a pilot study on usability analysis carried out with low-literacy users. The preliminary results show that oEducational FACILITAo improves the comprehension of text elements, although the assistance mechanisms might also confuse users when word sense ambiguity is introduced, by gathering, for a complex word, a list of synonyms with multiple meanings. This fact evokes a future solution in which the correct sense for a complex word in a sentence is identified, solving this pervasive characteristic of natural languages. The pilot study also identified that experienced computer users find the tool to be more useful than novice computer users do.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Natural Language Processing (NLP) symbolic systems, several linguistic phenomena, for instance, the thematic role relationships between sentence constituents, such as AGENT, PATIENT, and LOCATION, can be accounted for by the employment of a rule-based grammar. Another approach to NLP concerns the use of the connectionist model, which has the benefits of learning, generalization and fault tolerance, among others. A third option merges the two previous approaches into a hybrid one: a symbolic thematic theory is used to supply the connectionist network with initial knowledge. Inspired on neuroscience, it is proposed a symbolic-connectionist hybrid system called BIO theta PRED (BIOlogically plausible thematic (theta) symbolic-connectionist PREDictor), designed to reveal the thematic grid assigned to a sentence. Its connectionist architecture comprises, as input, a featural representation of the words (based on the verb/noun WordNet classification and on the classical semantic microfeature representation), and, as output, the thematic grid assigned to the sentence. BIO theta PRED is designed to ""predict"" thematic (semantic) roles assigned to words in a sentence context, employing biologically inspired training algorithm and architecture, and adopting a psycholinguistic view of thematic theory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is about the use of natural language to communicate with computers. Most researches that have pursued this goal consider only requests expressed in English. A way to facilitate the use of several languages in natural language systems is by using an interlingua. An interlingua is an intermediary representation for natural language information that can be processed by machines. We propose to convert natural language requests into an interlingua [universal networking language (UNL)] and to execute these requests using software components. In order to achieve this goal, we propose OntoMap, an ontology-based architecture to perform the semantic mapping between UNL sentences and software components. OntoMap also performs component search and retrieval based on semantic information formalized in ontologies and rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An implementation of a computational tool to generate new summaries from new source texts is presented, by means of the connectionist approach (artificial neural networks). Among other contributions that this work intends to bring to natural language processing research, the use of a more biologically plausible connectionist architecture and training for automatic summarization is emphasized. The choice relies on the expectation that it may bring an increase in computational efficiency when compared to the sa-called biologically implausible algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by NIT tools can be distinguished from their manual counterparts by means of metrics such as in-(ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better NIT tools and automatic evaluation metrics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The amount of textual information digitally stored is growing every day. However, our capability of processing and analyzing that information is not growing at the same pace. To overcome this limitation, it is important to develop semiautomatic processes to extract relevant knowledge from textual information, such as the text mining process. One of the main and most expensive stages of the text mining process is the text pre-processing stage, where the unstructured text should be transformed to structured format such as an attribute-value table. The stemming process, i.e. linguistics normalization, is usually used to find the attributes of this table. However, the stemming process is strongly dependent on the language in which the original textual information is given. Furthermore, for most languages, the stemming algorithms proposed in the literature are computationally expensive. In this work, several improvements of the well know Porter stemming algorithm for the Portuguese language, which explore the characteristics of this language, are proposed. Experimental results show that the proposed algorithm executes in far less time without affecting the quality of the generated stems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of this paper is to study and propose a new technique for noise reduction used during the reconstruction of speech signals, particularly for biomedical applications. The proposed method is based on Kalman filtering in the time domain combined with spectral subtraction. Comparison with discrete Kalman filter in the frequency domain shows better performance of the proposed technique. The performance is evaluated by using the segmental signal-to-noise ratio and the Itakura-Saito`s distance. Results have shown that Kalman`s filter in time combined with spectral subtraction is more robust and efficient, improving the Itakura-Saito`s distance by up to four times. (C) 2007 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a simple high-level programming language, endowed with resources that help encoding self-modifying programs. With this purpose, a conventional imperative language syntax (not explicitly stated in this paper) is incremented with special commands and statements forming an adaptive layer specially designed with focus on the dynamical changes to be applied to the code at run-time. The resulting language allows programmers to easily specify dynamic changes to their own program`s code. Such a language succeeds to allow programmers to effortless describe the dynamic logic of their adaptive applications. In this paper, we describe the most important aspects of the design and implementation of such a language. A small example is finally presented for illustration purposes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most post-processors for boundary element (BE) analysis use an auxiliary domain mesh to display domain results, working against the profitable modelling process of a pure boundary discretization. This paper introduces a novel visualization technique which preserves the basic properties of the boundary element methods. The proposed algorithm does not require any domain discretization and is based on the direct and automatic identification of isolines. Another critical aspect of the visualization of domain results in BE analysis is the effort required to evaluate results in interior points. In order to tackle this issue, the present article also provides a comparison between the performance of two different BE formulations (conventional and hybrid). In addition, this paper presents an overview of the most common post-processing and visualization techniques in BE analysis, such as the classical algorithms of scan line and the interpolation over a domain discretization. The results presented herein show that the proposed algorithm offers a very high performance compared with other visualization procedures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mutation testing has been used to assess the quality of test case suites by analyzing the ability in distinguishing the artifact under testing from a set of alternative artifacts, the so-called mutants. The mutants are generated from the artifact under testing by applying a set of mutant operators, which produce artifacts with simple syntactical differences. The mutant operators are usually based on typical errors that occur during the software development and can be related to a fault model. In this paper, we propose a language-named MuDeL (MUtant DEfinition Language)-for the definition of mutant operators, aiming not only at automating the mutant generation, but also at providing precision and formality to the operator definition. The proposed language is based on concepts from transformational and logical programming paradigms, as well as from context-free grammar theory. Denotational semantics formal framework is employed to define the semantics of the MuDeL language. We also describe a system-named mudelgen-developed to support the use of this language. An executable representation of the denotational semantics of the language is used to check the correctness of the implementation of mudelgen. At the very end, a mutant generator module is produced, which can be incorporated into a specific mutant tool/environment. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OWL-S is an application of OWL, the Web Ontology Language, that describes the semantics of Web Services so that their discovery, selection, invocation and composition can be automated. The research literature reports the use of UML diagrams for the automatic generation of Semantic Web Service descriptions in OWL-S. This paper demonstrates a higher level of automation by generating complete complete Web applications from OWL-S descriptions that have themselves been generated from UML. Previously, we proposed an approach for processing OWL-S descriptions in order to produce MVC-based skeletons for Web applications. The OWL-S ontology undergoes a series of transformations in order to generate a Model-View-Controller application implemented by a combination of Java Beans, JSP, and Servlets code, respectively. In this paper, we show in detail the documents produced at each processing step. We highlight the connections between OWL-S specifications and executable code in the various Java dialects and show the Web interfaces that result from this process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The relationship between thought and language and, in particular, the issue of whether and how language influences thought is still a matter of fierce debate. Here we consider a discrimination task scenario to study language acquisition in which an agent receives linguistic input from an external teacher, in addition to sensory stimuli from the objects that exemplify the overlapping categories that make up the environment. Sensory and linguistic input signals are fused using the Neural Modelling Fields (NMF) categorization algorithm. We find that the agent with language is capable of differentiating object features that it could not distinguish without language. In this sense, the linguistic stimuli prompt the agent to redefine and refine the discrimination capacity of its sensory channels. (C) 2007 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a new module of the expert system SISTEMAT used for the prediction of the skeletons of neolignans by (13)C NMR, (1)H NMR and botanical data obtained from the literature. SISTEMAT is composed of MACRONO, SISCONST, C13MACH, H1MACH and SISOCBOT programs, each analyzing data of the neolignan in question to predict the carbon skeleton of the compound. From these results, the global probability is computed and the most probable skeleton predicted. SISTEMAT predicted the skeletons of 75% of the 20 neolignans tested, in a rapid and simple procedure demonstrating its advantage for the structural elucidation of new compounds.