23 resultados para Natural language techniques, Semantic spaces, Random projection, Documents

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Natural Language Processing (NLP) symbolic systems, several linguistic phenomena, for instance, the thematic role relationships between sentence constituents, such as AGENT, PATIENT, and LOCATION, can be accounted for by the employment of a rule-based grammar. Another approach to NLP concerns the use of the connectionist model, which has the benefits of learning, generalization and fault tolerance, among others. A third option merges the two previous approaches into a hybrid one: a symbolic thematic theory is used to supply the connectionist network with initial knowledge. Inspired on neuroscience, it is proposed a symbolic-connectionist hybrid system called BIO theta PRED (BIOlogically plausible thematic (theta) symbolic-connectionist PREDictor), designed to reveal the thematic grid assigned to a sentence. Its connectionist architecture comprises, as input, a featural representation of the words (based on the verb/noun WordNet classification and on the classical semantic microfeature representation), and, as output, the thematic grid assigned to the sentence. BIO theta PRED is designed to ""predict"" thematic (semantic) roles assigned to words in a sentence context, employing biologically inspired training algorithm and architecture, and adopting a psycholinguistic view of thematic theory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is about the use of natural language to communicate with computers. Most researches that have pursued this goal consider only requests expressed in English. A way to facilitate the use of several languages in natural language systems is by using an interlingua. An interlingua is an intermediary representation for natural language information that can be processed by machines. We propose to convert natural language requests into an interlingua [universal networking language (UNL)] and to execute these requests using software components. In order to achieve this goal, we propose OntoMap, an ontology-based architecture to perform the semantic mapping between UNL sentences and software components. OntoMap also performs component search and retrieval based on semantic information formalized in ontologies and rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scenarios for the emergence or bootstrap of a lexicon involve the repeated interaction between at least two agents who must reach a consensus on how to name N objects using H words. Here we consider minimal models of two types of learning algorithms: cross-situational learning, in which the individuals determine the meaning of a word by looking for something in common across all observed uses of that word, and supervised operant conditioning learning, in which there is strong feedback between individuals about the intended meaning of the words. Despite the stark differences between these learning schemes, we show that they yield the same communication accuracy in the limits of large N and H, which coincides with the result of the classical occupancy problem of randomly assigning N objects to H words.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An implementation of a computational tool to generate new summaries from new source texts is presented, by means of the connectionist approach (artificial neural networks). Among other contributions that this work intends to bring to natural language processing research, the use of a more biologically plausible connectionist architecture and training for automatic summarization is emphasized. The choice relies on the expectation that it may bring an increase in computational efficiency when compared to the sa-called biologically implausible algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach for assisting low-literacy readers in accessing Web online information. The oEducational FACILITAo tool is a Web content adaptation tool that provides innovative features and follows more intuitive interaction models regarding accessibility concerns. Especially, we propose an interaction model and a Web application that explore the natural language processing tasks of lexical elaboration and named entity labeling for improving Web accessibility. We report on the results obtained from a pilot study on usability analysis carried out with low-literacy users. The preliminary results show that oEducational FACILITAo improves the comprehension of text elements, although the assistance mechanisms might also confuse users when word sense ambiguity is introduced, by gathering, for a complex word, a list of synonyms with multiple meanings. This fact evokes a future solution in which the correct sense for a complex word in a sentence is identified, solving this pervasive characteristic of natural languages. The pilot study also identified that experienced computer users find the tool to be more useful than novice computer users do.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using a new proposal for the ""picture lowering"" operators, we compute the tree level scattering amplitude in the minimal pure spinor formalism by performing the integration over the pure spinor space as a multidimensional Cauchy-type integral. The amplitude will be written in terms of the projective pure spinor variables, which turns out to be useful to relate rigorously the minimal and non-minimal versions of the pure spinor formalism. The natural language for relating these formalisms is the. Cech-Dolbeault isomorphism. Moreover, the Dolbeault cocycle corresponding to the tree-level scattering amplitude must be evaluated in SO(10)/SU(5) instead of the whole pure spinor space, which means that the origin is removed from this space. Also, the. Cech-Dolbeault language plays a key role for proving the invariance of the scattering amplitude under BRST, Lorentz and supersymmetry transformations, as well as the decoupling of unphysical states. We also relate the Green`s function for the massless scalar field in ten dimensions to the tree-level scattering amplitude and comment about the scattering amplitude at higher orders. In contrast with the traditional picture lowering operators, with our new proposal the tree level scattering amplitude is independent of the constant spinors introduced to define them and the BRST exact terms decouple without integrating over these constant spinors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by NIT tools can be distinguished from their manual counterparts by means of metrics such as in-(ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better NIT tools and automatic evaluation metrics.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recent advances in the control of molecular engineering architectures have allowed unprecedented ability of molecular recognition in biosensing, with a promising impact for clinical diagnosis and environment control. The availability of large amounts of data from electrical, optical, or electrochemical measurements requires, however, sophisticated data treatment in order to optimize sensing performance. In this study, we show how an information visualization system based on projections, referred to as Projection Explorer (PEx), can be used to achieve high performance for biosensors made with nanostructured films containing immobilized antigens. As a proof of concept, various visualizations were obtained with impedance spectroscopy data from an array of sensors whose electrical response could be specific toward a given antibody (analyte) owing to molecular recognition processes. In addition to discussing the distinct methods for projection and normalization of the data, we demonstrate that an excellent distinction can be made between real samples tested positive for Chagas disease and Leishmaniasis, which could not be achieved with conventional statistical methods. Such high performance probably arose from the possibility of treating the data in the whole frequency range. Through a systematic analysis, it was inferred that Sammon`s mapping with standardization to normalize the data gives the best results, where distinction could be made of blood serum samples containing 10(-7) mg/mL of the antibody. The method inherent in PEx and the procedures for analyzing the impedance data are entirely generic and can be extended to optimize any type of sensor or biosensor.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analyses of various kinds. This paper presents a novel multidimensional projection technique based on least square approximations. The approximations compute the coordinates of a set of projected points based on the coordinates of a reduced number of control points with defined geometry. We name the technique Least Square Projections ( LSP). From an initial projection of the control points, LSP defines the positioning of their neighboring points through a numerical solution that aims at preserving a similarity relationship between the points given by a metric in mD. In order to perform the projection, a small number of distance calculations are necessary, and no repositioning of the points is required to obtain a final solution with satisfactory precision. The results show the capability of the technique to form groups of points by degree of similarity in 2D. We illustrate that capability through its application to mapping collections of textual documents from varied sources, a strategic yet difficult application. LSP is faster and more accurate than other existing high-quality methods, particularly where it was mostly tested, that is, for mapping text sets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

OWL-S is an application of OWL, the Web Ontology Language, that describes the semantics of Web Services so that their discovery, selection, invocation and composition can be automated. The research literature reports the use of UML diagrams for the automatic generation of Semantic Web Service descriptions in OWL-S. This paper demonstrates a higher level of automation by generating complete complete Web applications from OWL-S descriptions that have themselves been generated from UML. Previously, we proposed an approach for processing OWL-S descriptions in order to produce MVC-based skeletons for Web applications. The OWL-S ontology undergoes a series of transformations in order to generate a Model-View-Controller application implemented by a combination of Java Beans, JSP, and Servlets code, respectively. In this paper, we show in detail the documents produced at each processing step. We highlight the connections between OWL-S specifications and executable code in the various Java dialects and show the Web interfaces that result from this process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

TEMA: a produção da fala nas modalidades de reabilitação oral protética. OBJETIVO: verificar se o tipo de reabilitação oral interfere na produção da fala. MÉTODO: 36 idosos (média = 68 anos), divididos em 3 grupos, foram avaliados: 13 com dentes naturais (A), 13 com prótese total mucosossuportada superior e inferior (B) e 10 com prótese total mucosossuportada superior e implantossuportada inferior (C). A estabilidade das próteses foi avaliada por um dentista e amostras de fala foram analisadas por 5 fonoaudiólogos. Para determinar a freqüência de alteração dos sons da fala utilizou-se o cálculo da Porcentagem de Consoantes Corretas (PCC). RESULTADOS: observou-se poucos casos com alteração de fala, com maior freqüência no grupo C (23,08%), sendo a articulação travada presente em todos os grupos, a redução dos movimentos labiais em dois grupos (A e B) e a articulação exagerada e a falta de controle salivar em um dos grupos (C e B). Quanto à PCC, menor valor foi observado para os fones linguodentais nos grupos B e C (maior ocorrência de alteração), seguido dos fones alveolares, predominando casos sem alteração no grupo A, contrariamente aos demais grupos, sendo a projeção lingual e o ceceio as alterações mais encontradas. Não houve diferença entre os grupos e a maioria do grupo B estava com a prótese inferior insatisfatória, não havendo associação entre alteração de fala e prótese insatisfatória. CONCLUSÃO: apesar da amostra pequena, indivíduos reabilitados com prótese total apresentam alteração nos fones linguodentais e alveolares e o tipo de prótese, bem como a estabilidade desta parece não interferir na produção da fala.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this study is to describe preliminary results from the cross-cultural adaptation of the Quality of Life Assessment Questionnaire, used to measure health related quality of life (HRQL) in Brazilian children aged between 5 and 11 with HIV/AIDS. The cross-cultural model evaluated the Concept, Item, Semantic and Measurement Equivalences (internal consistency and intra-observer reliability). Evaluation of the conceptual, item, semantic equivalences showed that the Portuguese version is pertinent for the Brazilian context. Four of seven domains showed internal consistency above 0.70 (α: 0.76-0.90) and five of seven revealed intra-observer reliability (ricc: 0.41-0.70). This first Portuguese version of the HRQL questionnaire can be understood as a valuable tool for assessing children's HRQL, but further studies with large samples and more robust analyses are recommended before use in the Brazilian context.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work is part of a research under construction since 2000, in which the main objective is to measure small dynamic displacements by using L1 GPS receivers. A very sensible way to detect millimetric periodic displacements is based on the Phase Residual Method (PRM). This method is based on the frequency domain analysis of the phase residuals resulted from the L1 double difference static data processing of two satellites in almost orthogonal elevation angle. In this article, it is proposed to obtain the phase residuals directly from the raw phase observable collected in a short baseline during a limited time span, in lieu of obtaining the residual data file from regular GPS processing programs which not always allow the choice of the aimed satellites. In order to improve the ability to detect millimetric oscillations, two filtering techniques are introduced. One is auto-correlation which reduces the phase noise with random time behavior. The other is the running mean to separate low frequency from the high frequency phase sources. Two trials have been carried out to verify the proposed method and filtering techniques. One simulates a 2.5 millimeter vertical antenna displacement and the second uses the GPS data collected during a bridge load test. The results have shown a good consistency to detect millimetric oscillations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.