3 resultados para LINGUISTICS

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The amount of textual information digitally stored is growing every day. However, our capability of processing and analyzing that information is not growing at the same pace. To overcome this limitation, it is important to develop semiautomatic processes to extract relevant knowledge from textual information, such as the text mining process. One of the main and most expensive stages of the text mining process is the text pre-processing stage, where the unstructured text should be transformed to structured format such as an attribute-value table. The stemming process, i.e. linguistics normalization, is usually used to find the attributes of this table. However, the stemming process is strongly dependent on the language in which the original textual information is given. Furthermore, for most languages, the stemming algorithms proposed in the literature are computationally expensive. In this work, several improvements of the well know Porter stemming algorithm for the Portuguese language, which explore the characteristics of this language, are proposed. Experimental results show that the proposed algorithm executes in far less time without affecting the quality of the generated stems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The issue of how children learn the meaning of words is fundamental to developmental psychology. The recent attempts to develop or evolve efficient communication protocols among interacting robots or Virtual agents have brought that issue to a central place in more applied research fields, such as computational linguistics and neural networks, as well. An attractive approach to learning an object-word mapping is the so-called cross-situational learning. This learning scenario is based on the intuitive notion that a learner can determine the meaning of a word by finding something in common across all observed uses of that word. Here we show how the deterministic Neural Modeling Fields (NMF) categorization mechanism can be used by the learner as an efficient algorithm to infer the correct object-word mapping. To achieve that we first reduce the original on-line learning problem to a batch learning problem where the inputs to the NMF mechanism are all possible object-word associations that Could be inferred from the cross-situational learning scenario. Since many of those associations are incorrect, they are considered as clutter or noise and discarded automatically by a clutter detector model included in our NMF implementation. With these two key ingredients - batch learning and clutter detection - the NMF mechanism was capable to infer perfectly the correct object-word mapping. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Scenarios for the emergence or bootstrap of a lexicon involve the repeated interaction between at least two agents who must reach a consensus on how to name N objects using H words. Here we consider minimal models of two types of learning algorithms: cross-situational learning, in which the individuals determine the meaning of a word by looking for something in common across all observed uses of that word, and supervised operant conditioning learning, in which there is strong feedback between individuals about the intended meaning of the words. Despite the stark differences between these learning schemes, we show that they yield the same communication accuracy in the limits of large N and H, which coincides with the result of the classical occupancy problem of randomly assigning N objects to H words.