3 resultados para Uses of Language

em Massachusetts Institute of Technology


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper considers the problem of language change. Linguists must explain not only how languages are learned but also how and why they have evolved along certain trajectories and not others. While the language learning problem has focused on the behavior of individuals and how they acquire a particular grammar from a class of grammars ${cal G}$, here we consider a population of such learners and investigate the emergent, global population characteristics of linguistic communities over several generations. We argue that language change follows logically from specific assumptions about grammatical theories and learning paradigms. In particular, we are able to transform parameterized theories and memoryless acquisition algorithms into grammatical dynamical systems, whose evolution depicts a population's evolving linguistic composition. We investigate the linguistic and computational consequences of this model, showing that the formalization allows one to ask questions about diachronic that one otherwise could not ask, such as the effect of varying initial conditions on the resulting diachronic trajectories. From a more programmatic perspective, we give an example of how the dynamical system model for language change can serve as a way to distinguish among alternative grammatical theories, introducing a formal diachronic adequacy criterion for linguistic theories.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The central thesis of this report is that human language is NP-complete. That is, the process of comprehending and producing utterances is bounded above by the class NP, and below by NP-hardness. This constructive complexity thesis has two empirical consequences. The first is to predict that a linguistic theory outside NP is unnaturally powerful. The second is to predict that a linguistic theory easier than NP-hard is descriptively inadequate. To prove the lower bound, I show that the following three subproblems of language comprehension are all NP-hard: decide whether a given sound is possible sound of a given language; disambiguate a sequence of words; and compute the antecedents of pronouns. The proofs are based directly on the empirical facts of the language user's knowledge, under an appropriate idealization. Therefore, they are invariant across linguistic theories. (For this reason, no knowledge of linguistic theory is needed to understand the proofs, only knowledge of English.) To illustrate the usefulness of the upper bound, I show that two widely-accepted analyses of the language user's knowledge (of syntactic ellipsis and phonological dependencies) lead to complexity outside of NP (PSPACE-hard and Undecidable, respectively). Next, guided by the complexity proofs, I construct alternate linguisitic analyses that are strictly superior on descriptive grounds, as well as being less complex computationally (in NP). The report also presents a new framework for linguistic theorizing, that resolves important puzzles in generative linguistics, and guides the mathematical investigation of human language.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.