6 resultados para unknown words
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.
Resumo:
The diagnosis of T-cell large granular lymphocytic leukemia in association with other B-cell disorders is uncommon but not unknown. However, the concomitant presence of three hematological diseases is extraordinarily rare. We report an 88-year-old male patient with three simultaneous clonal disorders, that is, CD4+/CD8(weak) T-cell large granular lymphocytic leukemia, monoclonal gammopathy of unknown significance and monoclonal B-cell lymphocytosis. The patient has only minimal complaints and has no anemia, neutropenia or thrombocytopenia. Lymphadenopathy and hepatosplenomegaly were not present. The three disorders were characterized by flow cytometry analysis, and the clonality of the T-cell large granular lymphocytic leukemia was confirmed by polymerase chain reaction. Interestingly, the patient has different B-cell clones, given that plasma cells of monoclonal gammopathy of unknown significance exhibited a kappa light-chain restriction population and, on the other hand, B-lymphocytes of monoclonal B-cell lymphocytosis exhibited a lambda light-chain restriction population. This finding does not support the antigen-driven hypothesis for the development of multi-compartment diseases, but suggests that T-cell large granular lymphocytic expansion might represent a direct antitumor immunological response to both B-cell and plasma-cell aberrant populations, as part of the immune surveillance against malignant neoplasms.
Resumo:
In this paper we have quantified the consistency of word usage in written texts represented by complex networks, where words were taken as nodes, by measuring the degree of preservation of the node neighborhood. Words were considered highly consistent if the authors used them with the same neighborhood. When ranked according to the consistency of use, the words obeyed a log-normal distribution, in contrast to Zipf's law that applies to the frequency of use. Consistency correlated positively with the familiarity and frequency of use, and negatively with ambiguity and age of acquisition. An inspection of some highly consistent words confirmed that they are used in very limited semantic contexts. A comparison of consistency indices for eight authors indicated that these indices may be employed for author recognition. Indeed, as expected, authors of novels could be distinguished from those who wrote scientific texts. Our analysis demonstrated the suitability of the consistency indices, which can now be applied in other tasks, such as emotion recognition.
Resumo:
Studies investigating factors that influence tone recognition generally use recognition tests, whereas the majority of the studies on verbal material use self-generated responses in the form of serial recall tests. In the present study we intended to investigate whether tonal and verbal materials share the same cognitive mechanisms, by presenting an experimental instrument that evaluates short-term and working memories for tones, using self-generated sung responses that may be compared to verbal tests. This paradigm was designed according to the same structure of the forward and backward digit span tests, but using digits, pseudowords, and tones as stimuli. The profile of amateur singers and professional singers in these tests was compared in forward and backward digit, pseudoword, tone, and contour spans. In addition, an absolute pitch experimental group was included, in order to observe the possible use of verbal labels in tone memorization tasks. In general, we observed that musical schooling has a slight positive influence on the recall of tones, as opposed to verbal material, which is not influenced by musical schooling. Furthermore, the ability to reproduce melodic contours (up and down patterns) is generally higher than the ability to reproduce exact tone sequences. However, backward spans were lower than forward spans for all stimuli (digits, pseudowords, tones, contour). Curiously, backward spans were disproportionately lower for tones than for verbal material-that is, the requirement to recall sequences in backward rather than forward order seems to differentially affect tonal stimuli. This difference does not vary according to musical expertise.
Resumo:
The purpose of this study was to investigate the exchange of disfluencies from function words to content words with age in Brazilian Portuguese speakers who do and do not stutter. Ninety stuttering individuals and 90 controls, native speakers of Brazilian Portuguese, were divided into three age groups (children, adolescents and adults). The study method involved analyzing the occurrence of stuttering on content and function words based on spontaneous speech samples. Results indicated that children tend to be more disfluent on function words. With the increase in age, teenagers and adults who stutter presented a higher number of disfluencies on content words. These findings support the current literature, indicating that with the aging process, there is an exchange of disfluencies from function to content words. This shift in the disfluency pattern may account for a more advanced type of stuttering. The study also demonstrated that disfluencies in Portuguese speakers follow the same pattern of shifting from function to content words with age as for English speakers.