7 resultados para Bag-of-words
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
In this paper we have quantified the consistency of word usage in written texts represented by complex networks, where words were taken as nodes, by measuring the degree of preservation of the node neighborhood. Words were considered highly consistent if the authors used them with the same neighborhood. When ranked according to the consistency of use, the words obeyed a log-normal distribution, in contrast to Zipf's law that applies to the frequency of use. Consistency correlated positively with the familiarity and frequency of use, and negatively with ambiguity and age of acquisition. An inspection of some highly consistent words confirmed that they are used in very limited semantic contexts. A comparison of consistency indices for eight authors indicated that these indices may be employed for author recognition. Indeed, as expected, authors of novels could be distinguished from those who wrote scientific texts. Our analysis demonstrated the suitability of the consistency indices, which can now be applied in other tasks, such as emotion recognition.
Resumo:
While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.
Resumo:
The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information retrieval, and represents a key step for developing the so-called Semantic Web. Humans disambiguate words in a straightforward fashion, but this does not apply to computers. In this paper we address the problem of Word Sense Disambiguation (WSD) by treating texts as complex networks, and show that word senses can be distinguished upon characterizing the local structure around ambiguous words. Our goal was not to obtain the best possible disambiguation system, but we nevertheless found that in half of the cases our approach outperforms traditional shallow methods. We show that the hierarchical connectivity and clustering of words are usually the most relevant features for WSD. The results reported here shed light on the relationship between semantic and structural parameters of complex networks. They also indicate that when combined with traditional techniques the complex network approach may be useful to enhance the discrimination of senses in large texts. Copyright (C) EPLA, 2012
Resumo:
Dado o âmbito multidimensional do implante coclear, há crescente necessidade em avaliar não somente medidas clínicas de eficácia relacionadas às habilidades comunicativas, mas também aspectos mais genéricos envolvidos na efetividade do tratamento, como a qualidade de vida. OBJETIVOS: Tradução e adaptação de questionário internacional para o Português Brasileiro; análise das correlações entre fatores relacionados à qualidade de vida; análise das correlações entre qualidade de vida e medidas clínicas de resultado. MATERIAL E MÉTODO: Estudo prospectivo realizado com pais de crianças com implante coclear consistindo na aplicação de instrumentos validados para avaliar aspectos de qualidade de vida e habilidades comunicativas. RESULTADOS: A tradução e adaptação cultural do questionário foi satisfatoriamente realizada e este estudo proporciona a disponibilização do questionário em versão para o Português Brasileiro. Pelos dados obtidos, o implante coclear apresentou efeito positivo na qualidade de vida das crianças implantadas e de suas famílias. As correlações observadas para a variável comunicação demonstram uma relação direta entre comunicação oral e outras variáveis de qualidade de vida. CONCLUSÃO: Este estudo disponibiliza o questionário em versão para o Português Brasileiro. Para os pais de crianças brasileiras usuárias de implante coclear, a habilidade lexical (aquisição e uso das palavras) é a variável de maior impacto na qualidade de vida de seus filhos.
Resumo:
Background: In normal aging, the decrease in the syntactic complexity of written production is usually associated with cognitive deficits. This study was aimed to analyze the quality of older adults' textual production indicated by verbal fluency (number of words) and grammatical complexity (number of ideas) in relation to gender, age, schooling, and cognitive status. Methods: From a probabilistic sample of community-dwelling people aged 65 years and above (n = 900), 577 were selected on basis of their responses to the Mini-Mental State Examination (MMSE) sentence writing, which were submitted to content analysis; 323 were excluded as they left the item blank or performed illegible or not meaningful responses. Education adjusted cut-off scores for the MMSE were used to classify the participants as cognitively impaired or unimpaired. Total and subdomain MMSE scores were computed. Results: 40.56% of participants whose answers to the MMSE sentence were excluded from the analyses had cognitive impairment compared to 13.86% among those whose answers were included. The excluded participants were older and less educated. Women and those older than 80 years had the lowest scores in the MMSE. There was no statistically significant relationship between gender, age, schooling, and textual performance. There was a modest but significant correlation between number of words written and the scores in the Language subdomain. Conclusions: Results suggest the strong influence of schooling and age over MMSE sentence performance. Failing to write a sentence may suggest cognitive impairment, yet, instructions for the MMSE sentence, i.e. to produce a simple sentence, may limit its clinical interpretation.
Resumo:
Cervical cancer remains persistently the second most common malignancies among women worldwide, responsible for 500,000 new cases annually. Only in Brazil, the estimate is for 18,430 new cases in 2011. Several types of molecular markers have been studied in carcinogenesis including proteins associated with apoptosis such as BAG-1 and PARP-1. This study aims to demonstrate the expression of BAG-1 and PARP-1 in patients with low-grade squamous intraepithelial lesions (LSILs), high-grade squamous intraepithelial lesions (HSILs) and invasive squamous cell carcinomas (SCCs) of the uterine cervix and to verify a possible association with HPV infection. Fifty samples of LSILs, 50 samples of HSILs and 50 samples of invasive SCCs of the uterine cervix were analyzed by immunohistochemistry for BAG-1 and PARP-1 expression. PCR was performed to detect and type HPV DNA. BAG-1 expression levels were significantly different between LSILs and HSILs (p = 0,014) and between LSILs and SCCs (p = 0,014). In regards to PARP-1 expression, we found significant differences between the expression levels in HSILs and SCCs (p = 0,022). No association was found between BAG-1 expression and the presence of HPV. However, a significant association was found between PARP-1 expression and HPV positivity in the HSILs group (p = 0,021). In conclusion our research suggests that BAG-1 expression could contribute to the differentiation between LSIL and HSIL/SCC whereas PARP-1 could be useful to the differentiation between HSIL HPV-related and SCC. Further studies are needed to clarify the molecular aspects of the relationship between PARP-1 expression and HPV infection, with potential applications for cervical cancer prediction.
Resumo:
The purpose of this study was to investigate the exchange of disfluencies from function words to content words with age in Brazilian Portuguese speakers who do and do not stutter. Ninety stuttering individuals and 90 controls, native speakers of Brazilian Portuguese, were divided into three age groups (children, adolescents and adults). The study method involved analyzing the occurrence of stuttering on content and function words based on spontaneous speech samples. Results indicated that children tend to be more disfluent on function words. With the increase in age, teenagers and adults who stutter presented a higher number of disfluencies on content words. These findings support the current literature, indicating that with the aging process, there is an exchange of disfluencies from function to content words. This shift in the disfluency pattern may account for a more advanced type of stuttering. The study also demonstrated that disfluencies in Portuguese speakers follow the same pattern of shifting from function to content words with age as for English speakers.