944 resultados para automated text classification
Resumo:
Mestrado em Engenharia Informática - Área de Especialização em Arquiteturas, Sistemas e Redes
Resumo:
The diagnosis of idiopathic Parkinson's disease (IPD) is entirely clinical. The fact that neuronal damage begins 5-10 years before occurrence of sub-clinical signs, underlines the importance of preclinical diagnosis. A new approach for in-vivo pathophysiological assessment of IPD-related neurodegeneration was implemented based on recently developed neuroimaging methods. It is based on non- invasive magnetic resonance data sensitive to brain tissue property changes that precede macroscopic atrophy in the early stages of IPD. This research aims to determine the brain tissue property changes induced by neurodegeneration that can be linked to clinical phenotypes which will allow us to create a predictive model for early diagnosis in IPD. We hypothesized that the degree of disease progression in IPD patients will have a differential and specific impact on brain tissue properties used to create a predictive model of motor and non-motor impairment in IPD. We studied the potential of in-vivo quantitative imaging sensitive to neurodegeneration- related brain tissue characteristics to detect changes in patients with IPD. We carried out methodological work within the well established SPM8 framework to estimate the sensitivity of tissue probability maps for automated tissue classification for detection of early IPD. We performed whole-brain multi parameter mapping at high resolution followed by voxel-based morphometric (VBM) analysis and voxel-based quantification (VBQ) comparing healthy subjects to IPD patients. We found a trend demonstrating non-significant tissue property changes in the olfactory bulb area using the MT and R1 parameter with p<0.001. Comparing to the IPD patients, the healthy group presented a bilateral higher MT and R1 intensity in this specific functional region. These results did not correlate with age, severity or duration of disease. We failed to demonstrate any changes with the R2* parameter. We interpreted our findings as demyelination of the olfactory tract, which is clinically represented as anosmia. However, the lack of correlation with duration or severity complicates its implications in the creation of a predictive model of impairment in IPD.
Resumo:
Evidence from magnetic resonance imaging (MRI) studies shows that healthy aging is associated with profound changes in cortical and subcortical brain structures. The reliable delineation of cortex and basal ganglia using automated computational anatomy methods based on T1-weighted images remains challenging, which results in controversies in the literature. In this study we use quantitative MRI (qMRI) to gain an insight into the microstructural mechanisms underlying tissue ageing and look for potential interactions between ageing and brain tissue properties to assess their impact on automated tissue classification. To this end we acquired maps of longitudinal relaxation rate R1, effective transverse relaxation rate R2* and magnetization transfer - MT, from healthy subjects (n=96, aged 21-88 years) using a well-established multi-parameter mapping qMRI protocol. Within the framework of voxel-based quantification we find higher grey matter volume in basal ganglia, cerebellar dentate and prefrontal cortex when tissue classification is based on MT maps compared with T1 maps. These discrepancies between grey matter volume estimates can be attributed to R2* - a surrogate marker of iron concentration, and further modulation by an interaction between R2* and age, both in cortical and subcortical areas. We interpret our findings as direct evidence for the impact of ageing-related brain tissue property changes on automated tissue classification of brain structures using SPM12. Computational anatomy studies of ageing and neurodegeneration should acknowledge these effects, particularly when inferring about underlying pathophysiology from regional cortex and basal ganglia volume changes.
Resumo:
A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora.
Resumo:
In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.
Resumo:
Traditional content-based image retrieval (CBIR) systems use low-level features such as colors, shapes, and textures of images. Although, users make queries based on semantics, which are not easily related to such low-level characteristics. Recent works on CBIR confirm that researchers have been trying to map visual low-level characteristics and high-level semantics. The relation between low-level characteristics and image textual information has motivated this article which proposes a model for automatic classification and categorization of words associated to images. This proposal considers a self-organizing neural network architecture, which classifies textual information without previous learning. Experimental results compare the performance results of the text-based approach to an image retrieval system based on low-level features. (c) 2008 Wiley Periodicals, Inc.
Fragmentos dos trilhos na paisagem de São Paulo: os brownfields ferroviários e sua refuncionalização
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
Objective: Evaluation of the antimicrobial effect of skin disinfection techniques is essential to avoid the transmission of infectious agents during blood transfusion. The aim of this study was to examine the effectiveness of two methods of arm skin disinfection used in blood donors at a Hemotherapy Center in Brazil that represents an important centre for distributing haemocomponents to many cities in the country. Methods: Two skin disinfection techniques in 50 blood donors were evaluated. For the first arm, 10% povidone-iodine/two-stage technique was used. On the opposite arm, 0.5% chlorhexidine digluconate alcohol solution/one-stage technique was used. The swabs were seeded on three culture media: blood agar, mannitol salt agar and Mac Conkey agar. Automated bacterial classification based on biochemical tests/specific substrates was performed. Donor characteristics were collected using the computerised system of the Hemotherapy Center. Results: We found that microbial reduction was significantly higher for 10% povidone-iodine technique (98.57-98.87%) when compared with 0.5% chlorhexidine technique (94.38-95.06%). The species Leuconostoc mesenteroides and Staphylococcus hominis showed resistance to both disinfection techniques. We did not find statistically significant relationships between donor characteristics and microbial reduction. Conclusions: Arm skin disinfection with 10% povidone-iodine produced better antimicrobial activity. We must acknowledge that 10% povidone-iodine technique has the limitation of being a two-stage method. However, prevention of adverse events due to bacterial contamination and transfusion reactions should be prioritised. Production of hypoallergenic and stronger antiseptics that allowed a safe one-stage disinfection technique should be encouraged in health systems, not only in Brazil but also around the world.
Resumo:
El foco geográfico de un documento identifica el lugar o lugares en los que se centra el contenido del texto. En este trabajo se presenta una aproximación basada en corpus para la detección del foco geográfico en el texto. Frente a otras aproximaciones que se centran en el uso de información puramente geográfica para la detección del foco, nuestra propuesta emplea toda la información textual existente en los documentos del corpus de trabajo, partiendo de la hipótesis de que la aparición de determinados personajes, eventos, fechas e incluso términos comunes, pueden resultar fundamentales para esta tarea. Para validar nuestra hipótesis, se ha realizado un estudio sobre un corpus de noticias geolocalizadas que tuvieron lugar entre los años 2008 y 2011. Esta distribución temporal nos ha permitido, además, analizar la evolución del rendimiento del clasificador y de los términos más representativos de diferentes localidades a lo largo del tiempo.
Resumo:
"November 1979."
Resumo:
Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.
Resumo:
Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.