7 resultados para Machine Learning,Natural Language Processing,Descriptive Text Mining,POIROT,Transformer
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Surveillance Levels (SLs) are categories for medical patients (used in Brazil) that represent different types of medical recommendations. SLs are defined according to risk factors and the medical and developmental history of patients. Each SL is associated with specific educational and clinical measures. The objective of the present paper was to verify computer-aided, automatic assignment of SLs. The present paper proposes a computer-aided approach for automatic recommendation of SLs. The approach is based on the classification of information from patient electronic records. For this purpose, a software architecture composed of three layers was developed. The architecture is formed by a classification layer that includes a linguistic module and machine learning classification modules. The classification layer allows for the use of different classification methods, including the use of preprocessed, normalized language data drawn from the linguistic module. We report the verification and validation of the software architecture in a Brazilian pediatric healthcare institution. The results indicate that selection of attributes can have a great effect on the performance of the system. Nonetheless, our automatic recommendation of surveillance level can still benefit from improvements in processing procedures when the linguistic module is applied prior to classification. Results from our efforts can be applied to different types of medical systems. The results of systems supported by the framework presented in this paper may be used by healthcare and governmental institutions to improve healthcare services in terms of establishing preventive measures and alerting authorities about the possibility of an epidemic.
Resumo:
The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Several recent studies in literature have identified brain morphological alterations associated to Borderline Personality Disorder (BPD) patients. These findings are reported by studies based on voxel-based-morphometry analysis of structural MRI data, comparing mean gray-matter concentration between groups of BPD patients and healthy controls. On the other hand, mean differences between groups are not informative about the discriminative value of neuroimaging data to predict the group of individual subjects. In this paper, we go beyond mean differences analyses, and explore to what extent individual BPD patients can be differentiated from controls (25 subjects in each group), using a combination of automated-morphometric tools for regional cortical thickness/volumetric estimation and Support Vector Machine classifier. The approach included a feature selection step in order to identify the regions containing most discriminative information. The accuracy of this classifier was evaluated using the leave-one-subject-out procedure. The brain regions indicated as containing relevant information to discriminate groups were the orbitofrontal, rostral anterior cingulate, posterior cingulate, middle temporal cortices, among others. These areas, which are distinctively involved in emotional and affect regulation of BPD patients, were the most informative regions to achieve both sensitivity and specificity values of 80% in SVM classification. The findings suggest that this new methodology can add clinical and potential diagnostic value to neuroimaging of psychiatric disorders. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Background: The integration of sequencing and gene interaction data and subsequent generation of pathways and networks contained in databases such as KEGG Pathway is essential for the comprehension of complex biological processes. We noticed the absence of a chart or pathway describing the well-studied preimplantation development stages; furthermore, not all genes involved in the process have entries in KEGG Orthology, important information for knowledge application with relation to other organisms. Results: In this work we sought to develop the regulatory pathway for the preimplantation development stage using text-mining tools such as Medline Ranker and PESCADOR to reveal biointeractions among the genes involved in this process. The genes present in the resulting pathway were also used as seeds for software developed by our group called SeedServer to create clusters of homologous genes. These homologues allowed the determination of the last common ancestor for each gene and revealed that the preimplantation development pathway consists of a conserved ancient core of genes with the addition of modern elements. Conclusions: The generation of regulatory pathways through text-mining tools allows the integration of data generated by several studies for a more complete visualization of complex biological processes. Using the genes in this pathway as “seeds” for the generation of clusters of homologues, the pathway can be visualized for other organisms. The clustering of homologous genes together with determination of the ancestry leads to a better understanding of the evolution of such process.
Resumo:
This study investigated whether there are differences in the Speech-Evoked Auditory Brainstem Response among children with Typical Development (TD), (Central) Auditory Processing Disorder (C) APD, and Language Impairment (LI). The speech-evoked Auditory Brainstem Response was tested in 57 children (ages 6-12). The children were placed into three groups: TD (n = 18), (C)APD (n = 18) and LI (n = 21). Speech-evoked ABR were elicited using the five-formant syllable/da/. Three dimensions were defined for analysis, including timing, harmonics, and pitch. A comparative analysis of the responses between the typical development children and children with (C)APD and LI revealed abnormal encoding of the speech acoustic features that are characteristics of speech perception in children with (C)APD and LI, although the two groups differed in their abnormalities. While the children with (C)APD might had a greater difficulty distinguishing stimuli based on timing cues, the children with LI had the additional difficulty of distinguishing speech harmonics, which are important to the identification of speech sounds. These data suggested that an inefficient representation of crucial components of speech sounds may contribute to the difficulties with language processing found in children with LI. Furthermore, these findings may indicate that the neural processes mediated by the auditory brainstem differ among children with auditory processing and speech-language disorders. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The reproductive performance of cattle may be influenced by several factors, but mineral imbalances are crucial in terms of direct effects on reproduction. Several studies have shown that elements such as calcium, copper, iron, magnesium, selenium, and zinc are essential for reproduction and can prevent oxidative stress. However, toxic elements such as lead, nickel, and arsenic can have adverse effects on reproduction. In this paper, we applied a simple and fast method of multi-element analysis to bovine semen samples from Zebu and European classes used in reproduction programs and artificial insemination. Samples were analyzed by inductively coupled plasma spectrometry (ICP-MS) using aqueous medium calibration and the samples were diluted in a proportion of 1:50 in a solution containing 0.01% (vol/vol) Triton X-100 and 0.5% (vol/vol) nitric acid. Rhodium, iridium, and yttrium were used as the internal standards for ICP-MS analysis. To develop a reliable method of tracing the class of bovine semen, we used data mining techniques that make it possible to classify unknown samples after checking the differentiation of known-class samples. Based on the determination of 15 elements in 41 samples of bovine semen, 3 machine-learning tools for classification were applied to determine cattle class. Our results demonstrate the potential of support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF) chemometric tools to identify cattle class. Moreover, the selection tools made it possible to reduce the number of chemical elements needed from 15 to just 8.
Resumo:
Multi-element analysis of honey samples was carried out with the aim of developing a reliable method of tracing the origin of honey. Forty-two chemical elements were determined (Al, Cu, Pb, Zn, Mn, Cd, Tl, Co, Ni, Rb, Ba, Be, Bi, U, V, Fe, Pt, Pd, Te, Hf, Mo, Sn, Sb, P, La, Mg, I, Sm, Tb, Dy, Sd, Th, Pr, Nd, Tm, Yb, Lu, Gd, Ho, Er, Ce, Cr) by inductively coupled plasma mass spectrometry (ICP-MS). Then, three machine learning tools for classification and two for attribute selection were applied in order to prove that it is possible to use data mining tools to find the region where honey originated. Our results clearly demonstrate the potential of Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) chemometric tools for honey origin identification. Moreover, the selection tools allowed a reduction from 42 trace element concentrations to only 5. (C) 2012 Elsevier Ltd. All rights reserved.