938 resultados para Naive Bayes
Resumo:
A limitada capacidade dos computadores em processar documentos de texto e consequente di culdade de extracção de informação desses documentos deve-se à dificuldade de processamento de informação não-estruturada. De modo a reduzir essa limitação é necessário aumentar a estrutura dos documentos com que os computadores trabalham. Este trabalho propõe um modelo de classificação de documentos através de um processo de refinamento sucessivo da informação. A cada iteração a informação presente no documento é melhor caracterizada através da aplicação de um classi cador apropriado. O processo de classificação recorre a informação estatística, usando o modelo de classificação de Bayes, sobre documentos ou fragmentos de documentos. O processo de classificação também recorre a técnicas para especificação de padrões de texto, usando expressões regulares para extrair informação que exibe um padrão conhecido. A informação obtida é armazenada em XML, que permite a interrogação de colecções de documentos de modo automático (recorrendo a bases de dados de suporte nativo XML). O XML também é usado para transformar a informação original noutros formatos, como por exemplo o HTML. Este formato pode ser usado para sintetizar a informação de modo melhorar a sua apresentação.
Resumo:
O sistema de transportes rodoviário é avaliado habitualmente por quatro parâmetros de desempenho: acessibilidade, mobilidade, economia e ambiente. Face à dimensão do problema que a sinistralidade rodoviária representa actualmente, em termos sociais e económicos, é essencial que a engenharia rodoviária consiga avaliar objectivamente a segurança rodoviária. Mas como medir a "Oferta da Segurança Rodoviária"? É apresentada nesta dissertação uma proposta de abordagem metológica da questão anterior, que assenta num desenvolvimento baseado na Aproximação Empírica de Bayes (AEB), sendo estruturado no pressuposto de vir a ser uma componente a ser integrada num sistema global de monitorização e maximização da segurança rodoviária. Esta abordagem metológica pode vir a ser aplicada com grande sucesso aos procedimentos, de gestão da segurança rodoviária e consequentemente facilitar também, ao nível estratégico, a estabilização das variáveis macroscópicas relevantes para a aferição global do desempenho no âmbito da segurança rodoviária e consequentemente, do sistema rodoviário. A proposta metodológica apresentada foi testada com sucesso através de um estudo de caso no IC1. Este itinerário foi seccionado em 43 troços homogéneos (face ao ambiente rodoviário), nos quais foi analisado o volume de exposição ao risco e a frequência observada de acidentes, para um periódo de 5 anos (2003-2007).
Resumo:
INTRODUÇÃO: Como parte do programa de investigação sobre violência familiar e desnutrição severa na infância, especificamente um estudo do tipo caso-controle foi avaliada a confiabilidade dos instrumentos utilizados no processo de obtenção de dados. Estudou-se a confiabilidade de quatro componentes do instrumento principal: (a) as Escalas sobre Táticas de Conflitos (Conflicts Tactics Scales - CTS) que medem o grau de conflito/violência familiar; (b) o instrumento CAGE (Cut-down; Annoyed; Guilty & Eye-opener) usado para indicar suspeição de alcoolismo; (c) o NSDUQ (Non-student Drugs Use Questionnaire) que visa aferir o uso de drogas ilícitas; e (d) a medida antropométrica de comprimento. MÉTODO: Para os três primeiros componentes citados foram avaliadas a estabilidade (confiabilidade intra-observador ou teste-reteste) e a equivalência (confiabilidade inter-observador), usando-se os 50 primeiros sujeitos captados no estudo caso-controle de fundo. Para a análise, usou-se o índice Kappa (k) com ajustamento (pseudo-Bayes) para lidar com problemas de estimabilidade. Em relação ao componente "d", foi estudada somente a equivalência (n=73), usando-se o Coeficiente de Correlação Intraclasse (Intra-class Correlation Coefficient - ICC) como estimador. RESULTADOS: Todos os componentes mostraram estabilidade e equivalência aceitáveis. Quanto à estabilidade das CTS, CAGE e NSDUQ, as estimações de k foram em torno de 0,70, 0,78 e 0,85, respectivamente. Em relação à equivalência, encontrou-se os valores de 1,0 para as CTS e NSDUQ e 0,75 para CAGE. A equivalência estimada através do ICC para comprimento foi de 0,99. Algumas situações desviantes foram observadas. Os resultados apontam para uma adequada padronização dos observadores e refletem a boa qualidade do processo de aferição referente ao estudo de fundo, encorajando a equipe de pesquisa a prosseguir com maior segurança.
Resumo:
OBJETIVO: Analisar a distribuição espacial da hanseníase, identificar áreas de possível sub-registro de casos ou de provável alta transmissão (risco) e verificar a associação dessa distribuição à existência de casos de formas multibacilares. MÉTODOS: O estudo foi realizado em Recife, PE, de acordo com 94 bairros analisados. A fonte de coleta de dados foi o Sistema de Informações sobre Agravos de Notificação do Ministério da Saúde. Foi adotada uma abordagem ecológica com utilização do método bayesiano empírico para suavização local de taxas, a partir de informações de bairros vizinhos por adjacência. RESULTADOS: A ocorrência média anual foi de 17,3% de casos novos em menores de 15 anos (28,3% de formas multibacilares), indicando um processo de intensa transmissão da doença. A análise da distribuição espacial de hanseníase apontou três áreas onde se concentram bairros com taxas de detecção elevadas e que possuem baixa condição de vida. CONCLUSÕES: O emprego do modelo bayesiano, baseado em informações de unidades espaciais vizinhas, permitiu estimar novamente indicadores epidemiológicos. Foi possível identificar áreas prioritárias para o programa de controle de hanseníase no município, tanto pelo elevado número de ocorrências correlacionado à presença de formas multibacilares de doença em menores de 15 anos quanto pela existência de subnotificação.
Resumo:
Liver steatosis is a common disease usually associated with social and genetic factors. Early detection and quantification is important since it can evolve to cirrhosis. In this paper, a new computer-aided diagnosis (CAD) system for steatosis classification, in a local and global basis, is presented. Bayes factor is computed from objective ultrasound textural features extracted from the liver parenchyma. The goal is to develop a CAD screening tool, to help in the steatosis detection. Results showed an accuracy of 93.33%, with a sensitivity of 94.59% and specificity of 92.11%, using the Bayes classifier. The proposed CAD system is a suitable graphical display for steatosis classification.
Resumo:
Chronic liver disease (CLD) is most of the time an asymptomatic, progressive, and ultimately potentially fatal disease. In this study, an automatic hierarchical procedure to stage CLD using ultrasound images, laboratory tests, and clinical records are described. The first stage of the proposed method, called clinical based classifier (CBC), discriminates healthy from pathologic conditions. When nonhealthy conditions are detected, the method refines the results in three exclusive pathologies in a hierarchical basis: 1) chronic hepatitis; 2) compensated cirrhosis; and 3) decompensated cirrhosis. The features used as well as the classifiers (Bayes, Parzen, support vector machine, and k-nearest neighbor) are optimally selected for each stage. A large multimodal feature database was specifically built for this study containing 30 chronic hepatitis cases, 34 compensated cirrhosis cases, and 36 decompensated cirrhosis cases, all validated after histopathologic analysis by liver biopsy. The CBC classification scheme outperformed the nonhierachical one against all scheme, achieving an overall accuracy of 98.67% for the normal detector, 87.45% for the chronic hepatitis detector, and 95.71% for the cirrhosis detector.
Resumo:
Liver steatosis is a common disease usually associated with social and genetic factors. Early detection and quantification is important since it can evolve to cirrhosis. Steatosis is usually a diffuse liver disease, since it is globally affected. However, steatosis can also be focal affecting only some foci difficult to discriminate. In both cases, steatosis is detected by laboratorial analysis and visual inspection of ultrasound images of the hepatic parenchyma. Liver biopsy is the most accurate diagnostic method but its invasive nature suggest the use of other non-invasive methods, while visual inspection of the ultrasound images is subjective and prone to error. In this paper a new Computer Aided Diagnosis (CAD) system for steatosis classification and analysis is presented, where the Bayes Factor, obatined from objective intensity and textural features extracted from US images of the liver, is computed in a local or global basis. The main goal is to provide the physician with an application to make it faster and accurate the diagnosis and quantification of steatosis, namely in a screening approach. The results showed an overall accuracy of 93.54% with a sensibility of 95.83% and 85.71% for normal and steatosis class, respectively. The proposed CAD system seemed suitable as a graphical display for steatosis classification and comparison with some of the most recent works in the literature is also presented.
Resumo:
Steatosis, also known as fatty liver, corresponds to an abnormal retention of lipids within the hepatic cells and reflects an impairment of the normal processes of synthesis and elimination of fat. Several causes may lead to this condition, namely obesity, diabetes, or alcoholism. In this paper an automatic classification algorithm is proposed for the diagnosis of the liver steatosis from ultrasound images. The features are selected in order to catch the same characteristics used by the physicians in the diagnosis of the disease based on visual inspection of the ultrasound images. The algorithm, designed in a Bayesian framework, computes two images: i) a despeckled one, containing the anatomic and echogenic information of the liver, and ii) an image containing only the speckle used to compute the textural features. These images are computed from the estimated RF signal generated by the ultrasound probe where the dynamic range compression performed by the equipment is taken into account. A Bayes classifier, trained with data manually classified by expert clinicians and used as ground truth, reaches an overall accuracy of 95% and a 100% of sensitivity. The main novelties of the method are the estimations of the RF and speckle images which make it possible to accurately compute textural features of the liver parenchyma relevant for the diagnosis.
Resumo:
Introduction: A major focus of data mining process - especially machine learning researches - is to automatically learn to recognize complex patterns and help to take the adequate decisions strictly based on the acquired data. Since imaging techniques like MPI – Myocardial Perfusion Imaging on Nuclear Cardiology, can implicate a huge part of the daily workflow and generate gigabytes of data, there could be advantages on Computerized Analysis of data over Human Analysis: shorter time, homogeneity and consistency, automatic recording of analysis results, relatively inexpensive, etc.Objectives: The aim of this study relates with the evaluation of the efficacy of this methodology on the evaluation of MPI Stress studies and the process of decision taking concerning the continuation – or not – of the evaluation of each patient. It has been pursued has an objective to automatically classify a patient test in one of three groups: “Positive”, “Negative” and “Indeterminate”. “Positive” would directly follow to the Rest test part of the exam, the “Negative” would be directly exempted from continuation and only the “Indeterminate” group would deserve the clinician analysis, so allowing economy of clinician’s effort, increasing workflow fluidity at the technologist’s level and probably sparing time to patients. Methods: WEKA v3.6.2 open source software was used to make a comparative analysis of three WEKA algorithms (“OneR”, “J48” and “Naïve Bayes”) - on a retrospective study using the comparison with correspondent clinical results as reference, signed by nuclear cardiologist experts - on “SPECT Heart Dataset”, available on University of California – Irvine, at the Machine Learning Repository. For evaluation purposes, criteria as “Precision”, “Incorrectly Classified Instances” and “Receiver Operating Characteristics (ROC) Areas” were considered. Results: The interpretation of the data suggests that the Naïve Bayes algorithm has the best performance among the three previously selected algorithms. Conclusions: It is believed - and apparently supported by the findings - that machine learning algorithms could significantly assist, at an intermediary level, on the analysis of scintigraphic data obtained on MPI, namely after Stress acquisition, so eventually increasing efficiency of the entire system and potentially easing both roles of Technologists and Nuclear Cardiologists. In the actual continuation of this study, it is planned to use more patient information and significantly increase the population under study, in order to allow improving system accuracy.
Resumo:
OBJETIVO: Desenvolver um modelo estatístico baseado em métodos Bayesianos para estimar o risco de infecção tuberculosa em estudos com perdas de seguimento, comparando-o com um modelo clássico determinístico. MÉTODOS: O modelo estocástico proposto é baseado em um algoritmo de amostradores de Gibbs, utilizando as informações de perdas de seguimento ao final de um estudo longitudinal. Para simular o número desconhecido de indivíduos reatores ao final do estudo e perdas de seguimento, mas não reatores no tempo inicial, uma variável latente foi introduzida no novo modelo. Apresenta-se um exercício de aplicação de ambos os modelos para comparação das estimativas geradas. RESULTADOS: As estimativas pontuais fornecidas por ambos os modelos são próximas, mas o modelo Bayesiano apresentou a vantagem de trazer os intervalos de credibilidade como medidas da variabilidade amostral dos parâmetros estimados. CONCLUSÕES: O modelo Bayesiano pode ser útil em estudos longitudinais com baixa adesão ao seguimento.
Resumo:
Liver steatosis is mainly a textural abnormality of the hepatic parenchyma due to fat accumulation on the hepatic vesicles. Today, the assessment is subjectively performed by visual inspection. Here a classifier based on features extracted from ultrasound (US) images is described for the automatic diagnostic of this phatology. The proposed algorithm estimates the original ultrasound radio-frequency (RF) envelope signal from which the noiseless anatomic information and the textural information encoded in the speckle noise is extracted. The features characterizing the textural information are the coefficients of the first order autoregressive model that describes the speckle field. A binary Bayesian classifier was implemented and the Bayes factor was calculated. The classification has revealed an overall accuracy of 100%. The Bayes factor could be helpful in the graphical display of the quantitative results for diagnosis purposes.
Resumo:
In this paper an automatic classification algorithm is proposed for the diagnosis of the liver steatosis, also known as, fatty liver, from ultrasound images. The features, automatically extracted from the ultrasound images used by the classifier, are basically the ones used by the physicians in the diagnosis of the disease based on visual inspection of the ultrasound images. The main novelty of the method is the utilization of the speckle noise that corrupts the ultrasound images to compute textural features of the liver parenchyma relevant for the diagnosis. The algorithm uses the Bayesian framework to compute a noiseless image, containing anatomic and echogenic information of the liver and a second image containing only the speckle noise used to compute the textural features. The classification results, with the Bayes classifier using manually classified data as ground truth show that the automatic classifier reaches an accuracy of 95% and a 100% of sensitivity.
Resumo:
OBJECTIVE To analyze the spatial distribution of risk for tuberculosis and its socioeconomic determinants in the city of Rio de Janeiro, Brazil.METHODS An ecological study on the association between the mean incidence rate of tuberculosis from 2004 to 2006 and socioeconomic indicators of the Censo Demográfico (Demographic Census) of 2000. The unit of analysis was the home district registered in the Sistema de Informação de Agravos de Notificação (Notifiable Diseases Information System) of Rio de Janeiro, Southeastern Brazil. The rates were standardized by sex and age group, and smoothed by the empirical Bayes method. Spatial autocorrelation was evaluated by Moran’s I. Multiple linear regression models were studied and the appropriateness of incorporating the spatial component in modeling was evaluated.RESULTS We observed a higher risk of the disease in some neighborhoods of the port and north regions, as well as a high incidence in the slums of Rocinha and Vidigal, in the south region, and Cidade de Deus, in the west. The final model identified a positive association for the variables: percentage of permanent private households in which the head of the house earns three to five minimum wages; percentage of individual residents in the neighborhood; and percentage of people living in homes with more than two people per bedroom.CONCLUSIONS The spatial analysis identified areas of risk of tuberculosis incidence in the neighborhoods of the city of Rio de Janeiro and also found spatial dependence for the incidence of tuberculosis and some socioeconomic variables. However, the inclusion of the space component in the final model was not required during the modeling process.
Resumo:
ABSTRACT OBJECTIVE To describe the spatial distribution of avoidable hospitalizations due to tuberculosis in the municipality of Ribeirao Preto, SP, Brazil, and to identify spatial and space-time clusters for the risk of occurrence of these events. METHODS This is a descriptive, ecological study that considered the hospitalizations records of the Hospital Information System of residents of Ribeirao Preto, SP, Southeastern Brazil, from 2006 to 2012. Only the cases with recorded addresses were considered for the spatial analyses, and they were also geocoded. We resorted to Kernel density estimation to identify the densest areas, local empirical Bayes rate as the method for smoothing the incidence rates of hospital admissions, and scan statistic for identifying clusters of risk. Softwares ArcGis 10.2, TerraView 4.2.2, and SaTScanTM were used in the analysis. RESULTS We identified 169 hospitalizations due to tuberculosis. Most were of men (n = 134; 79.2%), averagely aged 48 years (SD = 16.2). The predominant clinical form was the pulmonary one, which was confirmed through a microscopic examination of expectorated sputum (n = 66; 39.0%). We geocoded 159 cases (94.0%). We observed a non-random spatial distribution of avoidable hospitalizations due to tuberculosis concentrated in the northern and western regions of the municipality. Through the scan statistic, three spatial clusters for risk of hospitalizations due to tuberculosis were identified, one of them in the northern region of the municipality (relative risk [RR] = 3.4; 95%CI 2.7–4,4); the second in the central region, where there is a prison unit (RR = 28.6; 95%CI 22.4–36.6); and the last one in the southern region, and area of protection for hospitalizations (RR = 0.2; 95%CI 0.2–0.3). We did not identify any space-time clusters. CONCLUSIONS The investigation showed priority areas for the control and surveillance of tuberculosis, as well as the profile of the affected population, which shows important aspects to be considered in terms of management and organization of health care services targeting effectiveness in primary health care.
Resumo:
Price forecast is a matter of concern for all participants in electricity markets, from suppliers to consumers through policy makers, which are interested in the accurate forecast of day-ahead electricity prices either for better decisions making or for an improved evaluation of the effectiveness of market rules and structure. This paper describes a methodology to forecast market prices in an electricity market using an ARIMA model applied to the conjectural variations of the firms acting in an electricity market. This methodology is applied to the Iberian electricity market to forecast market prices in the 24 hours of a working day. The methodology was then compared with two other methodologies, one called naive and the other a direct forecast of market prices using also an ARIMA model. Results show that the conjectural variations price forecast performs better than the naive and that it performs slightly better than the direct price forecast.