838 resultados para text and data mining
Resumo:
OBJETIVO: Avaliar artigos, na literatura, que verificam o valor preditivo positivo das categorias 3, 4 e 5 do Breast Imaging Reporting and Data System (BI-RADS®). MATERIAIS E MÉTODOS: Foi realizada pesquisa na base de dados Medline utilizando os termos "predictive value" e "BI-RADS". Foram incluídos 11 artigos nesta revisão. RESULTADOS: O valor preditivo positivo das categorias 3, 4 e 5 variou entre 0% e 8%, 4% e 62%, 54% e 100%, respectivamente. Três artigos avaliaram, concomitantemente, os critérios morfológicos das lesões que apresentaram maior valor preditivo positivo na mamografia, sendo nódulo espiculado o critério com maior valor preditivo positivo. CONCLUSÃO: Houve grande variabilidade do valor preditivo positivo das categorias 3, 4 e 5 do BI-RADS® em todos os estudos, porém foram identificadas diferenças metodológicas que limitaram a comparação desses estudos.
Resumo:
OBJETIVO: O objetivo deste trabalho foi avaliar o BI-RADS® como fator preditivo de suspeição de malignidade em lesões mamárias não palpáveis nas categorias 3, 4 e 5, correlacionando as mamografias com os resultados histopatológicos através do cálculo do valor preditivo positivo do exame mamográfico. MATERIAIS E MÉTODOS: Trezentas e setenta e uma pacientes encaminhadas a um serviço de referência em tratamento de câncer em Teresina, PI, para realização de exames histopatológicos em mama no período de julho de 2005 a março de 2008, por terem mamografia de categorias 3, 4 ou 5, tiveram seus exames revisados. Das 371 pacientes, 265 foram submetidas a biópsia por agulha grossa e 106, a marcação pré-cirúrgica. RESULTADOS: Em relação às mamografias, 11,32% foram classificadas como categoria 3, 76,28% como categoria 4 e 12,4% como categoria 5. Os resultados histológicos demonstraram 24% de exames positivos para malignidade. Os valores preditivos positivos das categorias 3, 4 e 5 foram, respectivamente, de 7,14%, 16,96% e 82,61%. Foram calculados os valores preditivos positivos, separadamente, para as biópsias percutâneas (7,14%, 15,76%, 76,47%) e para as marcações pré-cirúrgicas (7,14%, 20%, 100%). CONCLUSÃO: Achados malignos foram subestimados pelo laudo radiológico e houve superestimação de achados benignos, o que resultou na realização desnecessária de alguns procedimentos invasivos.
Resumo:
Objective To evaluate the BI-RADS as a predictive factor of suspicion for malignancy in breast lesions by correlating radiological with histological results and calculating the positive predictive value for categories 3, 4 and 5 in a breast cancer reference center in the city of São Paulo. Materials and Methods Retrospective, analytical and cross-sectional study including 725 patients with mammographic and/or sonographic findings classified as BI-RADS categories 3, 4 and 5 who were referred to the authors' institution to undergo percutaneous biopsy. The tests results were reviewed and the positive predictive value was calculated by means of a specific mathematical equation. Results Positive predictive values found for categories 3, 4 and 5 were respectively the following: 0.74%, 33.08% and 92.95%, for cases submitted to ultrasound-guided biopsy, and 0.00%, 14.90% and 100% for cases submitted to stereotactic biopsy. Conclusion The present study demonstrated high suspicion for malignancy in lesions classified as category 5 and low risk for category 3. As regards category 4, the need for systematic biopsies was observed.
Resumo:
In general, laboratory activities are costly in terms of time, space, and money. As such, the ability to provide realistically simulated laboratory data that enables students to practice data analysis techniques as a complementary activity would be expected to reduce these costs while opening up very interesting possibilities. In the present work, a novel methodology is presented for design of analytical chemistry instrumental analysis exercises that can be automatically personalized for each student and the results evaluated immediately. The proposed system provides each student with a different set of experimental data generated randomly while satisfying a set of constraints, rather than using data obtained from actual laboratory work. This allows the instructor to provide students with a set of practical problems to complement their regular laboratory work along with the corresponding feedback provided by the system's automatic evaluation process. To this end, the Goodle Grading Management System (GMS), an innovative web-based educational tool for automating the collection and assessment of practical exercises for engineering and scientific courses, was developed. The proposed methodology takes full advantage of the Goodle GMS fusion code architecture. The design of a particular exercise is provided ad hoc by the instructor and requires basic Matlab knowledge. The system has been employed with satisfactory results in several university courses. To demonstrate the automatic evaluation process, three exercises are presented in detail. The first exercise involves a linear regression analysis of data and the calculation of the quality parameters of an instrumental analysis method. The second and third exercises address two different comparison tests, a comparison test of the mean and a t-paired test.
Resumo:
Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.
Resumo:
OBJETIVO: avaliar a acurácia da mamografia para o diagnóstico de microcalcificações mamárias suspeitas, com as classificações do Breast Imaging Reporting and Data System (BI-RADS TM) e Le Gal em comparação com o resultado histopatológico utilizado como padrão-ouro. MÉTODOS: foram selecionados dos arquivos dos blocos cirúrgicos, 130 casos operados com mamografias contendo somente microcalcificações mamárias, inicialmente classificadas como suspeitas sem lesões detectáveis ao exame clínico. Estas foram reclassificadas por dois examinadores, utilizando as classificações de Le Gal e BI-RADS TM, obtendo-se diagnóstico de consenso. As biópsias foram revistas por dois patologistas e foi obtido diagnóstico de consenso. A leitura das mamografias e a revisão das lâminas foram feitas em duplo-cego. As análises estatísticas utilizadas neste estudo foram o teste do chi2, o modelo Fleiss quadrático para VPP e o programa Epi-Info 6.0. RESULTADOS: a correlação entre a análise histopatológica e mamográfica, usando BI-RADS TM e Le Gal, mostrou a mesma sensibilidade de 96,4%, especificidade de 55,9 e 30,3%, valor preditivo positivo (VPP) de 37,5% e 27,5% e acurácia de 64,6 e 44,6%, respectivamente. Quando discriminamos por categorias de BI-RADS TM, obtivemos VPPs: categoria 2, 0%; categoria 3, 1,8%; categoria 4, 31,6% e categoria 5, 60%. Os VPPs pela classificação de Le Gal foram: categoria 2, 3,1%; categoria 3, 18,1 %; categoria 4, 26,4%; categoria 5, 66,7% e não classificável, 5,2%. CONCLUSÕES: observou-se uma maior precisão com a classificação de BI-RADS TM, porém não se conseguiu reduzir a ambigüidade na avaliação das microcalcificações mamárias.
Resumo:
This thesis introduces heat demand forecasting models which are generated by using data mining algorithms. The forecast spans one full day and this forecast can be used in regulating heat consumption of buildings. For training the data mining models, two years of heat consumption data from a case building and weather measurement data from Finnish Meteorological Institute are used. The thesis utilizes Microsoft SQL Server Analysis Services data mining tools in generating the data mining models and CRISP-DM process framework to implement the research. Results show that the built models can predict heat demand at best with mean average percentage errors of 3.8% for 24-h profile and 5.9% for full day. A deployment model for integrating the generated data mining models into an existing building energy management system is also discussed.
Resumo:
For years, choosing the right career by monitoring the trends and scope for different career paths have been a requirement for all youngsters all over the world. In this paper we provide a scientific, data mining based method for job absorption rate prediction and predicting the waiting time needed for 100% placement, for different engineering courses in India. This will help the students in India in a great deal in deciding the right discipline for them for a bright future. Information about passed out students are obtained from the NTMIS ( National technical manpower information system ) NODAL center in Kochi, India residing in Cochin University of science and technology
Resumo:
In the current study, epidemiology study is done by means of literature survey in groups identified to be at higher potential for DDIs as well as in other cases to explore patterns of DDIs and the factors affecting them. The structure of the FDA Adverse Event Reporting System (FAERS) database is studied and analyzed in detail to identify issues and challenges in data mining the drug-drug interactions. The necessary pre-processing algorithms are developed based on the analysis and the Apriori algorithm is modified to suit the process. Finally, the modules are integrated into a tool to identify DDIs. The results are compared using standard drug interaction database for validation. 31% of the associations obtained were identified to be new and the match with existing interactions was 69%. This match clearly indicates the validity of the methodology and its applicability to similar databases. Formulation of the results using the generic names expanded the relevance of the results to a global scale. The global applicability helps the health care professionals worldwide to observe caution during various stages of drug administration thus considerably enhancing pharmacovigilance
Resumo:
Data mining means to summarize information from large amounts of raw data. It is one of the key technologies in many areas of economy, science, administration and the internet. In this report we introduce an approach for utilizing evolutionary algorithms to breed fuzzy classifier systems. This approach was exercised as part of a structured procedure by the students Achler, Göb and Voigtmann as contribution to the 2006 Data-Mining-Cup contest, yielding encouragingly positive results.
Resumo:
We present a new algorithm called TITANIC for computing concept lattices. It is based on data mining techniques for computing frequent itemsets. The algorithm is experimentally evaluated and compared with B. Ganter's Next-Closure algorithm.
Resumo:
In this paper, we discuss Conceptual Knowledge Discovery in Databases (CKDD) in its connection with Data Analysis. Our approach is based on Formal Concept Analysis, a mathematical theory which has been developed and proven useful during the last 20 years. Formal Concept Analysis has led to a theory of conceptual information systems which has been applied by using the management system TOSCANA in a wide range of domains. In this paper, we use such an application in database marketing to demonstrate how methods and procedures of CKDD can be applied in Data Analysis. In particular, we show the interplay and integration of data mining and data analysis techniques based on Formal Concept Analysis. The main concern of this paper is to explain how the transition from data to knowledge can be supported by a TOSCANA system. To clarify the transition steps we discuss their correspondence to the five levels of knowledge representation established by R. Brachman and to the steps of empirically grounded theory building proposed by A. Strauss and J. Corbin.
Resumo:
Formal Concept Analysis is an unsupervised learning technique for conceptual clustering. We introduce the notion of iceberg concept lattices and show their use in Knowledge Discovery in Databases (KDD). Iceberg lattices are designed for analyzing very large databases. In particular they serve as a condensed representation of frequent patterns as known from association rule mining. In order to show the interplay between Formal Concept Analysis and association rule mining, we discuss the algorithm TITANIC. We show that iceberg concept lattices are a starting point for computing condensed sets of association rules without loss of information, and are a visualization method for the resulting rules.
Resumo:
Embora o objectivo de redução de acidentes laborais seja frequentemente invocado para justificar uma aplicação preventiva de testes de álcool e drogas no trabalho, há poucas evidências estatisticamente relevantes das pressupostas causalidade e correlação negativa entre a sujeição aos testes e os posteriores acidentes. Os dados de testes e dos acidentes ocorridos com os colaboradores de uma transportadora ferroviária portuguesa de âmbito nacional, durante anos recentes, começam agora a ser explorados, em busca de relações entre estas e outras variáveis biográficas. - Although the aim of reducing occupational accidents is frequently cited to justify preventive drug and alcohol testing at work, there is little statistically significant evidence of the assumed causality and negative correlation between exposure to testing and subsequent accidents. Data mining of tests and accidents involving employees of a Portuguese national wide railway transportation company, during recent years, is now beginning in search of relations between these and other biographical variables.