3 resultados para Pancreatitis -- pathology


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Repositories containing high quality human biospecimens linked with robust and relevant clinical and pathological information are required for the discovery and validation of biomarkers for disease diagnosis, progression and response to treatment. Current molecular based discovery projects using either low or high throughput technologies rely heavily on ready access to such sample collections. It is imperative that modern biobanks align with molecular diagnostic pathology practices not only to provide the type of samples needed for discovery projects but also to ensure requirements for ongoing sample collections and the future needs of researchers are adequately addressed. Biobanks within comprehensive molecular pathology programmes are perfectly positioned to offer more than just tumour derived biospecimens; for example, they have the ability to facilitate researchers gaining access to sample metadata such as digitised scans of tissue samples annotated prior to macrodissection for molecular diagnostics or pseudoanonymised clinical outcome data or research results retrieved from other users utilising the same or overlapping cohorts of samples. Furthermore, biobanks can work with molecular diagnostic laboratories to develop standardized methodologies for the acquisition and storage of samples required for new approaches to research such as ‘liquid biopsies’ which will ultimately feed into the test validations required in large prospective clinical studies in order to implement liquid biopsy approaches for routine clinical practice. We draw on our experience in Northern Ireland to discuss how this harmonised approach of biobanks working synergistically with molecular pathology programmes is key for the future success of precision medicine.