3 resultados para data analysis: algorithms and implementation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

There has been an increasing interest in the development of new methods using Pareto optimality to deal with multi-objective criteria (for example, accuracy and time complexity). Once one has developed an approach to a problem of interest, the problem is then how to compare it with the state of art. In machine learning, algorithms are typically evaluated by comparing their performance on different data sets by means of statistical tests. Standard tests used for this purpose are able to consider jointly neither performance measures nor multiple competitors at once. The aim of this paper is to resolve these issues by developing statistical procedures that are able to account for multiple competing measures at the same time and to compare multiple algorithms altogether. In particular, we develop two tests: a frequentist procedure based on the generalized likelihood-ratio test and a Bayesian procedure based on a multinomial-Dirichlet conjugate model. We further extend them by discovering conditional independences among measures to reduce the number of parameters of such models, as usually the number of studied cases is very reduced in such comparisons. Data from a comparison among general purpose classifiers is used to show a practical application of our tests.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives This paper describes the methods used in the International Cancer Benchmarking Partnership Module 4 Survey (ICBPM4) which examines time intervals and routes to cancer diagnosis in 10 jurisdictions. We present the study design with defining and measuring time intervals, identifying patients with cancer, questionnaire development, data management and analyses.
Design and setting Recruitment of participants to the ICBPM4 survey is based on cancer registries in each jurisdiction. Questionnaires draw on previous instruments and have been through a process of cognitive testing and piloting in three jurisdictions followed by standardised translation and adaptation. Data analysis focuses on comparing differences in time intervals and routes to diagnosis in the jurisdictions.
Participants Our target is 200 patients with symptomatic breast, lung, colorectal and ovarian cancer in each jurisdiction. Patients are approached directly or via their primary care physician (PCP). Patients’ PCPs and cancer treatment specialists (CTSs) are surveyed, anddata rules’ are applied to combine and reconcile conflicting information. Where CTS information is unavailable, audit information is sought from treatment records and databases.
Main outcomes Reliability testing of the patient questionnaire showed that agreement was complete (κ=1) in four items and substantial (κ=0.8, 95% CI 0.333 to 1) in one item. The identification of eligible patients is sufficient to meet the targets for breast, lung and colorectal cancer. Initial patient and PCP survey response rates from the UK and Sweden are comparable with similar published surveys. Data collection was completed in early 2016 for all cancer types.
Conclusion An international questionnaire-based survey of patients with cancer, PCPs and CTSs has been developed and launched in 10 jurisdictions. ICBPM4 will help to further understand international differences in cancer survival by comparing time intervals and routes to cancer diagnosis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is part of a special issue of Applied Geochemistry focusing on reliable applications of compositional multivariate statistical methods. This study outlines the application of compositional data analysis (CoDa) to calibration of geochemical data and multivariate statistical modelling of geochemistry and grain-size data from a set of Holocene sedimentary cores from the Ganges-Brahmaputra (G-B) delta. Over the last two decades, understanding near-continuous records of sedimentary sequences has required the use of core-scanning X-ray fluorescence (XRF) spectrometry, for both terrestrial and marine sedimentary sequences. Initial XRF data are generally unusable in ‘raw-format’, requiring data processing in order to remove instrument bias, as well as informed sequence interpretation. The applicability of these conventional calibration equations to core-scanning XRF data are further limited by the constraints posed by unknown measurement geometry and specimen homogeneity, as well as matrix effects. Log-ratio based calibration schemes have been developed and applied to clastic sedimentary sequences focusing mainly on energy dispersive-XRF (ED-XRF) core-scanning. This study has applied high resolution core-scanning XRF to Holocene sedimentary sequences from the tidal-dominated Indian Sundarbans, (Ganges-Brahmaputra delta plain). The Log-Ratio Calibration Equation (LRCE) was applied to a sub-set of core-scan and conventional ED-XRF data to quantify elemental composition. This provides a robust calibration scheme using reduced major axis regression of log-ratio transformed geochemical data. Through partial least squares (PLS) modelling of geochemical and grain-size data, it is possible to derive robust proxy information for the Sundarbans depositional environment. The application of these techniques to Holocene sedimentary data offers an improved methodological framework for unravelling Holocene sedimentation patterns.