19 resultados para statistical learning mechanisms


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of similar to 32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 x 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently there has been an increasing interest in the development of new methods using Pareto optimality to deal with multi-objective criteria (for example, accuracy and architectural complexity). Once one has learned a model based on their devised method, the problem is then how to compare it with the state of art. In machine learning, algorithms are typically evaluated by comparing their performance on different data sets by means of statistical tests. Unfortunately, the standard tests used for this purpose are not able to jointly consider performance measures. The aim of this paper is to resolve this issue by developing statistical procedures that are able to account for multiple competing measures at the same time. In particular, we develop two tests: a frequentist procedure based on the generalized likelihood-ratio test and a Bayesian procedure based on a multinomial-Dirichlet conjugate model. We further extend them by discovering conditional independences among measures to reduce the number of parameter of such models, as usually the number of studied cases is very reduced in such comparisons. Real data from a comparison among general purpose classifiers is used to show a practical application of our tests.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have used whole exome sequencing to compare a group of presentation t(4;14) with t(11;14) cases of myeloma to define the mutational landscape. Each case was characterized by a median of 24.5 exonic nonsynonymous single-nucleotide variations, and there was a consistently higher number of mutations in the t(4;14) group, but this number did not reach statistical significance. We show that the transition and transversion rates in the 2 subgroups are similar, suggesting that there was no specific mechanism leading to mutation differentiating the 2 groups. Only 3% of mutations were seen in both groups, and recurrently mutated genes include NRAS, KRAS, BRAF, and DIS3 as well as DNAH5, a member of the axonemal dynein family. The pattern of mutation in each group was distinct, with the t(4;14) group being characterized by deregulation of chromatin organization, actin filament, and microfilament movement. Recurrent RAS pathway mutations identified subclonal heterogeneity at a mutational level in both groups, with mutations being present as either dominant or minor subclones. The presence of subclonal diversity was confirmed at a single-cell level using other tumor-acquired mutations. These results are consistent with a distinct molecular pathogenesis underlying each subgroup and have important impacts on targeted treatment strategies. The Medical Research Council Myeloma IX trial is registered under ISRCTN68454111.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study examines whether virtual reality (VR) is more superior to paper-based instructions in increasing the speed at which individuals learn a new assembly task. Specifically, the work seeks to quantify any learning benefits when individuals have been given the opportunity and compares the performance of two groups using virtual and hardcopy media types to pre-learn the task. A build experiment based on multiple builds of an aircraft panel showed that a group of people who pre-learned the assembly task using a VR environment completed their builds faster (average build time 29.5% lower). The VR group also made fewer references to instructional materials (average number of references 38% lower) and made fewer errors than a group using more traditional, hard copy instructions. These outcomes were more pronounced during build one with differences in build time and number of references showing limited statistical differences.