Biblioteca Digital

998 resultados para SCORE TESTS

Hidden interlocutor misidentification in practical Turing Tests

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Based on insufficient evidence, and inadequate research, Floridi and his students report inaccuracies and draw false conclusions in their Minds and Machines evaluation, which this paper aims to clarify. Acting as invited judges, Floridi et al. participated in nine, of the ninety-six, Turing tests staged in the finals of the 18th Loebner Prize for Artificial Intelligence in October 2008. From the transcripts it appears that they used power over solidarity as an interrogation technique. As a result, they were fooled on several occasions into believing that a machine was a human and that a human was a machine. Worse still, they did not realise their mistake. This resulted in a combined correct identification rate of less than 56%. In their paper they assumed that they had made correct identifications when they in fact had been incorrect.

Cognitive tests used in chronic adult human randomised controlled trial micronutrient and phytochemical intervention studies

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years there has been a rapid growth of interest in exploring the relationship between nutritional therapies and the maintenance of cognitive function in adulthood. Emerging evidence reveals an increasingly complex picture with respect to the benefits of various food constituents on learning, memory and psychomotor function in adults. However, to date, there has been little consensus in human studies on the range of cognitive domains to be tested or the particular tests to be employed. To illustrate the potential difficulties that this poses, we conducted a systematic review of existing human adult randomised controlled trial (RCT) studies that have investigated the effects of 24 d to 36 months of supplementation with flavonoids and micronutrients on cognitive performance. There were thirty-nine studies employing a total of 121 different cognitive tasks that met the criteria for inclusion. Results showed that less than half of these studies reported positive effects of treatment, with some important cognitive domains either under-represented or not explored at all. Although there was some evidence of sensitivity to nutritional supplementation in a number of domains (for example, executive function, spatial working memory), interpretation is currently difficult given the prevailing 'scattergun approach' for selecting cognitive tests. Specifically, the practice means that it is often difficult to distinguish between a boundary condition for a particular nutrient and a lack of task sensitivity. We argue that for significant future progress to be made, researchers need to pay much closer attention to existing human RCT and animal data, as well as to more basic issues surrounding task sensitivity, statistical power and type I error.

Tests on the accuracy of Basel II

Relevância:

20.00% 20.00%

Publicador:

The unwritten score of corporate governance. Some ontological foundations for a study of the tacit rules of boards of directors

Relevância:

20.00% 20.00%

Publicador:

Reduction of invasive tests in chagasic patients with a modified self-organizing map

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The applicability of AI methods to the Chagas' disease diagnosis is carried out by the use of Kohonen's self-organizing feature maps. Electrodiagnosis indicators calculated from ECG records are used as features in input vectors to train the network. Cross-validation results are used to modify the maps, providing an outstanding improvement to the interpretation of the resulting output. As a result, the map might be used to reduce the need for invasive explorations in chronic Chagas' disease.

Deception-detection and machine intelligence in practical Turing tests

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Deception-detection is the crux of Turing’s experiment to examine machine thinking conveyed through a capacity to respond with sustained and satisfactory answers to unrestricted questions put by a human interrogator. However, in 60 years to the month since the publication of Computing Machinery and Intelligence little agreement exists for a canonical format for Turing’s textual game of imitation, deception and machine intelligence. This research raises from the trapped mine of philosophical claims, counter-claims and rebuttals Turing’s own distinct five minutes question-answer imitation game, which he envisioned practicalised in two different ways: a) A two-participant, interrogator-witness viva voce, b) A three-participant, comparison of a machine with a human both questioned simultaneously by a human interrogator. Using Loebner’s 18th Prize for Artificial Intelligence contest, and Colby et al.’s 1972 transcript analysis paradigm, this research practicalised Turing’s imitation game with over 400 human participants and 13 machines across three original experiments. Results show that, at the current state of technology, a deception rate of 8.33% was achieved by machines in 60 human-machine simultaneous comparison tests. Results also show more than 1 in 3 Reviewers succumbed to hidden interlocutor misidentification after reading transcripts from experiment 2. Deception-detection is essential to uncover the increasing number of malfeasant programmes, such as CyberLover, developed to steal identity and financially defraud users in chatrooms across the Internet. Practicalising Turing’s two tests can assist in understanding natural dialogue and mitigate the risk from cybercrime.

A method for the economic valuation of animal welfare benefits using a single welfare score

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Unless the benefits to society of measures to protect and improve the welfare of animals are made transparent by means of their valuation they are likely to go unrecognised and cannot easily be weighed against the costs of such measures as required, for example, by policy-makers. A simple single measure scoring system, based on the Welfare Quality® index, is used, together with a choice experiment economic valuation method, to estimate the value that people place on improvements to the welfare of different farm animal species measured on a continuous (0-100) scale. Results from using the method on a survey sample of some 300 people show that it is able to elicit apparently credible values. The survey found that 96% of respondents thought that we have a moral obligation to safeguard the welfare of animals and that over 72% were concerned about the way farm animals are treated. Estimated mean annual willingness to pay for meat from animals with improved welfare of just one point on the scale was £5.24 for beef cattle, £4.57 for pigs and £5.10 for meat chickens. Further development of the method is required to capture the total economic value of animal welfare benefits. Despite this, the method is considered a practical means for obtaining economic values that can be used in the cost-benefit appraisal of policy measures intended to improve the welfare of animals.

A reappraisal of blood clotting response tests for anticoagulant resistance and a proposal for a standardised BCR test methodology

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Blood clotting response (BCR) resistance tests are available for a number of anticoagulant rodenticides. However, during the development of these tests many of the test parameters have been changed, making meaningful comparisons between results difficult. It was recognised that a standard methodology was urgently required for future BCR resistance tests and, accordingly, this document presents a reappraisal of published tests, and proposes a standard protocol for future use (see Appendix). The protocol can be used to provide information on the incidence and degree of resistance in a particular rodent population; to provide a simple comparison of resistance factors between active ingredients, thus giving clear information about cross-resistance for any given strain; and to provide comparisons of susceptibility or resistance between different populations. The methodology has a sound statistical basis in being based on the ED50 response, and requires many fewer animals than the resistance tests in current use. Most importantly, tests can be used to give a clear indication of the likely practical impact of the resistance on field efficacy. The present study was commissioned and funded by the Rodenticide Resistance Action Committee (RRAC) of CropLife International.

Estimating reliability and resolution of probability forecasts through decomposition of the empirical score

Relevância:

20.00% 20.00%

Publicador:

Resumo:

References (20)Cited By (1)Export CitationAboutAbstract Proper scoring rules provide a useful means to evaluate probabilistic forecasts. Independent from scoring rules, it has been argued that reliability and resolution are desirable forecast attributes. The mathematical expectation value of the score allows for a decomposition into reliability and resolution related terms, demonstrating a relationship between scoring rules and reliability/resolution. A similar decomposition holds for the empirical (i.e. sample average) score over an archive of forecast–observation pairs. This empirical decomposition though provides a too optimistic estimate of the potential score (i.e. the optimum score which could be obtained through recalibration), showing that a forecast assessment based solely on the empirical resolution and reliability terms will be misleading. The differences between the theoretical and empirical decomposition are investigated, and specific recommendations are given how to obtain better estimators of reliability and resolution in the case of the Brier and Ignorance scoring rule.

Evaluating raw ensembles with the continuous ranked probability score

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The continuous ranked probability score (CRPS) is a frequently used scoring rule. In contrast with many other scoring rules, the CRPS evaluates cumulative distribution functions. An ensemble of forecasts can easily be converted into a piecewise constant cumulative distribution function with steps at the ensemble members. This renders the CRPS a convenient scoring rule for the evaluation of ‘raw’ ensembles, obviating the need for sophisticated ensemble model output statistics or dressing methods prior to evaluation. In this article, a relation between the CRPS score and the quantile score is established. The evaluation of ‘raw’ ensembles using the CRPS is discussed in this light. It is shown that latent in this evaluation is an interpretation of the ensemble as quantiles but with non-uniform levels. This needs to be taken into account if the ensemble is evaluated further, for example with rank histograms.

Reviewing the sustainability/stationarity of current account imbalances with tests for bounded integration

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate for 26 OECD economies whether their current account imbalances to GDP are driven by stochastic trends. Regarding bounded stationarity as the more natural counterpart of sustainability, results from Phillips–Perron tests for unit root and bounded unit root processes are contrasted. While the former hint at stationarity of current account imbalances for 12 economies, the latter indicate bounded stationarity for only six economies. Through panel-based test statistics, current account imbalances are diagnosed as bounded non-stationary. Thus, (spurious) rejections of the unit root hypothesis might be due to the existence of bounds reflecting hidden policy controls or financial crises.

Some implications of a sample of practical Turing tests

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A series of imitation games involving 3-participant (simultaneous comparison of two hidden entities) and 2-participant (direct interrogation of a hidden entity) were conducted at Bletchley Park on the 100th anniversary of Alan Turing’s birth: 23 June 2012. From the ongoing analysis of over 150 games involving (expert and non-expert, males and females, adults and child) judges, machines and hidden humans (foils for the machines), we present six particular conversations that took place between human judges and a hidden entity that produced unexpected results. From this sample we focus on features of Turing’s machine intelligence test that the mathematician/code breaker did not consider in his examination for machine thinking: the subjective nature of attributing intelligence to another mind.

Forecast encompassing tests and probability forecasts

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider tests of forecast encompassing for probability forecasts, for both quadratic and logarithmic scoring rules. We propose test statistics for the null of forecast encompassing, present the limiting distributions of the test statistics, and investigate the impact of estimating the forecasting models' parameters on these distributions. The small-sample performance is investigated, in terms of small numbers of forecasts and model estimation sample sizes. We show the usefulness of the tests for the evaluation of recession probability forecasts from logit models with different leading indicators as explanatory variables, and for evaluating survey-based probability forecasts.

'Score it upon my taille': the use (and abuse) of tallies by the medieval Exchequer

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper traces the evolution of the tally from a receipt for cash payments into the treasury, to proof of payments made by royal officials outside of the treasury and finally to an assignment of revenue to be paid out by royal officials. Each of these processes is illustrated by examples drawn from the Exchequer records and explains their significance for royal finance and for historians working on the Exchequer records.

On the transfer of prior tests or study events to subsequent study

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tests, as learning events, are often more effective than are additional study opportunities, especially when recall is tested after a long retention interval. To what degree, though, do prior test or study events support subsequent study activities? We set out to test an implication of Bjork and Bjork’s (1992) new theory of disuse—that, under some circumstances, prior study may facilitate subsequent study more than does prior testing. Participants learned English–Swahili translations and then underwent a practice phase during which some items were tested (without feedback) and other items were restudied. Although tested items were better recalled after a 1-week delay than were restudied items, this benefit did not persist after participants had the opportunity to study the items again via feedback. In fact, after this additional study opportunity, items that had been restudied earlier were better recalled than were items that had been tested earlier. These results suggest that measuring the memorial consequences of testing requires more than a single test of retention and, theoretically, a consideration of the differing status of initially recallable and nonrecallable items.

«
1
2
...
59
60
61
62
63
64
65
66
67
»