2 resultados para Human-computer interaction -- Evaluation
em Glasgow Theses Service
Resumo:
Technologies such as automobiles or mobile phones allow us to perform beyond our physical capabilities and travel faster or communicate over long distances. Technologies such as computers and calculators can also help us perform beyond our mental capabilities by storing and manipulating information that we would be unable to process or remember. In recent years there has been a growing interest in assistive technology for cognition (ATC) which can help people compensate for cognitive impairments. The aim of this thesis was to investigate ATC for memory to help people with memory difficulties which impacts independent functioning during everyday life. Chapter one argues that using both neuropsychological and human computing interaction theory and approaches is crucial when developing and researching ATC. Chapter two describes a systematic review and meta-analysis of studies which tested technology to aid memory for groups with ABI, stroke or degenerative disease. Good evidence was found supporting the efficacy of prompting devices which remind the user about a future intention at a set time. Chapter three looks at the prevalence of technologies and memory aids in current use by people with ABI and dementia and the factors that predicted this use. Pre-morbid use of technology, current use of non-tech aids and strategies and age (ABI group only) were the best predictors of this use. Based on the results, chapter four focuses on mobile phone based reminders for people with ABI. Focus groups were held with people with memory impairments after ABI and ABI caregivers (N=12) which discussed the barriers to uptake of mobile phone based reminding. Thematic analysis revealed six key themes that impact uptake of reminder apps; Perceived Need, Social Acceptability, Experience/Expectation, Desired Content and Functions, Cognitive Accessibility and Sensory/Motor Accessibility. The Perceived need theme described the difficulties with insight, motivation and memory which can prevent people from initially setting reminders on a smartphone. Chapter five investigates the efficacy and acceptability of unsolicited prompts (UPs) from a smartphone app (ForgetMeNot) to encourage people with ABI to set reminders. A single-case experimental design study evaluated use of the app over four weeks by three people with severe ABI living in a post-acute rehabilitation hospital. When six UPs were presented through the day from ForgetMeNot, daily reminder-setting and daily memory task completion increased compared to when using the app without the UPs. Chapter six investigates another barrier from chapter 4 – cognitive and sensory accessibility. A study is reported which shows that an app with ‘decision tree’ interface design (ApplTree) leads to more accurate reminder setting performance with no compromise of speed or independence (amount of guidance required) for people with ABI (n=14) compared to a calendar based interface. Chapter seven investigates the efficacy of a wearable reminding device (smartwatch) as a tool for delivering reminders set on a smartphone. Four community dwelling participants with memory difficulties following ABI were included in an ABA single case experimental design study. Three of the participants successfully used the smartwatch throughout the intervention weeks and these participants gave positive usability ratings. Two participants showed improved memory performance when using the smartwatch and all participants had marked decline in memory performance when the technology was removed. Chapter eight is a discussion which highlights the implications of these results for clinicians, researchers and designers.
Resumo:
This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.