2 resultados para Team Evaluation Models
em Glasgow Theses Service
Resumo:
This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.
Resumo:
This study focuses on the learning and teaching of Reading in English as a Foreign Language (REFL), in Libya. The study draws on an action research process in which I sought to look critically at students and teachers of English as a Foreign Language (EFL) in Libya as they learned and taught REFL in four Libyan research sites. The Libyan EFL educational system is influenced by two main factors: the method of teaching the Holy-Quran and the long-time ban on teaching EFL by the former Libyan regime under Muammar Gaddafi. Both of these factors have affected the learning and teaching of REFL and I outline these contextual factors in the first chapter of the thesis. This investigation, and the exploration of the challenges that Libyan university students encounter in their REFL, is supported by attention to reading models. These models helped to provide an analytical framework and starting point for understanding the many processes involved in reading for meaning and in reading to satisfy teacher instructions. The theoretical framework I adopted was based, mainly and initially, on top-down, bottom-up, interactive and compensatory interactive models. I drew on these models with a view to understanding whether and how the processes of reading described in the models could be applied to the reading of EFL students and whether these models could help me to better understand what was going on in REFL. The diagnosis stage of the study provided initial data collected from four Libyan research sites with research tools including video-recorded classroom observations, semi-structured interviews with teachers before and after lesson observation, and think-aloud protocols (TAPs) with 24 students (six from each university) in which I examined their REFL reading behaviours and strategies. This stage indicated that the majority of students shared behaviours such as reading aloud, reading each word in the text, articulating the phonemes and syllables of words, or skipping words if they could not pronounce them. Overall this first stage indicated that alternative methods of teaching REFL were needed in order to encourage ‘reading for meaning’ that might be based on strategies related to eventual interactive reading models adapted for REFL. The second phase of this research project was an Intervention Phase involving two team-teaching sessions in one of the four stage one universities. In each session, I worked with the teacher of one group to introduce an alternative method of REFL. This method was based on teaching different reading strategies to encourage the students to work towards an eventual interactive way of reading for meaning. A focus group discussion and TAPs followed the lessons with six students in order to discuss the 'new' method. Next were two video-recorded classroom observations which were followed by an audio-recorded discussion with the teacher about these methods. Finally, I conducted a Skype interview with the class teacher at the end of the semester to discuss any changes he had made in his teaching or had observed in his students' reading with respect to reading behaviour strategies, and reactions and performance of the students as he continued to use the 'new' method. The results of the intervention stage indicate that the teacher, perhaps not surprisingly, can play an important role in adding to students’ knowledge and confidence and in improving their REFL strategies. For example, after the intervention stage, students began to think about the title, and to use their own background knowledge to comprehend the text. The students employed, also, linguistic strategies such as decoding and, above all, the students abandoned the behaviour of reading for pronunciation in favour of reading for meaning. Despite the apparent efficacy of the alternative method, there are, inevitably, limitations related to the small-scale nature of the study and the time I had available to conduct the research. There are challenges, too, related to the students’ first language, the idiosyncrasies of the English language, the teacher training and continuing professional development of teachers, and the continuing political instability of Libya. The students’ lack of vocabulary and their difficulties with grammatical functions such as phrasal and prepositional verbs, forms which do not exist in Arabic, mean that REFL will always be challenging. Given such constraints, the ‘new’ methods I trialled and propose for adoption can only go so far in addressing students’ difficulties in REFL. Overall, the study indicates that the Libyan educational system is underdeveloped and under resourced with respect to REFL. My data indicates that the teacher participants have received little to no professional developmental that could help them improve their teaching in REFL and skills in teaching EFL. These circumstances, along with the perennial problem of large but varying class sizes; student, teacher and assessment expectations; and limited and often poor quality resources, affect the way EFL students learn to read in English. Against this background, the thesis concludes by offering tentative conclusions; reflections on the study, including a discussion of its limitations, and possible recommendations designed to improve REFL learning and teaching in Libyan universities.