3 resultados para hidden semi markov models
em Glasgow Theses Service
Resumo:
Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.
Resumo:
This study focuses on the learning and teaching of Reading in English as a Foreign Language (REFL), in Libya. The study draws on an action research process in which I sought to look critically at students and teachers of English as a Foreign Language (EFL) in Libya as they learned and taught REFL in four Libyan research sites. The Libyan EFL educational system is influenced by two main factors: the method of teaching the Holy-Quran and the long-time ban on teaching EFL by the former Libyan regime under Muammar Gaddafi. Both of these factors have affected the learning and teaching of REFL and I outline these contextual factors in the first chapter of the thesis. This investigation, and the exploration of the challenges that Libyan university students encounter in their REFL, is supported by attention to reading models. These models helped to provide an analytical framework and starting point for understanding the many processes involved in reading for meaning and in reading to satisfy teacher instructions. The theoretical framework I adopted was based, mainly and initially, on top-down, bottom-up, interactive and compensatory interactive models. I drew on these models with a view to understanding whether and how the processes of reading described in the models could be applied to the reading of EFL students and whether these models could help me to better understand what was going on in REFL. The diagnosis stage of the study provided initial data collected from four Libyan research sites with research tools including video-recorded classroom observations, semi-structured interviews with teachers before and after lesson observation, and think-aloud protocols (TAPs) with 24 students (six from each university) in which I examined their REFL reading behaviours and strategies. This stage indicated that the majority of students shared behaviours such as reading aloud, reading each word in the text, articulating the phonemes and syllables of words, or skipping words if they could not pronounce them. Overall this first stage indicated that alternative methods of teaching REFL were needed in order to encourage ‘reading for meaning’ that might be based on strategies related to eventual interactive reading models adapted for REFL. The second phase of this research project was an Intervention Phase involving two team-teaching sessions in one of the four stage one universities. In each session, I worked with the teacher of one group to introduce an alternative method of REFL. This method was based on teaching different reading strategies to encourage the students to work towards an eventual interactive way of reading for meaning. A focus group discussion and TAPs followed the lessons with six students in order to discuss the 'new' method. Next were two video-recorded classroom observations which were followed by an audio-recorded discussion with the teacher about these methods. Finally, I conducted a Skype interview with the class teacher at the end of the semester to discuss any changes he had made in his teaching or had observed in his students' reading with respect to reading behaviour strategies, and reactions and performance of the students as he continued to use the 'new' method. The results of the intervention stage indicate that the teacher, perhaps not surprisingly, can play an important role in adding to students’ knowledge and confidence and in improving their REFL strategies. For example, after the intervention stage, students began to think about the title, and to use their own background knowledge to comprehend the text. The students employed, also, linguistic strategies such as decoding and, above all, the students abandoned the behaviour of reading for pronunciation in favour of reading for meaning. Despite the apparent efficacy of the alternative method, there are, inevitably, limitations related to the small-scale nature of the study and the time I had available to conduct the research. There are challenges, too, related to the students’ first language, the idiosyncrasies of the English language, the teacher training and continuing professional development of teachers, and the continuing political instability of Libya. The students’ lack of vocabulary and their difficulties with grammatical functions such as phrasal and prepositional verbs, forms which do not exist in Arabic, mean that REFL will always be challenging. Given such constraints, the ‘new’ methods I trialled and propose for adoption can only go so far in addressing students’ difficulties in REFL. Overall, the study indicates that the Libyan educational system is underdeveloped and under resourced with respect to REFL. My data indicates that the teacher participants have received little to no professional developmental that could help them improve their teaching in REFL and skills in teaching EFL. These circumstances, along with the perennial problem of large but varying class sizes; student, teacher and assessment expectations; and limited and often poor quality resources, affect the way EFL students learn to read in English. Against this background, the thesis concludes by offering tentative conclusions; reflections on the study, including a discussion of its limitations, and possible recommendations designed to improve REFL learning and teaching in Libyan universities.
Resumo:
Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.