2 resultados para Semi-structured surveys

em Glasgow Theses Service


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Undoubtedly, statistics has become one of the most important subjects in the modern world, where its applications are ubiquitous. The importance of statistics is not limited to statisticians, but also impacts upon non-statisticians who have to use statistics within their own disciplines. Several studies have indicated that most of the academic departments around the world have realized the importance of statistics to non-specialist students. Therefore, the number of students enrolled in statistics courses has vastly increased, coming from a variety of disciplines. Consequently, research within the scope of statistics education has been able to develop throughout the last few years. One important issue is how statistics is best taught to, and learned by, non-specialist students. This issue is controlled by several factors that affect the learning and teaching of statistics to non-specialist students, such as the use of technology, the role of the English language (especially for those whose first language is not English), the effectiveness of statistics teachers and their approach towards teaching statistics courses, students’ motivation to learn statistics and the relevance of statistics courses to the main subjects of non-specialist students. Several studies, focused on aspects of learning and teaching statistics, have been conducted in different countries around the world, particularly in Western countries. Conversely, the situation in Arab countries, especially in Saudi Arabia, is different; here, there is very little research in this scope, and what there is does not meet the needs of those countries towards the development of learning and teaching statistics to non-specialist students. This research was instituted in order to develop the field of statistics education. The purpose of this mixed methods study was to generate new insights into this subject by investigating how statistics courses are currently taught to non-specialist students in Saudi universities. Hence, this study will contribute towards filling the knowledge gap that exists in Saudi Arabia. This study used multiple data collection approaches, including questionnaire surveys from 1053 non-specialist students who had completed at least one statistics course in different colleges of the universities in Saudi Arabia. These surveys were followed up with qualitative data collected via semi-structured interviews with 16 teachers of statistics from colleges within all six universities where statistics is taught to non-specialist students in Saudi Arabia’s Eastern Region. The data from questionnaires included several types, so different techniques were used in analysis. Descriptive statistics were used to identify the demographic characteristics of the participants. The chi-square test was used to determine associations between variables. Based on the main issues that are raised from literature review, the questions (items scales) were grouped and five key groups of questions were obtained which are: 1) Effectiveness of Teachers; 2) English Language; 3) Relevance of Course; 4) Student Engagement; 5) Using Technology. Exploratory data analysis was used to explore these issues in more detail. Furthermore, with the existence of clustering in the data (students within departments within colleges, within universities), multilevel generalized linear models for dichotomous analysis have been used to clarify the effects of clustering at those levels. Factor analysis was conducted confirming the dimension reduction of variables (items scales). The data from teachers’ interviews were analysed on an individual basis. The responses were assigned to one of the eight themes that emerged from within the data: 1) the lack of students’ motivation to learn statistics; 2) students' participation; 3) students’ assessment; 4) the effective use of technology; 5) the level of previous mathematical and statistical skills of non-specialist students; 6) the English language ability of non-specialist students; 7) the need for extra time for teaching and learning statistics; and 8) the role of administrators. All the data from students and teachers indicated that the situation of learning and teaching statistics to non-specialist students in Saudi universities needs to be improved in order to meet the needs of those students. The findings of this study suggested a weakness in the use of statistical software applications in these courses. This study showed that there is lack of application of technology such as statistical software programs in these courses, which would allow non-specialist students to consolidate their knowledge. The results also indicated that English language is considered one of the main challenges in learning and teaching statistics, particularly in institutions where English is not used as the main language. Moreover, the weakness of mathematical skills of students is considered another major challenge. Additionally, the results indicated that there was a need to tailor statistics courses to the needs of non-specialist students based on their main subjects. The findings indicate that statistics teachers need to choose appropriate methods when teaching statistics courses.