959 resultados para distributed learning
Resumo:
Distributed Computing frameworks belong to a class of programming models that allow developers to
launch workloads on large clusters of machines. Due to the dramatic increase in the volume of
data gathered by ubiquitous computing devices, data analytic workloads have become a common
case among distributed computing applications, making Data Science an entire field of
Computer Science. We argue that Data Scientist's concern lays in three main components: a dataset,
a sequence of operations they wish to apply on this dataset, and some constraint they may have
related to their work (performances, QoS, budget, etc). However, it is actually extremely
difficult, without domain expertise, to perform data science. One need to select the right amount
and type of resources, pick up a framework, and configure it. Also, users are often running their
application in shared environments, ruled by schedulers expecting them to specify precisely their resource
needs. Inherent to the distributed and concurrent nature of the cited frameworks, monitoring and
profiling are hard, high dimensional problems that block users from making the right
configuration choices and determining the right amount of resources they need. Paradoxically, the
system is gathering a large amount of monitoring data at runtime, which remains unused.
In the ideal abstraction we envision for data scientists, the system is adaptive, able to exploit
monitoring data to learn about workloads, and process user requests into a tailored execution
context. In this work, we study different techniques that have been used to make steps toward
such system awareness, and explore a new way to do so by implementing machine learning
techniques to recommend a specific subset of system configurations for Apache Spark applications.
Furthermore, we present an in depth study of Apache Spark executors configuration, which highlight
the complexity in choosing the best one for a given workload.
Resumo:
Background: Healthcare worldwide needs translation of basic ideas from engineering into the clinic. Consequently, there is increasing demand for graduates equipped with the knowledge and skills to apply interdisciplinary medicine/engineering approaches to the development of novel solutions for healthcare. The literature provides little guidance regarding barriers to, and facilitators of, effective interdisciplinary learning for engineering and medical students in a team-based project context. Methods: A quantitative survey was distributed to engineering and medical students and staff in two universities, one in Ireland and one in Belgium, to chart knowledge and practice in interdisciplinary learning and teaching, and of the teaching of innovation. Results: We report important differences for staff and students between the disciplines regarding attitudes towards, and perceptions of, the relevance of interdisciplinary learning opportunities, and the role of creativity and innovation. There was agreement across groups concerning preferred learning, instructional styles, and module content. Medical students showed greater resistance to the use of structured creativity tools and interdisciplinary teams. Conclusions: The results of this international survey will help to define the optimal learning conditions under which undergraduate engineering and medicine students can learn to consider the diverse factors which determine the success or failure of a healthcare engineering solution.
Resumo:
Intelligent Tutoring Systems (ITSs) are computerized systems for learning-by-doing. These systems provide students with immediate and customized feedback on learning tasks. An ITS typically consists of several modules that are connected to each other. This research focuses on the distribution of the ITS module that provides expert knowledge services. For the distribution of such an expert knowledge module we need to use an architectural style because this gives a standard interface, which increases the reusability and operability of the expert knowledge module. To provide expert knowledge modules in a distributed way we need to answer the research question: ‘How can we compare and evaluate REST, Web services and Plug-in architectural styles for the distribution of the expert knowledge module in an intelligent tutoring system?’. We present an assessment method for selecting an architectural style. Using the assessment method on three architectural styles, we selected the REST architectural style as the style that best supports the distribution of expert knowledge modules. With this assessment method we also analyzed the trade-offs that come with selecting REST. We present a prototype and architectural views based on REST to demonstrate that the assessment method correctly scores REST as an appropriate architectural style for the distribution of expert knowledge modules.
Resumo:
Our key contribution is a flexible, automated marking system that adds desirable functionality to existing E-Assessment systems. In our approach, any given E-Assessment system is relegated to a data-collection mechanism, whereas marking and the generation and distribution of personalised per-student feedback is handled separately by our own system. This allows content-rich Microsoft Word feedback documents to be generated and distributed to every student simultaneously according to a per-assessment schedule.
The feedback is adaptive in that it corresponds to the answers given by the student and provides guidance on where they may have gone wrong. It is not limited to simple multiple choice which are the most prescriptive question type offered by most E-Assessment Systems and as such most straightforward to mark consistently and provide individual per-alternative feedback strings. It is also better equipped to handle the use of mathematical symbols and images within the feedback documents which is more flexible than existing E-Assessment systems, which can only handle simple text strings.
As well as MCQs the system reliably and robustly handles Multiple Response, Text Matching and Numeric style questions in a more flexible manner than Questionmark: Perception and other E-Assessment Systems. It can also reliably handle multi-part questions where the response to an earlier question influences the answer to a later one and can adjust both scoring and feedback appropriately.
New question formats can be added at any time provided a corresponding marking method conforming to certain templates can also be programmed. Indeed, any question type for which a programmatic method of marking can be devised may be supported by our system. Furthermore, since the student’s response to each is question is marked programmatically, our system can be set to allow for minor deviations from the correct answer, and if appropriate award partial marks.
Resumo:
We propose three research problems to explore the relations between trust and security in the setting of distributed computation. In the first problem, we study trust-based adversary detection in distributed consensus computation. The adversaries we consider behave arbitrarily disobeying the consensus protocol. We propose a trust-based consensus algorithm with local and global trust evaluations. The algorithm can be abstracted using a two-layer structure with the top layer running a trust-based consensus algorithm and the bottom layer as a subroutine executing a global trust update scheme. We utilize a set of pre-trusted nodes, headers, to propagate local trust opinions throughout the network. This two-layer framework is flexible in that it can be easily extensible to contain more complicated decision rules, and global trust schemes. The first problem assumes that normal nodes are homogeneous, i.e. it is guaranteed that a normal node always behaves as it is programmed. In the second and third problems however, we assume that nodes are heterogeneous, i.e, given a task, the probability that a node generates a correct answer varies from node to node. The adversaries considered in these two problems are workers from the open crowd who are either investing little efforts in the tasks assigned to them or intentionally give wrong answers to questions. In the second part of the thesis, we consider a typical crowdsourcing task that aggregates input from multiple workers as a problem in information fusion. To cope with the issue of noisy and sometimes malicious input from workers, trust is used to model workers' expertise. In a multi-domain knowledge learning task, however, using scalar-valued trust to model a worker's performance is not sufficient to reflect the worker's trustworthiness in each of the domains. To address this issue, we propose a probabilistic model to jointly infer multi-dimensional trust of workers, multi-domain properties of questions, and true labels of questions. Our model is very flexible and extensible to incorporate metadata associated with questions. To show that, we further propose two extended models, one of which handles input tasks with real-valued features and the other handles tasks with text features by incorporating topic models. Our models can effectively recover trust vectors of workers, which can be very useful in task assignment adaptive to workers' trust in the future. These results can be applied for fusion of information from multiple data sources like sensors, human input, machine learning results, or a hybrid of them. In the second subproblem, we address crowdsourcing with adversaries under logical constraints. We observe that questions are often not independent in real life applications. Instead, there are logical relations between them. Similarly, workers that provide answers are not independent of each other either. Answers given by workers with similar attributes tend to be correlated. Therefore, we propose a novel unified graphical model consisting of two layers. The top layer encodes domain knowledge which allows users to express logical relations using first-order logic rules and the bottom layer encodes a traditional crowdsourcing graphical model. Our model can be seen as a generalized probabilistic soft logic framework that encodes both logical relations and probabilistic dependencies. To solve the collective inference problem efficiently, we have devised a scalable joint inference algorithm based on the alternating direction method of multipliers. The third part of the thesis considers the problem of optimal assignment under budget constraints when workers are unreliable and sometimes malicious. In a real crowdsourcing market, each answer obtained from a worker incurs cost. The cost is associated with both the level of trustworthiness of workers and the difficulty of tasks. Typically, access to expert-level (more trustworthy) workers is more expensive than to average crowd and completion of a challenging task is more costly than a click-away question. In this problem, we address the problem of optimal assignment of heterogeneous tasks to workers of varying trust levels with budget constraints. Specifically, we design a trust-aware task allocation algorithm that takes as inputs the estimated trust of workers and pre-set budget, and outputs the optimal assignment of tasks to workers. We derive the bound of total error probability that relates to budget, trustworthiness of crowds, and costs of obtaining labels from crowds naturally. Higher budget, more trustworthy crowds, and less costly jobs result in a lower theoretical bound. Our allocation scheme does not depend on the specific design of the trust evaluation component. Therefore, it can be combined with generic trust evaluation algorithms.
Resumo:
In this thesis, I studied self-efficacy in the learning of English and Swedish in Finland. The theory of self-efficacy, which was created by Albert Bandura, suggests that the beliefs a person has of his or her capabilities in a certain task affect the person’s performance in the task. My aim was to study whether there are differences in self-efficacy beliefs between the learners of English and Swedish, and whether these beliefs correlate with the performance in the language in question. My hypotheses were that the learners of English have higher self-efficacy beliefs than the learners of Swedish and that self-efficacy beliefs correlate with language performance. The study was quantitative, and it consisted of a self-efficacy questionnaire and a language test which were distributed to students of English and Swedish in an upper secondary school in Rovaniemi. The study was answered by 137 students, of whom 93 were learners of English and 44 were learners of Swedish. The results indicated that the learners of English had a higher sense of efficacy than the learners of Swedish. The analysis proved that there was a significant correlation between English students’ self-efficacy and their performance in the language measured by the test and the grades. In addition, a significant correlation existed between Swedish students’ self-efficacy and their grades. However, there was no correlation between the Swedish students’ self-efficacy and their test results. The difference in the self-efficacy beliefs of the two language groups indicates that people in Finland are more confident in using English than Swedish, which also implies that English is more valued in Finnish society than Swedish. It is important to acknowledge the lower self-efficacy beliefs in Swedish because various studies have proven that self-efficacy affects academic achievement. As a suggestion for further research, the self-efficacy beliefs of different language groups could be compared in a qualitative study in order to understand the development of self-efficacy more profoundly.
Resumo:
The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity. In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing. In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch.
Resumo:
Discovery of microRNAs (miRNAs) relies on predictive models for characteristic features from miRNA precursors (pre-miRNAs). The short length of miRNA genes and the lack of pronounced sequence features complicate this task. To accommodate the peculiarities of plant and animal miRNAs systems, tools for both systems have evolved differently. However, these tools are biased towards the species for which they were primarily developed and, consequently, their predictive performance on data sets from other species of the same kingdom might be lower. While these biases are intrinsic to the species, their characterization can lead to computational approaches capable of diminishing their negative effect on the accuracy of pre-miRNAs predictive models. We investigate in this study how 45 predictive models induced for data sets from 45 species, distributed in eight subphyla/classes, perform when applied to a species different from the species used in its induction. Results: Our computational experiments show that the separability of pre-miRNAs and pseudo pre-miRNAs instances is species-dependent and no feature set performs well for all species, even within the same subphylum/class. Mitigating this species dependency, we show that an ensemble of classifiers reduced the classification errors for all 45 species. As the ensemble members were obtained using meaningful, and yet computationally viable feature sets, the ensembles also have a lower computational cost than individual classifiers that rely on energy stability parameters, which are of prohibitive computational cost in large scale applications. Conclusion: In this study, the combination of multiple pre-miRNAs feature sets and multiple learning biases enhanced the predictive accuracy of pre-miRNAs classifiers of 45 species. This is certainly a promising approach to be incorporated in miRNA discovery tools towards more accurate and less species-dependent tools.
Resumo:
Objective: to identify aspects of improvement of the quality of the teaching-learning process through the analysis of tools that evaluated the acquisition of skills by undergraduate students of Nursing. Method: prospective longitudinal study conducted in a population of 60 second-year Nursing students based on registration data, from which quality indicators that evaluate the acquisition of skills were obtained, with descriptive and inferential analysis. Results: nine items were identified and nine learning activities included in the assessment tools that did not reach the established quality indicators (p<0.05). There are statistically significant differences depending on the hospital and clinical practices unit (p<0.05). Conclusion: the analysis of the evaluation tools used in the article "Nursing Care in Welfare Processes" of the analyzed university undergraduate course enabled the detection of the areas for improvement in the teaching-learning process. The challenge of education in nursing is to reach the best clinical research and educational results, in order to provide improvements to the quality of education and health care.
Resumo:
This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.
Resumo:
This study examined whether instrumental and normative learning contexts differentially influence 4- to 7-year-old children’s social learning strategies; specifically, their dispositions to copy an expert versus a majority consensus. Experiment 1 (N = 44) established that children copied a relatively competent “expert” individual over an incompetent individual in both kinds of learning context. In experiment 2 (N = 80) we then tested whether children would copy a competent individual versus a majority, in each of the two different learning contexts. Results showed that individual children differed in strategy, preferring with significant consistency across two different test trials to copy either the competent individual or the majority. This study is the first to show that children prefer to copy more competent individuals when shown competing methods of achieving an instrumental goal (Experiment 1) and provides new evidence that children, at least in our “individualist” culture, may consistently express either a competency or majority bias in learning both instrumental and normative information (Experiment 2). This effect was similar in the instrumental and normative learning contexts we applied.
Resumo:
This paper presents a distributed hierarchical multiagent architecture for detecting SQL injection attacks against databases. It uses a novel strategy, which is supported by a Case-Based Reasoning mechanism, which provides to the classifier agents with a great capacity of learning and adaptation to face this type of attack. The architecture combines strategies of intrusion detection systems such as misuse detection and anomaly detection. It has been tested and the results are presented in this paper.
Resumo:
Data sources are often dispersed geographically in real life applications. Finding a knowledge model may require to join all the data sources and to run a machine learning algorithm on the joint set. We present an alternative based on a Multi Agent System (MAS): an agent mines one data source in order to extract a local theory (knowledge model) and then merges it with the previous MAS theory using a knowledge fusion technique. This way, we obtain a global theory that summarizes the distributed knowledge without spending resources and time in joining data sources. New experiments have been executed including statistical significance analysis. The results show that, as a result of knowledge fusion, the accuracy of initial theories is significantly improved as well as the accuracy of the monolithic solution.