959 resultados para Sequence learning
Resumo:
Many schools do not begin to introduce college students to software engineering until they have had at least one semester of programming. Since software engineering is a large, complex, and abstract subject it is difficult to construct active learning exercises that build on the students’ elementary knowledge of programming and still teach basic software engineering principles. It is also the case that beginning students typically know how to construct small programs, but they have little experience with the techniques necessary to produce reliable and long-term maintainable modules. I have addressed these two concerns by defining a local standard (Montana Tech Method (MTM) Software Development Standard for Small Modules Template) that step-by-step directs students toward the construction of highly reliable small modules using well known, best-practices software engineering techniques. “Small module” is here defined as a coherent development task that can be unit tested, and can be car ried out by a single (or a pair of) software engineer(s) in at most a few weeks. The standard describes the process to be used and also provides a template for the top-level documentation. The instructional module’s sequence of mini-lectures and exercises associated with the use of this (and other) local standards are used throughout the course, which perforce covers more abstract software engineering material using traditional reading and writing assignments. The sequence of mini-lectures and hands-on assignments (many of which are done in small groups) constitutes an instructional module that can be used in any similar software engineering course.
Resumo:
The main objective of this article is to focus on the analysis of teaching techniques, ranging from the use of the blackboard and chalk in old traditional classes, using slides and overhead projectors in the eighties and use of presentation software in the nineties, to the video, electronic board and network resources nowadays. Furthermore, all the aforementioned, is viewed under the different mentalities in which the teacher conditions the student using the new teaching technique, improving soft skills but maybe leading either to encouragement or disinterest, and including the lack of educational knowledge consolidation at scientific, technology and specific levels. In the same way, we study the process of adaptation required for teachers, the differences in the processes of information transfer and education towards the student, and even the existence of teachers who are not any longer appealed by their work due which has become much simpler due to new technologies and the greater ease in the development of classes due to the criteria described on the new Grade Programs adopted by the European Higher Education Area. Moreover, it is also intended to understand the evolution of students’ profiles, from the eighties to present time, in order to understand certain attitudes, behaviours, accomplishments and acknowledgements acquired over the semesters within the degree Programs. As an Educational Innovation Group, another key question also arises. What will be the learning techniques in the future?. How these evolving matters will affect both positively and negatively on the mentality, attitude, behaviour, learning, achievement of goals and satisfaction levels of all elements involved in universities’ education? Clearly, this evolution from chalk to the electronic board, the three-dimensional view of our works and their sequence, greatly facilitates the understanding and adaptation later on to the business world, but does not answer to the unknowns regarding the knowledge and the full development of achievement’s indicators in basic skills of a degree. This is the underlying question which steers the roots of the presented research.
Resumo:
Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance – at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest-performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
Resumo:
Background: Designing novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone. Results: We present STAR, Site Targeted Amino acid Recombination predictor, which produces a score indicating the structural disruption caused by recombination, for each position in an amino acid sequence. Example predictions contrasted with those of alternative tools, illustrate STAR'S utility to assist in determining useful recombination sites. Overall, the correlation coefficient between the output of the experimentally validated protein design algorithm SCHEMA and the prediction of STAR is very high (0.89). Conclusion: STAR allows the user to explore useful recombination sites in amino acid sequences with unknown structure and unknown evolutionary origin. The predictor service is available from http://pprowler.itee.uq.edu.au/star.
Resumo:
In this paper we demonstrate that it is possible to gradually improve the performance of support vector machine (SVM) classifiers by using a genetic algorithm to select a sequence of training subsets from the available data. Performance improvement is possible because the SVM solution generally lies some distance away from the Bayes optimal in the space of learning parameters. We illustrate performance improvements on a number of benchmark data sets.
Resumo:
Prediction of peroxisomal matrix proteins generally depends on the presence of one of two distinct motifs at the end of the amino acid sequence. PTS1 peroxisomal proteins have a well conserved tripeptide at the C-terminal end. However, the preceding residues in the sequence arguably play a crucial role in targeting the protein to the peroxisome. Previous work in applying machine learning to the prediction of peroxisomal matrix proteins has failed W capitalize on the full extent of these dependencies. We benchmark a range of machine learning algorithms, and show that a classifier - based on the Support Vector Machine - produces more accurate results when dependencies between the conserved motif and the preceding section are exploited. We publish an updated and rigorously curated data set that results in increased prediction accuracy of most tested models.
Resumo:
Explicit (aware) learning has been shown to evidence certain characteristics, such as extinction, blocking, occasion setting, and reliance on context. These characteristics have not been assessed in implicit (unaware) learning. The current study investigated whether implicit learning is subject to blocking. Participants completed a cued reaction time task, where they watched rapid presentations of a random sequence of 8 pairs of shapes, and responded to two target shapes. One target was always preceded by a cue. The experimental group completed a pretraining phase where half the cue, one shape, was followed by the target. Both experimental and control groups completed a training phase where both elements of the cue, two shapes, were followed by the target. Both aware and unaware participants evidenced learning, whereby responding was faster for cued than uncued targets. Aware participants in the experimental group responded faster to targets preceded by the pretrained element than by the other element of the cue. Control and unaware experimental participants were faster to respond to targets preceded by either element of the cue. As blocking was only evident in aware participants, but implicit learning was observed in all participants, it is concluded that implicit learning is not subject to blocking.
Resumo:
The G-protein coupled receptors--or GPCRs--comprise simultaneously one of the largest and one of the most multi-functional protein families known to modern-day molecular bioscience. From a drug discovery and pharmaceutical industry perspective, the GPCRs constitute one of the most commercially and economically important groups of proteins known. The GPCRs undertake numerous vital metabolic functions and interact with a hugely diverse range of small and large ligands. Many different methodologies have been developed to efficiently and accurately classify the GPCRs. These range from motif-based techniques to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of sequences. We review here the available methodologies for the classification of GPCRs. Part of this work focuses on how we have tried to build the intrinsically hierarchical nature of sequence relations, implicit within the family, into an adaptive approach to classification. Importantly, we also allude to some of the key innate problems in developing an effective approach to classifying the GPCRs: the lack of sequence similarity between the six classes that comprise the GPCR family and the low sequence similarity to other family members evinced by many newly revealed members of the family.
Resumo:
Background - The literature is not univocal about the effects of Peer Review (PR) within the context of constructivist learning. Due to the predominant focus on using PR as an assessment tool, rather than a constructivist learning activity, and because most studies implicitly assume that the benefits of PR are limited to the reviewee, little is known about the effects upon students who are required to review their peers. Much of the theoretical debate in the literature is focused on explaining how and why constructivist learning is beneficial. At the same time these discussions are marked by an underlying presupposition of a causal relationship between reviewing and deep learning. Objectives - The purpose of the study is to investigate whether the writing of PR feedback causes students to benefit in terms of: perceived utility about statistics, actual use of statistics, better understanding of statistical concepts and associated methods, changed attitudes towards market risks, and outcomes of decisions that were made. Methods - We conducted a randomized experiment, assigning students randomly to receive PR or non–PR treatments and used two cohorts with a different time span. The paper discusses the experimental design and all the software components that we used to support the learning process: Reproducible Computing technology which allows students to reproduce or re–use statistical results from peers, Collaborative PR, and an AI–enhanced Stock Market Engine. Results - The results establish that the writing of PR feedback messages causes students to experience benefits in terms of Behavior, Non–Rote Learning, and Attitudes, provided the sequence of PR activities are maintained for a period that is sufficiently long.
Resumo:
Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the animal behaviour domain. Our objective was to see how much could be done in a simple and relatively rapid manner using a corpus of journal papers. We used a sequence of pre-existing text processing steps, and here describe the different choices made to clean the input, to derive a set of terms and to structure those terms in a number of hierarchies. We describe some of the challenges, especially that of focusing the ontology appropriately given a starting point of a heterogeneous corpus. Results - Using mainly automated techniques, we were able to construct an 18055 term ontology-like structure with 73% recall of animal behaviour terms, but a precision of only 26%. We were able to clean unwanted terms from the nascent ontology using lexico-syntactic patterns that tested the validity of term inclusion within the ontology. We used the same technique to test for subsumption relationships between the remaining terms to add structure to the initially broad and shallow structure we generated. All outputs are available at http://thirlmere.aston.ac.uk/~kiffer/animalbehaviour/ webcite. Conclusion - We present a systematic method for the initial steps of ontology or structured vocabulary construction for scientific domains that requires limited human effort and can make a contribution both to ontology learning and maintenance. The method is useful both for the exploration of a scientific domain and as a stepping stone towards formally rigourous ontologies. The filtering of recognised terms from a heterogeneous corpus to focus upon those that are the topic of the ontology is identified to be one of the main challenges for research in ontology learning.
Resumo:
Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the Animal Behaviour domain. Our objective was to see how much could be done in a simple and rapid manner using a corpus of journal papers. We used a sequence of text processing steps, and describe the different choices made to clean the input, to derive a set of terms and to structure those terms in a hierarchy. We were able in a very short space of time to construct a 17000 term ontology with a high percentage of suitable terms. We describe some of the challenges, especially that of focusing the ontology appropriately given a starting point of a heterogeneous corpus.
Resumo:
Formal grammars can used for describing complex repeatable structures such as DNA sequences. In this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar. L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant development, and model the morphology of a variety of organisms. We believe that parallel grammars also can be used for modeling genetic mechanisms and sequences such as promoters. Promoters are short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the promoter recognition a complex problem. We replace the problem of promoter recognition by induction of context-free stochastic L-grammar rules, which are later used for the structural analysis of promoter sequences. L-grammar rules are derived automatically from the drosophila and vertebrate promoter datasets using a genetic programming technique and their fitness is evaluated using a Support Vector Machine (SVM) classifier. The artificial promoter sequences generated using the derived L- grammar rules are analyzed and compared with natural promoter sequences.
Resumo:
This paper consolidates evidence and material from a range of specialist and disciplinary fields to provide an evidence-based review and synthesis on the design and use of serious games in higher education. Search terms identified 165 papers reporting conceptual and empirical evidence on how learning attributes and game mechanics may be planned, designed and implemented by university teachers interested in using games, which are integrated into lesson plans and orchestrated as part of a learning sequence at any scale. The findings outline the potential of classifying the links between learning attributes and game mechanics as a means to scaffold teachers’ understanding of how to perpetuate learning in optimal ways while enhancing the in-game learning experience. The findings of this paper provide a foundation for describing methods, frames and discourse around experiences of design and use of serious games, linked to methodological limitations and recommendations for further research in this area.