215 resultados para Machine learning approaches
Resumo:
The use of online tools to support teaching and learning is now commonplace within educational institutions, with many of these institutions mandating or strongly encouraging the use of a blended learning approach to teaching and learning. Consequently, these institutions generally adopt a learning management system (LMS), with a fixed set of collaborative tools, in the belief that effective teaching and learning approaches will be used, to allow students to build knowledge. While some studies into the use of an LMS’s still identify continued didactic approaches to teaching and learning, the focus of this paper is on the ability of collaborative tools such as discussion forums, to build knowledge. In the context of science education, argumentation is touted as playing an important role in this process of knowledge building. However, there is limited research into argumentation in other domains using online discussion and a blended learning approach. This paper describes a study, using design research, which adapts a framework for argumentation that can be applied to other domains. In particular it will focus on an adapted social argumentation schema to identify argument in a discussion forum of N=16 participants in a secondary High School.
Resumo:
Raven and Song Scope are two automated sound anal-ysis tools based on machine learning technique for en-vironmental monitoring. Many research works have been conducted upon them, however, no or rare explo-ration mentions about the performance and comparison between them. This paper investigates the comparisons from six aspects: theory, software interface, ease of use, detection targets, detection accuracy, and potential application. Through deep exploration one critical gap is identified that there is a lack of approach to detect both syllables and call structures, since Raven only aims to detect syllables while Song Scope targets call structures. Therefore, a Timed Probabilistic Automata (TPA) system is proposed which separates syllables first and clusters them into complex structures after.
Resumo:
The aim of this project was to gain the voice of the early adolescent (aged between 11 and 13 years) about the things that are genuinely important to them in their lives. Eight participants were asked to record a private video diary entry each night for one week. A number of thematic topics were identified including: their experiences and perspectives on school curriculum and assessment, opinions about schooling structures, and importance of friendship and family. Giving young adolescents the opportunity to voice their opinions has been valuable in gaining insight to the relative impacts of teaching and learning approaches in their school contexts and the issues they consider as the most important in their lives.
Resumo:
Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering is rarely known. Patterns are always thought to be more representative than single terms for representing documents. In this paper, a novel information filtering model, Pattern-based Topic Model(PBTM) , is proposed to represent the text documents not only using the topic distributions at general level but also using semantic pattern representations at detailed specific level, both of which contribute to the accurate document representation and document relevance ranking. Extensive experiments are conducted to evaluate the effectiveness of PBTM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model achieves outstanding performance.
Resumo:
Textual document set has become an important and rapidly growing information source in the web. Text classification is one of the crucial technologies for information organisation and management. Text classification has become more and more important and attracted wide attention of researchers from different research fields. In this paper, many feature selection methods, the implement algorithms and applications of text classification are introduced firstly. However, because there are much noise in the knowledge extracted by current data-mining techniques for text classification, it leads to much uncertainty in the process of text classification which is produced from both the knowledge extraction and knowledge usage, therefore, more innovative techniques and methods are needed to improve the performance of text classification. It has been a critical step with great challenge to further improve the process of knowledge extraction and effectively utilization of the extracted knowledge. Rough Set decision making approach is proposed to use Rough Set decision techniques to more precisely classify the textual documents which are difficult to separate by the classic text classification methods. The purpose of this paper is to give an overview of existing text classification technologies, to demonstrate the Rough Set concepts and the decision making approach based on Rough Set theory for building more reliable and effective text classification framework with higher precision, to set up an innovative evaluation metric named CEI which is very effective for the performance assessment of the similar research, and to propose a promising research direction for addressing the challenging problems in text classification, text mining and other relative fields.
Resumo:
Topic modelling has been widely used in the fields of information retrieval, text mining, machine learning, etc. In this paper, we propose a novel model, Pattern Enhanced Topic Model (PETM), which makes improvements to topic modelling by semantically representing topics with discriminative patterns, and also makes innovative contributions to information filtering by utilising the proposed PETM to determine document relevance based on topics distribution and maximum matched patterns proposed in this paper. Extensive experiments are conducted to evaluate the effectiveness of PETM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models.
Resumo:
We present an approach to automatically de-identify health records. In our approach, personal health information is identified using a Conditional Random Fields machine learning classifier, a large set of linguistic and lexical features, and pattern matching techniques. Identified personal information is then removed from the reports. The de-identification of personal health information is fundamental for the sharing and secondary use of electronic health records, for example for data mining and disease monitoring. The effectiveness of our approach is first evaluated on the 2007 i2b2 Shared Task dataset, a widely adopted dataset for evaluating de-identification techniques. Subsequently, we investigate the robustness of the approach to limited training data; we study its effectiveness on different type and quality of data by evaluating the approach on scanned pathology reports from an Australian institution. This data contains optical character recognition errors, as well as linguistic conventions that differ from those contained in the i2b2 dataset, for example different date formats. The findings suggest that our approach compares to the best approach from the 2007 i2b2 Shared Task; in addition, the approach is found to be robust to variations of training size, data type and quality in presence of sufficient training data.
Resumo:
CSCW researchers have increasingly come to realize that the material work setting and its population of artefacts play a crucial part in coordination of distributed or co-located work. This paper uses the notion of physicality as a basis to understand cooperative work. Using examples from an ongoing fieldwork on cooperative design practices, it provides a conceptual understanding of physicality and shows that material settings and co-workers’ working practices play an important role in understanding the physicality of cooperative design.
Resumo:
An important responsibility of the Environment Protection Authority, Victoria, is to set objectives for levels of environmental contaminants. To support the development of environmental objectives for water quality, a need has been identified to understand the dual impacts of concentration and duration of a contaminant on biota in freshwater streams. For suspended solids contamination, information reported by Newcombe and Jensen [ North American Journal of Fisheries Management , 16(4):693--727, 1996] study of freshwater fish and the daily suspended solids data from the United States Geological Survey stream monitoring network is utilised. The study group was requested to examine both the utility of the Newcombe and Jensen and the USA data, as well as the formulation of a procedure for use by the Environment Protection Authority Victoria that takes concentration and duration of harmful episodes into account when assessing water quality. The extent to which the impact of a toxic event on fish health could be modelled deterministically was also considered. It was found that concentration and exposure duration were the main compounding factors on the severity of effects of suspended solids on freshwater fish. A protocol for assessing the cumulative effect on fish health and a simple deterministic model, based on the biology of gill harm and recovery, was proposed. References D. W. T. Au, C. A. Pollino, R. S. S Wu, P. K. S. Shin, S. T. F. Lau, and J. Y. M. Tang. Chronic effects of suspended solids on gill structure, osmoregulation, growth, and triiodothyronine in juvenile green grouper epinephelus coioides . Marine Ecology Press Series , 266:255--264, 2004. J.C. Bezdek, S.K. Chuah, and D. Leep. Generalized k-nearest neighbor rules. Fuzzy Sets and Systems , 18:237--26, 1986. E. T. Champagne, K. L. Bett-Garber, A. M. McClung, and C. Bergman. {Sensory characteristics of diverse rice cultivars as influenced by genetic and environmental factors}. Cereal Chem. , {81}:{237--243}, {2004}. S. G. Cheung and P. K. S. Shin. Size effects of suspended particles on gill damage in green-lipped mussel perna viridis. Marine Pollution Bulletin , 51(8--12):801--810, 2005. D. H. Evans. The fish gill: site of action and model for toxic effects of environmental pollutants. Environmental Health Perspectives , 71:44--58, 1987. G. C. Grigg. The failure of oxygen transport in a fish at low levels of ambient oxygen. Comp. Biochem. Physiol. , 29:1253--1257, 1969. G. Holmes, A. Donkin, and I.H. Witten. {Weka: A machine learning workbench}. In Proceedings of the Second Australia and New Zealand Conference on Intelligent Information Systems , volume {24}, pages {357--361}, {Brisbane, Australia}, {1994}. {IEEE Computer Society}. D. D. Macdonald and C. P. Newcombe. Utility of the stress index for predicting suspended sediment effects: response to comments. North American Journal of Fisheries Management , 13:873--876, 1993. C. P. Newcombe. Suspended sediment in aquatic ecosystems: ill effects as a function of concentration and duration of exposure. Technical report, British Columbia Ministry of Environment, Lands and Parks, Habitat Protection branch, Victoria, 1994. C. P. Newcombe and J. O. T. Jensen. Channel suspended sediment and fisheries: A synthesis for quantitative assessment of risk and impact. North American Journal of Fisheries Management , 16(4):693--727, 1996. C. P. Newcombe and D. D. Macdonald. Effects of suspended sediments on aquatic ecosystems. North American Journal of Fisheries Management , 11(1):72--82, 1991. K. Schmidt-Nielsen. Scaling. Why is animal size so important? Cambridge University Press, NY, 1984. J. S. Schwartz, A. Simon, and L. Klimetz. Use of fish functional traits to associate in-stream suspended sediment transport metrics with biological impairment. Environmental Monitoring and Assessment , 179(1--4):347--369, 2011. E. Al Shaw and J. S. Richardson. Direct and indirect effects of sediment pulse duration on stream invertebrate assemb ages and rainbow trout ( Oncorhynchus mykiss ) growth and survival. Canadian Journal of Fish and Aquatic Science , 58:2213--2221, 2001. P. Tiwari and H. Hasegawa. {Demand for housing in Tokyo: A discrete choice analysis}. Regional Studies , {38}:{27--42}, {2004}. Y. Tramblay, A. Saint-Hilaire, T. B. M. J. Ouarda, F. Moatar, and B Hecht. Estimation of local extreme suspended sediment concentrations in california rivers. Science of the Total Environment , 408:4221--
Resumo:
Objective Evaluate the effectiveness and robustness of Anonym, a tool for de-identifying free-text health records based on conditional random fields classifiers informed by linguistic and lexical features, as well as features extracted by pattern matching techniques. De-identification of personal health information in electronic health records is essential for the sharing and secondary usage of clinical data. De-identification tools that adapt to different sources of clinical data are attractive as they would require minimal intervention to guarantee high effectiveness. Methods and Materials The effectiveness and robustness of Anonym are evaluated across multiple datasets, including the widely adopted Integrating Biology and the Bedside (i2b2) dataset, used for evaluation in a de-identification challenge. The datasets used here vary in type of health records, source of data, and their quality, with one of the datasets containing optical character recognition errors. Results Anonym identifies and removes up to 96.6% of personal health identifiers (recall) with a precision of up to 98.2% on the i2b2 dataset, outperforming the best system proposed in the i2b2 challenge. The effectiveness of Anonym across datasets is found to depend on the amount of information available for training. Conclusion Findings show that Anonym compares to the best approach from the 2006 i2b2 shared task. It is easy to retrain Anonym with new datasets; if retrained, the system is robust to variations of training size, data type and quality in presence of sufficient training data.
Resumo:
Background Timely diagnosis and reporting of patient symptoms in hospital emergency departments (ED) is a critical component of health services delivery. However, due to dispersed information resources and a vast amount of manual processing of unstructured information, accurate point-of-care diagnosis is often difficult. Aims The aim of this research is to report initial experimental evaluation of a clinician-informed automated method for the issue of initial misdiagnoses associated with delayed receipt of unstructured radiology reports. Method A method was developed that resembles clinical reasoning for identifying limb abnormalities. The method consists of a gazetteer of keywords related to radiological findings; the method classifies an X-ray report as abnormal if it contains evidence contained in the gazetteer. A set of 99 narrative reports of radiological findings was sourced from a tertiary hospital. Reports were manually assessed by two clinicians and discrepancies were validated by a third expert ED clinician; the final manual classification generated by the expert ED clinician was used as ground truth to empirically evaluate the approach. Results The automated method that attempts to individuate limb abnormalities by searching for keywords expressed by clinicians achieved an F-measure of 0.80 and an accuracy of 0.80. Conclusion While the automated clinician-driven method achieved promising performances, a number of avenues for improvement were identified using advanced natural language processing (NLP) and machine learning techniques.
Resumo:
In attempting to build intelligent litigation support tools, we have moved beyond first generation, production rule legal expert systems. Our work supplements rule-based reasoning with case based reasoning and intelligent information retrieval. This research, specifies an approach to the case based retrieval problem which relies heavily on an extended object-oriented / rule-based system architecture that is supplemented with causal background information. Machine learning techniques and a distributed agent architecture are used to help simulate the reasoning process of lawyers. In this paper, we outline our implementation of the hybrid IKBALS II Rule Based Reasoning / Case Based Reasoning system. It makes extensive use of an automated case representation editor and background information.
Resumo:
We study two problems of online learning under restricted information access. In the first problem, prediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(√(N/M)TlnN ) regret on T rounds of this game. The second problem, the multiarmed bandit with paid observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cNlnN) 1/3 T2/3+√TlnN ) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.
Resumo:
Accurate and detailed measurement of an individual's physical activity is a key requirement for helping researchers understand the relationship between physical activity and health. Accelerometers have become the method of choice for measuring physical activity due to their small size, low cost, convenience and their ability to provide objective information about physical activity. However, interpreting accelerometer data once it has been collected can be challenging. In this work, we applied machine learning algorithms to the task of physical activity recognition from triaxial accelerometer data. We employed a simple but effective approach of dividing the accelerometer data into short non-overlapping windows, converting each window into a feature vector, and treating each feature vector as an i.i.d training instance for a supervised learning algorithm. In addition, we improved on this simple approach with a multi-scale ensemble method that did not need to commit to a single window size and was able to leverage the fact that physical activities produced time series with repetitive patterns and discriminative features for physical activity occurred at different temporal scales.