789 resultados para Data-driven knowledge acquisition
Resumo:
The creation of Causal Loop Diagrams (CLDs) is a major phase in the System Dynamics (SD) life-cycle, since the created CLDs express dependencies and feedback in the system under study, as well as, guide modellers in building meaningful simulation models. The cre-ation of CLDs is still subject to the modeller's domain expertise (mental model) and her ability to abstract the system, because of the strong de-pendency on semantic knowledge. Since the beginning of SD, available system data sources (written and numerical models) have always been sparsely available, very limited and imperfect and thus of little benefit to the whole modelling process. However, in recent years, we have seen an explosion in generated data, especially in all business related domains that are analysed via Business Dynamics (BD). In this paper, we intro-duce a systematic tool supported CLD creation approach, which analyses and utilises available disparate data sources within the business domain. We demonstrate the application of our methodology on a given business use-case and evaluate the resulting CLD. Finally, we propose directions for future research to further push the automation in the CLD creation and increase confidence in the generated CLDs.
Resumo:
Inter-subject parcellation of functional Magnetic Resonance Imaging (fMRI) data based on a standard General Linear Model (GLM) and spectral clustering was recently proposed as a means to alleviate the issues associated with spatial normalization in fMRI. However, for all its appeal, a GLM-based parcellation approach introduces its own biases, in the form of a priori knowledge about the shape of Hemodynamic Response Function (HRF) and task-related signal changes, or about the subject behaviour during the task. In this paper, we introduce a data-driven version of the spectral clustering parcellation, based on Independent Component Analysis (ICA) and Partial Least Squares (PLS) instead of the GLM. First, a number of independent components are automatically selected. Seed voxels are then obtained from the associated ICA maps and we compute the PLS latent variables between the fMRI signal of the seed voxels (which covers regional variations of the HRF) and the principal components of the signal across all voxels. Finally, we parcellate all subjects data with a spectral clustering of the PLS latent variables. We present results of the application of the proposed method on both single-subject and multi-subject fMRI datasets. Preliminary experimental results, evaluated with intra-parcel variance of GLM t-values and PLS derived t-values, indicate that this data-driven approach offers improvement in terms of parcellation accuracy over GLM based techniques.
Resumo:
To analyze the characteristics and predict the dynamic behaviors of complex systems over time, comprehensive research to enable the development of systems that can intelligently adapt to the evolving conditions and infer new knowledge with algorithms that are not predesigned is crucially needed. This dissertation research studies the integration of the techniques and methodologies resulted from the fields of pattern recognition, intelligent agents, artificial immune systems, and distributed computing platforms, to create technologies that can more accurately describe and control the dynamics of real-world complex systems. The need for such technologies is emerging in manufacturing, transportation, hazard mitigation, weather and climate prediction, homeland security, and emergency response. Motivated by the ability of mobile agents to dynamically incorporate additional computational and control algorithms into executing applications, mobile agent technology is employed in this research for the adaptive sensing and monitoring in a wireless sensor network. Mobile agents are software components that can travel from one computing platform to another in a network and carry programs and data states that are needed for performing the assigned tasks. To support the generation, migration, communication, and management of mobile monitoring agents, an embeddable mobile agent system (Mobile-C) is integrated with sensor nodes. Mobile monitoring agents visit distributed sensor nodes, read real-time sensor data, and perform anomaly detection using the equipped pattern recognition algorithms. The optimal control of agents is achieved by mimicking the adaptive immune response and the application of multi-objective optimization algorithms. The mobile agent approach provides potential to reduce the communication load and energy consumption in monitoring networks. The major research work of this dissertation project includes: (1) studying effective feature extraction methods for time series measurement data; (2) investigating the impact of the feature extraction methods and dissimilarity measures on the performance of pattern recognition; (3) researching the effects of environmental factors on the performance of pattern recognition; (4) integrating an embeddable mobile agent system with wireless sensor nodes; (5) optimizing agent generation and distribution using artificial immune system concept and multi-objective algorithms; (6) applying mobile agent technology and pattern recognition algorithms for adaptive structural health monitoring and driving cycle pattern recognition; (7) developing a web-based monitoring network to enable the visualization and analysis of real-time sensor data remotely. Techniques and algorithms developed in this dissertation project will contribute to research advances in networked distributed systems operating under changing environments.
Resumo:
A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.
Resumo:
The recently proposed data-driven background dataset refinement technique provides a means of selecting an informative background for support vector machine (SVM)-based speaker verification systems. This paper investigates the characteristics of the impostor examples in such highly-informative background datasets. Data-driven dataset refinement individually evaluates the suitability of candidate impostor examples for the SVM background prior to selecting the highest-ranking examples as a refined background dataset. Further, the characteristics of the refined dataset were analysed to investigate the desired traits of an informative SVM background. The most informative examples of the refined dataset were found to consist of large amounts of active speech and distinctive language characteristics. The data-driven refinement technique was shown to filter the set of candidate impostor examples to produce a more disperse representation of the impostor population in the SVM kernel space, thereby reducing the number of redundant and less-informative examples in the background dataset. Furthermore, data-driven refinement was shown to provide performance gains when applied to the difficult task of refining a small candidate dataset that was mis-matched to the evaluation conditions.
Resumo:
This study assesses the recently proposed data-driven background dataset refinement technique for speaker verification using alternate SVM feature sets to the GMM supervector features for which it was originally designed. The performance improvements brought about in each trialled SVM configuration demonstrate the versatility of background dataset refinement. This work also extends on the originally proposed technique to exploit support vector coefficients as an impostor suitability metric in the data-driven selection process. Using support vector coefficients improved the performance of the refined datasets in the evaluation of unseen data. Further, attempts are made to exploit the differences in impostor example suitability measures from varying features spaces to provide added robustness.
Resumo:
Throughout the world, state and nation standardised testing of children, has become a "huge industry" (English, 2002). Although English is referring to the American system which has been involved in standardised testing for over half a century, the same could be said of many other countries, including Australia. It has been only in recent years that Australia has embraced national testing as part of a wider reform effort to bring about increased accountability in schooling. The results of high-stakes tests in Australia are now published in newspapers and electronically on the Australian federal government's MySchool website (www.myschoold.edu.au). MySchool provides results on the National Assessment Program - Literacy and Numeracy (NAPLAN) for students in Years 3,5, 7 and 9. Data are available that compare schools to statistically similar schools. This more recent publication of national testing results in Australia is a visible example of "contractual accountability", described by Mulford, Edmunds, Kendall, Kendall and Bishop (2008) as " the degree to which [actors] are fulfilling the expectations of particular audiences in terms of standards, outcomes and results" (p.20).
Resumo:
A commitment in 2010 by the Australian Federal Government to spend $466.7 million dollars on the implementation of personally controlled electronic health records (PCEHR) heralded a shift to a more effective and safer patient centric eHealth system. However, deployment of the PCEHR has met with much criticism, emphasised by poor adoption rates over the first 12 months of operation. An indifferent response by the public and healthcare providers largely sceptical of its utility and safety speaks to the complex sociotechnical drivers and obstacles inherent in the embedding of large (national) scale eHealth projects. With government efforts to inflate consumer and practitioner engagement numbers giving rise to further consumer disillusionment, broader utilitarian opportunities available with the PCEHR are at risk. This paper discusses the implications of establishing the PCEHR as the cornerstone of a holistic eHealth strategy for the aggregation of longitudinal patient information. A viewpoint is offered that the real value in patient data lies not just in the collection of data but in the integration of this information into clinical processes within the framework of a commoditised data-driven approach. Consideration is given to the eHealth-as-a-Service (eHaaS) construct as a disruptive next step for co-ordinated individualised healthcare in the Australian context.
Resumo:
While past knowledge-based approaches to service innovation have emphasized the role of knowledge integration in the delivery of customer-focused solutions, these approaches do not adequately address the complexities inherent in knowledge acquisition and integration in project-oriented firms. Adopting a dynamic capability framework and building on knowledge-based approaches to innovation, the current study examines how the interplay of learning capabilities and knowledge integration capability impacts service innovation and sustained competitive advantage. This two-stage multi-sample study finds that entrepreneurial project-oriented service firms in their quest for competitive advantage through greater innovation invest in knowledge acquisition and integration capabilities. Implications for theory and practice are discussed and directions for future research provided.
Resumo:
Decision-making is such an integral aspect in health care routine that the ability to make the right decisions at crucial moments can lead to patient health improvements. Evidence-based practice, the paradigm used to make those informed decisions, relies on the use of current best evidence from systematic research such as randomized controlled trials. Limitations of the outcomes from randomized controlled trials (RCT), such as “quantity” and “quality” of evidence generated, has lowered healthcare professionals’ confidence in using EBP. An alternate paradigm of Practice-Based Evidence has evolved with the key being evidence drawn from practice settings. Through the use of health information technology, electronic health records (EHR) capture relevant clinical practice “evidence”. A data-driven approach is proposed to capitalize on the benefits of EHR. The issues of data privacy, security and integrity are diminished by an information accountability concept. Data warehouse architecture completes the data-driven approach by integrating health data from multi-source systems, unique within the healthcare environment.
Resumo:
What potential do artists working with environmental data in public space have for producing new forms of engagement with local environmental conditions? Operating on the edge of heavy bureaucracy, these types of data-driven artistic experiments probe the politics of environmental metrics and explore methods of engaging audiences with issues of environmental health. This discussion considers a small collection of cases studies representative of this growing field of practice. These are works by Natalie Jeremijenko and The Living, Tega Brain and Keith Deverell. The case studies considered are examples of strategic design, works that soften, reveal and potentially shift existing regulations and bureaucratic norms. In doing so they open up new possibilities and questions as to what the smart city is and how it might be realised.
Resumo:
Large sized power transformers are important parts of the power supply chain. These very critical networks of engineering assets are an essential base of a nation’s energy resource infrastructure. This research identifies the key factors influencing transformer normal operating conditions and predicts the asset management lifespan. Engineering asset research has developed few lifespan forecasting methods combining real-time monitoring solutions for transformer maintenance and replacement. Utilizing the rich data source from a remote terminal unit (RTU) system for sensor-data driven analysis, this research develops an innovative real-time lifespan forecasting approach applying logistic regression based on the Weibull distribution. The methodology and the implementation prototype are verified using a data series from 161 kV transformers to evaluate the efficiency and accuracy for energy sector applications. The asset stakeholders and suppliers significantly benefit from the real-time power transformer lifespan evaluation for maintenance and replacement decision support.
Resumo:
Data generated via user activity on social media platforms is routinely used for research across a wide range of social sciences and humanities disciplines. The availability of data through the Twitter APIs in particular has afforded new modes of research, including in media and communication studies; however, there are practical and political issues with gaining access to such data, and with the consequences of how that access is controlled. In their paper ‘Easy Data, Hard Data’, Burgess and Bruns (2015) discuss both the practical and political aspects of Twitter data as they relate to academic research, describing how communication research has been enabled, shaped and constrained by Twitter’s “regimes of access” to data, the politics of data use, and emerging economies of data exchange. This conceptual model, including the ‘easy data, hard data’ formulation, can also be applied to Sina Weibo. In this paper, we build on this model to explore the practical and political challenges and opportunities associated with the ‘regimes of access’ to Weibo data, and their consequences for digital media and communication studies. We argue that in the Chinese context, the politics of data access can be even more complicated than in the case of Twitter, which makes scientific research relying on large social data from this platform more challenging in some ways, but potentially richer and more rewarding in others.