315 resultados para Data-driven modelling


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recently proposed data-driven background dataset refinement technique provides a means of selecting an informative background for support vector machine (SVM)-based speaker verification systems. This paper investigates the characteristics of the impostor examples in such highly-informative background datasets. Data-driven dataset refinement individually evaluates the suitability of candidate impostor examples for the SVM background prior to selecting the highest-ranking examples as a refined background dataset. Further, the characteristics of the refined dataset were analysed to investigate the desired traits of an informative SVM background. The most informative examples of the refined dataset were found to consist of large amounts of active speech and distinctive language characteristics. The data-driven refinement technique was shown to filter the set of candidate impostor examples to produce a more disperse representation of the impostor population in the SVM kernel space, thereby reducing the number of redundant and less-informative examples in the background dataset. Furthermore, data-driven refinement was shown to provide performance gains when applied to the difficult task of refining a small candidate dataset that was mis-matched to the evaluation conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study assesses the recently proposed data-driven background dataset refinement technique for speaker verification using alternate SVM feature sets to the GMM supervector features for which it was originally designed. The performance improvements brought about in each trialled SVM configuration demonstrate the versatility of background dataset refinement. This work also extends on the originally proposed technique to exploit support vector coefficients as an impostor suitability metric in the data-driven selection process. Using support vector coefficients improved the performance of the refined datasets in the evaluation of unseen data. Further, attempts are made to exploit the differences in impostor example suitability measures from varying features spaces to provide added robustness.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Throughout the world, state and nation standardised testing of children, has become a "huge industry" (English, 2002). Although English is referring to the American system which has been involved in standardised testing for over half a century, the same could be said of many other countries, including Australia. It has been only in recent years that Australia has embraced national testing as part of a wider reform effort to bring about increased accountability in schooling. The results of high-stakes tests in Australia are now published in newspapers and electronically on the Australian federal government's MySchool website (www.myschoold.edu.au). MySchool provides results on the National Assessment Program - Literacy and Numeracy (NAPLAN) for students in Years 3,5, 7 and 9. Data are available that compare schools to statistically similar schools. This more recent publication of national testing results in Australia is a visible example of "contractual accountability", described by Mulford, Edmunds, Kendall, Kendall and Bishop (2008) as " the degree to which [actors] are fulfilling the expectations of particular audiences in terms of standards, outcomes and results" (p.20).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A commitment in 2010 by the Australian Federal Government to spend $466.7 million dollars on the implementation of personally controlled electronic health records (PCEHR) heralded a shift to a more effective and safer patient centric eHealth system. However, deployment of the PCEHR has met with much criticism, emphasised by poor adoption rates over the first 12 months of operation. An indifferent response by the public and healthcare providers largely sceptical of its utility and safety speaks to the complex sociotechnical drivers and obstacles inherent in the embedding of large (national) scale eHealth projects. With government efforts to inflate consumer and practitioner engagement numbers giving rise to further consumer disillusionment, broader utilitarian opportunities available with the PCEHR are at risk. This paper discusses the implications of establishing the PCEHR as the cornerstone of a holistic eHealth strategy for the aggregation of longitudinal patient information. A viewpoint is offered that the real value in patient data lies not just in the collection of data but in the integration of this information into clinical processes within the framework of a commoditised data-driven approach. Consideration is given to the eHealth-as-a-Service (eHaaS) construct as a disruptive next step for co-ordinated individualised healthcare in the Australian context.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Decision-making is such an integral aspect in health care routine that the ability to make the right decisions at crucial moments can lead to patient health improvements. Evidence-based practice, the paradigm used to make those informed decisions, relies on the use of current best evidence from systematic research such as randomized controlled trials. Limitations of the outcomes from randomized controlled trials (RCT), such as “quantity” and “quality” of evidence generated, has lowered healthcare professionals’ confidence in using EBP. An alternate paradigm of Practice-Based Evidence has evolved with the key being evidence drawn from practice settings. Through the use of health information technology, electronic health records (EHR) capture relevant clinical practice “evidence”. A data-driven approach is proposed to capitalize on the benefits of EHR. The issues of data privacy, security and integrity are diminished by an information accountability concept. Data warehouse architecture completes the data-driven approach by integrating health data from multi-source systems, unique within the healthcare environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

What potential do artists working with environmental data in public space have for producing new forms of engagement with local environmental conditions? Operating on the edge of heavy bureaucracy, these types of data-driven artistic experiments probe the politics of environmental metrics and explore methods of engaging audiences with issues of environmental health. This discussion considers a small collection of cases studies representative of this growing field of practice. These are works by Natalie Jeremijenko and The Living, Tega Brain and Keith Deverell. The case studies considered are examples of strategic design, works that soften, reveal and potentially shift existing regulations and bureaucratic norms. In doing so they open up new possibilities and questions as to what the smart city is and how it might be realised.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Large sized power transformers are important parts of the power supply chain. These very critical networks of engineering assets are an essential base of a nation’s energy resource infrastructure. This research identifies the key factors influencing transformer normal operating conditions and predicts the asset management lifespan. Engineering asset research has developed few lifespan forecasting methods combining real-time monitoring solutions for transformer maintenance and replacement. Utilizing the rich data source from a remote terminal unit (RTU) system for sensor-data driven analysis, this research develops an innovative real-time lifespan forecasting approach applying logistic regression based on the Weibull distribution. The methodology and the implementation prototype are verified using a data series from 161 kV transformers to evaluate the efficiency and accuracy for energy sector applications. The asset stakeholders and suppliers significantly benefit from the real-time power transformer lifespan evaluation for maintenance and replacement decision support.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data generated via user activity on social media platforms is routinely used for research across a wide range of social sciences and humanities disciplines. The availability of data through the Twitter APIs in particular has afforded new modes of research, including in media and communication studies; however, there are practical and political issues with gaining access to such data, and with the consequences of how that access is controlled. In their paper ‘Easy Data, Hard Data’, Burgess and Bruns (2015) discuss both the practical and political aspects of Twitter data as they relate to academic research, describing how communication research has been enabled, shaped and constrained by Twitter’s “regimes of access” to data, the politics of data use, and emerging economies of data exchange. This conceptual model, including the ‘easy data, hard data’ formulation, can also be applied to Sina Weibo. In this paper, we build on this model to explore the practical and political challenges and opportunities associated with the ‘regimes of access’ to Weibo data, and their consequences for digital media and communication studies. We argue that in the Chinese context, the politics of data access can be even more complicated than in the case of Twitter, which makes scientific research relying on large social data from this platform more challenging in some ways, but potentially richer and more rewarding in others.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computational optimisation of clinically important electrocardiogram signal features, within a single heart beat, using a Markov-chain Monte Carlo (MCMC) method is undertaken. A detailed, efficient data-driven software implementation of an MCMC algorithm has been shown. Initially software parallelisation is explored and has been shown that despite the large amount of model parameter inter-dependency that parallelisation is possible. Also, an initial reconfigurable hardware approach is explored for future applicability to real-time computation on a portable ECG device, under continuous extended use.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digital technology offers enormous benefits (economic, quality of design and efficiency in use) if adopted to implement integrated ways of representing the physical world in a digital form. When applied across the full extent of the built and natural world, it is referred to as the Digital Built Environment (DBE) and encompasses a wide range of approaches and technology initiatives, all aimed at the same end goal: the development of a virtual world that sufficiently mirrors the real world to form the basis for the smart cities of the present and future, enable efficient infrastructure design and programmed maintenance, and create a new foundation for economic growth and social well-being through evidence-based analysis. The creation of a National Data Policy for the DBE will facilitate the creation of additional high technology industries in Australia; provide Governments, industries and citizens with greater knowledge of the environments they occupy and plan; and offer citizen-driven innovations for the future. Australia has slipped behind other nations in the adoption and execution of Building Information Modelling (BIM) and the principal concern is that the gap is widening. Data driven innovation added $67 billion to the Australian economy in 20131. Strong open data policy equates to $16 billion in new value2. Australian Government initiatives such as the Digital Earth inspired “National Map” offer a platform and pathway to embrace the concept of a “BIM Globe”, while also leveraging unprecedented growth in open source / open data collaboration. Australia must address the challenges by learning from international experiences—most notably the UK and NZ—and mandate the use of BIM across Government, extending the Framework for Spatial Data Foundation to include the Built Environment as a theme and engaging collaboration through a “BIM globe” metaphor. This proposed DBE strategy will modernise the Australian urban planning and the construction industry. It will change the way we develop our cities by fundamentally altering the dynamics and behaviours of the supply chains and unlocking new and more efficient ways of collaborating at all stages of the project life-cycle. There are currently two major modelling approaches that contribute to the challenge of delivering the DBE. Though these collectively encompass many (often competing) approaches or proprietary software systems, all can be categorised as either: a spatial modelling approach, where the focus is generally on representing the elements that make up the world within their geographic context; and a construction modelling approach, where the focus is on models that support the life cycle management of the built environment. These two approaches have tended to evolve independently, addressing two broad industry sectors: the one concerned with understanding and managing global and regional aspects of the world that we inhabit, including disciplines concerned with climate, earth sciences, land ownership, urban and regional planning and infrastructure management; the other is concerned with planning, design, construction and operation of built facilities and includes architectural and engineering design, product manufacturing, construction, facility management and related disciplines (a process/technology commonly known as Building Information Modelling, BIM). The spatial industries have a strong voice in the development of public policy in Australia, while the construction sector, which in 2014 accounted for around 8.5% of Australia’s GDP3, has no single voice and because of its diversity, is struggling to adapt to and take advantage of the opportunity presented by these digital technologies. The experience in the UK over the past few years has demonstrated that government leadership is very effective in stimulating industry adoption of digital technologies by, on the one hand, mandating the use of BIM on public procurement projects while at the same time, providing comparatively modest funding to address the common issues that confront the industry in adopting that way of working across the supply chain. The reported result has been savings of £840m in construction costs in 2013/14 according to UK Cabinet Office figures4. There is worldwide recognition of the value of bringing these two modelling technologies together. Australia has the expertise to exercise leadership in this work, but it requires a commitment by government to recognise the importance of BIM as a companion methodology to the spatial technologies so that these two disciplinary domains can cooperate in the development of data policies and information exchange standards to smooth out common workflows. buildingSMART Australasia, SIBA and their academic partners have initiated this dialogue in Australia and wish to work collaboratively, with government support and leadership, to explore the opportunities open to us as we develop an Australasian Digital Built Environment. As part of that programme, we must develop and implement a strategy to accelerate the adoption of BIM processes across the Australian construction sector while at the same time, developing an integrated approach in concert with the spatial sector that will position Australia at the forefront of international best practice in this area. Australia and New Zealand cannot afford to be on the back foot as we face the challenges of rapid urbanisation and change in the global environment. Although we can identify some exemplary initiatives in this area, particularly in New Zealand in response to the need for more resilient urban development in the face of earthquake threats, there is still much that needs to be done. We are well situated in the Asian region to take a lead in this challenge, but we are at imminent risk of losing the initiative if we do not take action now. Strategic collaboration between Governments, Industry and Academia will create new jobs and wealth, with the potential, for example, to save around 20% on the delivery costs of new built assets, based on recent UK estimates.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The problem of impostor dataset selection for GMM-based speaker verification is addressed through the recently proposed data-driven background dataset refinement technique. The SVM-based refinement technique selects from a candidate impostor dataset those examples that are most frequently selected as support vectors when training a set of SVMs on a development corpus. This study demonstrates the versatility of dataset refinement in the task of selecting suitable impostor datasets for use in GMM-based speaker verification. The use of refined Z- and T-norm datasets provided performance gains of 15% in EER in the NIST 2006 SRE over the use of heuristically selected datasets. The refined datasets were shown to generalise well to the unseen data of the NIST 2008 SRE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The ability to forecast machinery failure is vital to reducing maintenance costs, operation downtime and safety hazards. Recent advances in condition monitoring technologies have given rise to a number of prognostic models for forecasting machinery health based on condition data. Although these models have aided the advancement of the discipline, they have made only a limited contribution to developing an effective machinery health prognostic system. The literature review indicates that there is not yet a prognostic model that directly models and fully utilises suspended condition histories (which are very common in practice since organisations rarely allow their assets to run to failure); that effectively integrates population characteristics into prognostics for longer-range prediction in a probabilistic sense; which deduces the non-linear relationship between measured condition data and actual asset health; and which involves minimal assumptions and requirements. This work presents a novel approach to addressing the above-mentioned challenges. The proposed model consists of a feed-forward neural network, the training targets of which are asset survival probabilities estimated using a variation of the Kaplan-Meier estimator and a degradation-based failure probability density estimator. The adapted Kaplan-Meier estimator is able to model the actual survival status of individual failed units and estimate the survival probability of individual suspended units. The degradation-based failure probability density estimator, on the other hand, extracts population characteristics and computes conditional reliability from available condition histories instead of from reliability data. The estimated survival probability and the relevant condition histories are respectively presented as “training target” and “training input” to the neural network. The trained network is capable of estimating the future survival curve of a unit when a series of condition indices are inputted. Although the concept proposed may be applied to the prognosis of various machine components, rolling element bearings were chosen as the research object because rolling element bearing failure is one of the foremost causes of machinery breakdowns. Computer simulated and industry case study data were used to compare the prognostic performance of the proposed model and four control models, namely: two feed-forward neural networks with the same training function and structure as the proposed model, but neglected suspended histories; a time series prediction recurrent neural network; and a traditional Weibull distribution model. The results support the assertion that the proposed model performs better than the other four models and that it produces adaptive prediction outputs with useful representation of survival probabilities. This work presents a compelling concept for non-parametric data-driven prognosis, and for utilising available asset condition information more fully and accurately. It demonstrates that machinery health can indeed be forecasted. The proposed prognostic technique, together with ongoing advances in sensors and data-fusion techniques, and increasingly comprehensive databases of asset condition data, holds the promise for increased asset availability, maintenance cost effectiveness, operational safety and – ultimately – organisation competitiveness.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.