863 resultados para Data-driven modelling
Resumo:
Performance modelling is a useful tool in the lifeycle of high performance scientific software, such as weather and climate models, especially as a means of ensuring efficient use of available computing resources. In particular, sufficiently accurate performance prediction could reduce the effort and experimental computer time required when porting and optimising a climate model to a new machine. In this paper, traditional techniques are used to predict the computation time of a simple shallow water model which is illustrative of the computation (and communication) involved in climate models. These models are compared with real execution data gathered on AMD Opteron-based systems, including several phases of the U.K. academic community HPC resource, HECToR. Some success is had in relating source code to achieved performance for the K10 series of Opterons, but the method is found to be inadequate for the next-generation Interlagos processor. The experience leads to the investigation of a data-driven application benchmarking approach to performance modelling. Results for an early version of the approach are presented using the shallow model as an example.
Resumo:
The global vegetation response to climate and atmospheric CO2 changes between the last glacial maximum and recent times is examined using an equilibrium vegetation model (BIOME4), driven by output from 17 climate simulations from the Palaeoclimate Modelling Intercomparison Project. Features common to all of the simulations include expansion of treeless vegetation in high northern latitudes; southward displacement and fragmentation of boreal and temperate forests; and expansion of drought-tolerant biomes in the tropics. These features are broadly consistent with pollen-based reconstructions of vegetation distribution at the last glacial maximum. Glacial vegetation in high latitudes reflects cold and dry conditions due to the low CO2 concentration and the presence of large continental ice sheets. The extent of drought-tolerant vegetation in tropical and subtropical latitudes reflects a generally drier low-latitude climate. Comparisons of the observations with BIOME4 simulations, with and without consideration of the direct physiological effect of CO2 concentration on C3 photosynthesis, suggest an important additional role of low CO2 concentration in restricting the extent of forests, especially in the tropics. Global forest cover was overestimated by all models when climate change alone was used to drive BIOME4, and estimated more accurately when physiological effects of CO2 concentration were included. This result suggests that both CO2 effects and climate effects were important in determining glacial-interglacial changes in vegetation. More realistic simulations of glacial vegetation and climate will need to take into account the feedback effects of these structural and physiological changes on the climate.
Resumo:
The creation of Causal Loop Diagrams (CLDs) is a major phase in the System Dynamics (SD) life-cycle, since the created CLDs express dependencies and feedback in the system under study, as well as, guide modellers in building meaningful simulation models. The cre-ation of CLDs is still subject to the modeller's domain expertise (mental model) and her ability to abstract the system, because of the strong de-pendency on semantic knowledge. Since the beginning of SD, available system data sources (written and numerical models) have always been sparsely available, very limited and imperfect and thus of little benefit to the whole modelling process. However, in recent years, we have seen an explosion in generated data, especially in all business related domains that are analysed via Business Dynamics (BD). In this paper, we intro-duce a systematic tool supported CLD creation approach, which analyses and utilises available disparate data sources within the business domain. We demonstrate the application of our methodology on a given business use-case and evaluate the resulting CLD. Finally, we propose directions for future research to further push the automation in the CLD creation and increase confidence in the generated CLDs.
Resumo:
A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.
Resumo:
The recently proposed data-driven background dataset refinement technique provides a means of selecting an informative background for support vector machine (SVM)-based speaker verification systems. This paper investigates the characteristics of the impostor examples in such highly-informative background datasets. Data-driven dataset refinement individually evaluates the suitability of candidate impostor examples for the SVM background prior to selecting the highest-ranking examples as a refined background dataset. Further, the characteristics of the refined dataset were analysed to investigate the desired traits of an informative SVM background. The most informative examples of the refined dataset were found to consist of large amounts of active speech and distinctive language characteristics. The data-driven refinement technique was shown to filter the set of candidate impostor examples to produce a more disperse representation of the impostor population in the SVM kernel space, thereby reducing the number of redundant and less-informative examples in the background dataset. Furthermore, data-driven refinement was shown to provide performance gains when applied to the difficult task of refining a small candidate dataset that was mis-matched to the evaluation conditions.
Resumo:
This study assesses the recently proposed data-driven background dataset refinement technique for speaker verification using alternate SVM feature sets to the GMM supervector features for which it was originally designed. The performance improvements brought about in each trialled SVM configuration demonstrate the versatility of background dataset refinement. This work also extends on the originally proposed technique to exploit support vector coefficients as an impostor suitability metric in the data-driven selection process. Using support vector coefficients improved the performance of the refined datasets in the evaluation of unseen data. Further, attempts are made to exploit the differences in impostor example suitability measures from varying features spaces to provide added robustness.
Resumo:
Throughout the world, state and nation standardised testing of children, has become a "huge industry" (English, 2002). Although English is referring to the American system which has been involved in standardised testing for over half a century, the same could be said of many other countries, including Australia. It has been only in recent years that Australia has embraced national testing as part of a wider reform effort to bring about increased accountability in schooling. The results of high-stakes tests in Australia are now published in newspapers and electronically on the Australian federal government's MySchool website (www.myschoold.edu.au). MySchool provides results on the National Assessment Program - Literacy and Numeracy (NAPLAN) for students in Years 3,5, 7 and 9. Data are available that compare schools to statistically similar schools. This more recent publication of national testing results in Australia is a visible example of "contractual accountability", described by Mulford, Edmunds, Kendall, Kendall and Bishop (2008) as " the degree to which [actors] are fulfilling the expectations of particular audiences in terms of standards, outcomes and results" (p.20).
Resumo:
A commitment in 2010 by the Australian Federal Government to spend $466.7 million dollars on the implementation of personally controlled electronic health records (PCEHR) heralded a shift to a more effective and safer patient centric eHealth system. However, deployment of the PCEHR has met with much criticism, emphasised by poor adoption rates over the first 12 months of operation. An indifferent response by the public and healthcare providers largely sceptical of its utility and safety speaks to the complex sociotechnical drivers and obstacles inherent in the embedding of large (national) scale eHealth projects. With government efforts to inflate consumer and practitioner engagement numbers giving rise to further consumer disillusionment, broader utilitarian opportunities available with the PCEHR are at risk. This paper discusses the implications of establishing the PCEHR as the cornerstone of a holistic eHealth strategy for the aggregation of longitudinal patient information. A viewpoint is offered that the real value in patient data lies not just in the collection of data but in the integration of this information into clinical processes within the framework of a commoditised data-driven approach. Consideration is given to the eHealth-as-a-Service (eHaaS) construct as a disruptive next step for co-ordinated individualised healthcare in the Australian context.
Resumo:
Decision-making is such an integral aspect in health care routine that the ability to make the right decisions at crucial moments can lead to patient health improvements. Evidence-based practice, the paradigm used to make those informed decisions, relies on the use of current best evidence from systematic research such as randomized controlled trials. Limitations of the outcomes from randomized controlled trials (RCT), such as “quantity” and “quality” of evidence generated, has lowered healthcare professionals’ confidence in using EBP. An alternate paradigm of Practice-Based Evidence has evolved with the key being evidence drawn from practice settings. Through the use of health information technology, electronic health records (EHR) capture relevant clinical practice “evidence”. A data-driven approach is proposed to capitalize on the benefits of EHR. The issues of data privacy, security and integrity are diminished by an information accountability concept. Data warehouse architecture completes the data-driven approach by integrating health data from multi-source systems, unique within the healthcare environment.
Resumo:
What potential do artists working with environmental data in public space have for producing new forms of engagement with local environmental conditions? Operating on the edge of heavy bureaucracy, these types of data-driven artistic experiments probe the politics of environmental metrics and explore methods of engaging audiences with issues of environmental health. This discussion considers a small collection of cases studies representative of this growing field of practice. These are works by Natalie Jeremijenko and The Living, Tega Brain and Keith Deverell. The case studies considered are examples of strategic design, works that soften, reveal and potentially shift existing regulations and bureaucratic norms. In doing so they open up new possibilities and questions as to what the smart city is and how it might be realised.
Resumo:
Large sized power transformers are important parts of the power supply chain. These very critical networks of engineering assets are an essential base of a nation’s energy resource infrastructure. This research identifies the key factors influencing transformer normal operating conditions and predicts the asset management lifespan. Engineering asset research has developed few lifespan forecasting methods combining real-time monitoring solutions for transformer maintenance and replacement. Utilizing the rich data source from a remote terminal unit (RTU) system for sensor-data driven analysis, this research develops an innovative real-time lifespan forecasting approach applying logistic regression based on the Weibull distribution. The methodology and the implementation prototype are verified using a data series from 161 kV transformers to evaluate the efficiency and accuracy for energy sector applications. The asset stakeholders and suppliers significantly benefit from the real-time power transformer lifespan evaluation for maintenance and replacement decision support.
Resumo:
Data generated via user activity on social media platforms is routinely used for research across a wide range of social sciences and humanities disciplines. The availability of data through the Twitter APIs in particular has afforded new modes of research, including in media and communication studies; however, there are practical and political issues with gaining access to such data, and with the consequences of how that access is controlled. In their paper ‘Easy Data, Hard Data’, Burgess and Bruns (2015) discuss both the practical and political aspects of Twitter data as they relate to academic research, describing how communication research has been enabled, shaped and constrained by Twitter’s “regimes of access” to data, the politics of data use, and emerging economies of data exchange. This conceptual model, including the ‘easy data, hard data’ formulation, can also be applied to Sina Weibo. In this paper, we build on this model to explore the practical and political challenges and opportunities associated with the ‘regimes of access’ to Weibo data, and their consequences for digital media and communication studies. We argue that in the Chinese context, the politics of data access can be even more complicated than in the case of Twitter, which makes scientific research relying on large social data from this platform more challenging in some ways, but potentially richer and more rewarding in others.
Resumo:
Streamflow forecasts at daily time scale are necessary for effective management of water resources systems. Typical applications include flood control, water quality management, water supply to multiple stakeholders, hydropower and irrigation systems. Conventionally physically based conceptual models and data-driven models are used for forecasting streamflows. Conceptual models require detailed understanding of physical processes governing the system being modeled. Major constraints in developing effective conceptual models are sparse hydrometric gauge network and short historical records that limit our understanding of physical processes. On the other hand, data-driven models rely solely on previous hydrological and meteorological data without directly taking into account the underlying physical processes. Among various data driven models Auto Regressive Integrated Moving Average (ARIMA), Artificial Neural Networks (ANNs) are most widely used techniques. The present study assesses performance of ARIMA and ANNs methods in arriving at one-to seven-day ahead forecast of daily streamflows at Basantpur streamgauge site that is situated at upstream of Hirakud Dam in Mahanadi river basin, India. The ANNs considered include Feed-Forward back propagation Neural Network (FFNN) and Radial Basis Neural Network (RBNN). Daily streamflow forecasts at Basantpur site find use in management of water from Hirakud reservoir. (C) 2015 The Authors. Published by Elsevier B.V.