906 resultados para Data Interpretation, Statistical
Resumo:
Electricity network investment and asset management require accurate estimation of future demand in energy consumption within specified service areas. For this purpose, simple models are typically developed to predict future trends in electricity consumption using various methods and assumptions. This paper presents a statistical model to predict electricity consumption in the residential sector at the Census Collection District (CCD) level over the state of New South Wales, Australia, based on spatial building and household characteristics. Residential household demographic and building data from the Australian Bureau of Statistics (ABS) and actual electricity consumption data from electricity companies are merged for 74 % of the 12,000 CCDs in the state. Eighty percent of the merged dataset is randomly set aside to establish the model using regression analysis, and the remaining 20 % is used to independently test the accuracy of model prediction against actual consumption. In 90 % of the cases, the predicted consumption is shown to be within 5 kWh per dwelling per day from actual values, with an overall state accuracy of -1.15 %. Given a future scenario with a shift in climate zone and a growth in population, the model is used to identify the geographical or service areas that are most likely to have increased electricity consumption. Such geographical representation can be of great benefit when assessing alternatives to the centralised generation of energy; having such a model gives a quantifiable method to selecting the 'most' appropriate system when a review or upgrade of the network infrastructure is required.
Resumo:
An important aspect of decision support systems involves applying sophisticated and flexible statistical models to real datasets and communicating these results to decision makers in interpretable ways. An important class of problem is the modelling of incidence such as fire, disease etc. Models of incidence known as point processes or Cox processes are particularly challenging as they are ‘doubly stochastic’ i.e. obtaining the probability mass function of incidents requires two integrals to be evaluated. Existing approaches to the problem either use simple models that obtain predictions using plug-in point estimates and do not distinguish between Cox processes and density estimation but do use sophisticated 3D visualization for interpretation. Alternatively other work employs sophisticated non-parametric Bayesian Cox process models, but do not use visualization to render interpretable complex spatial temporal forecasts. The contribution here is to fill this gap by inferring predictive distributions of Gaussian-log Cox processes and rendering them using state of the art 3D visualization techniques. This requires performing inference on an approximation of the model on a discretized grid of large scale and adapting an existing spatial-diurnal kernel to the log Gaussian Cox process context.
Resumo:
This paper contributes to critical policy research by theorising one aspect of policy enactment, the meaning making work of a cohort of mid-level policy actors. Specifically, we propose that Basil Bernstein’s work on the structuring of pedagogic discourse, in particular, the concept of recontextualisation, may add to understandings of the policy work of interpretation and translation. Recontextualisation refers to the relational processes of selecting and moving knowledge from one context to another, as well as to the distinctive re-organisation of knowledge as an instructional and regulative or moral discourse. Processes of recontextualisation necessitate an analysis of power and control relations, and therefore add to the Foucauldian theorisations of power that currently dominate the critical policy literature. A process of code elaboration (decoding and recoding) takes place in various recontextualising agencies, responsible for the production of professional development materials, teaching guidelines and curriculum resources. We propose that mid-level policy actors are crucial to the work of policy interpretation and translation because they are engaged in elaborating the condensed codes of policy texts to an imagined logic of teachers’ practical work. To illustrate our theoretical points we draw on data; collected for an Australian research project on the accounts of mid-level policy actors responsible for the interpretation of child protection and safety policies for staff in Queensland schools.
Resumo:
Enterprise resource planning (ERP) systems are rapidly being combined with “big data” analytics processes and publicly available “open data sets”, which are usually outside the arena of the enterprise, to expand activity through better service to current clients as well as identifying new opportunities. Moreover, these activities are now largely based around relevant software systems hosted in a “cloud computing” environment. However, the over 50- year old phrase related to mistrust in computer systems, namely “garbage in, garbage out” or “GIGO”, is used to describe problems of unqualified and unquestioning dependency on information systems. However, a more relevant GIGO interpretation arose sometime later, namely “garbage in, gospel out” signifying that with large scale information systems based around ERP and open datasets as well as “big data” analytics, particularly in a cloud environment, the ability to verify the authenticity and integrity of the data sets used may be almost impossible. In turn, this may easily result in decision making based upon questionable results which are unverifiable. Illicit “impersonation” of and modifications to legitimate data sets may become a reality while at the same time the ability to audit any derived results of analysis may be an important requirement, particularly in the public sector. The pressing need for enhancement of identity, reliability, authenticity and audit services, including naming and addressing services, in this emerging environment is discussed in this paper. Some current and appropriate technologies currently being offered are also examined. However, severe limitations in addressing the problems identified are found and the paper proposes further necessary research work for the area. (Note: This paper is based on an earlier unpublished paper/presentation “Identity, Addressing, Authenticity and Audit Requirements for Trust in ERP, Analytics and Big/Open Data in a ‘Cloud’ Computing Environment: A Review and Proposal” presented to the Department of Accounting and IT, College of Management, National Chung Chen University, 20 November 2013.)
Resumo:
A method is proposed to describe force or compound muscle action potential (CMAP) trace data collected in an electromyography study for motor unit number estimation (MUNE). Experimental data was collected using incre- mental stimulation at multiple durations. However, stimulus information, vital for alternate MUNE methods, is not comparable for multiple duration data and therefore previous methods of MUNE (Ridall et al., 2006, 2007) cannot be used with any reliability. Hypothesised ring combinations of motor units are mod- elled using a multiplicative factor and Bayesian P-spline formulation. The model describes the process for force and CMAP in a meaningful way.
Resumo:
This study was a step forward to improve the performance for discovering useful knowledge – especially, association rules in this study – in databases. The thesis proposed an approach to use granules instead of patterns to represent knowledge implicitly contained in relational databases; and multi-tier structure to interpret association rules in terms of granules. Association mappings were proposed for the construction of multi-tier structure. With these tools, association rules can be quickly assessed and meaningless association rules can be justified according to the association mappings. The experimental results indicated that the proposed approach is promising.
Resumo:
Dealing with the large amount of data resulting from association rule mining is a big challenge. The essential issue is how to provide efficient methods for summarizing and representing meaningful discovered knowledge from databases. This paper presents a new approach called multi-tier granule mining to improve the performance of association rule mining. Rather than using patterns, it uses granules to represent knowledge that is implicitly contained in relational databases. This approach also uses multi-tier structures and association mappings to interpret association rules in terms of granules. Consequently, association rules can be quickly assessed and meaningless association rules can be justified according to these association mappings. The experimental results indicate that the proposed approach is promising
Resumo:
Mortality following hip arthroplasty is affected by a large number of confounding variables each of which must be considered to enable valid interpretation. Relevant variables available from the 2011 NJR data set were included in the Cox model. Mortality rates in hip arthroplasty patients were lower than in the age-matched population across all hip types. Age at surgery, ASA grade, diagnosis, gender, provider type, hip type and lead surgeon grade all had a significant effect on mortality. Schemper's statistic showed that only 18.98% of the variation in mortality was explained by the variables available in the NJR data set. It is inappropriate to use NJR data to study an outcome affected by a multitude of confounding variables when these cannot be adequately accounted for in the available data set.
Resumo:
The cotton strip assay (CSA) is an established technique for measuring soil microbial activity. The technique involves burying cotton strips and measuring their tensile strength after a certain time. This gives a measure of the rotting rate, R, of the cotton strips. R is then a measure of soil microbial activity. This paper examines properties of the technique and indicates how the assay can be optimised. Humidity conditioning of the cotton strips before measuring their tensile strength reduced the within and between day variance and enabled the distribution of the tensile strength measurements to approximate normality. The test data came from a three-way factorial experiment (two soils, two temperatures, three moisture levels). The cotton strips were buried in the soil for intervals of time ranging up to 6 weeks. This enabled the rate of loss of cotton tensile strength with time to be studied under a range of conditions. An inverse cubic model accounted for greater than 90% of the total variation within each treatment combination. This offers support for summarising the decomposition process by a single parameter R. The approximate variance of the decomposition rate was estimated from a function incorporating the variance of tensile strength and the differential of the function for the rate of decomposition, R, with respect to tensile strength. This variance function has a minimum when the measured strength is approximately 2/3 that of the original strength. The estimates of R are almost unbiased and relatively robust against the cotton strips being left in the soil for more or less than the optimal time. We conclude that the rotting rate X should be measured using the inverse cubic equation, and that the cotton strips should be left in the soil until their strength has been reduced to about 2/3.
Resumo:
As a sequel to a paper that dealt with the analysis of two-way quantitative data in large germplasm collections, this paper presents analytical methods appropriate for two-way data matrices consisting of mixed data types, namely, ordered multicategory and quantitative data types. While various pattern analysis techniques have been identified as suitable for analysis of the mixed data types which occur in germplasm collections, the clustering and ordination methods used often can not deal explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions) with incomplete information. However, it is shown that the ordination technique of principal component analysis and the mixture maximum likelihood method of clustering can be employed to achieve such analyses. Germplasm evaluation data for 11436 accessions of groundnut (Arachis hypogaea L.) from the International Research Institute of the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the post-rainy season and five ordered multicategory descriptors were used. Pattern analysis results generally indicated that the accessions could be distinguished into four regions along the continuum of growth habit (or plant erectness). Interpretation of accession membership in these regions was found to be consistent with taxonomic information, such as subspecies. Each growth habit region contained accessions from three of the most common groundnut botanical varieties. This implies that within each of the habit types there is the full range of expression for the other descriptors used in the analysis. Using these types of insights, the patterns of variability in germplasm collections can provide scientists with valuable information for their plant improvement programs.
Resumo:
In studies using macroinvertebrates as indicators for monitoring rivers and streams, species level identifications in comparison with lower resolution identifications can have greater information content and result in more reliable site classifications and better capacity to discriminate between sites, yet many such programmes identify specimens to the resolution of family rather than species. This is often because it is cheaper to obtain family level data than species level data. Choice of appropriate taxonomic resolution is a compromise between the cost of obtaining data at high taxonomic resolutions and the loss of information at lower resolutions. Optimum taxonomic resolution should be determined by the information required to address programme objectives. Costs saved in identifying macroinvertebrates to family level may not be justified if family level data can not give the answers required and expending the extra cost to obtain species level data may not be warranted if cheaper family level data retains sufficient information to meet objectives. We investigated the influence of taxonomic resolution and sample quantification (abundance vs. presence/absence) on the representation of aquatic macroinvertebrate species assemblage patterns and species richness estimates. The study was conducted in a physically harsh dryland river system (Condamine-Balonne River system, located in south-western Queensland, Australia), characterised by low macroinvertebrate diversity. Our 29 study sites covered a wide geographic range and a diversity of lotic conditions and this was reflected by differences between sites in macroinvertebrate assemblage composition and richness. The usefulness of expending the extra cost necessary to identify macroinvertebrates to species was quantified via the benefits this higher resolution data offered in its capacity to discriminate between sites and give accurate estimates of site species richness. We found that very little information (<6%) was lost by identifying taxa to family (or genus), as opposed to species, and that quantifying the abundance of taxa provided greater resolution for pattern interpretation than simply noting their presence/absence. Species richness was very well represented by genus, family and order richness, so that each of these could be used as surrogates of species richness if, for example, surveying to identify diversity hot-spots. It is suggested that sharing of common ecological responses among species within higher taxonomic units is the most plausible mechanism for the results. Based on a cost/benefit analysis, family level abundance data is recommended as the best resolution for resolving patterns in macroinvertebrate assemblages in this system. The relevance of these findings are discussed in the context of other low diversity, harsh, dryland river systems.
Resumo:
The use of Wireless Sensor Networks (WSNs) for vibration-based Structural Health Monitoring (SHM) has become a promising approach due to many advantages such as low cost, fast and flexible deployment. However, inherent technical issues such as data asynchronicity and data loss have prevented these distinct systems from being extensively used. Recently, several SHM-oriented WSNs have been proposed and believed to be able to overcome a large number of technical uncertainties. Nevertheless, there is limited research verifying the applicability of those WSNs with respect to demanding SHM applications like modal analysis and damage identification. Based on a brief review, this paper first reveals that Data Synchronization Error (DSE) is the most inherent factor amongst uncertainties of SHM-oriented WSNs. Effects of this factor are then investigated on outcomes and performance of the most robust Output-only Modal Analysis (OMA) techniques when merging data from multiple sensor setups. The two OMA families selected for this investigation are Frequency Domain Decomposition (FDD) and data-driven Stochastic Subspace Identification (SSI-data) due to the fact that they both have been widely applied in the past decade. Accelerations collected by a wired sensory system on a large-scale laboratory bridge model are initially used as benchmark data after being added with a certain level of noise to account for the higher presence of this factor in SHM-oriented WSNs. From this source, a large number of simulations have been made to generate multiple DSE-corrupted datasets to facilitate statistical analyses. The results of this study show the robustness of FDD and the precautions needed for SSI-data family when dealing with DSE at a relaxed level. Finally, the combination of preferred OMA techniques and the use of the channel projection for the time-domain OMA technique to cope with DSE are recommended.
Resumo:
In this paper the method of renormalization group (RG) [Phys. Rev. E 54, 376 (1996)] is related to the well-known approximations of Rytov and Born used in wave propagation in deterministic and random media. Certain problems in linear and nonlinear media are examined from the viewpoint of RG and compared with the literature on Born and Rytov approximations. It is found that the Rytov approximation forms a special case of the asymptotic expansion generated by the RG, and as such it gives a superior approximation to the exact solution compared with its Born counterpart. Analogous conclusions are reached for nonlinear equations with an intensity-dependent index of refraction where the RG recovers the exact solution. © 2008 Optical Society of America.
Resumo:
Land-use change, particularly clearing of forests for agriculture, has contributed significantly to the observed rise in atmospheric carbon dioxide concentration. Concern about the impacts on climate has led to efforts to monitor and curtail the rapid increase in concentrations of carbon dioxide and other greenhouse gases in the atmosphere. Internationally, much of the current focus is on the Kyoto Protocol to the United Nations Framework Convention on Climate Change (UNFCCC). Although electing to not ratify the Protocol, Australia, as a party to the UNFCCC, reports on national greenhouse gas emissions, trends in emissions and abatement measures. In this paper we review the complex accounting rules for human activities affecting greenhouse gas fluxes in the terrestrial biosphere and explore implications and potential opportunities for managing carbon in the savanna ecosystems of northern Australia. Savannas in Australia are managed for grazing as well as for cultural and environmental values against a background of extreme climate variability and disturbance, notably fire. Methane from livestock and non-CO2 emissions from burning are important components of the total greenhouse gas emissions associated with management of savannas. International developments in carbon accounting for the terrestrial biosphere bring a requirement for better attribution of change in carbon stocks and more detailed and spatially explicit data on such characteristics of savanna ecosystems as fire regimes, production and type of fuel for burning, drivers of woody encroachment, rates of woody regrowth, stocking rates and grazing impacts. The benefits of improved biophysical information and of understanding the impacts on ecosystem function of natural factors and management options will extend beyond greenhouse accounting to better land management for multiple objectives.
Resumo:
Realistic virtual models of leaf surfaces are important for a number of applications in the plant sciences, such as modelling agrichemical spray droplet movement and spreading on the surface. In this context, the virtual surfaces are required to be sufficiently smooth to facilitate the use of the mathematical equations that govern the motion of the droplet. While an effective approach is to apply discrete smoothing D2-spline algorithms to reconstruct the leaf surfaces from three-dimensional scanned data, difficulties arise when dealing with wheat leaves that tend to twist and bend. To overcome this topological difficulty, we develop a parameterisation technique that rotates and translates the original data, allowing the surface to be fitted using the discrete smoothing D2-spline methods in the new parameter space. Our algorithm uses finite element methods to represent the surface as a linear combination of compactly supported shape functions. Numerical results confirm that the parameterisation, along with the use of discrete smoothing D2-spline techniques, produces realistic virtual representations of wheat leaves.