922 resultados para Data Streams Distribution
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
Estimates of potential and actual C sequestration require areal information about various types of management activities. Forest surveys, land use data, and agricultural statistics contribute information enabling calculation of the impacts of current and historical land management on C sequestration in biomass (in forests) or in soil (in agricultural systems). Unfortunately little information exists on the distribution of various management activities that can impact soil C content in grassland systems. Limited information of this type restricts our ability to carry out bottom-up estimates of the current C balance of grasslands or to assess the potential for grasslands to act as C sinks with changes in management. Here we review currently available information about grassland management, how that information could be related to information about the impacts of management on soil C stocks, information that may be available in the future, and needs that remain to be filled before in-depth assessments may be carried out. We also evaluate constraints induced by variability in information sources within and between countries. It is readily apparent that activity data for grassland management is collected less frequently and on a coarser scale than data for forest or agricultural inventories and that grassland activity data cannot be directly translated into IPCC-type factors as is done for IPCC inventories of agricultural soils. However, those management data that are available can serve to delineate broad-scale differences in management activities within regions in which soil C is likely to change in response to changes in management. This, coupled with the distinct possibility of more intensive surveys planned in the future, may enable more accurate assessments of grassland C dynamics with higher resolution both spatially and in the number management activities.
Resumo:
Maintenance activities in a large-scale engineering system are usually scheduled according to the lifetimes of various components in order to ensure the overall reliability of the system. Lifetimes of components can be deduced by the corresponding probability distributions with parameters estimated from past failure data. While failure data of the components is not always readily available, the engineers have to be content with the primitive information from the manufacturers only, such as the mean and standard deviation of lifetime, to plan for the maintenance activities. In this paper, the moment-based piecewise polynomial model (MPPM) are proposed to estimate the parameters of the reliability probability distribution of the products when only the mean and standard deviation of the product lifetime are known. This method employs a group of polynomial functions to estimate the two parameters of the Weibull Distribution according to the mathematical relationship between the shape parameter of two-parameters Weibull Distribution and the ratio of mean and standard deviation. Tests are carried out to evaluate the validity and accuracy of the proposed methods with discussions on its suitability of applications. The proposed method is particularly useful for reliability-critical systems, such as railway and power systems, in which the maintenance activities are scheduled according to the expected lifetimes of the system components.
Resumo:
Mandatory data breach notification has become a matter of increasing concern for law reformers. In Australia, this issue was recently addressed as part of a comprehensive review of privacy law conducted by the Australian Law Reform Commission (ALRC) which recommended a uniform national regime for protecting personal information applicable to both the public and private sectors. As in all federal systems, the distribution of powers between central and state governments poses problems for national consistency. In the authors’ view, a uniform approach to mandatory data breach notification has greater merit than a ‘jurisdiction specific’ approach epitomized by US state-based laws. The US response has given rise to unnecessary overlaps and inefficiencies as demonstrated by a review of different notification triggers and encryption safe harbors. Reviewing the US response, the authors conclude that a uniform approach to data breach notification is inherently more efficient.
Resumo:
Before 2001, most Africans immigrating to Australia were white South Africans and Zimbabweans who arrived as economic and family-reunion migrants (Cox, Cooper & Adepoju, 1999). Black African communities are a more recent addition to the Australian landscape, with most entering Australia as refugees after 2001. African refugees are a particularly disadvantaged immigrant group, which the Department of Immigration and Multicultural Affairs (in the Community Relations Commission of New South Wales, 2006) suggests require high levels of settlement support (p.23). Decision makers and settlement service providers need to have settlement data on the communities so that they can be effective in planning, budgeting and delivering support where it is most needed. Settlement data are also useful for determining the challenges that these communities face in trying to establish themselves in resettlement. There has been no verification of existing secondary data sources, however, or previous formal study of African refugee settlement geography in Southeast Queensland. This research addresses the knowledge gap by using a mixed-method approach to identify and describe the distribution and population size of eight African communities in Southeast Queensland, examine secondary migration patterns in these communities and assess the relationship between these geographic features and housing, a critical factor in successful settlement. Significant discrepancies exist between the primary data gathered in the study and existing secondary data relating to population size and distribution of the communities. Results also reveal a tension between the socio-cultural forces and the housing and economic imperatives driving secondary migration in the communities, and a general lack of engagement by African refugees with structured support networks. These findings have a wide range of implications for policy and for groups that provide settlement support to these communities.
Resumo:
This paper investigates how to interface the wireless application protocol (WAP) architecture to the SCADA system running distributed network protocol (DNP) in a power process plant. DNP is a well-developed protocol to be applied in the supervisory control and data acquisition (SCADA) system but the system control centre and remote terminal units (RTUs) are presently connected through a local area network. The conditions in a process plant are harsh and the site is remote. Resources for data communication are difficult to obtain under these conditions, thus, a wireless channel communication through a mobile phone is practical and efficient in a process plant environment. The mobile communication industries and the public have a strong interest in the WAP technology application in mobile phone networks and the WAP application programming interface (API) in power industry applications is one area that requires extensive investigation.
Resumo:
Background: Efforts to prevent the development of overweight and obesity have increasingly focused early in the life course as we recognise that both metabolic and behavioural patterns are often established within the first few years of life. Randomised controlled trials (RCTs) of interventions are even more powerful when, with forethought, they are synthesised into an individual patient data (IPD) prospective meta-analysis (PMA). An IPD PMA is a unique research design where several trials are identified for inclusion in an analysis before any of the individual trial results become known and the data are provided for each randomised patient. This methodology minimises the publication and selection bias often associated with a retrospective meta-analysis by allowing hypotheses, analysis methods and selection criteria to be specified a priori. Methods/Design: The Early Prevention of Obesity in CHildren (EPOCH) Collaboration was formed in 2009. The main objective of the EPOCH Collaboration is to determine if early intervention for childhood obesity impacts on body mass index (BMI) z scores at age 18-24 months. Additional research questions will focus on whether early intervention has an impact on children’s dietary quality, TV viewing time, duration of breastfeeding and parenting styles. This protocol includes the hypotheses, inclusion criteria and outcome measures to be used in the IPD PMA. The sample size of the combined dataset at final outcome assessment (approximately 1800 infants) will allow greater precision when exploring differences in the effect of early intervention with respect to pre-specified participant- and intervention-level characteristics. Discussion: Finalisation of the data collection procedures and analysis plans will be complete by the end of 2010. Data collection and analysis will occur during 2011-2012 and results should be available by 2013. Trial registration number: ACTRN12610000789066
Resumo:
Much debate in media and communication studies is based on exaggerated opposition between the digital sublime and the digital abject: overly enthusiastic optimism versus determined pessimism over the potential of new technologies. This inhibits the discipline's claims to provide rigorous insight into industry and social change which is, after all, continuous. Instead of having to decide one way or the other, we need to ask how we study the process of change.This article examines the impact of online distribution in the film industry, particularly addressing the question of rates of change. Are there genuinely new players disrupting the established oligopoly, and if so with what effect? Is there evidence of disruption to, and innovation in, business models? Has cultural change been forced on the incumbents? Outside mainstream Hollywood, where are the new opportunities and the new players? What is the situation in Australia?
Resumo:
Background: There has been a lack of investigation into the spatial distribution and clustering of suicide in Australia, where the population density is lower than many countries and varies dramatically among urban, rural and remote areas. This study aims to examine the spatial distribution of suicide at a Local Governmental Area (LGA) level and identify the LGAs with a high relative risk of suicide in Queensland, Australia, using geographical information system (GIS) techniques.---------- Methods: Data on suicide and demographic variables in each LGA between 1999 and 2003 were acquired from the Australian Bureau of Statistics. An age standardised mortality (ASM) rate for suicide was calculated at the LGA level. GIS techniques were used to examine the geographical difference of suicide across different areas.---------- Results: Far north and north-eastern Queensland (i.e., Cook and Mornington Shires) had the highest suicide incidence in both genders, while the south-western areas (i.e., Barcoo and Bauhinia Shires) had the lowest incidence in both genders. In different age groups (≤24 years, 25 to 44 years, 45 to 64 years, and ≥65 years), ASM rates of suicide varied with gender at the LGA level. Mornington and six other LGAs with low socioeconomic status in the upper Southeast had significant spatial clusters of high suicide risk.---------- Conclusions: There was a notable difference in ASM rates of suicide at the LGA level in Queensland. Some LGAs had significant spatial clusters of high suicide risk. The determinants of the geographical difference of suicide should be addressed in future research.
Resumo:
Australia’s Arts and Entertainment Sector underpins cultural and social innovation, improves the quality of community life, is essential to maintaining our cities as world class attractors of talent and investment, and helps create ‘Brand Australia’ in the global marketplace of ideas (QUT Creative Industries Faculty 2010). The sector makes a significant contribution to the Australian economy. So what is the size and nature of this contribution? The Creative Industries Faculty at Queensland University of Technology recently conducted an exercise to source and present statistics in order to produce a data picture of Australia’s Arts and Entertainment Sector. The exercise involved gathering the latest statistics on broadcasting, new media, performing arts, and music composition, distribution and publishing as well as Australia’s performance in world markets.
Resumo:
Background: International data on child maltreatment are largely derived from child protection agencies, and predominantly report only substantiated cases of child maltreatment. This approach underestimates the incidence of maltreatment and makes inter-jurisdictional comparisons difficult. There has been a growing recognition of the importance of health professionals in identifying, documenting and reporting suspected child maltreatment. This study aimed to describe the issues around case identification using coded morbidity data, outline methods for selecting and grouping relevant codes, and illustrate patterns of maltreatment identified. Methods: A comprehensive review of the ICD-10-AM classification system was undertaken, including review of index terms, a free text search of tabular volumes, and a review of coding standards pertaining to child maltreatment coding. Identified codes were further categorised into maltreatment types including physical abuse, sexual abuse, emotional or psychological abuse, and neglect. Using these code groupings, one year of Australian hospitalisation data for children under 18 years of age was examined to quantify the proportion of patients identified and to explore the characteristics of cases assigned maltreatment-related codes. Results: Less than 0.5% of children hospitalised in Australia between 2005 and 2006 had a maltreatment code assigned, almost 4% of children with a principal diagnosis of a mental and behavioural disorder and over 1% of children with an injury or poisoning as the principal diagnosis had a maltreatment code assigned. The patterns of children assigned with definitive T74 codes varied by sex and age group. For males selected as having a maltreatment-related presentation, physical abuse was most commonly coded (62.6% of maltreatment cases) while for females selected as having a maltreatment-related presentation, sexual abuse was the most commonly assigned form of maltreatment (52.9% of maltreatment cases). Conclusion: This study has demonstrated that hospital data could provide valuable information for routine monitoring and surveillance of child maltreatment, even in the absence of population-based linked data sources. With national and international calls for a public health response to child maltreatment, better understanding of, investment in and utilisation of our core national routinely collected data sources will enhance the evidence-base needed to support an appropriate response to children at risk.
Resumo:
Background: Internationally, research on child maltreatment-related injuries has been hampered by a lack of available routinely collected health data to identify cases, examine causes, identify risk factors and explore health outcomes. Routinely collected hospital separation data coded using the International Classification of Diseases and Related Health Problems (ICD) system provide an internationally standardised data source for classifying and aggregating diseases, injuries, causes of injuries and related health conditions for statistical purposes. However, there has been limited research to examine the reliability of these data for child maltreatment surveillance purposes. This study examined the reliability of coding of child maltreatment in Queensland, Australia. Methods: A retrospective medical record review and recoding methodology was used to assess the reliability of coding of child maltreatment. A stratified sample of hospitals across Queensland was selected for this study, and a stratified random sample of cases was selected from within those hospitals. Results: In 3.6% of cases the coders disagreed on whether any maltreatment code could be assigned (definite or possible) versus no maltreatment being assigned (unintentional injury), giving a sensitivity of 0.982 and specificity of 0.948. The review of these cases where discrepancies existed revealed that all cases had some indications of risk documented in the records. 15.5% of cases originally assigned a definite or possible maltreatment code, were recoded to a more or less definite strata. In terms of the number and type of maltreatment codes assigned, the auditor assigned a greater number of maltreatment types based on the medical documentation than the original coder assigned (22% of the auditor coded cases had more than one maltreatment type assigned compared to only 6% of the original coded data). The maltreatment types which were the most ‘under-coded’ by the original coder were psychological abuse and neglect. Cases coded with a sexual abuse code showed the highest level of reliability. Conclusion: Given the increasing international attention being given to improving the uniformity of reporting of child-maltreatment related injuries and the emphasis on the better utilisation of routinely collected health data, this study provides an estimate of the reliability of maltreatment-specific ICD-10-AM codes assigned in an inpatient setting.
Resumo:
The aim of this paper is to demonstrate the validity of using Gaussian mixture models (GMM) for representing probabilistic distributions in a decentralised data fusion (DDF) framework. GMMs are a powerful and compact stochastic representation allowing efficient communication of feature properties in large scale decentralised sensor networks. It will be shown that GMMs provide a basis for analytical solutions to the update and prediction operations for general Bayesian filtering. Furthermore, a variant on the Covariance Intersect algorithm for Gaussian mixtures will be presented ensuring a conservative update for the fusion of correlated information between two nodes in the network. In addition, purely visual sensory data will be used to show that decentralised data fusion and tracking of non-Gaussian states observed by multiple autonomous vehicles is feasible.
Resumo:
Freeways are divided roadways designed to facilitate the uninterrupted movement of motor vehicles. However, many freeways now experience demand flows in excess of capacity, leading to recurrent congestion. The Highway Capacity Manual (TRB, 1994) uses empirical macroscopic relationships between speed, flow and density to quantify freeway operations and performance. Capacity may be predicted as the maximum uncongested flow achievable. Although they are effective tools for design and analysis, macroscopic models lack an understanding of the nature of processes taking place in the system. Szwed and Smith (1972, 1974) and Makigami and Matsuo (1990) have shown that microscopic modelling is also applicable to freeway operations. Such models facilitate an understanding of the processes whilst providing for the assessment of performance, through measures of capacity and delay. However, these models are limited to only a few circumstances. The aim of this study was to produce more comprehensive and practical microscopic models. These models were required to accurately portray the mechanisms of freeway operations at the specific locations under consideration. The models needed to be able to be calibrated using data acquired at these locations. The output of the models needed to be able to be validated with data acquired at these sites. Therefore, the outputs should be truly descriptive of the performance of the facility. A theoretical basis needed to underlie the form of these models, rather than empiricism, which is the case for the macroscopic models currently used. And the models needed to be adaptable to variable operating conditions, so that they may be applied, where possible, to other similar systems and facilities. It was not possible to produce a stand-alone model which is applicable to all facilities and locations, in this single study, however the scene has been set for the application of the models to a much broader range of operating conditions. Opportunities for further development of the models were identified, and procedures provided for the calibration and validation of the models to a wide range of conditions. The models developed, do however, have limitations in their applicability. Only uncongested operations were studied and represented. Driver behaviour in Brisbane was applied to the models. Different mechanisms are likely in other locations due to variability in road rules and driving cultures. Not all manoeuvres evident were modelled. Some unusual manoeuvres were considered unwarranted to model. However the models developed contain the principal processes of freeway operations, merging and lane changing. Gap acceptance theory was applied to these critical operations to assess freeway performance. Gap acceptance theory was found to be applicable to merging, however the major stream, the kerb lane traffic, exercises only a limited priority over the minor stream, the on-ramp traffic. Theory was established to account for this activity. Kerb lane drivers were also found to change to the median lane where possible, to assist coincident mergers. The net limited priority model accounts for this by predicting a reduced major stream flow rate, which excludes lane changers. Cowan's M3 model as calibrated for both streams. On-ramp and total upstream flow are required as input. Relationships between proportion of headways greater than 1 s and flow differed for on-ramps where traffic leaves signalised intersections and unsignalised intersections. Constant departure onramp metering was also modelled. Minimum follow-on times of 1 to 1.2 s were calibrated. Critical gaps were shown to lie between the minimum follow-on time, and the sum of the minimum follow-on time and the 1 s minimum headway. Limited priority capacity and other boundary relationships were established by Troutbeck (1995). The minimum average minor stream delay and corresponding proportion of drivers delayed were quantified theoretically in this study. A simulation model was constructed to predict intermediate minor and major stream delays across all minor and major stream flows. Pseudo-empirical relationships were established to predict average delays. Major stream average delays are limited to 0.5 s, insignificant compared with minor stream delay, which reach infinity at capacity. Minor stream delays were shown to be less when unsignalised intersections are located upstream of on-ramps than signalised intersections, and less still when ramp metering is installed. Smaller delays correspond to improved merge area performance. A more tangible performance measure, the distribution of distances required to merge, was established by including design speeds. This distribution can be measured to validate the model. Merging probabilities can be predicted for given taper lengths, a most useful performance measure. This model was also shown to be applicable to lane changing. Tolerable limits to merging probabilities require calibration. From these, practical capacities can be estimated. Further calibration is required of traffic inputs, critical gap and minimum follow-on time, for both merging and lane changing. A general relationship to predict proportion of drivers delayed requires development. These models can then be used to complement existing macroscopic models to assess performance, and provide further insight into the nature of operations.
Resumo:
The potential restriction to effective dispersal and gene flow caused by habitat fragmentation can apply to multiple levels of evolutionary scale; from the fragmentation of ancient supercontinents driving diversification and speciation on disjunct landmasses, to the isolation of proximate populations as a result of their inability to cross intervening unsuitable habitat. Investigating the role of habitat fragmentation in driving diversity within and among taxa can thus include inferences of phylogenetic relationships among taxa, assessments of intraspecific phylogeographic structure and analyses of gene flow among neighbouring populations. The proposed Gondwanan clade within the chironomid (non-biting midge) subfamily Orthocladiinae (Diptera: Chironomidae) represents a model system for investigating the role that population fragmentation and isolation has played at different evolutionary scales. A pilot study by Krosch et al (2009) indentified several highly divergent lineages restricted to ancient rainforest refugia and limited gene flow among proximate sites within a refuge for one member of this clade, Echinocladius martini Cranston. This study provided a framework for investigating the evolutionary history of this taxon and its relatives more thoroughly. Populations of E. martini were sampled in the Paluma bioregion of northeast Queensland to investigate patterns of fine-scale within- and among-stream dispersal and gene flow within a refuge more rigorously. Data was incorporated from Krosch et al (2009) and additional sites were sampled up- and downstream of the original sites. Analyses of genetic structure revealed strong natal site fidelity and high genetic structure among geographically proximate streams. Little evidence was found for regular headwater exchange among upstream sites, but there was distinct evidence for rare adult flight among sites on separate stream reaches. Overall, however, the distribution of shared haplotypes implied that both larval and adult dispersal was largely limited to the natal stream channel. Patterns of regional phylogeographic structure were examined in two related austral orthoclad taxa – Naonella forsythi Boothroyd from New Zealand and Ferringtonia patagonica Sæther and Andersen from southern South America – to provide a comparison with patterns revealed in their close relative E. martini. Both taxa inhabit tectonically active areas of the southern hemisphere that have also experienced several glaciation events throughout the Plio-Pleistocene that are thought to have affected population structure dramatically in many taxa. Four highly divergent lineages estimated to have diverged since the late Miocene were revealed in each taxon, mirroring patterns in E. martini; however, there was no evidence for local geographical endemism, implying substantial range expansion post-diversification. The differences in pattern evident among the three related taxa were suggested to have been influenced by variation in the responses of closed forest habitat to climatic fluctuations during interglacial periods across the three landmasses. Phylogeographic structure in E. martini was resolved at a continental scale by expanding upon the sampling design of Krosch et al (2009) to encompass populations in southeast Queensland, New South Wales and Victoria. Patterns of phylogeographic structure were consistent with expectations and several previously unrecognised lineages were revealed from central- and southern Australia that were geographically endemic to closed forest refugia. Estimated divergence times were congruent with the timing of Plio-Pleistocene rainforest contractions across the east coast of Australia. This suggested that dispersal and gene flow of E. martini among isolated refugia was highly restricted and that this taxon was susceptible to the impacts of habitat change. Broader phylogenetic relationships among taxa considered to be members of this Gondwanan orthoclad group were resolved in order to test expected patterns of evolutionary affinities across the austral continents. The inferred phylogeny and estimated divergence times did not accord with expected patterns based on the geological sequence of break-up of the Gondwanan supercontinent and implied instead several transoceanic dispersal events post-vicariance. Difficulties in appropriate taxonomic sampling and accurate calibration of molecular phylogenies notwithstanding, the sampling regime implemented in the current study has been the most intensive yet performed for austral members of the Orthocladiinae and unsurprisingly has revealed both novel taxa and phylogenetic relationships within and among described genera. Several novel associations between life stages are made here for both described and previously unknown taxa. Investigating evolutionary relationships within and among members of this clade of proposed Gondwanan orthoclad taxa has demonstrated that a complex interaction between historical population fragmentation and dispersal at several levels of evolutionary scale has been important in driving diversification in this group. While interruptions to migration, colonisation and gene flow driven by population fragmentation have clearly contributed to the development and maintenance of much of the diversity present in this group, long-distance dispersal has also played a role in influencing diversification of continental biotas and facilitating gene flow among disjunct populations.