153 resultados para correlation analysis
Resumo:
Selection criteria and misspecification tests for the intra-cluster correlation structure (ICS) in longitudinal data analysis are considered. In particular, the asymptotical distribution of the correlation information criterion (CIC) is derived and a new method for selecting a working ICS is proposed by standardizing the selection criterion as the p-value. The CIC test is found to be powerful in detecting misspecification of the working ICS structures, while with respect to the working ICS selection, the standardized CIC test is also shown to have satisfactory performance. Some simulation studies and applications to two real longitudinal datasets are made to illustrate how these criteria and tests might be useful.
Resumo:
A modeling paradigm is proposed for covariate, variance and working correlation structure selection for longitudinal data analysis. Appropriate selection of covariates is pertinent to correct variance modeling and selecting the appropriate covariates and variance function is vital to correlation structure selection. This leads to a stepwise model selection procedure that deploys a combination of different model selection criteria. Although these criteria find a common theoretical root based on approximating the Kullback-Leibler distance, they are designed to address different aspects of model selection and have different merits and limitations. For example, the extended quasi-likelihood information criterion (EQIC) with a covariance penalty performs well for covariate selection even when the working variance function is misspecified, but EQIC contains little information on correlation structures. The proposed model selection strategies are outlined and a Monte Carlo assessment of their finite sample properties is reported. Two longitudinal studies are used for illustration.
Resumo:
Images from cell biology experiments often indicate the presence of cell clustering, which can provide insight into the mechanisms driving the collective cell behaviour. Pair-correlation functions provide quantitative information about the presence, or absence, of clustering in a spatial distribution of cells. This is because the pair-correlation function describes the ratio of the abundance of pairs of cells, separated by a particular distance, relative to a randomly distributed reference population. Pair-correlation functions are often presented as a kernel density estimate where the frequency of pairs of objects are grouped using a particular bandwidth (or bin width), Δ>0. The choice of bandwidth has a dramatic impact: choosing Δ too large produces a pair-correlation function that contains insufficient information, whereas choosing Δ too small produces a pair-correlation signal dominated by fluctuations. Presently, there is little guidance available regarding how to make an objective choice of Δ. We present a new technique to choose Δ by analysing the power spectrum of the discrete Fourier transform of the pair-correlation function. Using synthetic simulation data, we confirm that our approach allows us to objectively choose Δ such that the appropriately binned pair-correlation function captures known features in uniform and clustered synthetic images. We also apply our technique to images from two different cell biology assays. The first assay corresponds to an approximately uniform distribution of cells, while the second assay involves a time series of images of a cell population which forms aggregates over time. The appropriately binned pair-correlation function allows us to make quantitative inferences about the average aggregate size, as well as quantifying how the average aggregate size changes with time.
Resumo:
Correlations between oil and agricultural commodities have varied over previous decades, impacted by renewable fuels policy and turbulent economic conditions. We estimate smooth transition conditional correlation models for 12 agricultural commodities and WTI crude oil. While a structural change in correlations occurred concurrently with the introduction of biofuel policy, oil and food price levels are also key influences. High correlation between biofuel feedstocks and oil is more likely to occur when food and oil price levels are high. Correlation with oil returns is strong for biofuel feedstocks, unlike with other agricultural futures, suggesting limited contagion from energy to food markets.
Resumo:
In the design of tissue engineering scaffolds, design parameters including pore size, shape and interconnectivity, mechanical properties and transport properties should be optimized to maximize successful inducement of bone ingrowth. In this paper we describe a 3D micro-CT and pore partitioning study to derive pore scale parameters including pore radius distribution, accessible radius, throat radius, and connectivity over the pore space of the tissue engineered constructs. These pore scale descriptors are correlated to bone ingrowth into the scaffolds. Quantitative and visual comparisons show a strong correlation between the local accessible pore radius and bone ingrowth; for well connected samples a cutoff accessible pore radius of approximately 100 microM is observed for ingrowth. The elastic properties of different types of scaffolds are simulated and can be described by standard cellular solids theory: (E/E(0))=(rho/rho(s))(n). Hydraulic conductance and diffusive properties are calculated; results are consistent with the concept of a threshold conductance for bone ingrowth. Simple simulations of local flow velocity and local shear stress show no correlation to in vivo bone ingrowth patterns. These results demonstrate a potential for 3D imaging and analysis to define relevant pore scale morphological and physical properties within scaffolds and to provide evidence for correlations between pore scale descriptors, physical properties and bone ingrowth.
Resumo:
Monitoring unused or dark IP addresses offers opportunities to extract useful information about both on-going and new attack patterns. In recent years, different techniques have been used to analyze such traffic including sequential analysis where a change in traffic behavior, for example change in mean, is used as an indication of malicious activity. Change points themselves say little about detected change; further data processing is necessary for the extraction of useful information and to identify the exact cause of the detected change which is limited due to the size and nature of observed traffic. In this paper, we address the problem of analyzing a large volume of such traffic by correlating change points identified in different traffic parameters. The significance of the proposed technique is two-fold. Firstly, automatic extraction of information related to change points by correlating change points detected across multiple traffic parameters. Secondly, validation of the detected change point by the simultaneous presence of another change point in a different parameter. Using a real network trace collected from unused IP addresses, we demonstrate that the proposed technique enables us to not only validate the change point but also extract useful information about the causes of change points.
Resumo:
The composition of many professional services firms in the Urban Development area has moved away from a discipline specific ‘silo’ structure to a more multidisciplinary environment. The benefits of multidisciplinarity have been seen in industry by providing synergies across many of the related disciplines. Similarly, the Queensland University of Technology, Bachelor of Urban Development degree has sought to broaden the knowledge base of students and achieve a greater level of synergy between related urban development disciplines through the introduction of generic and multidisciplinary units. This study aims to evaluate the effectiveness of delivering core property units in a multidisciplinary context. A comparative analysis has been undertaken between core property units and more generic units offered in a multidisciplinary context from introductory, intermediate and advanced years within the property program. This analysis was based on data collected from course performance surveys, student performance results, a student focus group and was informed by a reflective process from the student perspective and lecturer/ tutor feedback. The study showed that there are many benefits associated with multidisciplinary unit offerings across the QUT Urban Development program particularly in the more generic units. However, these units require a greater degree of management. It is more difficult to organise, teach and coordinate multidisciplinary student cohorts due to a difference in prior knowledge and experience between each of the discipline groups. In addition, the interaction between lecturers/ tutors and the students frequently becomes more limited. A perception exists within the student body that this more limited face to face contact with academic staff is not valuable which may be exacerbated by the quality of complimentary online teaching materials. For many academics, non-attendance at lectures was coupled with an increase in email communication. From the limited data collected during the study there appears to be no clear correlation between large multidisciplinary student classes and student academic performance or satisfaction.
Resumo:
Perez-Losada et al. [1] analyzed 72 complete genomes corresponding to nine mammalian (67 strains) and 2 avian (5 strains) polyomavirus species using maximum likelihood and Bayesian methods of phylogenetic inference. Because some data of 2 genomes in their work are now not available in GenBank, in this work, we analyze the phylogenetic relationship of the remaining 70 complete genomes corresponding to nine mammalian (65 strains) and two avian (5 strains) polyomavirus species using a dynamical language model approach developed by our group (Yu et al., [26]). This distance method does not require sequence alignment for deriving species phylogeny based on overall similarities of the complete genomes. Our best tree separates the bird polyomaviruses (avian polyomaviruses and goose hemorrhagic polymaviruses) from the mammalian polyomaviruses, which supports the idea of splitting the genus into two subgenera. Such a split is consistent with the different viral life strategies of each group. In the mammalian polyomavirus subgenera, mouse polyomaviruses (MPV), simian viruses 40 (SV40), BK viruses (BKV) and JC viruses (JCV) are grouped as different branches as expected. The topology of our best tree is quite similar to that of the tree constructed by Perez-Losada et al.
Resumo:
Introduction Polybrominated diphenyl ethers (PBDEs) are considered to be a cost effective and efficient way to reduce the possibility of product ignition and inhibit the spread of fire, thereby limiting harm caused by fires. PBDEs are incorporated into a wide variety of manufactured products and are now considered an ubiquitous contaminant found worldwide in biological and environmental samples . In comparison to “traditional” persistent organic pollutants (POPs), the exposure modes of PBDEs in humans are less well defined, although dietary sources, inhalation (air/particulate matter) and dust ingestion have been reported 2-4. Limited investigations of population specific factors such as age or gender and PBDE concentrations report: no conclusive correlation by age in adults ; higher concentrations in children ; similar concentrations in maternal and cord blood ; and no gender differences . After preliminary findings of higher PBDE concentrations in children than in adults in Australia11 we sought to investigate at what age the PBDE concentrations peaked in an effort to focus exposure studies. This investigation involved the collection of blood samples from young age groups and the development of a simple model to predict PBDE concentrations by age in Australia.
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.
Resumo:
In this paper, we propose a multivariate GARCH model with a time-varying conditional correlation structure. The new double smooth transition conditional correlation (DSTCC) GARCH model extends the smooth transition conditional correlation (STCC) GARCH model of Silvennoinen and Teräsvirta (2005) by including another variable according to which the correlations change smoothly between states of constant correlations. A Lagrange multiplier test is derived to test the constancy of correlations against the DSTCC-GARCH model, and another one to test for another transition in the STCC-GARCH framework. In addition, other specification tests, with the aim of aiding the model building procedure, are considered. Analytical expressions for the test statistics and the required derivatives are provided. Applying the model to the stock and bond futures data, we discover that the correlation pattern between them has dramatically changed around the turn of the century. The model is also applied to a selection of world stock indices, and we find evidence for an increasing degree of integration in the capital markets.
Resumo:
Today’s evolving networks are experiencing a large number of different attacks ranging from system break-ins, infection from automatic attack tools such as worms, viruses, trojan horses and denial of service (DoS). One important aspect of such attacks is that they are often indiscriminate and target Internet addresses without regard to whether they are bona fide allocated or not. Due to the absence of any advertised host services the traffic observed on unused IP addresses is by definition unsolicited and likely to be either opportunistic or malicious. The analysis of large repositories of such traffic can be used to extract useful information about both ongoing and new attack patterns and unearth unusual attack behaviors. However, such an analysis is difficult due to the size and nature of the collected traffic on unused address spaces. In this dissertation, we present a network traffic analysis technique which uses traffic collected from unused address spaces and relies on the statistical properties of the collected traffic, in order to accurately and quickly detect new and ongoing network anomalies. Detection of network anomalies is based on the concept that an anomalous activity usually transforms the network parameters in such a way that their statistical properties no longer remain constant, resulting in abrupt changes. In this dissertation, we use sequential analysis techniques to identify changes in the behavior of network traffic targeting unused address spaces to unveil both ongoing and new attack patterns. Specifically, we have developed a dynamic sliding window based non-parametric cumulative sum change detection techniques for identification of changes in network traffic. Furthermore we have introduced dynamic thresholds to detect changes in network traffic behavior and also detect when a particular change has ended. Experimental results are presented that demonstrate the operational effectiveness and efficiency of the proposed approach, using both synthetically generated datasets and real network traces collected from a dedicated block of unused IP addresses.
Resumo:
Transport regulators consider that, with respect to pavement damage, heavy vehicles (HVs) are the riskiest vehicles on the road network. That HV suspension design contributes to road and bridge damage has been recognised for some decades. This thesis deals with some aspects of HV suspension characteristics, particularly (but not exclusively) air suspensions. This is in the areas of developing low-cost in-service heavy vehicle (HV) suspension testing, the effects of larger-than-industry-standard longitudinal air lines and the characteristics of on-board mass (OBM) systems for HVs. All these areas, whilst seemingly disparate, seek to inform the management of HVs, reduce of their impact on the network asset and/or provide a measurement mechanism for worn HV suspensions. A number of project management groups at the State and National level in Australia have been, and will be, presented with the results of the project that resulted in this thesis. This should serve to inform their activities applicable to this research. A number of HVs were tested for various characteristics. These tests were used to form a number of conclusions about HV suspension behaviours. Wheel forces from road test data were analysed. A “novel roughness” measure was developed and applied to the road test data to determine dynamic load sharing, amongst other research outcomes. Further, it was proposed that this approach could inform future development of pavement models incorporating roughness and peak wheel forces. Left/right variations in wheel forces and wheel force variations for different speeds were also presented. This led on to some conclusions regarding suspension and wheel force frequencies, their transmission to the pavement and repetitive wheel loads in the spatial domain. An improved method of determining dynamic load sharing was developed and presented. It used the correlation coefficient between two elements of a HV to determine dynamic load sharing. This was validated against a mature dynamic loadsharing metric, the dynamic load sharing coefficient (de Pont, 1997). This was the first time that the technique of measuring correlation between elements on a HV has been used for a test case vs. a control case for two different sized air lines. That dynamic load sharing was improved at the air springs was shown for the test case of the large longitudinal air lines. The statistically significant improvement in dynamic load sharing at the air springs from larger longitudinal air lines varied from approximately 30 percent to 80 percent. Dynamic load sharing at the wheels was improved only for low air line flow events for the test case of larger longitudinal air lines. Statistically significant improvements to some suspension metrics across the range of test speeds and “novel roughness” values were evident from the use of larger longitudinal air lines, but these were not uniform. Of note were improvements to suspension metrics involving peak dynamic forces ranging from below the error margin to approximately 24 percent. Abstract models of HV suspensions were developed from the results of some of the tests. Those models were used to propose further development of, and future directions of research into, further gains in HV dynamic load sharing. This was from alterations to currently available damping characteristics combined with implementation of large longitudinal air lines. In-service testing of HV suspensions was found to be possible within a documented range from below the error margin to an error of approximately 16 percent. These results were in comparison with either the manufacturer’s certified data or test results replicating the Australian standard for “road-friendly” HV suspensions, Vehicle Standards Bulletin 11. OBM accuracy testing and development of tamper evidence from OBM data were detailed for over 2000 individual data points across twelve test and control OBM systems from eight suppliers installed on eleven HVs. The results indicated that 95 percent of contemporary OBM systems available in Australia are accurate to +/- 500 kg. The total variation in OBM linearity, after three outliers in the data were removed, was 0.5 percent. A tamper indicator and other OBM metrics that could be used by jurisdictions to determine tamper events were developed and documented. That OBM systems could be used as one vector for in-service testing of HV suspensions was one of a number of synergies between the seemingly disparate streams of this project.