851 resultados para statistical methods
Resumo:
Objectives Demonstrate the application of decision trees – classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs) – to understand structure in missing data. Setting Data taken from employees at three different industry sites in Australia. Participants 7915 observations were included. Materials and Methods The approach was evaluated using an occupational health dataset comprising results of questionnaires, medical tests, and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the Type of data (medical or environmental), the site in which it was collected, the number of visits and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusion Researchers are encouraged to use CART and BRT models to explore and understand missing data.
Resumo:
In an ever-changing and globalised world there is a need for higher education to adapt and evolve its models of learning and teaching. The old industrial model has lost traction, and new patterns of creative engagement are required. These new models potentially increase relevancy and better equip students for the future. Although creativity is recognised as an attribute that can contribute much to the development of these pedagogies, and creativity is valued by universities as a graduate capability, some educators understandably struggle to translate this vision into practice. This paper reports on selected survey findings from a mixed methods research project which aimed to shed light on how creativity can be designed for in higher education learning and teaching settings. A social constructivist epistemology underpinned the research and data was gathered using survey and case study methods. Descriptive statistical methods and informed grounded theory were employed for the analysis reported here. The findings confirm that creativity is valued for its contribution to the development of students’ academic work, employment opportunities and life in general; however, tensions arise between individual educator’s creative pedagogical goals and the provision of institutional support for implementation of those objectives. Designing for creativity becomes, paradoxically, a matter of navigating and limiting complexity and uncertainty, while simultaneously designing for those same states or qualities.
Resumo:
Pedestrian safety is a critical issue in Ethiopia. Reports show that 50 to 60% of traffic fatality victims in the country are pedestrians. The primary aim of this research was to examine the possible causes of and contributing factors to crashes with pedestrians in Ethiopia, and improve pedestrian safety by recommending possible countermeasures. The secondary aim was to develop appropriate pedestrian crash models for two-way two-lane rural roads and roundabouts in the capital city of Ethiopia. This research uses quantitative methods throughout the process of the investigation. The research has applied various statistical methods. The results of this research support the idea that geometric and operational features have significant influence on pedestrian safety and crashes. Accordingly, policies and strategies are needed to safeguard pedestrians in Ethiopia.
Resumo:
Genome-wide association studies (GWAS) have identified around 60 common variants associated with multiple sclerosis (MS), but these loci only explain a fraction of the heritability of MS. Some missing heritability may be caused by rare variants that have been suggested to play an important role in the aetiology of complex diseases such as MS. However current genetic and statistical methods for detecting rare variants are expensive and time consuming. 'Population-based linkage analysis' (PBLA) or so called identity-by-descent (IBD) mapping is a novel way to detect rare variants in extant GWAS datasets. We employed BEAGLE fastIBD to search for rare MS variants utilising IBD mapping in a large GWAS dataset of 3,543 cases and 5,898 controls. We identified a genome-wide significant linkage signal on chromosome 19 (LOD = 4.65; p = 1.9×10-6). Network analysis of cases and controls sharing haplotypes on chromosome 19 further strengthened the association as there are more large networks of cases sharing haplotypes than controls. This linkage region includes a cluster of zinc finger genes of unknown function. Analysis of genome wide transcriptome data suggests that genes in this zinc finger cluster may be involved in very early developmental regulation of the CNS. Our study also indicates that BEAGLE fastIBD allowed identification of rare variants in large unrelated population with moderate computational intensity. Even with the development of whole-genome sequencing, IBD mapping still may be a promising way to narrow down the region of interest for sequencing priority. © 2013 Lin et al.
Resumo:
This project was a step forward in applying statistical methods and models to provide new insights for more informed decision-making at large spatial scales. The model has been designed to address complicated effects of ecological processes that govern the state of populations and uncertainties inherent in large spatio-temporal datasets. Specifically, the thesis contributes to better understanding and management of the Great Barrier Reef.
Resumo:
Ordinal qualitative data are often collected for phenotypical measurements in plant pathology and other biological sciences. Statistical methods, such as t tests or analysis of variance, are usually used to analyze ordinal data when comparing two groups or multiple groups. However, the underlying assumptions such as normality and homogeneous variances are often violated for qualitative data. To this end, we investigated an alternative methodology, rank regression, for analyzing the ordinal data. The rank-based methods are essentially based on pairwise comparisons and, therefore, can deal with qualitative data naturally. They require neither normality assumption nor data transformation. Apart from robustness against outliers and high efficiency, the rank regression can also incorporate covariate effects in the same way as the ordinary regression. By reanalyzing a data set from a wheat Fusarium crown rot study, we illustrated the use of the rank regression methodology and demonstrated that the rank regression models appear to be more appropriate and sensible for analyzing nonnormal data and data with outliers.
Resumo:
So far, most Phase II trials have been designed and analysed under a frequentist framework. Under this framework, a trial is designed so that the overall Type I and Type II errors of the trial are controlled at some desired levels. Recently, a number of articles have advocated the use of Bavesian designs in practice. Under a Bayesian framework, a trial is designed so that the trial stops when the posterior probability of treatment is within certain prespecified thresholds. In this article, we argue that trials under a Bayesian framework can also be designed to control frequentist error rates. We introduce a Bayesian version of Simon's well-known two-stage design to achieve this goal. We also consider two other errors, which are called Bayesian errors in this article because of their similarities to posterior probabilities. We show that our method can also control these Bayesian-type errors. We compare our method with other recent Bayesian designs in a numerical study and discuss implications of different designs on error rates. An example of a clinical trial for patients with nasopharyngeal carcinoma is used to illustrate differences of the different designs.
Resumo:
Statistical methods are often used to analyse commercial catch and effort data to provide standardised fishing effort and/or a relative index of fish abundance for input into stock assessment models. Achieving reliable results has proved difficult in Australia's Northern Prawn Fishery (NPF), due to a combination of such factors as the biological characteristics of the animals, some aspects of the fleet dynamics, and the changes in fishing technology. For this set of data, we compared four modelling approaches (linear models, mixed models, generalised estimating equations, and generalised linear models) with respect to the outcomes of the standardised fishing effort or the relative index of abundance. We also varied the number and form of vessel covariates in the models. Within a subset of data from this fishery, modelling correlation structures did not alter the conclusions from simpler statistical models. The random-effects models also yielded similar results. This is because the estimators are all consistent even if the correlation structure is mis-specified, and the data set is very large. However, the standard errors from different models differed, suggesting that different methods have different statistical efficiency. We suggest that there is value in modelling the variance function and the correlation structure, to make valid and efficient statistical inferences and gain insight into the data. We found that fishing power was separable from the indices of prawn abundance only when we offset the impact of vessel characteristics at assumed values from external sources. This may be due to the large degree of confounding within the data, and the extreme temporal changes in certain aspects of individual vessels, the fleet and the fleet dynamics.
Resumo:
Many statistical forecast systems are available to interested users. In order to be useful for decision-making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and their statistical manifestation have been firmly established, the forecasts must also provide some quantitative evidence of `quality’. However, the quality of statistical climate forecast systems (forecast quality) is an ill-defined and frequently misunderstood property. Often, providers and users of such forecast systems are unclear about what ‘quality’ entails and how to measure it, leading to confusion and misinformation. Here we present a generic framework to quantify aspects of forecast quality using an inferential approach to calculate nominal significance levels (p-values) that can be obtained either by directly applying non-parametric statistical tests such as Kruskal-Wallis (KW) or Kolmogorov-Smirnov (KS) or by using Monte-Carlo methods (in the case of forecast skill scores). Once converted to p-values, these forecast quality measures provide a means to objectively evaluate and compare temporal and spatial patterns of forecast quality across datasets and forecast systems. Our analysis demonstrates the importance of providing p-values rather than adopting some arbitrarily chosen significance levels such as p < 0.05 or p < 0.01, which is still common practice. This is illustrated by applying non-parametric tests (such as KW and KS) and skill scoring methods (LEPS and RPSS) to the 5-phase Southern Oscillation Index classification system using historical rainfall data from Australia, The Republic of South Africa and India. The selection of quality measures is solely based on their common use and does not constitute endorsement. We found that non-parametric statistical tests can be adequate proxies for skill measures such as LEPS or RPSS. The framework can be implemented anywhere, regardless of dataset, forecast system or quality measure. Eventually such inferential evidence should be complimented by descriptive statistical methods in order to fully assist in operational risk management.
Resumo:
The development of innovative methods of stock assessment is a priority for State and Commonwealth fisheries agencies. It is driven by the need to facilitate sustainable exploitation of naturally occurring fisheries resources for the current and future economic, social and environmental well being of Australia. This project was initiated in this context and took advantage of considerable recent achievements in genomics that are shaping our comprehension of the DNA of humans and animals. The basic idea behind this project was that genetic estimates of effective population size, which can be made from empirical measurements of genetic drift, were equivalent to estimates of the successful number of spawners that is an important parameter in process of fisheries stock assessment. The broad objectives of this study were to 1. Critically evaluate a variety of mathematical methods of calculating effective spawner numbers (Ne) by a. conducting comprehensive computer simulations, and by b. analysis of empirical data collected from the Moreton Bay population of tiger prawns (P. esculentus). 2. Lay the groundwork for the application of the technology in the northern prawn fishery (NPF). 3. Produce software for the calculation of Ne, and to make it widely available. The project pulled together a range of mathematical models for estimating current effective population size from diverse sources. Some of them had been recently implemented with the latest statistical methods (eg. Bayesian framework Berthier, Beaumont et al. 2002), while others had lower profiles (eg. Pudovkin, Zaykin et al. 1996; Rousset and Raymond 1995). Computer code and later software with a user-friendly interface (NeEstimator) was produced to implement the methods. This was used as a basis for simulation experiments to evaluate the performance of the methods with an individual-based model of a prawn population. Following the guidelines suggested by computer simulations, the tiger prawn population in Moreton Bay (south-east Queensland) was sampled for genetic analysis with eight microsatellite loci in three successive spring spawning seasons in 2001, 2002 and 2003. As predicted by the simulations, the estimates had non-infinite upper confidence limits, which is a major achievement for the application of the method to a naturally-occurring, short generation, highly fecund invertebrate species. The genetic estimate of the number of successful spawners was around 1000 individuals in two consecutive years. This contrasts with about 500,000 prawns participating in spawning. It is not possible to distinguish successful from non-successful spawners so we suggest a high level of protection for the entire spawning population. We interpret the difference in numbers between successful and non-successful spawners as a large variation in the number of offspring per family that survive – a large number of families have no surviving offspring, while a few have a large number. We explored various ways in which Ne can be useful in fisheries management. It can be a surrogate for spawning population size, assuming the ratio between Ne and spawning population size has been previously calculated for that species. Alternatively, it can be a surrogate for recruitment, again assuming that the ratio between Ne and recruitment has been previously determined. The number of species that can be analysed in this way, however, is likely to be small because of species-specific life history requirements that need to be satisfied for accuracy. The most universal approach would be to integrate Ne with spawning stock-recruitment models, so that these models are more accurate when applied to fisheries populations. A pathway to achieve this was established in this project, which we predict will significantly improve fisheries sustainability in the future. Regardless of the success of integrating Ne into spawning stock-recruitment models, Ne could be used as a fisheries monitoring tool. Declines in spawning stock size or increases in natural or harvest mortality would be reflected by a decline in Ne. This would be good for data-poor fisheries and provides fishery independent information, however, we suggest a species-by-species approach. Some species may be too numerous or experiencing too much migration for the method to work. During the project two important theoretical studies of the simultaneous estimation of effective population size and migration were published (Vitalis and Couvet 2001b; Wang and Whitlock 2003). These methods, combined with collection of preliminary genetic data from the tiger prawn population in southern Gulf of Carpentaria population and a computer simulation study that evaluated the effect of differing reproductive strategies on genetic estimates, suggest that this technology could make an important contribution to the stock assessment process in the northern prawn fishery (NPF). Advances in the genomics world are rapid and already a cheaper, more reliable substitute for microsatellite loci in this technology is available. Digital data from single nucleotide polymorphisms (SNPs) are likely to super cede ‘analogue’ microsatellite data, making it cheaper and easier to apply the method to species with large population sizes.
Resumo:
Recurring water stresses are a major risk factor for rainfed maize cropping across the highly diverse agro-ecological environments of Queensland (Qld) and northern New South Wales (NNSW). Enhanced understanding of such agro-ecological diversity is necessary to more consistently sample target production environments for testing and targeting release of improved germplasm, and to improve the efficiency of the maize pre-breeding and breeding programs of Qld and New South Wales. Here, we used the Agricultural Production Systems Simulator (APSIM) – a well validated maize crop model to characterize the key distinctive water stress patterns and risk to production across the main maize growing regions of Qld and NNSW located between 15.8° and 31.5°S, and 144.5° and 151.8°E. APSIM was configured to simulate daily water supply demand ratios (SDRs) around anthesis as an indicator of the degree of water stress, and the final grain yield. Simulations were performed using daily climatic records during the period between 1890 and 2010 for 32 sites-soils in the target production regions. The runs were made assuming adequate nitrogen supply for mid-season maize hybrid Pioneer 3153. Hierarchical complete linkage analyses of the simulated yield resulted in five major clusters showing distinct probability distribution of the expected yields and geographic patterns. The drought stress patterns and their frequencies using SDRs were quantified using multivariate statistical methods. The identified stress patterns included no stress, mid-season (flowering) stress, and three terminal stresses differing in terms of severity. The combined frequency of flowering and terminal stresses was highest (82.9%), mainly in sites-soils combinations in the west of Qld and NNSW. Yield variability across the different sites-soils was significantly related to the variability in frequencies of water stresses. Frequencies of water stresses within each yield cluster tended to be similar, but different across clusters. Sites-soils falling within each yield cluster therefore could be treated as distinct maize production environments for testing and targeting newly developed maize cultivars and hybrids for adaptation to water stress patterns most common to those environments.
Resumo:
Curves are a common feature of road infrastructure; however crashes on road curves are associated with increased risk of injury and fatality to vehicle occupants. Countermeasures require the identification of contributing factors. However, current approaches to identifying contributors use traditional statistical methods and have not used self-reported narrative claim to identify factors related to the driver, vehicle and environment in a systemic way. Text mining of 3434 road-curve crash claim records filed between 1 January 2003 and 31 December 2005 at a major insurer in Queensland, Australia, was undertaken to identify risk levels and contributing factors. Rough set analysis was used on insurance claim narratives to identify significant contributing factors to crashes and their associated severity. New contributing factors unique to curve crashes were identified (e.g., tree, phone, over-steer) in addition to those previously identified via traditional statistical analysis of Police and licensing authority records. Text mining is a novel methodology to improve knowledge related to risk and contributing factors to road-curve crash severity. Future road-curve crash countermeasures should more fully consider the interrelationships between environment, the road, the driver and the vehicle, and education campaigns in particular could highlight the increased risk of crash on road-curves.
Resumo:
Fluorinated surfactant-based aqueous film-forming foams (AFFFs) are made up of per- and polyfluorinated alkyl substances (PFAS) and are used to extinguish fires involving highly flammable liquids. The use of perfluorooctanesulfonic acid (PFOS) and other perfluoroalkyl acids (PFAAs) in some AFFF formulations has been linked to substantial environmental contamination. Recent studies have identified a large number of novel and infrequently reported fluorinated surfactants in different AFFF formulations. In this study, a strategy based on a case-control approach using quadrupole time-of-flight tandem mass spectrometry (QTOF-MS/MS) and advanced statistical methods has been used to extract and identify known and unknown PFAS in human serum associated with AFFF-exposed firefighters. Two target sulfonic acids [PFOS and perfluorohexanesulfonic acid (PFHxS)], three non-target acids [perfluoropentanesulfonic acid (PFPeS), perfluoroheptanesulfonic acid (PFHpS), and perfluorononanesulfonic acid (PFNS)], and four unknown sulfonic acids (Cl-PFOS, ketone-PFOS, ether-PFHxS, and Cl-PFHxS) were exclusively or significantly more frequently detected at higher levels in firefighters compared to controls. The application of this strategy has allowed for identification of previously unreported fluorinated chemicals in a timely and cost-efficient way.
Resumo:
It has been known for decades that particles can cause adverse health effects as they are deposited within the respiratory system. Atmospheric aerosol particles influence climate by scattering solar radiation but aerosol particles act also as the nuclei around which cloud droplets form. The principal objectives of this thesis were to investigate the chemical composition and the sources of fine particles in different environments (traffic, urban background, remote) as well as during some specific air pollution situations. Quantifying the climate and health effects of atmospheric aerosols is not possible without detailed information of the aerosol chemical composition. Aerosol measurements were carried out at nine sites in six countries (Finland, Germany, Czech, Netherlands, Greece and Italy). Several different instruments were used in order to measure both the particulate matter (PM) mass and its chemical composition. In the off-line measurements the samples were collected first on a substrate or filter and gravimetric and chemical analysis were conducted in the laboratory. In the on-line measurements the sampling and analysis were either a combined procedure or performed successively within the same instrument. Results from the impactor samples were analyzed by the statistical methods. This thesis comprises also a work where a method for the determination carbonaceous matter size distribution by using a multistage impactor was developed. It was found that the chemistry of PM has usually strong spatial, temporal and size-dependent variability. In the Finnish sites most of the fine PM consisted of organic matter. However, in Greece sulfate dominated the fine PM and in Italy nitrate made the largest contribution to the fine PM. Regarding the size-dependent chemical composition, organic components were likely to be enriched in smaller particles than inorganic ions. Data analysis showed that organic carbon (OC) had four major sources in Helsinki. Secondary production was the major source in Helsinki during spring, summer and fall, whereas in winter biomass combustion dominated OC. The significant impact of biomass combustion on OC concentrations was also observed in the measurements performed in Central Europe. In this thesis aerosol samples were collected mainly by the conventional filter and impactor methods which suffered from the long integration time. However, by filter and impactor measurements chemical mass closure was achieved accurately, and a simple filter sampling was found to be useful in order to explain the sources of PM on the seasonal basis. The online instruments gave additional information related to the temporal variations of the sources and the atmospheric mixing conditions.
Resumo:
Determination of the environmental factors controlling earth surface processes and landform patterns is one of the central themes in physical geography. However, the identification of the main drivers of the geomorphological phenomena is often challenging. Novel spatial analysis and modelling methods could provide new insights into the process-environment relationships. The objective of this research was to map and quantitatively analyse the occurrence of cryogenic phenomena in subarctic Finland. More precisely, utilising a grid-based approach the distribution and abundance of periglacial landforms were modelled to identify important landscape scale environmental factors. The study was performed using a comprehensive empirical data set of periglacial landforms from an area of 600 km2 at a 25-ha resolution. The utilised statistical methods were generalized linear modelling (GLM) and hierarchical partitioning (HP). GLMs were used to produce distribution and abundance models and HP to reveal independently the most likely causal variables. The GLM models were assessed utilising statistical evaluation measures, prediction maps, field observations and the results of HP analyses. A total of 40 different landform types and subtypes were identified. Topographical, soil property and vegetation variables were the primary correlates for the occurrence and cover of active periglacial landforms on the landscape scale. In the model evaluation, most of the GLMs were shown to be robust although the explanation power, prediction ability as well as the selected explanatory variables varied between the models. The great potential of the combination of a spatial grid system, terrain data and novel statistical techniques to map the occurrence of periglacial landforms was demonstrated in this study. GLM proved to be a useful modelling framework for testing the shapes of the response functions and significances of the environmental variables and the HP method helped to make better deductions of the important factors of earth surface processes. Hence, the numerical approach presented in this study can be a useful addition to the current range of techniques available to researchers to map and monitor different geographical phenomena.