763 resultados para information criteria
Resumo:
1. Ecological data sets often use clustered measurements or use repeated sampling in a longitudinal design. Choosing the correct covariance structure is an important step in the analysis of such data, as the covariance describes the degree of similarity among the repeated observations. 2. Three methods for choosing the covariance are: the Akaike information criterion (AIC), the quasi-information criterion (QIC), and the deviance information criterion (DIC). We compared the methods using a simulation study and using a data set that explored effects of forest fragmentation on avian species richness over 15 years. 3. The overall success was 80.6% for the AIC, 29.4% for the QIC and 81.6% for the DIC. For the forest fragmentation study the AIC and DIC selected the unstructured covariance, whereas the QIC selected the simpler autoregressive covariance. Graphical diagnostics suggested that the unstructured covariance was probably correct. 4. We recommend using DIC for selecting the correct covariance structure.
Resumo:
Modelling of interferometric signals related to tear film surface quality is considered. In the context of tear film surface quality estimation in normal healthy eyes, two clinical parameters are of interest: the build-up time, and the average interblink surface quality. The former is closely related to the signal derivative while the latter to the signal itself. Polynomial signal models, chosen for a particular set of noisy interferometric measurements, can be optimally selected, in some sense, with a range of information criteria such as AIC, MDL, Cp, and CME. Those criteria, however, do not always guarantee that the true derivative of the signal is accurately represented and they often overestimate it. Here, a practical method for judicious selection of model order in a polynomial fitting to a signal is proposed so that the derivative of the signal is adequately represented. The paper highlights the importance of context-based signal modelling in model order selection.
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.
Resumo:
Statistical modeling of traffic crashes has been of interest to researchers for decades. Over the most recent decade many crash models have accounted for extra-variation in crash counts—variation over and above that accounted for by the Poisson density. The extra-variation – or dispersion – is theorized to capture unaccounted for variation in crashes across sites. The majority of studies have assumed fixed dispersion parameters in over-dispersed crash models—tantamount to assuming that unaccounted for variation is proportional to the expected crash count. Miaou and Lord [Miaou, S.P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Transport. Res. Rec. 1840, 31–40] challenged the fixed dispersion parameter assumption, and examined various dispersion parameter relationships when modeling urban signalized intersection accidents in Toronto. They suggested that further work is needed to determine the appropriateness of the findings for rural as well as other intersection types, to corroborate their findings, and to explore alternative dispersion functions. This study builds upon the work of Miaou and Lord, with exploration of additional dispersion functions, the use of an independent data set, and presents an opportunity to corroborate their findings. Data from Georgia are used in this study. A Bayesian modeling approach with non-informative priors is adopted, using sampling-based estimation via Markov Chain Monte Carlo (MCMC) and the Gibbs sampler. A total of eight model specifications were developed; four of them employed traffic flows as explanatory factors in mean structure while the remainder of them included geometric factors in addition to major and minor road traffic flows. The models were compared and contrasted using the significance of coefficients, standard deviance, chi-square goodness-of-fit, and deviance information criteria (DIC) statistics. The findings indicate that the modeling of the dispersion parameter, which essentially explains the extra-variance structure, depends greatly on how the mean structure is modeled. In the presence of a well-defined mean function, the extra-variance structure generally becomes insignificant, i.e. the variance structure is a simple function of the mean. It appears that extra-variation is a function of covariates when the mean structure (expected crash count) is poorly specified and suffers from omitted variables. In contrast, when sufficient explanatory variables are used to model the mean (expected crash count), extra-Poisson variation is not significantly related to these variables. If these results are generalizable, they suggest that model specification may be improved by testing extra-variation functions for significance. They also suggest that known influences of expected crash counts are likely to be different than factors that might help to explain unaccounted for variation in crashes across sites
Resumo:
Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.
Resumo:
Spatial data analysis has become more and more important in the studies of ecology and economics during the last decade. One focus of spatial data analysis is how to select predictors, variance functions and correlation functions. However, in general, the true covariance function is unknown and the working covariance structure is often misspecified. In this paper, our target is to find a good strategy to identify the best model from the candidate set using model selection criteria. This paper is to evaluate the ability of some information criteria (corrected Akaike information criterion, Bayesian information criterion (BIC) and residual information criterion (RIC)) for choosing the optimal model when the working correlation function, the working variance function and the working mean function are correct or misspecified. Simulations are carried out for small to moderate sample sizes. Four candidate covariance functions (exponential, Gaussian, Matern and rational quadratic) are used in simulation studies. With the summary in simulation results, we find that the misspecified working correlation structure can still capture some spatial correlation information in model fitting. When the sample size is large enough, BIC and RIC perform well even if the the working covariance is misspecified. Moreover, the performance of these information criteria is related to the average level of model fitting which can be indicated by the average adjusted R square ( [GRAPHICS] ), and overall RIC performs well.
Resumo:
This paper is concerned with the study of the equilibrium exchange of ammonium ions with two natural zeolite samples sourced in Australia from Castle Mountain Zeolites and Zeolite Australia. A range of sorption models including Langmuir Vageler, Competitive Langmuir, Freundlich, Temkin, Dubinin Astakhov and Brouers–Sotolongo were applied in order to gain an insight as to the exchange process. In contrast to most previous studies, non-linear regression was used in all instances to determine the best fit of the experimental data. Castle Mountain natural zeolite was found to exhibit higher ammonium capacity than Zeolite Australia material when in the freshly received state, and this behavior was related to the greater amount of sodium ions present relative to calcium ions on the zeolite exchange sites. The zeolite capacity for ammonium ions was also found to be dependent on the solution normality, with 35–60% increase inuptake noted when increasing the ammonium concentration from 250 to 1000 mg/L. The optimal fit ofthe equilibrium data was achieved by the Freundlich expression as confirmed by use of Akaikes Information Criteria. It was emphasized that the bottle-point method chosen influenced the isotherm profile in several ways, and could lead to misleading interpretation of experiments, especially if the constant zeolite mass approach was followed. Pre-treatment of natural zeolite with acid and subsequently sodium hydroxide promoted the uptake of ammonium species by at least 90%. This paper highlighted the factors which should be taken into account when investigating ammonium ion exchange with natural zeolites.
Resumo:
The interdependence of Greece and other European stock markets and the subsequent portfolio implications are examined in wavelet and variational mode decomposition domain. In applying the decomposition techniques, we analyze the structural properties of data and distinguish between short and long term dynamics of stock market returns. First, the GARCH-type models are fitted to obtain the standardized residuals. Next, different copula functions are evaluated, and based on the conventional information criteria and time varying parameter, Joe-Clayton copula is chosen to model the tail dependence between the stock markets. The short-run lower tail dependence time paths show a sudden increase in comovement during the global financial crises. The results of the long-run dependence suggest that European stock markets have higher interdependence with Greece stock market. Individual country’s Value at Risk (VaR) separates the countries into two distinct groups. Finally, the two-asset portfolio VaR measures provide potential markets for Greece stock market investment diversification.
Resumo:
Ecosystem based management requires the integration of various types of assessment indicators. Understanding stakeholders' information preferences is important, in selecting those indicators that best support management and policy. Both the preferences of decision-makers and the general public may matter, in democratic participatory management institutions. This paper presents a multi-criteria analysis aimed at quantifying the relative importance to these groups of economic, ecological and socio-economic indicators usually considered when managing ecosystem services in a coastal development context. The Analytic Hierarchy Process (AHP) is applied within two nationwide surveys in Australia, and preferences of both the general public and decision-makers for these indicators are elicited and compared. Results show that, on average across both groups, the priority in assessing a generic coastal development project is for the ecological assessment of its impacts on marine biodiversity. Ecological assessment indicators are globally preferred to both economic and socio-economic indicators regardless of the nature of the impacts studied. These results are observed for a significantly larger proportion of decision-maker than general public respondents, questioning the extent to which the general public's preferences are well reflected in decision-making processes.
Resumo:
Research and innovation in the built environment is increasingly taking on an inter-disciplinary nature. The built environment industry and professional practice have long adopted multi and inter-disciplinary practices. The application of IT in Construction is moving beyond the automation and replication of discrete mono and multi-disciplinary tasks to replicate and model the improved inter-disciplinary processes of modern design and construction practice. A major long-term research project underway at the University of Salford seeks to develop IT modelling capability to support the design of buildings and facilities that are buildable, maintainable, operable, sustainable, accessible, and have properties of acoustic, thermal and business support performance that are of a high standard. Such an IT modelling tool has been the dream of the research community for a long time. Recent advances in technology are beginning to make such a modelling tool feasible.----- Some of the key problems with its further research and development, and with its ultimate implementation, will be the challenges of multiple research and built environment stakeholders sharing a common vision, language and sense of trust. This paper explores these challenges as a set of research issues that underpin the development of appropriate technology to support realisable advances in construction process improvements.
Resumo:
This paper reports the application of multicriteria decision making techniques, PROMETHEE and GAIA, and receptor models, PCA/APCS and PMF, to data from an air monitoring site located on the campus of Queensland University of Technology in Brisbane, Australia and operated by Queensland Environmental Protection Agency (QEPA). The data consisted of the concentrations of 21 chemical species and meteorological data collected between 1995 and 2003. PROMETHEE/GAIA separated the samples into those collected when leaded and unleaded petrol were used to power vehicles in the region. The number and source profiles of the factors obtained from PCA/APCS and PMF analyses were compared. There are noticeable differences in the outcomes possibly because of the non-negative constraints imposed on the PMF analysis. While PCA/APCS identified 6 sources, PMF reduced the data to 9 factors. Each factor had distinctive compositions that suggested that motor vehicle emissions, controlled burning of forests, secondary sulphate, sea salt and road dust/soil were the most important sources of fine particulate matter at the site. The most plausible locations of the sources were identified by combining the results obtained from the receptor models with meteorological data. The study demonstrated the potential benefits of combining results from multi-criteria decision making analysis with those from receptor models in order to gain insights into information that could enhance the development of air pollution control measures.
Resumo:
An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).
Resumo:
Objective: To systematically review the published evidence of the impact of health information technology (HIT) on the quality of medical and health care specifically clinicians’ adherence to evidence-based guidelines and the corresponding impact this had on patient clinical outcomes. In order to be as inclusive as possible the research examined literature discussing the use of health information technologies and systems in both medical care such as clinical and surgical, and other health care such as allied health and preventive services.----- Design: Systematic review----- Data Sources: Relevant literature was systematically searched on English language studies indexed in MEDLINE and CINAHL(1998 to 2008), Cochrane Library, PubMed, Database of Abstracts of Review of Effectiveness (DARE), Google scholar and other relevant electronic databases. A search for eligible studies (matching the inclusion criteria) was also performed by searching relevant conference proceedings available through internet and electronic databases, as well as using reference lists identified from cited papers.----- Selection criteria: Studies were included in the review if they examined the impact of Electronic Health Record (EHR), Computerised Provider Order-Entry (CPOE), or Decision Support System (DS); and if the primary outcomes of the studies were focused on the level of compliance with evidence-based guidelines among clinicians. Measures could be either changes in clinical processes resulting from a change of the providers’ behaviour or specific patient outcomes that demonstrated the effectiveness of a particular treatment given by providers. ----- Methods: Studies were reviewed and summarised in tabular and text form. Due to heterogeneity between studies, meta-analysis was not performed.----- Results: Out of 17 studies that assessed the impact of health information technology on health care practitioners’ performance, 14 studies revealed a positive improvement in relation to their compliance with evidence-based guidelines. The primary domain of improvement was evident from preventive care and drug ordering studies. Results from the studies that included an assessment for patient outcomes however, were insufficient to detect either clinically or statistically important improvements as only a small proportion of these studies found benefits. For instance, only 3 studies had shown positive improvement, while 5 studies revealed either no change or adverse outcomes.----- Conclusion: Although the number of included studies was relatively small for reaching a conclusive statement about the effectiveness of health information technologies and systems on clinical care, the results demonstrated consistency with other systematic reviews previously undertaken. Widescale use of HIT has been shown to increase clinician’s adherence to guidelines in this review. Therefore, it presents ongoing opportunities to maximise the uptake of research evidence into practice for health care organisations, policy makers and stakeholders.
Resumo:
The multi-criteria decision making methods, Preference METHods for Enrichment Evaluation (PROMETHEE) and Graphical Analysis for Interactive Assistance (GAIA), and the two-way Positive Matrix Factorization (PMF) receptor model were applied to airborne fine particle compositional data collected at three sites in Hong Kong during two monitoring campaigns held from November 2000 to October 2001 and November 2004 to October 2005. PROMETHEE/GAIA indicated that the three sites were worse during the later monitoring campaign, and that the order of the air quality at the sites during each campaign was: rural site > urban site > roadside site. The PMF analysis on the other hand, identified 6 common sources at all of the sites (diesel vehicle, fresh sea salt, secondary sulphate, soil, aged sea salt and oil combustion) which accounted for approximately 68.8 ± 8.7% of the fine particle mass at the sites. In addition, road dust, gasoline vehicle, biomass burning, secondary nitrate, and metal processing were identified at some of the sites. Secondary sulphate was found to be the highest contributor to the fine particle mass at the rural and urban sites with vehicle emission as a high contributor to the roadside site. The PMF results are broadly similar to those obtained in a previous analysis by PCA/APCS. However, the PMF analysis resolved more factors at each site than the PCA/APCS. In addition, the study demonstrated that combined results from multi-criteria decision making analysis and receptor modelling can provide more detailed information that can be used to formulate the scientific basis for mitigating air pollution in the region.
Resumo:
The traditional searching method for model-order selection in linear regression is a nested full-parameters-set searching procedure over the desired orders, which we call full-model order selection. On the other hand, a method for model-selection searches for the best sub-model within each order. In this paper, we propose using the model-selection searching method for model-order selection, which we call partial-model order selection. We show by simulations that the proposed searching method gives better accuracies than the traditional one, especially for low signal-to-noise ratios over a wide range of model-order selection criteria (both information theoretic based and bootstrap-based). Also, we show that for some models the performance of the bootstrap-based criterion improves significantly by using the proposed partial-model selection searching method. Index Terms— Model order estimation, model selection, information theoretic criteria, bootstrap 1. INTRODUCTION Several model-order selection criteria can be applied to find the optimal order. Some of the more commonly used information theoretic-based procedures include Akaike’s information criterion (AIC) [1], corrected Akaike (AICc) [2], minimum description length (MDL) [3], normalized maximum likelihood (NML) [4], Hannan-Quinn criterion (HQC) [5], conditional model-order estimation (CME) [6], and the efficient detection criterion (EDC) [7]. From a practical point of view, it is difficult to decide which model order selection criterion to use. Many of them perform reasonably well when the signal-to-noise ratio (SNR) is high. The discrepancies in their performance, however, become more evident when the SNR is low. In those situations, the performance of the given technique is not only determined by the model structure (say a polynomial trend versus a Fourier series) but, more importantly, by the relative values of the parameters within the model. This makes the comparison between the model-order selection algorithms difficult as within the same model with a given order one could find an example for which one of the methods performs favourably well or fails [6, 8]. Our aim is to improve the performance of the model order selection criteria in cases where the SNR is low by considering a model-selection searching procedure that takes into account not only the full-model order search but also a partial model order search within the given model order. Understandably, the improvement in the performance of the model order estimation is at the expense of additional computational complexity.