950 resultados para akaike information criterion
Resumo:
In this study, the Schwarz Information Criterion (SIC) is applied in order to detect change-points in the time series of surface water quality variables. The application of change-point analysis allowed detecting change-points in both the mean and the variance in series under study. Time variations in environmental data are complex and they can hinder the identification of the so-called change-points when traditional models are applied to this type of problems. The assumptions of normality and uncorrelation are not present in some time series, and so, a simulation study is carried out in order to evaluate the methodology’s performance when applied to non-normal data and/or with time correlation.
Resumo:
Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.
Resumo:
Introducción Los sistemas de puntuación para predicción se han desarrollado para medir la severidad de la enfermedad y el pronóstico de los pacientes en la unidad de cuidados intensivos. Estas medidas son útiles para la toma de decisiones clínicas, la estandarización de la investigación, y la comparación de la calidad de la atención al paciente crítico. Materiales y métodos Estudio de tipo observacional analítico de cohorte en el que reviso las historias clínicas de 283 pacientes oncológicos admitidos a la unidad de cuidados intensivos (UCI) durante enero de 2014 a enero de 2016 y a quienes se les estimo la probabilidad de mortalidad con los puntajes pronósticos APACHE IV y MPM II, se realizó regresión logística con las variables predictoras con las que se derivaron cada uno de los modelos es sus estudios originales y se determinó la calibración, la discriminación y se calcularon los criterios de información Akaike AIC y Bayesiano BIC. Resultados En la evaluación de desempeño de los puntajes pronósticos APACHE IV mostro mayor capacidad de predicción (AUC = 0,95) en comparación con MPM II (AUC = 0,78), los dos modelos mostraron calibración adecuada con estadístico de Hosmer y Lemeshow para APACHE IV (p = 0,39) y para MPM II (p = 0,99). El ∆ BIC es de 2,9 que muestra evidencia positiva en contra de APACHE IV. Se reporta el estadístico AIC siendo menor para APACHE IV lo que indica que es el modelo con mejor ajuste a los datos. Conclusiones APACHE IV tiene un buen desempeño en la predicción de mortalidad de pacientes críticamente enfermos, incluyendo pacientes oncológicos. Por lo tanto se trata de una herramienta útil para el clínico en su labor diaria, al permitirle distinguir los pacientes con alta probabilidad de mortalidad.
Resumo:
Purpose – The purpose of this paper is to examine the use of bid information, including both price and non-price factors in predicting the bidder’s performance. Design/methodology/approach – The practice of the industry was first reviewed. Data on bid evaluation and performance records of the successful bids were then obtained from the Hong Kong Housing Department, the largest housing provider in Hong Kong. This was followed by the development of a radial basis function (RBF) neural network based performance prediction model. Findings – It is found that public clients are more conscientious and include non-price factors in their bid evaluation equations. With the input variables used the information is available at the time of the bid and the output variable is the project performance score recorded during work in progress achieved by the successful bidder. It was found that past project performance score is the most sensitive input variable in predicting future performance. Research limitations/implications – The paper shows the inadequacy of using price alone for bid award criterion. The need for a systemic performance evaluation is also highlighted, as this information is highly instrumental for subsequent bid evaluations. The caveat for this study is that the prediction model was developed based on data obtained from one single source. Originality/value – The value of the paper is in the use of an RBF neural network as the prediction tool because it can model non-linear function. This capability avoids tedious ‘‘trial and error’’ in deciding the number of hidden layers to be used in the network model. Keywords Hong Kong, Construction industry, Neural nets, Modelling, Bid offer spreads Paper type Research paper
Resumo:
This paper firstly presents an extended ambiguity resolution model that deals with an ill-posed problem and constraints among the estimated parameters. In the extended model, the regularization criterion is used instead of the traditional least squares in order to estimate the float ambiguities better. The existing models can be derived from the general model. Secondly, the paper examines the existing ambiguity searching methods from four aspects: exclusion of nuisance integer candidates based on the available integer constraints; integer rounding; integer bootstrapping and integer least squares estimations. Finally, this paper systematically addresses the similarities and differences between the generalized TCAR and decorrelation methods from both theoretical and practical aspects.
Resumo:
Corneal-height data are typically measured with videokeratoscopes and modeled using a set of orthogonal Zernike polynomials. We address the estimation of the number of Zernike polynomials, which is formalized as a model-order selection problem in linear regression. Classical information-theoretic criteria tend to overestimate the corneal surface due to the weakness of their penalty functions, while bootstrap-based techniques tend to underestimate the surface or require extensive processing. In this paper, we propose to use the efficient detection criterion (EDC), which has the same general form of information-theoretic-based criteria, as an alternative to estimating the optimal number of Zernike polynomials. We first show, via simulations, that the EDC outperforms a large number of information-theoretic criteria and resampling-based techniques. We then illustrate that using the EDC for real corneas results in models that are in closer agreement with clinical expectations and provides means for distinguishing normal corneal surfaces from astigmatic and keratoconic surfaces.
Resumo:
Background: In the last decade, there has been increasing interest in the health effects of sedentary behavior, which is often assessed using self-report sitting-time questions. The aim of this qualitative study was to document older adults’ understanding of sitting-time questions from the International Physical Activity (PA) Questionnaire (IPAQ) and the PA Scale for the Elderly (PASE). Methods: Australian community-dwelling adults aged 65+ years answered the IPAQ and PASE sitting questions in face-to-face semi-structured interviews. IPAQ uses one open-ended question to assess sitting on a weekday in the last 7 days 'at work, at home, while doing coursework and during leisure time'; PASE uses a three-part closed question about daily leisure-time sitting in the last 7 days. Participants expressed their thoughts out loud while answering each question. They were then probed about their responses. Interviews were recorded, transcribed and coded into themes. Results: Mean age of the 28 male and 27 female participants was 73 years (range 65-89). The most frequently reported activity was watching TV. For both questionnaires, many participants had difficulties understanding what activities to report. Some had difficulty understanding what activities should be classified as ‘leisure-time sitting’. Some assumed they were being asked to only report activities provided as examples. Most reported activities they normally do, rather than those performed on a day in the previous week. Participants used a variety of strategies to select ‘a day’ for which they reported their sitting activities and to calculate sitting time on that day. Therefore, many different ways of estimating sitting time were used. Participants had particular difficulty reporting their daily sitting-time when their schedules were not consistent across days. Some participants declared the IPAQ sitting question too difficult to answer. Conclusion: The accuracy of older adults’ self-reported sitting time is questionable given the challenges they have in answering sitting-time questions. Their responses to sitting-time questions may be more accurate if our recommendations for clarifying the sitting domains, providing examples relevant to older adults and suggesting strategies for formulating responses are incorporated. Future quantitative studies should include objective criterion measures to assess validity and reliability of these questions.
Resumo:
Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we provide results for average reward BMDPs. We establish a fundamental relationship between the discounted and the average reward problems, prove the existence of Blackwell optimal policies and, for both notions of optimality, derive algorithms that converge to the optimal value function.
Resumo:
Evaluating the validity of formative variables has presented ongoing challenges for researchers. In this paper we use global criterion measures to compare and critically evaluate two alternative formative measures of System Quality. One model is based on the ISO-9126 software quality standard, and the other is based on a leading information systems research model. We find that despite both models having a strong provenance, many of the items appear to be non-significant in our study. We examine the implications of this by evaluating the quality of the criterion variables we used, and the performance of PLS when evaluating formative models with a large number of items. We find that our respondents had difficulty distinguishing between global criterion variables measuring different aspects of overall System Quality. Also, because formative indicators “compete with one another” in PLS, it may be difficult to develop a set of measures which are all significant for a complex formative construct with a broad scope and a large number of items. Overall, we suggest that there is cautious evidence that both sets of measures are valid and largely equivalent, although questions still remain about the measures, the use of criterion variables, and the use of PLS for this type of model evaluation.
Resumo:
The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool `PromPredict'. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.
Resumo:
Guo and Nixon proposed a feature selection method based on maximizing I(x; Y),the multidimensional mutual information between feature vector x and class variable Y. Because computing I(x; Y) can be difficult in practice, Guo and Nixon proposed an approximation of I(x; Y) as the criterion for feature selection. We show that Guo and Nixon's criterion originates from approximating the joint probability distributions in I(x; Y) by second-order product distributions. We remark on the limitations of the approximation and discuss computationally attractive alternatives to compute I(x; Y).
Resumo:
The research question of this thesis was how knowledge can be managed with information systems. Information systems can support but not replace knowledge management. Systems can mainly store epistemic organisational knowledge included in content, and process data and information. Certain value can be achieved by adding communication technology to systems. All communication, however, can not be managed. A new layer between communication and manageable information was named as knowformation. Knowledge management literature was surveyed, together with information species from philosophy, physics, communication theory, and information system science. Positivism, post-positivism, and critical theory were studied, but knowformation in extended organisational memory seemed to be socially constructed. A memory management model of an extended enterprise (M3.exe) and knowformation concept were findings from iterative case studies, covering data, information and knowledge management systems. The cases varied from groups towards extended organisation. Systems were investigated, and administrators, users (knowledge workers) and managers interviewed. The model building required alternative sets of data, information and knowledge, instead of using the traditional pyramid. Also the explicit-tacit dichotomy was reconsidered. As human knowledge is the final aim of all data and information in the systems, the distinction between management of information vs. management of people was harmonised. Information systems were classified as the core of organisational memory. The content of the systems is in practice between communication and presentation. Firstly, the epistemic criterion of knowledge is not required neither in the knowledge management literature, nor from the content of the systems. Secondly, systems deal mostly with containers, and the knowledge management literature with applied knowledge. Also the construction of reality based on the system content and communication supports the knowformation concept. Knowformation belongs to memory management model of an extended enterprise (M3.exe) that is divided into horizontal and vertical key dimensions. Vertically, processes deal with content that can be managed, whereas communication can be supported, mainly by infrastructure. Horizontally, the right hand side of the model contains systems, and the left hand side content, which should be independent from each other. A strategy based on the model was defined.
Resumo:
Recently, Guo and Xia gave sufficient conditions for an STBC to achieve full diversity when a PIC (Partial Interference Cancellation) or a PIC-SIC (PIC with Successive Interference Cancellation) decoder is used at the receiver. In this paper, we give alternative conditions for an STBC to achieve full diversity with PIC and PIC-SIC decoders, which are equivalent to Guo and Xia's conditions, but are much easier to check. Using these conditions, we construct a new class of full diversity PIC-SIC decodable codes, which contain the Toeplitz codes and a family of codes recently proposed by Zhang, Xu et. al. as proper subclasses. With the help of the new criteria, we also show that a class of PIC-SIC decodable codes recently proposed by Zhang, Shi et. al. can be decoded with much lower complexity than what is reported, without compromising on full diversity.
Resumo:
For any n(t) transmit, n(r) receive antenna (n(t) x n(r)) multiple-input multiple-output (MIMO) system in a quasi-static Rayleigh fading environment, it was shown by Elia et al. that linear space-time block code schemes (LSTBC schemes) that have the nonvanishing determinant (NVD) property are diversity-multiplexing gain tradeoff (DMT)-optimal for arbitrary values of n(r) if they have a code rate of n(t) complex dimensions per channel use. However, for asymmetric MIMO systems (where n(r) < n(t)), with the exception of a few LSTBC schemes, it is unknown whether general LSTBC schemes with NVD and a code rate of n(r) complex dimensions per channel use are DMT optimal. In this paper, an enhanced sufficient criterion for any STBC scheme to be DMT optimal is obtained, and using this criterion, it is established that any LSTBC scheme with NVD and a code rate of min {n(t), n(r)} complex dimensions per channel use is DMT optimal. This result settles the DMT optimality of several well-known, low-ML-decoding-complexity LSTBC schemes for certain asymmetric MIMO systems.
Resumo:
We consider a discrete time partially observable zero-sum stochastic game with average payoff criterion. We study the game using an equivalent completely observable game. We show that the game has a value and also we present a pair of optimal strategies for both the players.