24 resultados para Bootstrap paramétrique
em University of Queensland eSpace - Australia
Resumo:
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.
Resumo:
Most populations and some species of ticks of the genera Boophilus (5 spp.) and Rhipicephalus (ca. 75 spp.) cannot be distinguished phenotypically. Moreover, there is doubt about the validity of species in these genera. I studied the entire second internal transcribed spacer (ITS 2) rRNA of 16 populations of rhipicephaline ticks to address these problems: Boophilus,microplus from Australia, Kenya, South Africa and Brazil (4 populations); Boophilus decoloratus from Kenya; Rhipicephalus appendiculatus from Kenya, Zimbabwe and Zambia (7 populations); Rhipicephalus zambesiensis from Zimbabwe (3 populations); and Rhipicephalus evertsi from Kenya. Each of the 16 populations had a unique ITS 2, but most of the nucleotide variation occurred among species and genera. ITS 2 rRNA can be used to distinguish the populations and species of Boophilus and Rhipicephalus studied here. Little support was found for the hypothesis that B. microplus from Australia and South Africa are different species. ITS 2 appears useful for phylogenetic inference in the Rhipicephalinae because in genetic distance, maximum likelihood, and maximum parsimony analyses, most branches leading to species had >95% bootstrap support. Rhipicephalus appendiculatus and R, zambeziensis are closely related, yet their ITS 2 sequences could be distinguished unambiguously. This lends weight to a previous proposal that Rhipicephalus sanguineus and Rhipicephalus turanicus, and Rhipicephalus pumlilio and Rhipicephalus camicasi, respectively, are conspecific, because each of these pairs of species had identical sequences for ca. 250 bp of ITS 2 rRNA.
Resumo:
There is no morphological synapomorphy for the disparate digeneans, the Fellodistomidae Nicoll, 1909. Although all known life-cycles of the group include bivalves as first intermediate hosts, there is no convincing morphological synapomorphy that can be used to unite the group. Sequences from the V4 region of small subunit (18S) rRNA genes were used to infer phylogenetic relationships among 13 species of Fellodistomidae from four subfamilies and eight species from seven other digenean families: Bivesiculidae; Brachylaimidae; Bucephalidae; Gorgoderidae; Gymnophallidae; Opecoelidae; and Zoogonidae. Outgroup comparison was made initially with an aspidogastrean. Various species from the other digenean families were used as outgroups in subsequent analyses. Three methods of analysis indicated polyphyly of the Fellodistomidae and at least two independent radiations of the subfamilies, such that they were more closely associated with other digeneans than to each other. The Tandanicolinae was monophyletic (100% bootstrap support) and was weakly associated with the Gymnophallidae (< 50-55% bootstrap support). Monophyly of the Baccigerinae was supported with 78-87% bootstrap support, and monophyly of the Zoogonidae + Baccigerinae received 77-86% support. The remaining fellodistomid species, Fellodistomum fellis, F. agnotum and Coomera brayi (Fellodistominae) plus Proctoeces maculatus and Complexobursa sp. (Proctoecinae), formed a separate clade with 74-92% bootstrap support. On the basis of molecular, morphological and life-cycle evidence, the subfamilies Baccigerinae and Tandanicolinae are removed from the Fellodistomidae and promoted to familial status. The Baccigerinae is promoted under the senior synonym Faustulidae Poche, 1926, and the Echinobrevicecinae Dronen, Blend & McEachran, 1994 is synonymised with the Faustulidae. Consequently, species that were formerly in the Fellodistomidae are now distributed in three families: Fellodistomidae; Faustulidae (syn. Baccigerinae Yamaguti, 1954); and Tandanicolidae Johnston, 1927. We infer that the use of bivalves as intermediate hosts by this broad range of families indicates multiple host-switching events within the radiation of the Digenea.
Resumo:
Hemichordates were traditionally allied to the chordates, but recent molecular analyses have suggested that hemichordates are a sister group to the echinoderms, a relationship that has important consequences for the interpretation of the evolution of deuterostome body plans. However, the molecular phylogenetic analyses to date have not provided robust support for the hemichordate + echinoderm clade. We use a maximum likelihood framework, including the parametric bootstrap, to reanalyze DNA data from complete mitochondrial genomes and nuclear 18S rRNA. This approach provides the first statistically significant support for the hemichordate + echinoderm clade from molecular data. This grouping implies that the ancestral deuterostome had features that included an adult with a pharynx and a dorsal nerve cord and an indirectly developing dipleurula-like larva.
Resumo:
Matrix population models, elasticity analysis and loop analysis can potentially provide powerful techniques for the analysis of life histories. Data from a capture-recapture study on a population of southern highland water skinks (Eulamprus tympanum) were used to construct a matrix population model. Errors in elasticities were calculated by using the parametric bootstrap technique. Elasticity and loop analyses were then conducted to identify the life history stages most important to fitness. The same techniques were used to investigate the relative importance of fast versus slow growth, and rapid versus delayed reproduction. Mature water skinks were long-lived, but there was high immature mortality. The most sensitive life history stage was the subadult stage. It is suggested that life history evolution in E. tympanum may be strongly affected by predation, particularly by birds. Because our population declined over the study, slow growth and delayed reproduction were the optimal life history strategies over this period. Although the techniques of evolutionary demography provide a powerful approach for the analysis of life histories, there are formidable logistical obstacles in gathering enough high-quality data for robust estimates of the critical parameters.
Resumo:
Giardia isolates from eight horses from New York State (NY), USA and two horses from Western Australia (WA) were genetically characterized at the SSU-rDNA and triose-phosphate isomerase (TPI) genes. Phylogenetic analysis of the TPI gene provided strong support for the placement of both isolates of Giardia from horses in WA and a single isolate from a horse in NY within the assemblage AI genotype of G. duodenalis. Another two isolates from horses in NY placed within the assemblage All genotype of G. duodenalis. Phylogenetic analysis of the TPI gene also provided strong bootstrap support for the placement of four G. duodenalis isolates from horses in NY into a potentially host-specific sub-assemblage of assemblage BIV. The results of this study are consistent with previous studies showing that assemblages AI and AII of G. duodenalis provide the greatest potential zoonotic risk to humans. Horses may therefore constitute a potential source for human infection of Giardia either directly or via watersheds. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
In this paper use consider the problem of providing standard errors of the component means in normal mixture models fitted to univariate or multivariate data by maximum likelihood via the EM algorithm. Two methods of estimation of the standard errors are considered: the standard information-based method and the computationally-intensive bootstrap method. They are compared empirically by their application to three real data sets and by a small-scale Monte Carlo experiment.
Resumo:
The current classification of the Monocotylidae (Monogenea) is based on a phylogeny generated from morphological characters. The present study tests the morphological phylogenetic hypothesis using molecular methods. Sequences from domains C2 and D1 and the partial domains C1 and D2 from the 28S rDNA gene for 26 species of monocotylids from six of the seven subfamilies were used. Trees were generated using maximum parsimony, neighbour joining and maximum likelihood algorithms. The maximum parsimony tree, with branches showing less than 70% bootstrap support collapsed, had a topology identical to that obtained using the maximum likelihood analysis. The neighbour joining tree, with branches showing less than 70% support collapsed. differed only in its placement of Heterocotyle capricornensis as the sister group to the Decacotylinae clade. The molecular tree largely supports the subfamilies established using morphological characters. Differences are primarily how the subfamilies are related to each other. The monophyly of the Calicotylinae and Merizocotylinae and their sister group relationship is supported by high bootstrap values in all three methods, but relationships within the Merizocotylinae are unclear. Merizocotyle is paraphyletic and our data suggest that Mycteronastes and Thaumatocotyle, which were synonymized with Merizocotyle after the morphological cladistic analysis, should perhaps be resurrected as valid genera. The monophyly of the Monocotylinae and Decacotylinae is also supported by high bootstrap values. The Decacotylinae, which was considered previously to be the sister group to the Calicotylinae plus Merizocotylinae, is grouped in an unresolved polychotomy with the Monocotylinae and members of the Heterocotylinae. According to our molecular data, the Heterocotylinae is paraphyletic. Molecular data support a sister group relationship between Troglocephalus rhinobatidis and Neoheterocotyle rhinobatidis to the exclusion of the other species of Neoheterocotyle and recognition of Troglocephalus renders Neoheterocotyle,le paraphyletic. We propose Troglocephalus incertae sedis. An updated classification and full species list of the Monocotylidae is provided. (C) 2001 Australian Society for Parasitology Inc. Published by Elsevier Science Ltd. All rights reserved.
Resumo:
Darwin's paradigm holds that the diversity of present-day organisms has arisen via a process of genetic descent with modification, as on a bifurcating tree. Evidence is accumulating that genes are sometimes transferred not along lineages but rather across lineages. To the extent that this is so, Darwin's paradigm can apply only imperfectly to genomes, potentially complicating or perhaps undermining attempts to reconstruct historical relationships among genomes (i.e., a genome tree). Whether most genes in a genome have arisen via treelike (vertical) descent or by lateral transfer across lineages can be tested if enough complete genome sequences are used. We define a phylogenetically discordant sequence (PDS) as an open reading frame (ORF) that exhibits patterns of similarity relationships statistically distinguishable from those of most other ORFs in the same genome. PDSs represent between 6.0 and 16.8% (mean, 10.8%) of the analyzable ORFs in the genomes of 28 bacteria, eight archaea, and one eukaryote (Saccharomyces cerevisiae). In this study we developed and assessed a distance-based approach, based on mean pairwise sequence similarity, for generating genome trees. Exclusion of PDSs improved bootstrap support for basal nodes but altered few topological features, indicating that there is little systematic bias among PDSs. Many but not all features of the genome tree from which PDSs were excluded are consistent with the 16S rRNA tree.
Resumo:
Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).
Resumo:
Background: Reliability or validity studies are important for the evaluation of measurement error in dietary assessment methods. An approach to validation known as the method of triads uses triangulation techniques to calculate the validity coefficient of a food-frequency questionnaire (FFQ). Objective: To assess the validity of an FFQ estimates of carotenoid and vitamin E intake against serum biomarker measurements and weighed food records (WFRs), by applying the method of triads. Design: The study population was a sub-sample of adult participants in a randomised controlled trial of beta-carotene and sunscreen in the prevention of skin cancer. Dietary intake was assessed by a self-administered FFQ and a WFR. Nonfasting blood samples were collected and plasma analysed for five carotenoids (alpha-carotene, beta-carotene, beta-cryptoxanthin, lutein, lycopene) and vitamin E. Correlation coefficients were calculated between each of the dietary methods and the validity coefficient was calculated using the method of triads. The 95% confidence intervals for the validity coefficients were estimated using bootstrap sampling. Results: The validity coefficients of the FFQ were highest for alpha-carotene (0.85) and lycopene (0.62), followed by beta- carotene (0.55) and total carotenoids (0.55), while the lowest validity coefficient was for lutein (0.19). The method of triads could not be used for b- cryptoxanthin and vitamin E, as one of the three underlying correlations was negative. Conclusions: Results were similar to other studies of validity using biomarkers and the method of triads. For many dietary factors, the upper limit of the validity coefficients was less than 0.5 and therefore only strong relationships between dietary exposure and disease will be detected.
Resumo:
An investigation was conducted to evaluate the impact of experimental designs and spatial analyses (single-trial models) of the response to selection for grain yield in the northern grains region of Australia (Queensland and northern New South Wales). Two sets of multi-environment experiments were considered. One set, based on 33 trials conducted from 1994 to 1996, was used to represent the testing system of the wheat breeding program and is referred to as the multi-environment trial (MET). The second set, based on 47 trials conducted from 1986 to 1993, sampled a more diverse set of years and management regimes and was used to represent the target population of environments (TPE). There were 18 genotypes in common between the MET and TPE sets of trials. From indirect selection theory, the phenotypic correlation coefficient between the MET and TPE single-trial adjusted genotype means [r(p(MT))] was used to determine the effect of the single-trial model on the expected indirect response to selection for grain yield in the TPE based on selection in the MET. Five single-trial models were considered: randomised complete block (RCB), incomplete block (IB), spatial analysis (SS), spatial analysis with a measurement error (SSM) and a combination of spatial analysis and experimental design information to identify the preferred (PF) model. Bootstrap-resampling methodology was used to construct multiple MET data sets, ranging in size from 2 to 20 environments per MET sample. The size and environmental composition of the MET and the single-trial model influenced the r(p(MT)). On average, the PF model resulted in a higher r(p(MT)) than the IB, SS and SSM models, which were in turn superior to the RCB model for MET sizes based on fewer than ten environments. For METs based on ten or more environments, the r(p(MT)) was similar for all single-trial models.
Resumo:
The bispectrum and third-order moment can be viewed as equivalent tools for testing for the presence of nonlinearity in stationary time series. This is because the bispectrum is the Fourier transform of the third-order moment. An advantage of the bispectrum is that its estimator comprises terms that are asymptotically independent at distinct bifrequencies under the null hypothesis of linearity. An advantage of the third-order moment is that its values in any subset of joint lags can be used in the test, whereas when using the bispectrum the entire (or truncated) third-order moment is required to construct the Fourier transform. In this paper, we propose a test for nonlinearity based upon the estimated third-order moment. We use the phase scrambling bootstrap method to give a nonparametric estimate of the variance of our test statistic under the null hypothesis. Using a simulation study, we demonstrate that the test obtains its target significance level, with large power, when compared to an existing standard parametric test that uses the bispectrum. Further we show how the proposed test can be used to identify the source of nonlinearity due to interactions at specific frequencies. We also investigate implications for heuristic diagnosis of nonstationarity.