937 resultados para likelihood-based inference
Resumo:
Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.
Resumo:
Many seemingly disparate approaches for marginal modeling have been developed in recent years. We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the proposed copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate concepts.
Resumo:
A likelihood-based discriminant for the identification of quark- and gluon-initiated jets is built and validated using 4.7 fb−1 √ of proton–proton collision data at √s = 7 TeV collected with the ATLAS detector at the LHC. Data sampleswith enriched quark or gluon content are used in the construction and validation of templates of jet properties that are the input to the likelihood-based discriminant. The discriminating power of the jet tagger is established in both data and Monte Carlo samples within a systematic uncertainty of ≈ 10–20 %. In data, light-quark jets can be tagged with an efficiency of ≈ 50% while achieving a gluon-jet mis-tag rate of ≈ 25% in a pT range between 40 GeV and 360 GeV for jets in the acceptance of the tracker. The rejection of gluon-jets found in the data is significantly below what is attainable using a Pythia 6Monte Carlo simulation, where gluon-jet mis-tag rates of 10% can be reached for a 50% selection efficiency of light-quark jets using the same jet properties.
Resumo:
Linear- and unimodal-based inference models for mean summer temperatures (partial least squares, weighted averaging, and weighted averaging partial least squares models) were applied to a high-resolution pollen and cladoceran stratigraphy from Gerzensee, Switzerland. The time-window of investigation included the Allerød, the Younger Dryas, and the Preboreal. Characteristic major and minor oscillations in the oxygen-isotope stratigraphy, such as the Gerzensee oscillation, the onset and end of the Younger Dryas stadial, and the Preboreal oscillation, were identified by isotope analysis of bulk-sediment carbonates of the same core and were used as independent indicators for hemispheric or global scale climatic change. In general, the pollen-inferred mean summer temperature reconstruction using all three inference models follows the oxygen-isotope curve more closely than the cladoceran curve. The cladoceran-inferred reconstruction suggests generally warmer summers than the pollen-based reconstructions, which may be an effect of terrestrial vegetation not being in equilibrium with climate due to migrational lags during the Late Glacial and early Holocene. Allerød summer temperatures range between 11 and 12°C based on pollen, whereas the cladoceran-inferred temperatures lie between 11 and 13°C. Pollen and cladocera-inferred reconstructions both suggest a drop to 9–10°C at the beginning of the Younger Dryas. Although the Allerød–Younger Dryas transition lasted 150–160 years in the oxygen-isotope stratigraphy, the pollen-inferred cooling took 180–190 years and the cladoceran-inferred cooling lasted 250–260 years. The pollen-inferred summer temperature rise to 11.5–12°C at the transition from the Younger Dryas to the Preboreal preceded the oxygen-isotope signal by several decades, whereas the cladoceran-inferred warming lagged. Major discrepancies between the pollen- and cladoceran-inference models are observed for the Preboreal, where the cladoceran-inference model suggests mean summer temperatures of up to 14–15°C. Both pollen- and cladoceran-inferred reconstructions suggest a cooling that may be related to the Gerzensee oscillation, but there is no evidence for a cooling synchronous with the Preboreal oscillation as recorded in the oxygen-isotope record. For the Gerzensee oscillation the inferred cooling was ca. 1 and 0.5°C based on pollen and cladocera, respectively, which lies well within the inherent prediction errors of the inference models.
Resumo:
Introduction Many marine planktonic crustaceans such as copepods have been considered as widespread organisms. However, the growing evidence for cryptic and pseudo-cryptic speciation has emphasized the need of re-evaluating the status of copepod species complexes in molecular and morphological studies to get a clearer picture about pelagic marine species as evolutionary units and their distributions. This study analyses the molecular diversity of the ecologically important Paracalanus parvus species complex. Its seven currently recognized species are abundant and also often dominant in marine coastal regions worldwide from temperate to tropical oceans. Results COI and Cytochrome b sequences of 160 specimens of the Paracalanus parvus complex from all oceans were obtained. Furthermore, 42 COI sequences from GenBank were added for the genetic analyses. Thirteen distinct molecular operational taxonomic units (MOTU) and two single sequences were revealed with cladistic analyses (Maximum Likelihood, Bayesian Inference), of which seven were identical with results from species delimitation methods (barcode gaps, ABDG, GMYC, Rosenberg's P(AB)). In total, 10 to 12 putative species were detected and could be placed in three categories: (1) temperate geographically isolated, (2) warm-temperate to tropical wider spread and (3) circumglobal warm-water species. Conclusions The present study provides evidence of cryptic or pseudocryptic speciation in the Paracalanus parvus complex. One major insight is that the species Paracalanus parvus s.s. is not panmictic, but may be restricted in its distribution to the northeastern Atlantic.
Resumo:
Invasive alien species are among the primary causes of biodiversity change globally, with the risks thereof broadly understood for most regions of the world. They are similarly thought to be among the most significant conservation threats to Antarctica, especially as climate change proceeds in the region. However, no comprehensive, continent-wide evaluation of the risks to Antarctica posed by such species has been undertaken. Here we do so by sampling, identifying, and mapping the vascular plant propagules carried by all categories of visitors to Antarctica during the International Polar Year's first season (2007-2008) and assessing propagule establishment likelihood based on their identity and origins and on spatial variation in Antarctica's climate. For an evaluation of the situation in 2100, we use modeled climates based on the Intergovernmental Panel on Climate Change's Special Report on Emissions Scenarios Scenario A1B [Nakicenovic N, Swart R, eds (2000) Special Report on Emissions Scenarios: A Special Report of Working Group III of the Intergovernmental Panel on Climate Change (Cambridge University Press, Cambridge, UK)]. Visitors carrying seeds average 9.5 seeds per person, although as vectors, scientists carry greater propagule loads than tourists. Annual tourist numbers (~33,054) are higher than those of scientists (~7,085), thus tempering these differences in propagule load. Alien species establishment is currently most likely for the Western Antarctic Peninsula. Recent founder populations of several alien species in this area corroborate these findings. With climate change, risks will grow in the Antarctic Peninsula, Ross Sea, and East Antarctic coastal regions. Our evidence-based assessment demonstrates which parts of Antarctica are at growing risk from alien species that may become invasive and provides the means to mitigate this threat now and into the future as the continent's climate changes.
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
This paper presents a greedy Bayesian experimental design criterion for heteroscedastic Gaussian process models. The criterion is based on the Fisher information and is optimal in the sense of minimizing parameter uncertainty for likelihood based estimators. We demonstrate the validity of the criterion under different noise regimes and present experimental results from a rabies simulator to demonstrate the effectiveness of the resulting approximately optimal designs.
Resumo:
The principled statistical application of Gaussian random field models used in geostatistics has historically been limited to data sets of a small size. This limitation is imposed by the requirement to store and invert the covariance matrix of all the samples to obtain a predictive distribution at unsampled locations, or to use likelihood-based covariance estimation. Various ad hoc approaches to solve this problem have been adopted, such as selecting a neighborhood region and/or a small number of observations to use in the kriging process, but these have no sound theoretical basis and it is unclear what information is being lost. In this article, we present a Bayesian method for estimating the posterior mean and covariance structures of a Gaussian random field using a sequential estimation algorithm. By imposing sparsity in a well-defined framework, the algorithm retains a subset of “basis vectors” that best represent the “true” posterior Gaussian random field model in the relative entropy sense. This allows a principled treatment of Gaussian random field models on very large data sets. The method is particularly appropriate when the Gaussian random field model is regarded as a latent variable model, which may be nonlinearly related to the observations. We show the application of the sequential, sparse Bayesian estimation in Gaussian random field models and discuss its merits and drawbacks.
Resumo:
With the ability to collect and store increasingly large datasets on modern computers comes the need to be able to process the data in a way that can be useful to a Geostatistician or application scientist. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively for likelihood-based Geostatistics. Various methods have been proposed and are extensively used in an attempt to overcome these complexity issues. This thesis introduces a number of principled techniques for treating large datasets with an emphasis on three main areas: reduced complexity covariance matrices, sparsity in the covariance matrix and parallel algorithms for distributed computation. These techniques are presented individually, but it is also shown how they can be combined to produce techniques for further improving computational efficiency.
Resumo:
Background The HIV virus is known for its ability to exploit numerous genetic and evolutionary mechanisms to ensure its proliferation, among them, high replication, mutation and recombination rates. Sliding MinPD, a recently introduced computational method [1], was used to investigate the patterns of evolution of serially-sampled HIV-1 sequence data from eight patients with a special focus on the emergence of X4 strains. Unlike other phylogenetic methods, Sliding MinPD combines distance-based inference with a nonparametric bootstrap procedure and automated recombination detection to reconstruct the evolutionary history of longitudinal sequence data. We present serial evolutionary networks as a longitudinal representation of the mutational pathways of a viral population in a within-host environment. The longitudinal representation of the evolutionary networks was complemented with charts of clinical markers to facilitate correlation analysis between pertinent clinical information and the evolutionary relationships. Results Analysis based on the predicted networks suggests the following:: significantly stronger recombination signals (p = 0.003) for the inferred ancestors of the X4 strains, recombination events between different lineages and recombination events between putative reservoir virus and those from a later population, an early star-like topology observed for four of the patients who died of AIDS. A significantly higher number of recombinants were predicted at sampling points that corresponded to peaks in the viral load levels (p = 0.0042). Conclusion Our results indicate that serial evolutionary networks of HIV sequences enable systematic statistical analysis of the implicit relations embedded in the topology of the structure and can greatly facilitate identification of patterns of evolution that can lead to specific hypotheses and new insights. The conclusions of applying our method to empirical HIV data support the conventional wisdom of the new generation HIV treatments, that in order to keep the virus in check, viral loads need to be suppressed to almost undetectable levels.
Resumo:
Many coastal wetland communities of south Florida have been cut off from freshwater sheet flow for decades and are migrating landward due to salt-water encroachment. A paleoecological study using mollusks was conducted to assess the rates and effects of salt-water encroachment due to freshwater diversion and sea level rise on coastal wetland basins in Biscayne National Park. Modem mollusk distributions taken from 226 surface sites were used to determine local habitat affinities which were applied to infer past environments from mollusk distributions found in soil cores. Mollusks species compositions were found to be strongly correlated to habitat and salinity, providing reliable predictions. Wetland soils were cored to bedrock at 36locations. Mollusks were abundant throughout the cores and 15 of the 20 most abundant taxa served as bioindicators of salinity and habitat. Historic accounts coupled with mollusk based inference models indicate (1) increasing salinity levels along the coast and encroaching into the interior with mangroves communities currently migrating westward, (2) replacement of a mixed graminoid-mangrove zone by a dense monoculture of dwarf mangroves, and (3) a confinement of freshwater and freshwater graminoid marsh to landward areas between urban developments and drainage canals.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Aim Recently developed parametric methods in historical biogeography allow researchers to integrate temporal and palaeogeographical information into the reconstruction of biogeographical scenarios, thus overcoming a known bias of parsimony-based approaches. Here, we compare a parametric method, dispersal-extinction-cladogenesis (DEC), against a parsimony-based method, dispersal-vicariance analysis (DIVA), which does not incorporate branch lengths but accounts for phylogenetic uncertainty through a Bayesian empirical approach (Bayes-DIVA). We analyse the benefits and limitations of each method using the cosmopolitan plant family Sapindaceae as a case study.Location World-wide.Methods Phylogenetic relationships were estimated by Bayesian inference on a large dataset representing generic diversity within Sapindaceae. Lineage divergence times were estimated by penalized likelihood over a sample of trees from the posterior distribution of the phylogeny to account for dating uncertainty in biogeographical reconstructions. We compared biogeographical scenarios between Bayes-DIVA and two different DEC models: one with no geological constraints and another that employed a stratified palaeogeographical model in which dispersal rates were scaled according to area connectivity across four time slices, reflecting the changing continental configuration over the last 110 million years.Results Despite differences in the underlying biogeographical model, Bayes-DIVA and DEC inferred similar biogeographical scenarios. The main differences were: (1) in the timing of dispersal events - which in Bayes-DIVA sometimes conflicts with palaeogeographical information, and (2) in the lower frequency of terminal dispersal events inferred by DEC. Uncertainty in divergence time estimations influenced both the inference of ancestral ranges and the decisiveness with which an area can be assigned to a node.Main conclusions By considering lineage divergence times, the DEC method gives more accurate reconstructions that are in agreement with palaeogeographical evidence. In contrast, Bayes-DIVA showed the highest decisiveness in unequivocally reconstructing ancestral ranges, probably reflecting its ability to integrate phylogenetic uncertainty. Care should be taken in defining the palaeogeographical model in DEC because of the possibility of overestimating the frequency of extinction events, or of inferring ancestral ranges that are outside the extant species ranges, owing to dispersal constraints enforced by the model. The wide-spanning spatial and temporal model proposed here could prove useful for testing large-scale biogeographical patterns in plants.
Resumo:
Standard indirect Inference (II) estimators take a given finite-dimensional statistic, Z_{n} , and then estimate the parameters by matching the sample statistic with the model-implied population moment. We here propose a novel estimation method that utilizes all available information contained in the distribution of Z_{n} , not just its first moment. This is done by computing the likelihood of Z_{n}, and then estimating the parameters by either maximizing the likelihood or computing the posterior mean for a given prior of the parameters. These are referred to as the maximum indirect likelihood (MIL) and Bayesian Indirect Likelihood (BIL) estimators, respectively. We show that the IL estimators are first-order equivalent to the corresponding moment-based II estimator that employs the optimal weighting matrix. However, due to higher-order features of Z_{n} , the IL estimators are higher order efficient relative to the standard II estimator. The likelihood of Z_{n} will in general be unknown and so simulated versions of IL estimators are developed. Monte Carlo results for a structural auction model and a DSGE model show that the proposed estimators indeed have attractive finite sample properties.