888 resultados para likelihood-based inference
Resumo:
Invasive alien species are among the primary causes of biodiversity change globally, with the risks thereof broadly understood for most regions of the world. They are similarly thought to be among the most significant conservation threats to Antarctica, especially as climate change proceeds in the region. However, no comprehensive, continent-wide evaluation of the risks to Antarctica posed by such species has been undertaken. Here we do so by sampling, identifying, and mapping the vascular plant propagules carried by all categories of visitors to Antarctica during the International Polar Year's first season (2007-2008) and assessing propagule establishment likelihood based on their identity and origins and on spatial variation in Antarctica's climate. For an evaluation of the situation in 2100, we use modeled climates based on the Intergovernmental Panel on Climate Change's Special Report on Emissions Scenarios Scenario A1B [Nakicenovic N, Swart R, eds (2000) Special Report on Emissions Scenarios: A Special Report of Working Group III of the Intergovernmental Panel on Climate Change (Cambridge University Press, Cambridge, UK)]. Visitors carrying seeds average 9.5 seeds per person, although as vectors, scientists carry greater propagule loads than tourists. Annual tourist numbers (~33,054) are higher than those of scientists (~7,085), thus tempering these differences in propagule load. Alien species establishment is currently most likely for the Western Antarctic Peninsula. Recent founder populations of several alien species in this area corroborate these findings. With climate change, risks will grow in the Antarctic Peninsula, Ross Sea, and East Antarctic coastal regions. Our evidence-based assessment demonstrates which parts of Antarctica are at growing risk from alien species that may become invasive and provides the means to mitigate this threat now and into the future as the continent's climate changes.
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
This paper presents a greedy Bayesian experimental design criterion for heteroscedastic Gaussian process models. The criterion is based on the Fisher information and is optimal in the sense of minimizing parameter uncertainty for likelihood based estimators. We demonstrate the validity of the criterion under different noise regimes and present experimental results from a rabies simulator to demonstrate the effectiveness of the resulting approximately optimal designs.
Resumo:
The principled statistical application of Gaussian random field models used in geostatistics has historically been limited to data sets of a small size. This limitation is imposed by the requirement to store and invert the covariance matrix of all the samples to obtain a predictive distribution at unsampled locations, or to use likelihood-based covariance estimation. Various ad hoc approaches to solve this problem have been adopted, such as selecting a neighborhood region and/or a small number of observations to use in the kriging process, but these have no sound theoretical basis and it is unclear what information is being lost. In this article, we present a Bayesian method for estimating the posterior mean and covariance structures of a Gaussian random field using a sequential estimation algorithm. By imposing sparsity in a well-defined framework, the algorithm retains a subset of “basis vectors” that best represent the “true” posterior Gaussian random field model in the relative entropy sense. This allows a principled treatment of Gaussian random field models on very large data sets. The method is particularly appropriate when the Gaussian random field model is regarded as a latent variable model, which may be nonlinearly related to the observations. We show the application of the sequential, sparse Bayesian estimation in Gaussian random field models and discuss its merits and drawbacks.
Resumo:
With the ability to collect and store increasingly large datasets on modern computers comes the need to be able to process the data in a way that can be useful to a Geostatistician or application scientist. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively for likelihood-based Geostatistics. Various methods have been proposed and are extensively used in an attempt to overcome these complexity issues. This thesis introduces a number of principled techniques for treating large datasets with an emphasis on three main areas: reduced complexity covariance matrices, sparsity in the covariance matrix and parallel algorithms for distributed computation. These techniques are presented individually, but it is also shown how they can be combined to produce techniques for further improving computational efficiency.
Resumo:
Background The HIV virus is known for its ability to exploit numerous genetic and evolutionary mechanisms to ensure its proliferation, among them, high replication, mutation and recombination rates. Sliding MinPD, a recently introduced computational method [1], was used to investigate the patterns of evolution of serially-sampled HIV-1 sequence data from eight patients with a special focus on the emergence of X4 strains. Unlike other phylogenetic methods, Sliding MinPD combines distance-based inference with a nonparametric bootstrap procedure and automated recombination detection to reconstruct the evolutionary history of longitudinal sequence data. We present serial evolutionary networks as a longitudinal representation of the mutational pathways of a viral population in a within-host environment. The longitudinal representation of the evolutionary networks was complemented with charts of clinical markers to facilitate correlation analysis between pertinent clinical information and the evolutionary relationships. Results Analysis based on the predicted networks suggests the following:: significantly stronger recombination signals (p = 0.003) for the inferred ancestors of the X4 strains, recombination events between different lineages and recombination events between putative reservoir virus and those from a later population, an early star-like topology observed for four of the patients who died of AIDS. A significantly higher number of recombinants were predicted at sampling points that corresponded to peaks in the viral load levels (p = 0.0042). Conclusion Our results indicate that serial evolutionary networks of HIV sequences enable systematic statistical analysis of the implicit relations embedded in the topology of the structure and can greatly facilitate identification of patterns of evolution that can lead to specific hypotheses and new insights. The conclusions of applying our method to empirical HIV data support the conventional wisdom of the new generation HIV treatments, that in order to keep the virus in check, viral loads need to be suppressed to almost undetectable levels.
Resumo:
Many coastal wetland communities of south Florida have been cut off from freshwater sheet flow for decades and are migrating landward due to salt-water encroachment. A paleoecological study using mollusks was conducted to assess the rates and effects of salt-water encroachment due to freshwater diversion and sea level rise on coastal wetland basins in Biscayne National Park. Modem mollusk distributions taken from 226 surface sites were used to determine local habitat affinities which were applied to infer past environments from mollusk distributions found in soil cores. Mollusks species compositions were found to be strongly correlated to habitat and salinity, providing reliable predictions. Wetland soils were cored to bedrock at 36locations. Mollusks were abundant throughout the cores and 15 of the 20 most abundant taxa served as bioindicators of salinity and habitat. Historic accounts coupled with mollusk based inference models indicate (1) increasing salinity levels along the coast and encroaching into the interior with mangroves communities currently migrating westward, (2) replacement of a mixed graminoid-mangrove zone by a dense monoculture of dwarf mangroves, and (3) a confinement of freshwater and freshwater graminoid marsh to landward areas between urban developments and drainage canals.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.
Resumo:
We propose a new method for fitting proportional hazards models with error-prone covariates. Regression coefficients are estimated by solving an estimating equation that is the average of the partial likelihood scores based on imputed true covariates. For the purpose of imputation, a linear spline model is assumed on the baseline hazard. We discuss consistency and asymptotic normality of the resulting estimators, and propose a stochastic approximation scheme to obtain the estimates. The algorithm is easy to implement, and reduces to the ordinary Cox partial likelihood approach when the measurement error has a degenerative distribution. Simulations indicate high efficiency and robustness. We consider the special case where error-prone replicates are available on the unobserved true covariates. As expected, increasing the number of replicate for the unobserved covariates increases efficiency and reduces bias. We illustrate the practical utility of the proposed method with an Eastern Cooperative Oncology Group clinical trial where a genetic marker, c-myc expression level, is subject to measurement error.
Resumo:
Perez-Losada et al. [1] analyzed 72 complete genomes corresponding to nine mammalian (67 strains) and 2 avian (5 strains) polyomavirus species using maximum likelihood and Bayesian methods of phylogenetic inference. Because some data of 2 genomes in their work are now not available in GenBank, in this work, we analyze the phylogenetic relationship of the remaining 70 complete genomes corresponding to nine mammalian (65 strains) and two avian (5 strains) polyomavirus species using a dynamical language model approach developed by our group (Yu et al., [26]). This distance method does not require sequence alignment for deriving species phylogeny based on overall similarities of the complete genomes. Our best tree separates the bird polyomaviruses (avian polyomaviruses and goose hemorrhagic polymaviruses) from the mammalian polyomaviruses, which supports the idea of splitting the genus into two subgenera. Such a split is consistent with the different viral life strategies of each group. In the mammalian polyomavirus subgenera, mouse polyomaviruses (MPV), simian viruses 40 (SV40), BK viruses (BKV) and JC viruses (JCV) are grouped as different branches as expected. The topology of our best tree is quite similar to that of the tree constructed by Perez-Losada et al.
Resumo:
Accurate road lane information is crucial for advanced vehicle navigation and safety applications. With the increasing of very high resolution (VHR) imagery of astonishing quality provided by digital airborne sources, it will greatly facilitate the data acquisition and also significantly reduce the cost of data collection and updates if the road details can be automatically extracted from the aerial images. In this paper, we proposed an effective approach to detect road lanes from aerial images with employment of the image analysis procedures. This algorithm starts with constructing the (Digital Surface Model) DSM and true orthophotos from the stereo images. Next, a maximum likelihood clustering algorithm is used to separate road from other ground objects. After the detection of road surface, the road traffic and lane lines are further detected using texture enhancement and morphological operations. Finally, the generated road network is evaluated to test the performance of the proposed approach, in which the datasets provided by Queensland department of Main Roads are used. The experiment result proves the effectiveness of our approach.
Resumo:
We present a novel approach for developing summary statistics for use in approximate Bayesian computation (ABC) algorithms by using indirect inference. ABC methods are useful for posterior inference in the presence of an intractable likelihood function. In the indirect inference approach to ABC the parameters of an auxiliary model fitted to the data become the summary statistics. Although applicable to any ABC technique, we embed this approach within a sequential Monte Carlo algorithm that is completely adaptive and requires very little tuning. This methodological development was motivated by an application involving data on macroparasite population evolution modelled by a trivariate stochastic process for which there is no tractable likelihood function. The auxiliary model here is based on a beta–binomial distribution. The main objective of the analysis is to determine which parameters of the stochastic model are estimable from the observed data on mature parasite worms.
Resumo:
This paper presents a fault diagnosis method based on adaptive neuro-fuzzy inference system (ANFIS) in combination with decision trees. Classification and regression tree (CART) which is one of the decision tree methods is used as a feature selection procedure to select pertinent features from data set. The crisp rules obtained from the decision tree are then converted to fuzzy if-then rules that are employed to identify the structure of ANFIS classifier. The hybrid of back-propagation and least squares algorithm are utilized to tune the parameters of the membership functions. In order to evaluate the proposed algorithm, the data sets obtained from vibration signals and current signals of the induction motors are used. The results indicate that the CART–ANFIS model has potential for fault diagnosis of induction motors.
Resumo:
This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.