901 resultados para Asymptotic behaviour, Bayesian methods, Mixture models, Overfitting, Posterior concentration


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Environmental impacts of wind energy facilities increasingly cause concern, a central issue being bats and birds killed by rotor blades. Two approaches have been employed to assess collision rates: carcass searches and surveys of animals prone to collisions. Carcass searches can provide an estimate for the actual number of animals being killed but they offer little information on the relation between collision rates and, for example, weather parameters due to the time of death not being precisely known. In contrast, a density index of animals exposed to collision is sufficient to analyse the parameters influencing the collision rate. However, quantification of the collision rate from animal density indices (e.g. acoustic bat activity or bird migration traffic rates) remains difficult. We combine carcass search data with animal density indices in a mixture model to investigate collision rates. In a simulation study we show that the collision rates estimated by our model were at least as precise as conventional estimates based solely on carcass search data. Furthermore, if certain conditions are met, the model can be used to predict the collision rate from density indices alone, without data from carcass searches. This can reduce the time and effort required to estimate collision rates. We applied the model to bat carcass search data obtained at 30 wind turbines in 15 wind facilities in Germany. We used acoustic bat activity and wind speed as predictors for the collision rate. The model estimates correlated well with conventional estimators. Our model can be used to predict the average collision rate. It enables an analysis of the effect of parameters such as rotor diameter or turbine type on the collision rate. The model can also be used in turbine-specific curtailment algorithms that predict the collision rate and reduce this rate with a minimal loss of energy production.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Survival models are being widely applied to the engineering field to model time-to-event data once censored data is here a common issue. Using parametric models or not, for the case of heterogeneous data, they may not always represent a good fit. The present study relays on critical pumps survival data where traditional parametric regression might be improved in order to obtain better approaches. Considering censored data and using an empiric method to split the data into two subgroups to give the possibility to fit separated models to our censored data, we’ve mixture two distinct distributions according a mixture-models approach. We have concluded that it is a good method to fit data that does not fit to a usual parametric distribution and achieve reliable parameters. A constant cumulative hazard rate policy was used as well to check optimum inspection times using the obtained model from the mixture-model, which could be a plus when comparing with the actual maintenance policies to check whether changes should be introduced or not.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The issue of using informative priors for estimation of mixtures at multiple time points is examined. Several different informative priors and an independent prior are compared using samples of actual and simulated aerosol particle size distribution (PSD) data. Measurements of aerosol PSDs refer to the concentration of aerosol particles in terms of their size, which is typically multimodal in nature and collected at frequent time intervals. The use of informative priors is found to better identify component parameters at each time point and more clearly establish patterns in the parameters over time. Some caveats to this finding are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of automatic pathological voice detection systems is to serve as tools, to medical specialists, for a more objective, less invasive and improved diagnosis of diseases. In this respect, the gold standard for those system include the usage of a optimized representation of the spectral envelope, either based on cepstral coefficients from the mel-scaled Fourier spectral envelope (Mel-Frequency Cepstral Coefficients) or from an all-pole estimation (Linear Prediction Coding Cepstral Coefficients) forcharacterization, and Gaussian Mixture Models for posterior classification. However, the study of recently proposed GMM-based classifiers as well as Nuisance mitigation techniques, such as those employed in speaker recognition, has not been widely considered inpathology detection labours. The present work aims at testing whether or not the employment of such speaker recognition tools might contribute to improve system performance in pathology detection systems, specifically in the automatic detection of Obstructive Sleep Apnea. The testing procedure employs an Obstructive Sleep Apnea database, in conjunction with GMM-based classifiers looking for a better performance. The results show that an improved performance might be obtained by using such approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We find that at a mole fraction 0.05 of DMSO (x(DMSO) = 0.05) in aqueous solution, a linear hydrocarbon chain of intermediate length (n = 30-40) adopts the most stable collapsed conformation. In pure water, the same chain exhibits an intermittent oscillation between the collapsed and the extended coiled conformations. Even when the mole fraction of DMSO in the bulk is 0.05, the concentration of the same in the first hydration layer around the hydrocarbon of chain length 30 (n = 30) is as large as 17%. Formation of such hydrophobic environment around the hydrocarbon chain may be viewed as the reason for the collapsed conformation gaining additional stability. We find a second anomalous behavior to emerge near x(DMSO) = 0.15, due to a chain-like aggregation of the methyl groups of DMSO in water that lowers the relative concentration of the DMSO molecules in the hydration layer. We further find that as the concentration of DMSO is gradually increased, it progressively attains the extended coiled structure as the stable conformation. Although Flory-Huggins theory (for binary mixture solvent) fails to predict the anomaly at x(DMSO) = 0.05, it seems to capture the essence of the anomaly at 0.15.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Binary mixtures have strong influence on activities of polymers and biopolymers even at low cosolvent concentration. Among the several aqueous binary mixtures studied, water-DMSO especially stands out for its unusual behavior at certain specific concentrations of DMSO. In the present work, we study the effect of water-DMSO binary mixture on polymers and biopolymers by taking a simple linear hydrocarbon chain of intermediate length (n = 30) and the protein lysozyme, respectively. We find that at a mole fraction of 0.05 of DMSO (x(DMSO) = 0.05) in aqueous solution, the hydrocarbon chain adopts the collapsed conformation as the most stable and rigid state. In this case of 0.05 mole fraction of DMSO in bulk, the DMSO concentration in the first hydration layer around the polymer is found to be as large as 17%. Formation of such hydrophobic environment around the polymer is the reason for the collapsed state gaining so much stability. Interestingly, similar quench of conformational fluctuation is also observed for the protein investigated. It is observed that in the case of alkane polymer chains, long wavelength fluctuation gets easily quenched, the polymer being purely hydrophobic. However, in case of the protein, quench of fluctuation is prominent only at the hydrophobic surface, and quench of long wavelength fluctuation becomes insignificant for the full protein. As protein contains both hydrophobic and hydrophilic moieties, the extent of quench of conformational fluctuation with respect to that in pure water is almost half for the biopolymer complex (16.83%) than the same for pure hydrophobic polymer chain (32.43%).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Marine brachyuran and anomuran crustaceans are completely absent from the extremely cold (-1.8 °C) Antarctic continental shelf, but caridean shrimps are abundant. This has at least partly been attributed to low capacities for magnesium excretion in brachyuran and anomuran lithodid crabs ([Mg2+]HL = 20-50 mmol/L) compared to caridean shrimp species ([Mg2+]HL = 5-12 mmol/L). Magnesium has an anaesthetizing effect and reduces cold tolerance and activity of adult brachyuran crabs. We investigated whether the capacity for magnesium regulation is a factor that influences temperature-dependent activity of early ontogenetic stages of the Sub-Antarctic lithodid crab Paralomis granulosa. Ion composition (Na+, Mg2+, Ca2+, Cl-, [SO4]2-) was measured in haemolymph withdrawn from larval stages, the first and second juvenile instars (crabs I and II) and adult males and females. Magnesium excretion improved during ontogeny, but haemolymph sulphate concentration was lowest in the zoeal stages. Neither haemolymph magnesium concentrations nor Ca2+:Mg2+ ratios paralleled activity levels of the life stages. Long-term (3 week) cold exposure of crab I to 1 °C caused a significant rise of haemolymph sulphate concentration and a decrease in magnesium and calcium concentrations compared to control temperature (9 °C). Spontaneous swimming activity of the zoeal stages was determined at 1, 4 and 9 °C in natural sea water (NSW, [Mg2+] = 51 mmol/L) and in sea water enriched with magnesium (NSW + Mg2+, [Mg2+] = 97 mmol/L). It declined significantly with temperature but only insignificantly with increased magnesium concentration. Spontaneous velocities were low, reflecting the demersal life style of the zoeae. Heart rate, scaphognathite beat rate and forced swimming activity (maxilliped beat rate, zoea I) or antennule beat rate (crab I) were investigated in response to acute temperature change (9, 6, 3, 1, -1 °C) in NSW or NSW + Mg2+. High magnesium concentration reduced heart rates in both stages. The temperature-frequency curve of the maxilliped beat (maximum: 9.6 beats/s at 6.6 °C in NSW) of zoea I was depressed and shifted towards warmer temperatures by 2 °C in NSW + Mg2+, but antennule beat rate of crab I was not affected. Magnesium may therefore influence cold tolerance of highly active larvae, but it remains questionable whether the slow-moving lithodid crabs with demersal larvae would benefit from an enhanced magnesium excretion in nature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a new simulation methodology in order to obtain exact or approximate Bayesian inference for models for low-valued count time series data that have computationally demanding likelihood functions. The algorithm fits within the framework of particle Markov chain Monte Carlo (PMCMC) methods. The particle filter requires only model simulations and, in this regard, our approach has connections with approximate Bayesian computation (ABC). However, an advantage of using the PMCMC approach in this setting is that simulated data can be matched with data observed one-at-a-time, rather than attempting to match on the full dataset simultaneously or on a low-dimensional non-sufficient summary statistic, which is common practice in ABC. For low-valued count time series data we find that it is often computationally feasible to match simulated data with observed data exactly. Our particle filter maintains $N$ particles by repeating the simulation until $N+1$ exact matches are obtained. Our algorithm creates an unbiased estimate of the likelihood, resulting in exact posterior inferences when included in an MCMC algorithm. In cases where exact matching is computationally prohibitive, a tolerance is introduced as per ABC. A novel aspect of our approach is that we introduce auxiliary variables into our particle filter so that partially observed and/or non-Markovian models can be accommodated. We demonstrate that Bayesian model choice problems can be easily handled in this framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spreading cell fronts play an essential role in many physiological processes. Classically, models of this process are based on the Fisher-Kolmogorov equation; however, such continuum representations are not always suitable as they do not explicitly represent behaviour at the level of individual cells. Additionally, many models examine only the large time asymptotic behaviour, where a travelling wave front with a constant speed has been established. Many experiments, such as a scratch assay, never display this asymptotic behaviour, and in these cases the transient behaviour must be taken into account. We examine the transient and asymptotic behaviour of moving cell fronts using techniques that go beyond the continuum approximation via a volume-excluding birth-migration process on a regular one-dimensional lattice. We approximate the averaged discrete results using three methods: (i) mean-field, (ii) pair-wise, and (iii) one-hole approximations. We discuss the performace of these methods, in comparison to the averaged discrete results, for a range of parameter space, examining both the transient and asymptotic behaviours. The one-hole approximation, based on techniques from statistical physics, is not capable of predicting transient behaviour but provides excellent agreement with the asymptotic behaviour of the averaged discrete results, provided that cells are proliferating fast enough relative to their rate of migration. The mean-field and pair-wise approximations give indistinguishable asymptotic results, which agree with the averaged discrete results when cells are migrating much more rapidly than they are proliferating. The pair-wise approximation performs better in the transient region than does the mean-field, despite having the same asymptotic behaviour. Our results show that each approximation only works in specific situations, thus we must be careful to use a suitable approximation for a given system, otherwise inaccurate predictions could be made.