352 resultados para maximum pseudolikelihood (MPL) estimation
Resumo:
Gene expression is arguably the most important indicator of biological function. Thus identifying differentially expressed genes is one of the main aims of high throughout studies that use microarray and RNAseq platforms to study deregulated cellular pathways. There are many tools for analysing differentia gene expression from transciptomic datasets. The major challenge of this topic is to estimate gene expression variance due to the high amount of ‘background noise’ that is generated from biological equipment and the lack of biological replicates. Bayesian inference has been widely used in the bioinformatics field. In this work, we reveal that the prior knowledge employed in the Bayesian framework also helps to improve the accuracy of differential gene expression analysis when using a small number of replicates. We have developed a differential analysis tool that uses Bayesian estimation of the variance of gene expression for use with small numbers of biological replicates. Our method is more consistent when compared to the widely used cyber-t tool that successfully introduced the Bayesian framework to differential analysis. We also provide a user-friendly web based Graphic User Interface for biologists to use with microarray and RNAseq data. Bayesian inference can compensate for the instability of variance caused when using a small number of biological replicates by using pseudo replicates as prior knowledge. We also show that our new strategy to select pseudo replicates will improve the performance of the analysis. - See more at: http://www.eurekaselect.com/node/138761/article#sthash.VeK9xl5k.dpuf
Resumo:
The elastic properties of the arterial wall have been the subject of physiological, clinical and biomedical research for many years. There is convincing evidence that the elastic properties of the large arteries are seriously impaired in the presence of cardiovascular disease (CVD), due to alterations in the intrinsic structural and functional characteristics of vessels [1]. Early detection of changes in the elastic modulus of arteries would provide a powerful tool for both monitoring patients at high cardiovascular risk and testing the effects of pharmaceuticals aimed at stabilizing existing plaques by stiffening them or lowering the lipids.
Resumo:
The acceptance of broadband ultrasound attenuation (BUA) for the assessment of osteoporosis suffers from a limited understanding of both ultrasound wave propagation through cancellous bone and its exact dependence upon the material and structural properties. It has recently been proposed that ultrasound wave propagation in cancellous bone may be described by a concept of parallel sonic rays; the transit time of each ray defined by the proportion of bone and marrow propagated. A Transit Time Spectrum (TTS) describes the proportion of sonic rays having a particular transit time, effectively describing the lateral inhomogeneity of transit times over the surface aperture of the receive ultrasound transducer. The aim of this study was to test the hypothesis that the solid volume fraction (SVF) of simplified bone:marrow replica models may be reliably estimated from the corresponding ultrasound transit time spectrum. Transit time spectra were derived via digital deconvolution of the experimentally measured input and output ultrasonic signals, and compared to predicted TTS based on the parallel sonic ray concept, demonstrating agreement in both position and amplitude of spectral peaks. Solid volume fraction was calculated from the TTS; agreement between true (geometric calculation) with predicted (computer simulation) and experimentally-derived values were R2=99.9% and R2=97.3% respectively. It is therefore envisaged that ultrasound transit time spectroscopy (UTTS) offers the potential to reliably estimate bone mineral density and hence the established T-score parameter for clinical osteoporosis assessment.
Resumo:
We consider the development of statistical models for prediction of constituent concentration of riverine pollutants, which is a key step in load estimation from frequent flow rate data and less frequently collected concentration data. We consider how to capture the impacts of past flow patterns via the average discounted flow (ADF) which discounts the past flux based on the time lapsed - more recent fluxes are given more weight. However, the effectiveness of ADF depends critically on the choice of the discount factor which reflects the unknown environmental cumulating process of the concentration compounds. We propose to choose the discount factor by maximizing the adjusted R-2 values or the Nash-Sutcliffe model efficiency coefficient. The R2 values are also adjusted to take account of the number of parameters in the model fit. The resulting optimal discount factor can be interpreted as a measure of constituent exhaustion rate during flood events. To evaluate the performance of the proposed regression estimators, we examine two different sampling scenarios by resampling fortnightly and opportunistically from two real daily datasets, which come from two United States Geological Survey (USGS) gaging stations located in Des Plaines River and Illinois River basin. The generalized rating-curve approach produces biased estimates of the total sediment loads by -30% to 83%, whereas the new approaches produce relatively much lower biases, ranging from -24% to 35%. This substantial improvement in the estimates of the total load is due to the fact that predictability of concentration is greatly improved by the additional predictors.
Resumo:
Rank-based inference is widely used because of its robustness. This article provides optimal rank-based estimating functions in analysis of clustered data with random cluster effects. The extensive simulation studies carried out to evaluate the performance of the proposed method demonstrate that it is robust to outliers and is highly efficient given the existence of strong cluster correlations. The performance of the proposed method is satisfactory even when the correlation structure is misspecified, or when heteroscedasticity in variance is present. Finally, a real dataset is analyzed for illustration.
Resumo:
We consider estimating the total load from frequent flow data but less frequent concentration data. There are numerous load estimation methods available, some of which are captured in various online tools. However, most estimators are subject to large biases statistically, and their associated uncertainties are often not reported. This makes interpretation difficult and the estimation of trends or determination of optimal sampling regimes impossible to assess. In this paper, we first propose two indices for measuring the extent of sampling bias, and then provide steps for obtaining reliable load estimates that minimizes the biases and makes use of informative predictive variables. The key step to this approach is in the development of an appropriate predictive model for concentration. This is achieved using a generalized rating-curve approach with additional predictors that capture unique features in the flow data, such as the concept of the first flush, the location of the event on the hydrograph (e.g. rise or fall) and the discounted flow. The latter may be thought of as a measure of constituent exhaustion occurring during flood events. Forming this additional information can significantly improve the predictability of concentration, and ultimately the precision with which the pollutant load is estimated. We also provide a measure of the standard error of the load estimate which incorporates model, spatial and/or temporal errors. This method also has the capacity to incorporate measurement error incurred through the sampling of flow. We illustrate this approach for two rivers delivering to the Great Barrier Reef, Queensland, Australia. One is a data set from the Burdekin River, and consists of the total suspended sediment (TSS) and nitrogen oxide (NO(x)) and gauged flow for 1997. The other dataset is from the Tully River, for the period of July 2000 to June 2008. For NO(x) Burdekin, the new estimates are very similar to the ratio estimates even when there is no relationship between the concentration and the flow. However, for the Tully dataset, by incorporating the additional predictive variables namely the discounted flow and flow phases (rising or recessing), we substantially improved the model fit, and thus the certainty with which the load is estimated.
Resumo:
The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (Liang and Zeger, 1986). However, the efficiency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (Qin and Lawless, 1994). Analyses show that this hybrid method is more efficient than a GEE using a misspecified working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efficient parameter estimates. In simulations, the hybrid method's finite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efficient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.
Resumo:
Consider a general regression model with an arbitrary and unknown link function and a stochastic selection variable that determines whether the outcome variable is observable or missing. The paper proposes U-statistics that are based on kernel functions as estimators for the directions of the parameter vectors in the link function and the selection equation, and shows that these estimators are consistent and asymptotically normal.
Resumo:
The extended recruitment season for short-lived species such as prawns biases the estimation of growth parameters from length-frequency data when conventional methods are used. We propose a simple method for overcoming this bias given a time series of length-frequency data. The difficulties arising from extended recruitment are eliminated by predicting the growth of the succeeding samples and the length increments of the recruits in previous samples. This method requires that some maximum size at recruitment can be specified. The advantages of this multiple length-frequency method are: it is simple to use; it requires only three parameters; no specific distributions need to be assumed; and the actual seasonal recruitment pattern does not have to be specified. We illustrate the new method with length-frequency data on the tiger prawn Penaeus esculentus from the north-western Gulf of Carpentaria, Australia.
Resumo:
We propose a new model for estimating the size of a population from successive catches taken during a removal experiment. The data from these experiments often have excessive variation, known as overdispersion, as compared with that predicted by the multinomial model. The new model allows catchability to vary randomly among samplings, which accounts for overdispersion. When the catchability is assumed to have a beta distribution, the likelihood function, which is refered to as beta-multinomial, is derived, and hence the maximum likelihood estimates can be evaluated. Simulations show that in the presence of extravariation in the data, the confidence intervals have been substantially underestimated in previous models (Leslie-DeLury, Moran) and that the new model provides more reliable confidence intervals. The performance of these methods was also demonstrated using two real data sets: one with overdispersion, from smallmouth bass (Micropterus dolomieu), and the other without overdispersion, from rat (Rattus rattus).
Resumo:
We propose a simple method of constructing quasi-likelihood functions for dependent data based on conditional-mean-variance relationships, and apply the method to estimating the fractal dimension from box-counting data. Simulation studies were carried out to compare this method with the traditional methods. We also applied this technique to real data from fishing grounds in the Gulf of Carpentaria, Australia
Resumo:
Robust estimation often relies on a dispersion function that is more slowly varying at large values than the square function. However, the choice of tuning constant in dispersion functions may impact the estimation efficiency to a great extent. For a given family of dispersion functions such as the Huber family, we suggest obtaining the "best" tuning constant from the data so that the asymptotic efficiency is maximized. This data-driven approach can automatically adjust the value of the tuning constant to provide the necessary resistance against outliers. Simulation studies show that substantial efficiency can be gained by this data-dependent approach compared with the traditional approach in which the tuning constant is fixed. We briefly illustrate the proposed method using two datasets.
Resumo:
The approach of generalized estimating equations (GEE) is based on the framework of generalized linear models but allows for specification of a working matrix for modeling within-subject correlations. The variance is often assumed to be a known function of the mean. This article investigates the impacts of misspecifying the variance function on estimators of the mean parameters for quantitative responses. Our numerical studies indicate that (1) correct specification of the variance function can improve the estimation efficiency even if the correlation structure is misspecified; (2) misspecification of the variance function impacts much more on estimators for within-cluster covariates than for cluster-level covariates; and (3) if the variance function is misspecified, correct choice of the correlation structure may not necessarily improve estimation efficiency. We illustrate impacts of different variance functions using a real data set from cow growth.
Resumo:
The Macroscopic Fundamental Diagram (MFD) relates space-mean density and flow. Since the MFD represents the area-wide network traffic performance, studies on perimeter control strategies and network-wide traffic state estimation utilising the MFD concept have been reported. Most previous works have utilised data from fixed sensors, such as inductive loops, to estimate the MFD, which can cause biased estimation in urban networks due to queue spillovers at intersections. To overcome the limitation, recent literature reports the use of trajectory data obtained from probe vehicles. However, these studies have been conducted using simulated datasets; limited works have discussed the limitations of real datasets and their impact on the variable estimation. This study compares two methods for estimating traffic state variables of signalised arterial sections: a method based on cumulative vehicle counts (CUPRITE), and one based on vehicles’ trajectory from taxi Global Positioning System (GPS) log. The comparisons reveal some characteristics of taxi trajectory data available in Brisbane, Australia. The current trajectory data have limitations in quantity (i.e., the penetration rate), due to which the traffic state variables tend to be underestimated. Nevertheless, the trajectory-based method successfully captures the features of traffic states, which suggests that the trajectories from taxis can be a good estimator for the network-wide traffic states.
Resumo:
We propose an iterative estimating equations procedure for analysis of longitudinal data. We show that, under very mild conditions, the probability that the procedure converges at an exponential rate tends to one as the sample size increases to infinity. Furthermore, we show that the limiting estimator is consistent and asymptotically efficient, as expected. The method applies to semiparametric regression models with unspecified covariances among the observations. In the special case of linear models, the procedure reduces to iterative reweighted least squares. Finite sample performance of the procedure is studied by simulations, and compared with other methods. A numerical example from a medical study is considered to illustrate the application of the method.