982 resultados para Rank-size rule
Resumo:
Rank-based inference is widely used because of its robustness. This article provides optimal rank-based estimating functions in analysis of clustered data with random cluster effects. The extensive simulation studies carried out to evaluate the performance of the proposed method demonstrate that it is robust to outliers and is highly efficient given the existence of strong cluster correlations. The performance of the proposed method is satisfactory even when the correlation structure is misspecified, or when heteroscedasticity in variance is present. Finally, a real dataset is analyzed for illustration.
Resumo:
For clustered survival data, the traditional Gehan-type estimator is asymptotically equivalent to using only the between-cluster ranks, and the within-cluster ranks are ignored. The contribution of this paper is two fold: - (i) incorporating within-cluster ranks in censored data analysis, and; - (ii) applying the induced smoothing of Brown and Wang (2005, Biometrika) for computational convenience. Asymptotic properties of the resulting estimating functions are given. We also carry out numerical studies to assess the performance of the proposed approach and conclude that the proposed approach can lead to much improved estimators when strong clustering effects exist. A dataset from a litter-matched tumorigenesis experiment is used for illustration.
Resumo:
With growing population and fast urbanization in Australia, it is a challenging task to maintain our water quality. It is essential to develop an appropriate statistical methodology in analyzing water quality data in order to draw valid conclusions and hence provide useful advices in water management. This paper is to develop robust rank-based procedures for analyzing nonnormally distributed data collected over time at different sites. To take account of temporal correlations of the observations within sites, we consider the optimally combined estimating functions proposed by Wang and Zhu (Biometrika, 93:459-464, 2006) which leads to more efficient parameter estimation. Furthermore, we apply the induced smoothing method to reduce the computational burden. Smoothing leads to easy calculation of the parameter estimates and their variance-covariance matrix. Analysis of water quality data from Total Iron and Total Cyanophytes shows the differences between the traditional generalized linear mixed models and rank regression models. Our analysis also demonstrates the advantages of the rank regression models for analyzing nonnormal data.
Resumo:
Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which dearly demonstrates the advantages of the rank regression models.
Resumo:
We consider rank regression for clustered data analysis and investigate the induced smoothing method for obtaining the asymptotic covariance matrices of the parameter estimators. We prove that the induced estimating functions are asymptotically unbiased and the resulting estimators are strongly consistent and asymptotically normal. The induced smoothing approach provides an effective way for obtaining asymptotic covariance matrices for between- and within-cluster estimators and for a combined estimator to take account of within-cluster correlations. We also carry out extensive simulation studies to assess the performance of different estimators. The proposed methodology is substantially Much faster in computation and more stable in numerical results than the existing methods. We apply the proposed methodology to a dataset from a randomized clinical trial.
Resumo:
Power calculation and sample size determination are critical in designing environmental monitoring programs. The traditional approach based on comparing the mean values may become statistically inappropriate and even invalid when substantial proportions of the response values are below the detection limits or censored because strong distributional assumptions have to be made on the censored observations when implementing the traditional procedures. In this paper, we propose a quantile methodology that is robust to outliers and can also handle data with a substantial proportion of below-detection-limit observations without the need of imputing the censored values. As a demonstration, we applied the methods to a nutrient monitoring project, which is a part of the Perth Long-Term Ocean Outlet Monitoring Program. In this example, the sample size required by our quantile methodology is, in fact, smaller than that by the traditional t-test, illustrating the merit of our method.
Resumo:
This paper describes the design and implementation of a high-level query language called Generalized Query-By-Rule (GQBR) which supports retrieval, insertion, deletion and update operations. This language, based on the formalism of database logic, enables the users to access each database in a distributed heterogeneous environment, without having to learn all the different data manipulation languages. The compiler has been implemented on a DEC 1090 system in Pascal.
Resumo:
We consider rank-based regression models for repeated measures. To account for possible withinsubject correlations, we decompose the total ranks into between- and within-subject ranks and obtain two different estimators based on between- and within-subject ranks. A simple perturbation method is then introduced to generate bootstrap replicates of the estimating functions and the parameter estimates. This provides a convenient way for combining the corresponding two types of estimating function for more efficient estimation.
Resumo:
We propose a new model for estimating the size of a population from successive catches taken during a removal experiment. The data from these experiments often have excessive variation, known as overdispersion, as compared with that predicted by the multinomial model. The new model allows catchability to vary randomly among samplings, which accounts for overdispersion. When the catchability is assumed to have a beta distribution, the likelihood function, which is refered to as beta-multinomial, is derived, and hence the maximum likelihood estimates can be evaluated. Simulations show that in the presence of extravariation in the data, the confidence intervals have been substantially underestimated in previous models (Leslie-DeLury, Moran) and that the new model provides more reliable confidence intervals. The performance of these methods was also demonstrated using two real data sets: one with overdispersion, from smallmouth bass (Micropterus dolomieu), and the other without overdispersion, from rat (Rattus rattus).
Resumo:
Natural mortality of marine invertebrates is often very high in the early life history stages and decreases in later stages. The possible size-dependent mortality of juvenile banana prawns, P. merguiensis (2-15 mm carapace length) in the Gulf of Carpentaria was investigated. The analysis was based on the data collected at 2-weekly intervals by beam trawls at four sites over a period of six years (between September 1986 and March 1992). It was assumed that mortality was a parametric function of size, rather than a constant. Another complication in estimating mortality for juvenile banana prawns is that a significant proportion of the population emigrates from the study area each year. This effect was accounted for by incorporating the size-frequency pattern of the emigrants in the analysis. Both the extra parameter in the model required to describe the size dependence of mortality, and that used to account for emigration were found to be significantly different from zero, and the instantaneous mortality rate declined from 0.89 week(-1) for 2 mm prawns to 0.02 week(-1) for 15 mm prawns.
Resumo:
Adaptions of weighted rank regression to the accelerated failure time model for censored survival data have been successful in yielding asymptotically normal estimates and flexible weighting schemes to increase statistical efficiencies. However, for only one simple weighting scheme, Gehan or Wilcoxon weights, are estimating equations guaranteed to be monotone in parameter components, and even in this case are step functions, requiring the equivalent of linear programming for computation. The lack of smoothness makes standard error or covariance matrix estimation even more difficult. An induced smoothing technique overcame these difficulties in various problems involving monotone but pure jump estimating equations, including conventional rank regression. The present paper applies induced smoothing to the Gehan-Wilcoxon weighted rank regression for the accelerated failure time model, for the more difficult case of survival time data subject to censoring, where the inapplicability of permutation arguments necessitates a new method of estimating null variance of estimating functions. Smooth monotone parameter estimation and rapid, reliable standard error or covariance matrix estimation is obtained.
Resumo:
A 'pseudo-Bayesian' interpretation of standard errors yields a natural induced smoothing of statistical estimating functions. When applied to rank estimation, the lack of smoothness which prevents standard error estimation is remedied. Efficiency and robustness are preserved, while the smoothed estimation has excellent computational properties. In particular, convergence of the iterative equation for standard error is fast, and standard error calculation becomes asymptotically a one-step procedure. This property also extends to covariance matrix calculation for rank estimates in multi-parameter problems. Examples, and some simple explanations, are given.
Resumo:
Although subsampling is a common method for describing the composition of large and diverse trawl catches, the accuracy of these techniques is often unknown. We determined the sampling errors generated from estimating the percentage of the total number of species recorded in catches, as well as the abundance of each species, at each increase in the proportion of the sorted catch. We completely partitioned twenty prawn trawl catches from tropical northern Australia into subsamples of about 10 kg each. All subsamples were then sorted, and species numbers recorded. Catch weights ranged from 71 to 445 kg, and the number of fish species in trawls ranged from 60 to 138, and invertebrate species from 18 to 63. Almost 70% of the species recorded in catches were "rare" in subsamples (less than one individual per 10 kg subsample or less than one in every 389 individuals). A matrix was used to show the increase in the total number of species that were recorded in each catch as the percentage of the sorted catch increased. Simulation modelling showed that sorting small subsamples (about 10% of catch weights) identified about 50% of the total number of species caught in a trawl. Larger subsamples (50% of catch weight on average) identified about 80% of the total species caught in a trawl. The accuracy of estimating the abundance of each species also increased with increasing subsample size. For the "rare" species, sampling error was around 80% after sorting 10% of catch weight and was just less than 50% after 40% of catch weight had been sorted. For the "abundant" species (five or more individuals per 10 kg subsample or five or more in every 389 individuals), sampling error was around 25% after sorting 10% of catch weight, but was reduced to around 10% after 40% of catch weight had been sorted.
Resumo:
The topic of this study is the most renowned anthology of essays written in Literary Chinese, Guwen guanzhi, compiled and edited by Wu Chengquan (Chucai) and Wu Dazhi (Diaohou), and first published during the Qing dynasty, in 1695. Because of the low social standing of the compilers, their anthology remained outside the recommended study materials produced by members of the established literati and used for preparing students in the imperial civil-service examinations. However, since the end of the imperial era, Guwen guanzhi has risen to a position as the classical anthology par excellence. Today it is widely used as required or supplementary reading material of Literary Chinese in middle-schools both in Mainland China and on Taiwan. The goal of this study is to explain the persistent longevity of the anthology. So far, Guwen guanzhi has not been a topic of any published academic study, and the opinions expressed on it in various sources are widely discrepant. Through a comparative study with a dozen classical Chinese anthologies in use during the early Qing dynasty, this study reveals the extent to which the compilers of Guwen guanzhi modelled their work after other selections. Altogether 86 % of the texts in Guwen guanzhi originate from another Qing era anthology, Guwen xiyi, often copied character by character. However, the notes and commentaries are all different. Concentrating on the special characteristics unique to Guwen guanzhi—the commentaries and certain peculiarities in the selection of texts—this study then discusses the possible reasons for the popularity of Guwen guanzhi over the competing readers during the Qing era. Most remarkably, Guwen guanzhi put in practise the equalitarian, educational ideals of the Ming philosopher Wang Shouren (Yangming). Thus Guwen guanzhi suited the self-enlightenment needs of the ”subordinate classes”, in particular the rising middle-class comprised mainly of merchants. The lack of moral teleology, together with the compact size, relative comprehensiveness of the selection and good notes and comments, have made Guwen guanzhi well suited for the new society since the abolition of the imperial examination system. Through a content analysis, based on a sample of the texts, this study measures the relative emphasis on centralism and localism (both in concrete and spiritual terms) expressed in the texts of Guwen guanzhi. The analysis shows that the texts manifest some bias towards emphasising innate virtue on the expense of state-defined moral. This may reflect hidden critique towards intellectual oppression by the centralised imperial rule. During the early decades of the Qing era, such critique was often linked to Ming-loyalism. Finally, this study concludes that the kind of ”spiritual localism” that Guwen guanzhi manifests gives it the potential to undermine monolithic orthodoxy even in today’s Chinese societies. This study has progressed hand in hand with the translation of a selection of texts from Guwen guanzhi into Finnish, published by Gaudeamus Helsinki University Press: Jadekasvot – Valittuja tarinoita Kiinan muinaisajoilta (2005), Jadelähde – Valittuja kirjoituksia Kiinan keskiajalta (2007) and Jadepeili – Valittuja kirjoituksia keisarillisen Kiinan kulta-ajoilta (2008). All translations are critical editions, complete with extensive notation. The trilogy is the first comprehensive translation based on Guwen guanzhi in a European language.
Resumo:
Stallard (1998, Biometrics 54, 279-294) recently used Bayesian decision theory for sample-size determination in phase II trials. His design maximizes the expected financial gains in the development of a new treatment. However, it results in a very high probability (0.65) of recommending an ineffective treatment for phase III testing. On the other hand, the expected gain using his design is more than 10 times that of a design that tightly controls the false positive error (Thall and Simon, 1994, Biometrics 50, 337-349). Stallard's design maximizes the expected gain per phase II trial, but it does not maximize the rate of gain or total gain for a fixed length of time because the rate of gain depends on the proportion: of treatments forwarding to the phase III study. We suggest maximizing the rate of gain, and the resulting optimal one-stage design becomes twice as efficient as Stallard's one-stage design. Furthermore, the new design has a probability of only 0.12 of passing an ineffective treatment to phase III study.