220 resultados para Robust estimates
Resumo:
Genome-wide association studies (GWASs) have been successful at identifying single-nucleotide polymorphisms (SNPs) highly associated with common traits; however, a great deal of the heritable variation associated with common traits remains unaccounted for within the genome. Genome-wide complex trait analysis (GCTA) is a statistical method that applies a linear mixed model to estimate phenotypic variance of complex traits explained by genome-wide SNPs, including those not associated with the trait in a GWAS. We applied GCTA to 8 cohorts containing 7096 case and 19 455 control individuals of European ancestry in order to examine the missing heritability present in Parkinson's disease (PD). We meta-analyzed our initial results to produce robust heritability estimates for PD types across cohorts. Our results identify 27% (95% CI 17-38, P = 8.08E - 08) phenotypic variance associated with all types of PD, 15% (95% CI -0.2 to 33, P = 0.09) phenotypic variance associated with early-onset PD and 31% (95% CI 17-44, P = 1.34E - 05) phenotypic variance associated with late-onset PD. This is a substantial increase from the genetic variance identified by top GWAS hits alone (between 3 and 5%) and indicates there are substantially more risk loci to be identified. Our results suggest that although GWASs are a useful tool in identifying the most common variants associated with complex disease, a great deal of common variants of small effect remain to be discovered. © Published by Oxford University Press 2012.
Resumo:
This thesis has investigated how to cluster a large number of faces within a multi-media corpus in the presence of large session variation. Quality metrics are used to select the best faces to represent a sequence of faces; and session variation modelling improves clustering performance in the presence of wide variations across videos. Findings from this thesis contribute to improving the performance of both face verification systems and the fully automated clustering of faces from a large video corpus.
Resumo:
Commercial environments may receive only a fraction of expected genetic gains for growth rate as predicted from the selection environment This fraction is the result of undesirable genotype-by-environment interactions (G x E) and measured by the genetic correlation (r(g)) of growth between environments. Rapid estimates of genetic correlation achieved in one generation are notoriously difficult to estimate with precision. A new design is proposed where genetic correlations can be estimated by utilising artificial mating from cryopreserved semen and unfertilised eggs stripped from a single female. We compare a traditional phenotype analysis of growth to a threshold model where only the largest fish are genotyped for sire identification. The threshold model was robust to differences in family mortality differing up to 30%. The design is unique as it negates potential re-ranking of families caused by an interaction between common maternal environmental effects and growing environment. The design is suitable for rapid assessment of G x E over one generation with a true 0.70 genetic correlation yielding standard errors as low as 0.07. Different design scenarios were tested for bias and accuracy with a range of heritability values, number of half-sib families created, number of progeny within each full-sib family, number of fish genotyped, number of fish stocked, differing family survival rates and at various simulated genetic correlation levels
Resumo:
With growing population and fast urbanization in Australia, it is a challenging task to maintain our water quality. It is essential to develop an appropriate statistical methodology in analyzing water quality data in order to draw valid conclusions and hence provide useful advices in water management. This paper is to develop robust rank-based procedures for analyzing nonnormally distributed data collected over time at different sites. To take account of temporal correlations of the observations within sites, we consider the optimally combined estimating functions proposed by Wang and Zhu (Biometrika, 93:459-464, 2006) which leads to more efficient parameter estimation. Furthermore, we apply the induced smoothing method to reduce the computational burden. Smoothing leads to easy calculation of the parameter estimates and their variance-covariance matrix. Analysis of water quality data from Total Iron and Total Cyanophytes shows the differences between the traditional generalized linear mixed models and rank regression models. Our analysis also demonstrates the advantages of the rank regression models for analyzing nonnormal data.
Resumo:
Previous studies have shown that the external growth records of the posterior adductor muscle scar (PAMS) of the bivalve Pinna nobilis are incomplete and do not produce accurate age estimations. We have developed a new methodology to study age and growth using the inner record of the PAMS, which avoids the necessity of costly in situ shell measurements or isotopic studies. Using the inner record we identified the positions of PAMS previously obscured by nacre and estimated the number of missing records in adult specimens with strong abrasion of the calcite layer in the anterior portion of the shell. The study of the PAMS and inner record of two shells that were 6 years old when collected showed that only 2 and 3 PAMS were observed, while 6 inner records could be counted, thus confirming our working methodology. Growth parameters of a P. nobilis population located in Moraira, Spain (western Mediterranean) were estimated with the new methodology and compared to those obtained using PAMS data and in situ measurements. For the comparisons, we applied different models considering the data alternatively as length-at-age (LA) and tag-recapture (TR). Among every method we tested to fit the Von Bertalanffy growth model, we observed that LA data from inner record fitted to the model using non-linear mixed effects and the estimation of missing records using the calcite width was the most appropriate. The equation obtained with this method, L = 573*(1 - e(-0.16(t-0.02))), is very similar to that calculated previously from in situ measurements for the same population.
Resumo:
We propose a new model for estimating the size of a population from successive catches taken during a removal experiment. The data from these experiments often have excessive variation, known as overdispersion, as compared with that predicted by the multinomial model. The new model allows catchability to vary randomly among samplings, which accounts for overdispersion. When the catchability is assumed to have a beta distribution, the likelihood function, which is refered to as beta-multinomial, is derived, and hence the maximum likelihood estimates can be evaluated. Simulations show that in the presence of extravariation in the data, the confidence intervals have been substantially underestimated in previous models (Leslie-DeLury, Moran) and that the new model provides more reliable confidence intervals. The performance of these methods was also demonstrated using two real data sets: one with overdispersion, from smallmouth bass (Micropterus dolomieu), and the other without overdispersion, from rat (Rattus rattus).
Resumo:
Robust estimation often relies on a dispersion function that is more slowly varying at large values than the square function. However, the choice of tuning constant in dispersion functions may impact the estimation efficiency to a great extent. For a given family of dispersion functions such as the Huber family, we suggest obtaining the "best" tuning constant from the data so that the asymptotic efficiency is maximized. This data-driven approach can automatically adjust the value of the tuning constant to provide the necessary resistance against outliers. Simulation studies show that substantial efficiency can be gained by this data-dependent approach compared with the traditional approach in which the tuning constant is fixed. We briefly illustrate the proposed method using two datasets.
Resumo:
Robust methods are useful in making reliable statistical inferences when there are small deviations from the model assumptions. The widely used method of the generalized estimating equations can be "robustified" by replacing the standardized residuals with the M-residuals. If the Pearson residuals are assumed to be unbiased from zero, parameter estimators from the robust approach are asymptotically biased when error distributions are not symmetric. We propose a distribution-free method for correcting this bias. Our extensive numerical studies show that the proposed method can reduce the bias substantially. Examples are given for illustration.
Resumo:
We consider estimation of mortality rates and growth parameters from length-frequency data of a fish stock and derive the underlying length distribution of the population and the catch when there is individual variability in the von Bertalanffy growth parameter L-infinity. The model is flexible enough to accommodate 1) any recruitment pattern as a function of both time and length, 2) length-specific selectivity, and 3) varying fishing effort over time. The maximum likelihood method gives consistent estimates, provided the underlying distribution for individual variation in growth is correctly specified. Simulation results indicate that our method is reasonably robust to violations in the assumptions. The method is applied to tiger prawn data (Penaeus semisulcatus) to obtain estimates of natural and fishing mortality.
Resumo:
Although subsampling is a common method for describing the composition of large and diverse trawl catches, the accuracy of these techniques is often unknown. We determined the sampling errors generated from estimating the percentage of the total number of species recorded in catches, as well as the abundance of each species, at each increase in the proportion of the sorted catch. We completely partitioned twenty prawn trawl catches from tropical northern Australia into subsamples of about 10 kg each. All subsamples were then sorted, and species numbers recorded. Catch weights ranged from 71 to 445 kg, and the number of fish species in trawls ranged from 60 to 138, and invertebrate species from 18 to 63. Almost 70% of the species recorded in catches were "rare" in subsamples (less than one individual per 10 kg subsample or less than one in every 389 individuals). A matrix was used to show the increase in the total number of species that were recorded in each catch as the percentage of the sorted catch increased. Simulation modelling showed that sorting small subsamples (about 10% of catch weights) identified about 50% of the total number of species caught in a trawl. Larger subsamples (50% of catch weight on average) identified about 80% of the total species caught in a trawl. The accuracy of estimating the abundance of each species also increased with increasing subsample size. For the "rare" species, sampling error was around 80% after sorting 10% of catch weight and was just less than 50% after 40% of catch weight had been sorted. For the "abundant" species (five or more individuals per 10 kg subsample or five or more in every 389 individuals), sampling error was around 25% after sorting 10% of catch weight, but was reduced to around 10% after 40% of catch weight had been sorted.
Resumo:
The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988-96, while accounting for spatial and temporal correlations in the catch-effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had within-cluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations.
Resumo:
We consider estimation of mortality rates and growth parameters from length-frequency data of a fish stock when there is individual variability in the von Bertalanffy growth parameter L-infinity and investigate the possible bias in the estimates when the individual variability is ignored. Three methods are examined: (i) the regression method based on the Beverton and Holt's (1956, Rapp. P.V. Reun. Cons. Int. Explor. Mer, 140: 67-83) equation; (ii) the moment method of Powell (1979, Rapp. PV. Reun. Int. Explor. Mer, 175: 167-169); and (iii) a generalization of Powell's method that estimates the individual variability to be incorporated into the estimation. It is found that the biases in the estimates from the existing methods are, in general, substantial, even when individual variability in growth is small and recruitment is uniform, and the generalized method performs better in terms of bias but is subject to a larger variation. There is a need to develop robust and flexible methods to deal with individual variability in the analysis of length-frequency data.
Resumo:
During the past few decades, developing efficient methods to solve dynamic facility layout problems has been focused on significantly by practitioners and researchers. More specifically meta-heuristic algorithms, especially genetic algorithm, have been proven to be increasingly helpful to generate sub-optimal solutions for large-scale dynamic facility layout problems. Nevertheless, the uncertainty of the manufacturing factors in addition to the scale of the layout problem calls for a mixed genetic algorithm–robust approach that could provide a single unlimited layout design. The present research aims to devise a customized permutation-based robust genetic algorithm in dynamic manufacturing environments that is expected to be generating a unique robust layout for all the manufacturing periods. The numerical outcomes of the proposed robust genetic algorithm indicate significant cost improvements compared to the conventional genetic algorithm methods and a selective number of other heuristic and meta-heuristic techniques.
Resumo:
This project was a step forward in improving the voltage profile of traditional low voltage distribution networks with high photovoltaic generation or high peak demand. As a practical and economical solution, the developed methods use a Dynamic Voltage Restorer or DVR, which is a series voltage compensator, for continuous and communication-less power quality enhancement. The placement of DVR in the network is optimised in order to minimise its power rating and cost. In addition, new approaches were developed for grid synchronisation and control of DVR which are integrated with the voltage quality improvement algorithm for stable operation.
Resumo:
Purpose – Preliminary cost estimates for construction projects are often the basis of financial feasibility and budgeting decisions in the early stages of planning and for effective project control, monitoring and execution. The purpose of this paper is to identify and better understand the cost drivers and factors that contribute to the accuracy of estimates in residential construction projects from the developers’ perspective. Design/methodology/approach – The paper uses a literature review to determine the drivers that affect the accuracy of developers’ early stage cost estimates and the factors influencing the construction costs of residential construction projects. It used cost variance data and other supporting documentation collected from two case study projects in South East Queensland, Australia, along with semi-structured interviews conducted with the practitioners involved. Findings – It is found that many cost drivers or factors of cost uncertainty identified in the literature for large-scale projects are not as apparent and relevant for developers’ small-scale residential construction projects. Specifically, the certainty and completeness of project-specific information, suitability of historical cost data, contingency allowances, methods of estimating and the estimator’s level of experience significantly affect the accuracy of cost estimates. Developers of small-scale residential projects use pre-established and suitably priced bills of quantities as the prime estimating method, which is considered to be the most efficient and accurate method for standard house designs. However, this method needs to be backed with the expertise and experience of the estimator. Originality/value – There is a lack of research on the accuracy of developers’ early stage cost estimates and the relevance and applicability of cost drivers and factors in the residential construction projects. This research has practical significance for improving the accuracy of such preliminary cost estimates.