Biblioteca Digital

893 resultados para Bias, Error Rates, Genetic Modelling

Classification Based upon Gene Expression Data: Bias and Precision of Error Rates

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Improving precision and reducing bias in biological surveys: estimating false-negative error rates

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of presence/absence data in wildlife management and biological surveys is widespread. There is a growing interest in quantifying the sources of error associated with these data. We show that false-negative errors (failure to record a species when in fact it is present) can have a significant impact on statistical estimation of habitat models using simulated data. Then we introduce an extension of logistic modeling, the zero-inflated binomial (ZIB) model that permits the estimation of the rate of false-negative errors and the correction of estimates of the probability of occurrence for false-negative errors by using repeated. visits to the same site. Our simulations show that even relatively low rates of false negatives bias statistical estimates of habitat effects. The method with three repeated visits eliminates the bias, but estimates are relatively imprecise. Six repeated visits improve precision of estimates to levels comparable to that achieved with conventional statistics in the absence of false-negative errors In general, when error rates are less than or equal to50% greater efficiency is gained by adding more sites, whereas when error rates are >50% it is better to increase the number of repeated visits. We highlight the flexibility of the method with three case studies, clearly demonstrating the effect of false-negative errors for a range of commonly used survey methods.

Veja mais

Maximum Margin Classifiers with Specified False Positive and False Negative Error Rates

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the problem of maximum margin classification given the moments of class conditional densities and the false positive and false negative error rates. Using Chebyshev inequalities, the problem can be posed as a second order cone programming problem. The dual of the formulation leads to a geometric optimization problem, that of computing the distance between two ellipsoids, which is solved by an iterative algorithm. The formulation is extended to non-linear classifiers using kernel methods. The resultant classifiers are applied to the case of classification of unbalanced datasets with asymmetric costs for misclassification. Experimental results on benchmark datasets show the efficacy of the proposed method.

Veja mais

Eng-genes: A new genetic modelling approach for nonlinear dynamic systems

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Next-generation sequencing of HIV-1 RNA genomes: determination of error rates and minimizing artificial recombination.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Next-generation sequencing (NGS) is a valuable tool for the detection and quantification of HIV-1 variants in vivo. However, these technologies require detailed characterization and control of artificially induced errors to be applicable for accurate haplotype reconstruction. To investigate the occurrence of substitutions, insertions, and deletions at the individual steps of RT-PCR and NGS, 454 pyrosequencing was performed on amplified and non-amplified HIV-1 genomes. Artificial recombination was explored by mixing five different HIV-1 clonal strains (5-virus-mix) and applying different RT-PCR conditions followed by 454 pyrosequencing. Error rates ranged from 0.04-0.66% and were similar in amplified and non-amplified samples. Discrepancies were observed between forward and reverse reads, indicating that most errors were introduced during the pyrosequencing step. Using the 5-virus-mix, non-optimized, standard RT-PCR conditions introduced artificial recombinants in a fraction of at least 30% of the reads that subsequently led to an underestimation of true haplotype frequencies. We minimized the fraction of recombinants down to 0.9-2.6% by optimized, artifact-reducing RT-PCR conditions. This approach enabled correct haplotype reconstruction and frequency estimations consistent with reference data obtained by single genome amplification. RT-PCR conditions are crucial for correct frequency estimation and analysis of haplotypes in heterogeneous virus populations. We developed an RT-PCR procedure to generate NGS data useful for reliable haplotype reconstruction and quantification.

Veja mais

Towards model based prediction of human error rates in interactive systems

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Error and uncertainty of adult age estimation of the pubic symphysis in an Australian sub-population using computed tomography

Relevância:

100.00% 100.00%

Publicador:

Resumo:

After attending this presentation, attendees will gain awareness of: (1) the error and uncertainty associated with the application of the Suchey-Brooks (S-B) method of age estimation of the pubic symphysis to a contemporary Australian population; (2) the implications of sexual dimorphism and bilateral asymmetry of the pubic symphysis through preliminary geometric morphometric assessment; and (3) the value of three-dimensional (3D) autopsy data acquisition for creating forensic anthropological standards. This presentation will impact the forensic science community by demonstrating that, in the absence of demographically sound skeletal collections, post-mortem autopsy data provides an exciting platform for the construction of large contemporary ‘virtual osteological libraries’ for which forensic anthropological research can be conducted on Australian individuals. More specifically, this study assesses the applicability and accuracy of the S-B method to a contemporary adult population in Queensland, Australia, and using a geometric morphometric approach, provides an insight to the age-related degeneration of the pubic symphysis. Despite the prominent use of the Suchey-Brooks (1990) method of age estimation in forensic anthropological practice, it is subject to intrinsic limitations, with reports of differential inter-population error rates between geographical locations1-4. Australian forensic anthropology is constrained by a paucity of population specific standards due to a lack of repositories of documented skeletons. Consequently, in Australian casework proceedings, standards constructed from predominately American reference samples are applied to establish a biological profile. In the global era of terrorism and natural disasters, more specific population standards are required to improve the efficiency of medico-legal death investigation in Queensland. The sample comprises multi-slice computed tomography (MSCT) scans of the pubic symphysis (slice thickness: 0.5mm, overlap: 0.1mm) on 195 individuals of caucasian ethnicity aged 15-70 years. Volume rendering reconstruction of the symphyseal surface was conducted in Amira® (v.4.1) and quantitative analyses in Rapidform® XOS. The sample was divided into ten-year age sub-sets (eg. 15-24) with a final sub-set of 65-70 years. Error with respect to the method’s assigned means were analysed on the basis of bias (directionality of error), inaccuracy (magnitude of error) and percentage correct classification of left and right symphyseal surfaces. Morphometric variables including surface area, circumference, maximum height and width of the symphyseal surface and micro-architectural assessment of cortical and trabecular bone composition were quantified using novel automated engineering software capabilities. The results of this study demonstrated correct age classification utilizing the mean and standard deviations of each phase of the S-B method of 80.02% and 86.18% in Australian males and females, respectively. Application of the S-B method resulted in positive biases and mean inaccuracies of 7.24 (±6.56) years for individuals less than 55 years of age, compared to negative biases and mean inaccuracies of 5.89 (±3.90) years for individuals greater than 55 years of age. Statistically significant differences between chronological and S-B mean age were demonstrated in 83.33% and 50% of the six age subsets in males and females, respectively. Asymmetry of the pubic symphysis was a frequent phenomenon with 53.33% of the Queensland population exhibiting statistically significant (χ2 - p<0.01) differential phase classification of left and right surfaces of the same individual. Directionality was found in bilateral asymmetry, with the right symphyseal faces being slightly older on average and providing more accurate estimates using the S-B method5. Morphometric analysis verified these findings, with the left surface exhibiting significantly greater circumference and surface area than the right (p<0.05). Morphometric analysis demonstrated an increase in maximum height and width of the surface with age, with most significant changes (p<0.05) occurring between the 25-34 and 55-64 year age subsets. These differences may be attributed to hormonal components linked to menopause in females and a reduction in testosterone in males. Micro-architectural analysis demonstrated degradation of cortical composition with age, with differential bone resorption between the medial, ventral and dorsal surfaces of the pubic symphysis. This study recommends that the S-B method be applied with caution in medico-legal death investigations of unknown skeletal remains in Queensland. Age estimation will always be accompanied by error; therefore this study demonstrates the potential for quantitative morphometric modelling of age related changes of the pubic symphysis as a tool for methodological refinement, providing a rigor and robust assessment to remove the subjectivity associated with current pelvic aging methods.

Veja mais

Restriction site-associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.

Veja mais

The v-MFG test: Investigating maternal, offspring and maternal-fetal genetic incompatibility effects on disease and viability

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The MFG test is a family-based association test that detects genetic effects contributing to disease in offspring, including offspring allelic effects, maternal allelic effects and MFG incompatibility effects. Like many other family-based association tests, it assumes that the offspring survival and the offspring-parent genotypes are conditionally independent provided the offspring is affected. However, when the putative disease-increasing locus can affect another competing phenotype, for example, offspring viability, the conditional independence assumption fails and these tests could lead to incorrect conclusions regarding the role of the gene in disease. We propose the v-MFG test to adjust for the genetic effects on one phenotype, e.g., viability, when testing the effects of that locus on another phenotype, e.g., disease. Using genotype data from nuclear families containing parents and at least one affected offspring, the v-MFG test models the distribution of family genotypes conditional on offspring phenotypes. It simultaneously estimates genetic effects on two phenotypes, viability and disease. Simulations show that the v-MFG test produces accurate genetic effect estimates on disease as well as on viability under several different scenarios. It generates accurate type-I error rates and provides adequate power with moderate sample sizes to detect genetic effects on disease risk when viability is reduced. We demonstrate the v-MFG test with HLA-DRB1 data from study participants with rheumatoid arthritis (RA) and their parents, we show that the v-MFG test successfully detects an MFG incompatibility effect on RA while simultaneously adjusting for a possible viability loss.

Veja mais

Modelling and simulation of the second-order Doppler error of a laser dual-frequency interferometer

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Only the first- order Doppler frequency shift is considered in current laser dual- frequency interferometers; however; the second- order Doppler frequency shift should be considered when the measurement corner cube ( MCC) moves at high velocity or variable velocity because it can cause considerable error. The influence of the second- order Doppler frequency shift on interferometer error is studied in this paper, and a model of the second- order Doppler error is put forward. Moreover, the model has been simulated with both high velocity and variable velocity motion. The simulated results show that the second- order Doppler error is proportional to the velocity of the MCC when it moves with uniform motion and the measured displacement is certain. When the MCC moves with variable motion, the second- order Doppler error concerns not only velocity but also acceleration. When muzzle velocity is zero the second- order Doppler error caused by an acceleration of 0.6g can be up to 2.5 nm in 0.4 s, which is not negligible in nanometric measurement. Moreover, when the muzzle velocity is nonzero, the accelerated motion may result in a greater error and decelerated motion may result in a smaller error.

Veja mais

Outperformance in exchange-traded fund pricing deviations: Generalized control of data snooping bias

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An investigation into exchange-traded fund (ETF) outperforrnance during the period 2008-2012 is undertaken utilizing a data set of 288 U.S. traded securities. ETFs are tested for net asset value (NAV) premium, underlying index and market benchmark outperformance, with Sharpe, Treynor, and Sortino ratios employed as risk-adjusted performance measures. A key contribution is the application of an innovative generalized stepdown procedure in controlling for data snooping bias. We find that a large proportion of optimized replication and debt asset class ETFs display risk-adjusted premiums with energy and precious metals focused funds outperforming the S&P 500 market benchmark.

Veja mais

Hui and Walter's latent-class model extended to estimate diagnostic test properties from surveillance data: a latent model for latent data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diagnostic test sensitivity and specificity are probabilistic estimates with far reaching implications for disease control, management and genetic studies. In the absence of 'gold standard' tests, traditional Bayesian latent class models may be used to assess diagnostic test accuracies through the comparison of two or more tests performed on the same groups of individuals. The aim of this study was to extend such models to estimate diagnostic test parameters and true cohort-specific prevalence, using disease surveillance data. The traditional Hui-Walter latent class methodology was extended to allow for features seen in such data, including (i) unrecorded data (i.e. data for a second test available only on a subset of the sampled population) and (ii) cohort-specific sensitivities and specificities. The model was applied with and without the modelling of conditional dependence between tests. The utility of the extended model was demonstrated through application to bovine tuberculosis surveillance data from Northern and the Republic of Ireland. Simulation coupled with re-sampling techniques, demonstrated that the extended model has good predictive power to estimate the diagnostic parameters and true herd-level prevalence from surveillance data. Our methodology can aid in the interpretation of disease surveillance data, and the results can potentially refine disease control strategies.

Veja mais

Étude du cortex prémoteur et préfrontal lors de la prise de décision pendant l'intégration temporelle des informations

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Une variété de modèles sur le processus de prise de décision dans divers contextes présume que les sujets accumulent les évidences sensorielles, échantillonnent et intègrent constamment les signaux pour et contre des hypothèses alternatives. L'intégration continue jusqu'à ce que les évidences en faveur de l'une des hypothèses dépassent un seuil de critère de décision (niveau de preuve exigé pour prendre une décision). De nouveaux modèles suggèrent que ce processus de décision est plutôt dynamique; les différents paramètres peuvent varier entre les essais et même pendant l’essai plutôt que d’être un processus statique avec des paramètres qui ne changent qu’entre les blocs d’essais. Ce projet de doctorat a pour but de démontrer que les décisions concernant les mouvements d’atteinte impliquent un mécanisme d’accumulation temporelle des informations sensorielles menant à un seuil de décision. Pour ce faire, nous avons élaboré un paradigme de prise de décision basée sur un stimulus ambigu afin de voir si les neurones du cortex moteur primaire (M1), prémoteur dorsal (PMd) et préfrontal (DLPFc) démontrent des corrélats neuronaux de ce processus d’accumulation temporelle. Nous avons tout d’abord testé différentes versions de la tâche avec l’aide de sujets humains afin de développer une tâche où l’on observe le comportement idéal des sujets pour nous permettre de vérifier l’hypothèse de travail. Les données comportementales chez l’humain et les singes des temps de réaction et du pourcentage d'erreurs montrent une augmentation systématique avec l'augmentation de l'ambigüité du stimulus. Ces résultats sont cohérents avec les prédictions des modèles de diffusion, tel que confirmé par une modélisation computationnelle des données. Nous avons, par la suite, enregistré des cellules dans M1, PMd et DLPFc de 2 singes pendant qu'ils s'exécutaient à la tâche. Les neurones de M1 ne semblent pas être influencés par l'ambiguïté des stimuli mais déchargent plutôt en corrélation avec le mouvement exécuté. Les neurones du PMd codent la direction du mouvement choisi par les singes, assez rapidement après la présentation du stimulus. De plus, l’activation de plusieurs cellules du PMd est plus lente lorsque l'ambiguïté du stimulus augmente et prend plus de temps à signaler la direction de mouvement. L’activité des neurones du PMd reflète le choix de l’animal, peu importe si c’est une bonne réponse ou une erreur. Ceci supporte un rôle du PMd dans la prise de décision concernant les mouvements d’atteinte. Finalement, nous avons débuté des enregistrements dans le cortex préfrontal et les résultats présentés sont préliminaires. Les neurones du DLPFc semblent beaucoup plus influencés par les combinaisons des facteurs de couleur et de position spatiale que les neurones du PMd. Notre conclusion est que le cortex PMd est impliqué dans l'évaluation des évidences pour ou contre la position spatiale de différentes cibles potentielles mais assez indépendamment de la couleur de celles-ci. Le cortex DLPFc serait plutôt responsable du traitement des informations pour la combinaison de la couleur et de la position des cibles spatiales et du stimulus ambigu nécessaire pour faire le lien entre le stimulus ambigu et la cible correspondante.

Veja mais

A note on the use of the generalized odds ratio in meta-analysis of association studies involving bi- and tri-allelic polymorphisms

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Background The generalized odds ratio (GOR) was recently suggested as a genetic model-free measure for association studies. However, its properties were not extensively investigated. We used Monte Carlo simulations to investigate type-I error rates, power and bias in both effect size and between-study variance estimates of meta-analyses using the GOR as a summary effect, and compared these results to those obtained by usual approaches of model specification. We further applied the GOR in a real meta-analysis of three genome-wide association studies in Alzheimer's disease. Findings For bi-allelic polymorphisms, the GOR performs virtually identical to a standard multiplicative model of analysis (e.g. per-allele odds ratio) for variants acting multiplicatively, but augments slightly the power to detect variants with a dominant mode of action, while reducing the probability to detect recessive variants. Although there were differences among the GOR and usual approaches in terms of bias and type-I error rates, both simulation- and real data-based results provided little indication that these differences will be substantial in practice for meta-analyses involving bi-allelic polymorphisms. However, the use of the GOR may be slightly more powerful for the synthesis of data from tri-allelic variants, particularly when susceptibility alleles are less common in the populations (≤10%). This gain in power may depend on knowledge of the direction of the effects. Conclusions For the synthesis of data from bi-allelic variants, the GOR may be regarded as a multiplicative-like model of analysis. The use of the GOR may be slightly more powerful in the tri-allelic case, particularly when susceptibility alleles are less common in the populations.

Veja mais

Genetic association studies under the population stratification, family pedigree and application to genome-wide association studies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation has three separate parts: the first part deals with the general pedigree association testing incorporating continuous covariates; the second part deals with the association tests under population stratification using the conditional likelihood tests; the third part deals with the genome-wide association studies based on the real rheumatoid arthritis (RA) disease data sets from Genetic Analysis Workshop 16 (GAW16) problem 1. Many statistical tests are developed to test the linkage and association using either case-control status or phenotype covariates for family data structure, separately. Those univariate analyses might not use all the information coming from the family members in practical studies. On the other hand, the human complex disease do not have a clear inheritance pattern, there might exist the gene interactions or act independently. In part I, the new proposed approach MPDT is focused on how to use both the case control information as well as the phenotype covariates. This approach can be applied to detect multiple marker effects. Based on the two existing popular statistics in family studies for case-control and quantitative traits respectively, the new approach could be used in the simple family structure data set as well as general pedigree structure. The combined statistics are calculated using the two statistics; A permutation procedure is applied for assessing the p-value with adjustment from the Bonferroni for the multiple markers. We use simulation studies to evaluate the type I error rates and the powers of the proposed approach. Our results show that the combined test using both case-control information and phenotype covariates not only has the correct type I error rates but also is more powerful than the other existing methods. For multiple marker interactions, our proposed method is also very powerful. Selective genotyping is an economical strategy in detecting and mapping quantitative trait loci in the genetic dissection of complex disease. When the samples arise from different ethnic groups or an admixture population, all the existing selective genotyping methods may result in spurious association due to different ancestry distributions. The problem can be more serious when the sample size is large, a general requirement to obtain sufficient power to detect modest genetic effects for most complex traits. In part II, I describe a useful strategy in selective genotyping while population stratification is present. Our procedure used a principal component based approach to eliminate any effect of population stratification. The paper evaluates the performance of our procedure using both simulated data from an early study data sets and also the HapMap data sets in a variety of population admixture models generated from empirical data. There are one binary trait and two continuous traits in the rheumatoid arthritis dataset of Problem 1 in the Genetic Analysis Workshop 16 (GAW16): RA status, AntiCCP and IgM. To allow multiple traits, we suggest a set of SNP-level F statistics by the concept of multiple-correlation to measure the genetic association between multiple trait values and SNP-specific genotypic scores and obtain their null distributions. Hereby, we perform 6 genome-wide association analyses using the novel one- and two-stage approaches which are based on single, double and triple traits. Incorporating all these 6 analyses, we successfully validate the SNPs which have been identified to be responsible for rheumatoid arthritis in the literature and detect more disease susceptibility SNPs for follow-up studies in the future. Except for chromosome 13 and 18, each of the others is found to harbour susceptible genetic regions for rheumatoid arthritis or related diseases, i.e., lupus erythematosus. This topic is discussed in part III.

Veja mais

893 resultados para Bias, Error Rates, Genetic Modelling

Filtro por publicador