998 resultados para Resampling methods


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resources created at the University of Southampton for the module Remote Sensing for Earth Observation

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Genetic assignment methods use genotype likelihoods to draw inference about where individuals were or were not born, potentially allowing direct, real-time estimates of dispersal. We used simulated data sets to test the power and accuracy of Monte Carlo resampling methods in generating statistical thresholds for identifying F-0 immigrants in populations with ongoing gene flow, and hence for providing direct, real-time estimates of migration rates. The identification of accurate critical values required that resampling methods preserved the linkage disequilibrium deriving from recent generations of immigrants and reflected the sampling variance present in the data set being analysed. A novel Monte Carlo resampling method taking into account these aspects was proposed and its efficiency was evaluated. Power and error were relatively insensitive to the frequency assumed for missing alleles. Power to identify F-0 immigrants was improved by using large sample size (up to about 50 individuals) and by sampling all populations from which migrants may have originated. A combination of plotting genotype likelihoods and calculating mean genotype likelihood ratios (D-LR) appeared to be an effective way to predict whether F-0 immigrants could be identified for a particular pair of populations using a given set of markers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Mites (Acari) have traditionally been treated as monophyletic, albeit composed of two major lineages: Acariformes and Parasitiformes. Yet recent studies based on morphology, molecular data, or combinations thereof, have increasingly drawn their monophyly into question. Furthermore, the usually basal (molecular) position of one or both mite lineages among the chelicerates is in conflict to their morphology, and to the widely accepted view that mites are close relatives of Ricinulei. Results: The phylogenetic position of the acariform mites is examined through employing SSU, partial LSU sequences, and morphology from 91 chelicerate extant terminals (forty Acariformes). In a static homology framework, molecular sequences were aligned using their secondary structure as guide, whereby regions of ambiguous alignment were discarded, and pre-aligned sequences analyzed under parsimony and different mixed models in a Bayesian inference. Parsimony and Bayesian analyses led to trees largely congruent concerning infraordinal, well-supported branches, but with low support for inter-ordinal relationships. An exception is Solifugae + Acariformes (P. P = 100%, J. = 0.91). In a dynamic homology framework, two analyses were run: a standard POY analysis and an analysis constrained by secondary structure. Both analyses led to largely congruent trees; supporting a (Palpigradi (Solifugae Acariformes)) clade and Ricinulei as sister group of Tetrapulmonata with the topology (Ricinulei (Amblypygi (Uropygi Araneae))). Combined analysis with two different morphological data matrices were run in order to evaluate the impact of constraining the analysis on the recovered topology when employing secondary structure as a guide for homology establishment. The constrained combined analysis yielded two topologies similar to the exclusively molecular analysis for both morphological matrices, except for the recovery of Pedipalpi instead of the (Uropygi Araneae) clade. The standard (direct optimization) POY analysis, however, led to the recovery of trees differing in the absence of the otherwise well-supported group Solifugae + Acariformes. Conclusions: Previous studies combining ribosomal sequences and morphology often recovered topologies similar to purely morphological analyses of Chelicerata. The apparent stability of certain clades not recovered here, like Haplocnemata and Acari, is regarded as a byproduct of the way the molecular homology was previously established using the instrumentalist approach implemented in POY. Constraining the analysis by a priori homology assessment is defended here as a way of maintaining the severity of the test when adding new data to the analysis. Although the strength of the method advocated here is keeping phylogenetic information from regions usually discarded in an exclusively static homology framework; it still has the inconvenience of being uninformative on the effect of alignment ambiguity on resampling methods of clade support estimation. Finally, putative morphological apomorphies of Solifugae + Acariformes are the reduction of the proximal cheliceral podomere, medial abutting of the leg coxae, loss of sperm nuclear membrane, and presence of differentiated germinative and secretory regions in the testis delivering their products into a common lumen.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we propose a subsampling estimator for the distribution ofstatistics diverging at either known rates when the underlying timeseries in strictly stationary abd strong mixing. Based on our results weprovide a detailed discussion how to estimate extreme order statisticswith dependent data and present two applications to assessing financialmarket risk. Our method performs well in estimating Value at Risk andprovides a superior alternative to Hill's estimator in operationalizingSafety First portofolio selection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of this paper is to compare the performance of twopredictive radiological models, logistic regression (LR) and neural network (NN), with five different resampling methods. One hundred and sixty-seven patients with proven calvarial lesions as the only known disease were enrolled. Clinical and CT data were used for LR and NN models. Both models were developed with cross validation, leave-one-out and three different bootstrap algorithms. The final results of each model were compared with error rate and the area under receiver operating characteristic curves (Az). The neural network obtained statistically higher Az than LR with cross validation. The remaining resampling validation methods did not reveal statistically significant differences between LR and NN rules. The neural network classifier performs better than the one based on logistic regression. This advantage is well detected by three-fold cross-validation, but remains unnoticed when leave-one-out or bootstrap algorithms are used.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L'imputation est souvent utilisée dans les enquêtes pour traiter la non-réponse partielle. Il est bien connu que traiter les valeurs imputées comme des valeurs observées entraîne une sous-estimation importante de la variance des estimateurs ponctuels. Pour remédier à ce problème, plusieurs méthodes d'estimation de la variance ont été proposées dans la littérature, dont des méthodes adaptées de rééchantillonnage telles que le Bootstrap et le Jackknife. Nous définissons le concept de double-robustesse pour l'estimation ponctuelle et de variance sous l'approche par modèle de non-réponse et l'approche par modèle d'imputation. Nous mettons l'emphase sur l'estimation de la variance à l'aide du Jackknife qui est souvent utilisé dans la pratique. Nous étudions les propriétés de différents estimateurs de la variance à l'aide du Jackknife pour l'imputation par la régression déterministe ainsi qu'aléatoire. Nous nous penchons d'abord sur le cas de l'échantillon aléatoire simple. Les cas de l'échantillonnage stratifié et à probabilités inégales seront aussi étudiés. Une étude de simulation compare plusieurs méthodes d'estimation de variance à l'aide du Jackknife en terme de biais et de stabilité relative quand la fraction de sondage n'est pas négligeable. Finalement, nous établissons la normalité asymptotique des estimateurs imputés pour l'imputation par régression déterministe et aléatoire.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les documents publiés par des entreprises, tels les communiqués de presse, contiennent une foule d’informations sur diverses activités des entreprises. C’est une source précieuse pour des analyses en intelligence d’affaire. Cependant, il est nécessaire de développer des outils pour permettre d’exploiter cette source automatiquement, étant donné son grand volume. Ce mémoire décrit un travail qui s’inscrit dans un volet d’intelligence d’affaire, à savoir la détection de relations d’affaire entre les entreprises décrites dans des communiqués de presse. Dans ce mémoire, nous proposons une approche basée sur la classification. Les méthodes de classifications existantes ne nous permettent pas d’obtenir une performance satisfaisante. Ceci est notamment dû à deux problèmes : la représentation du texte par tous les mots, qui n’aide pas nécessairement à spécifier une relation d’affaire, et le déséquilibre entre les classes. Pour traiter le premier problème, nous proposons une approche de représentation basée sur des mots pivots c’est-à-dire les noms d’entreprises concernées, afin de mieux cerner des mots susceptibles de les décrire. Pour le deuxième problème, nous proposons une classification à deux étapes. Cette méthode s’avère plus appropriée que les méthodes traditionnelles de ré-échantillonnage. Nous avons testé nos approches sur une collection de communiqués de presse dans le domaine automobile. Nos expérimentations montrent que les approches proposées peuvent améliorer la performance de classification. Notamment, la représentation du document basée sur les mots pivots nous permet de mieux centrer sur les mots utiles pour la détection de relations d’affaire. La classification en deux étapes apporte une solution efficace au problème de déséquilibre entre les classes. Ce travail montre que la détection automatique des relations d’affaire est une tâche faisable. Le résultat de cette détection pourrait être utilisé dans une analyse d’intelligence d’affaire.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Following up genetic linkage studies to identify the underlying susceptibility gene(s) for complex disease traits is an arduous yet biologically and clinically important task. Complex traits, such as hypertension, are considered polygenic with many genes influencing risk, each with small effects. Chromosome 2 has been consistently identified as a genomic region with genetic linkage evidence suggesting that one or more loci contribute to blood pressure levels and hypertension status. Using combined positional candidate gene methods, the Family Blood Pressure Program has concentrated efforts in investigating this region of chromosome 2 in an effort to identify underlying candidate hypertension susceptibility gene(s). Initial informatics efforts identified the boundaries of the region and the known genes within it. A total of 82 polymorphic sites in eight positional candidate genes were genotyped in a large hypothesis-generating sample consisting of 1640 African Americans, 1339 whites, and 1616 Mexican Americans. To adjust for multiple comparisons, resampling-based false discovery adjustment was applied, extending traditional resampling methods to sibship samples. Following this adjustment for multiple comparisons, SLC4A5, a sodium bicarbonate transporter, was identified as a primary candidate gene for hypertension. Polymorphisms in SLC4A5 were subsequently genotyped and analyzed for validation in two populations of African Americans (N = 461; N = 778) and two of whites (N = 550; N = 967). Again, SNPs within SLC4A5 were significantly associated with blood pressure levels and hypertension status. While not identifying a single causal DNA sequence variation that is significantly associated with blood pressure levels and hypertension status across all samples, the results further implicate SLC4A5 as a candidate hypertension susceptibility gene, validating previous evidence for one or more genes on chromosome 2 that influence hypertension related phenotypes in the population-at-large. The methodology and results reported provide a case study of one approach for following up the results of genetic linkage analyses to identify genes influencing complex traits. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In simultaneous analyses of multiple data partitions, the trees relevant when measuring support for a clade are the optimal tree, and the best tree lacking the clade (i.e., the most reasonable alternative). The parsimony-based method of partitioned branch support (PBS) forces each data set to arbitrate between the two relevant trees. This value is the amount each data set contributes to clade support in the combined analysis, and can be very different to support apparent in separate analyses. The approach used in PBS can also be employed in likelihood: a simultaneous analysis of all data retrieves the maximum likelihood tree, and the best tree without the clade of interest is also found. Each data set is fitted to the two trees and the log-likelihood difference calculated, giving partitioned likelihood support (PLS) for each data set. These calculations can be performed regardless of the complexity of the ML model adopted. The significance of PLS can be evaluated using a variety of resampling methods, such as the Kishino-Hasegawa test, the Shimodiara-Hasegawa test, or likelihood weights, although the appropriateness and assumptions of these tests remains debated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Static state estimators currently in use in power systems are prone to masking by multiple bad data. This is mainly because the power system regression model contains many leverage points; typically they have a cluster pattern. As reported recently in the statistical literature, only high breakdown point estimators are robust enough to cope with gross errors corrupting such a model. This paper deals with one such estimator, the least median of squares estimator, developed by Rousseeuw in 1984. The robustness of this method is assessed while applying it to power systems. Resampling methods are developed, and simulation results for IEEE test systems discussed. © 1991 IEEE.