1000 resultados para resampling method
Resumo:
INTRODUCTION/OBJECTIVES: Detection rates for adenoma and early colorectal cancer (CRC) are insufficient due to low compliance towards invasive screening procedures, like colonoscopy.Available non-invasive screening tests have unfortunately low sensitivity and specificity performances.Therefore, there is a large unmet need calling for a cost-effective, reliable and non-invasive test to screen for early neoplastic and pre-neoplastic lesions AIMS & Methods: The objective is to develop a screening test able to detect early CRCs and adenomas.This test is based on a nucleic acids multi-gene assay performed on peripheral blood mononuclear cells (PBMCs).A colonoscopy-controlled feasibility study was conducted on 179 subjects.The first 92 subjects was used as training set to generate a statistical significant signature.Colonoscopy revealed 21 subjects with CRC,30 with adenoma bigger than 1 cm and 41 with no neoplastic or inflammatory lesions.The second group of 48 subjects (controls, CRC and polyps) was used as a test set and will be kept blinded for the entire data analysis.To determine the organ and disease specificity 38 subjects were used:24 with inflammatory bowel disease (IBD),14 with other cancers than CRC (OC).Blood samples were taken from each patient the day of the colonoscopy and PBMCs were purified. Total RNA was extracted following standard procedures.Multiplex RT-qPCR was applied on 92 different candidate biomarkers.Different univariate and multivariate statistical methods were applied on these candidates and among them 60 biomarkers with significant p-values (<0.01) were selected.These biomarkers are involved in several different biological functions as cellular movement,cell signaling and interaction,tissue and cellular development,cancer and cell growth and proliferation.Two distinct biomarker signatures are used to separate patients without lesion from those with cancer or with adenoma, named COLOX CRC and COLOX POL respectively.COLOX performances were validated using random resampling method, bootstrap. RESULTS: COLOX CRC and POL tests successfully separate patients without lesions from those with CRC (Se 67%,Sp 93%,AUC 0.87) and from those with adenoma bigger than 1cm (Se 63%,Sp 83%,AUC 0.77),respectively. 6/24 patients in the IBD group and 1/14 patients in the OC group have a positive COLOX CRC CONCLUSION: The two COLOX tests demonstrated a high sensitivity and specificity to detect the presence of CRCs and adenomas bigger than 1 cm.A prospective, multicenter, pivotal study is underway in order to confirm these promising results in a larger cohort.
Resumo:
BACKGROUND: The objective is to develop a cost-effective, reliable and non invasive screening test able to detect early CRCs and adenomas. This is done on a nucleic acids multigene assay performed on peripheral blood mononuclear cells (PBMCs). METHODS: A colonoscopy-controlled study was conducted on 179 subjects. 92 subjects (21 CRC, 30 adenoma >1 cm and 41 controls) were used as training set to generate a signature. Other 48 subjects kept blinded (controls, CRC and polyps) were used as a test set. To determine organ and disease specificity 38 subjects were used: 24 with inflammatory bowel disease (IBD),14 with other cancers (OC). Blood samples were taken and PBMCs were purified. After the RNA extraction, multiplex RT-qPCR was applied on 92 different candidate biomarkers. After different univariate and multivariate analysis 60 biomarkers with significant p-values (<0.01) were selected. 2 distinct biomarker signatures are used to separate patients without lesion from those with CRC or with adenoma, named COLOX CRC and COLOX POL. COLOX performances were validated using random resampling method, bootstrap. RESULTS: COLOX CRC and POL tests successfully separate patients without lesions from those with CRC (Se 67%, Sp 93%, AUC 0.87), and from those with adenoma > 1cm (Se 63%, Sp 83%, AUC 0.77). 6/24 patients in the IBD group and 1/14 patients in the OC group have a positive COLOX CRC. CONCLUSION: The two COLOX tests demonstrated a high Se and Sp to detect the presence of CRCs and adenomas > 1 cm. A prospective, multicenter, pivotal study is underway in order to confirm these promising results in a larger cohort.
Resumo:
The objective of this study was to determine the minimum number of plants per plot that must be sampled in experiments with sugarcane (Saccharum officinarum) full-sib families in order to provide an effective estimation of genetic and phenotypic parameters of yield-related traits. The data were collected in a randomized complete block design with 18 sugarcane full-sib families and 6 replicates, with 20 plants per plot. The sample size was determined using resampling techniques with replacement, followed by an estimation of genetic and phenotypic parameters. Sample-size estimates varied according to the evaluated parameter and trait. The resampling method permits an efficient comparison of the sample-size effects on the estimation of genetic and phenotypic parameters. A sample of 16 plants per plot, or 96 individuals per family, was sufficient to obtain good estimates for all traits considered of all the characters evaluated. However, for Brix, if sample separation by trait were possible, ten plants per plot would give an efficient estimate for most of the characters evaluated.
Resumo:
Les documents publiés par des entreprises, tels les communiqués de presse, contiennent une foule d’informations sur diverses activités des entreprises. C’est une source précieuse pour des analyses en intelligence d’affaire. Cependant, il est nécessaire de développer des outils pour permettre d’exploiter cette source automatiquement, étant donné son grand volume. Ce mémoire décrit un travail qui s’inscrit dans un volet d’intelligence d’affaire, à savoir la détection de relations d’affaire entre les entreprises décrites dans des communiqués de presse. Dans ce mémoire, nous proposons une approche basée sur la classification. Les méthodes de classifications existantes ne nous permettent pas d’obtenir une performance satisfaisante. Ceci est notamment dû à deux problèmes : la représentation du texte par tous les mots, qui n’aide pas nécessairement à spécifier une relation d’affaire, et le déséquilibre entre les classes. Pour traiter le premier problème, nous proposons une approche de représentation basée sur des mots pivots c’est-à-dire les noms d’entreprises concernées, afin de mieux cerner des mots susceptibles de les décrire. Pour le deuxième problème, nous proposons une classification à deux étapes. Cette méthode s’avère plus appropriée que les méthodes traditionnelles de ré-échantillonnage. Nous avons testé nos approches sur une collection de communiqués de presse dans le domaine automobile. Nos expérimentations montrent que les approches proposées peuvent améliorer la performance de classification. Notamment, la représentation du document basée sur les mots pivots nous permet de mieux centrer sur les mots utiles pour la détection de relations d’affaire. La classification en deux étapes apporte une solution efficace au problème de déséquilibre entre les classes. Ce travail montre que la détection automatique des relations d’affaire est une tâche faisable. Le résultat de cette détection pourrait être utilisé dans une analyse d’intelligence d’affaire.
Resumo:
Le but de cette thèse est d étendre la théorie du bootstrap aux modèles de données de panel. Les données de panel s obtiennent en observant plusieurs unités statistiques sur plusieurs périodes de temps. Leur double dimension individuelle et temporelle permet de contrôler l 'hétérogénéité non observable entre individus et entre les périodes de temps et donc de faire des études plus riches que les séries chronologiques ou les données en coupe instantanée. L 'avantage du bootstrap est de permettre d obtenir une inférence plus précise que celle avec la théorie asymptotique classique ou une inférence impossible en cas de paramètre de nuisance. La méthode consiste à tirer des échantillons aléatoires qui ressemblent le plus possible à l échantillon d analyse. L 'objet statitstique d intérêt est estimé sur chacun de ses échantillons aléatoires et on utilise l ensemble des valeurs estimées pour faire de l inférence. Il existe dans la littérature certaines application du bootstrap aux données de panels sans justi cation théorique rigoureuse ou sous de fortes hypothèses. Cette thèse propose une méthode de bootstrap plus appropriée aux données de panels. Les trois chapitres analysent sa validité et son application. Le premier chapitre postule un modèle simple avec un seul paramètre et s 'attaque aux propriétés théoriques de l estimateur de la moyenne. Nous montrons que le double rééchantillonnage que nous proposons et qui tient compte à la fois de la dimension individuelle et la dimension temporelle est valide avec ces modèles. Le rééchantillonnage seulement dans la dimension individuelle n est pas valide en présence d hétérogénéité temporelle. Le ré-échantillonnage dans la dimension temporelle n est pas valide en présence d'hétérogénéité individuelle. Le deuxième chapitre étend le précédent au modèle panel de régression. linéaire. Trois types de régresseurs sont considérés : les caractéristiques individuelles, les caractéristiques temporelles et les régresseurs qui évoluent dans le temps et par individu. En utilisant un modèle à erreurs composées doubles, l'estimateur des moindres carrés ordinaires et la méthode de bootstrap des résidus, on montre que le rééchantillonnage dans la seule dimension individuelle est valide pour l'inférence sur les coe¢ cients associés aux régresseurs qui changent uniquement par individu. Le rééchantillonnage dans la dimen- sion temporelle est valide seulement pour le sous vecteur des paramètres associés aux régresseurs qui évoluent uniquement dans le temps. Le double rééchantillonnage est quand à lui est valide pour faire de l inférence pour tout le vecteur des paramètres. Le troisième chapitre re-examine l exercice de l estimateur de différence en di¤érence de Bertrand, Duflo et Mullainathan (2004). Cet estimateur est couramment utilisé dans la littérature pour évaluer l impact de certaines poli- tiques publiques. L exercice empirique utilise des données de panel provenant du Current Population Survey sur le salaire des femmes dans les 50 états des Etats-Unis d Amérique de 1979 à 1999. Des variables de pseudo-interventions publiques au niveau des états sont générées et on s attend à ce que les tests arrivent à la conclusion qu il n y a pas d e¤et de ces politiques placebos sur le salaire des femmes. Bertrand, Du o et Mullainathan (2004) montre que la non-prise en compte de l hétérogénéité et de la dépendance temporelle entraîne d importantes distorsions de niveau de test lorsqu'on évalue l'impact de politiques publiques en utilisant des données de panel. Une des solutions préconisées est d utiliser la méthode de bootstrap. La méthode de double ré-échantillonnage développée dans cette thèse permet de corriger le problème de niveau de test et donc d'évaluer correctement l'impact des politiques publiques.
Resumo:
We discuss the estimation of the expected value of the quality-adjusted survival, based on multistate models. We generalize an earlier work, considering the sojourn times in health states are not identically distributed, for a given vector of covariates. Approaches based on semiparametric and parametric (exponential and Weibull distributions) methodologies are considered. A simulation study is conducted to evaluate the performance of the proposed estimator and the jackknife resampling method is used to estimate the variance of such estimator. An application to a real data set is also included.
Resumo:
In clinical trials, it may be of interest taking into account physical and emotional well-being in addition to survival when comparing treatments. Quality-adjusted survival time has the advantage of incorporating information about both survival time and quality-of-life. In this paper, we discuss the estimation of the expected value of the quality-adjusted survival, based on multistate models for the sojourn times in health states. Semiparametric and parametric (with exponential distribution) approaches are considered. A simulation study is presented to evaluate the performance of the proposed estimator and the jackknife resampling method is used to compute bias and variance of the estimator. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Various inference procedures for linear regression models with censored failure times have been studied extensively. Recent developments on efficient algorithms to implement these procedures enhance the practical usage of such models in survival analysis. In this article, we present robust inferences for certain covariate effects on the failure time in the presence of "nuisance" confounders under a semiparametric, partial linear regression setting. Specifically, the estimation procedures for the regression coefficients of interest are derived from a working linear model and are valid even when the function of the confounders in the model is not correctly specified. The new proposals are illustrated with two examples and their validity for cases with practical sample sizes is demonstrated via a simulation study.
Resumo:
The construction of a reliable, practically useful prediction rule for future response is heavily dependent on the "adequacy" of the fitted regression model. In this article, we consider the absolute prediction error, the expected value of the absolute difference between the future and predicted responses, as the model evaluation criterion. This prediction error is easier to interpret than the average squared error and is equivalent to the mis-classification error for the binary outcome. We show that the distributions of the apparent error and its cross-validation counterparts are approximately normal even under a misspecified fitted model. When the prediction rule is "unsmooth", the variance of the above normal distribution can be estimated well via a perturbation-resampling method. We also show how to approximate the distribution of the difference of the estimated prediction errors from two competing models. With two real examples, we demonstrate that the resulting interval estimates for prediction errors provide much more information about model adequacy than the point estimates alone.
Resumo:
This paper introduces a novel approach to making inference about the regression parameters in the accelerated failure time (AFT) model for current status and interval censored data. The estimator is constructed by inverting a Wald type test for testing a null proportional hazards model. A numerically efficient Markov chain Monte Carlo (MCMC) based resampling method is proposed to simultaneously obtain the point estimator and a consistent estimator of its variance-covariance matrix. We illustrate our approach with interval censored data sets from two clinical studies. Extensive numerical studies are conducted to evaluate the finite sample performance of the new estimators.
Resumo:
Standardization is a common method for adjusting confounding factors when comparing two or more exposure category to assess excess risk. Arbitrary choice of standard population in standardization introduces selection bias due to healthy worker effect. Small sample in specific groups also poses problems in estimating relative risk and the statistical significance is problematic. As an alternative, statistical models were proposed to overcome such limitations and find adjusted rates. In this dissertation, a multiplicative model is considered to address the issues related to standardized index namely: Standardized Mortality Ratio (SMR) and Comparative Mortality Factor (CMF). The model provides an alternative to conventional standardized technique. Maximum likelihood estimates of parameters of the model are used to construct an index similar to the SMR for estimating relative risk of exposure groups under comparison. Parametric Bootstrap resampling method is used to evaluate the goodness of fit of the model, behavior of estimated parameters and variability in relative risk on generated sample. The model provides an alternative to both direct and indirect standardization method. ^
Resumo:
Genetic assignment methods use genotype likelihoods to draw inference about where individuals were or were not born, potentially allowing direct, real-time estimates of dispersal. We used simulated data sets to test the power and accuracy of Monte Carlo resampling methods in generating statistical thresholds for identifying F-0 immigrants in populations with ongoing gene flow, and hence for providing direct, real-time estimates of migration rates. The identification of accurate critical values required that resampling methods preserved the linkage disequilibrium deriving from recent generations of immigrants and reflected the sampling variance present in the data set being analysed. A novel Monte Carlo resampling method taking into account these aspects was proposed and its efficiency was evaluated. Power and error were relatively insensitive to the frequency assumed for missing alleles. Power to identify F-0 immigrants was improved by using large sample size (up to about 50 individuals) and by sampling all populations from which migrants may have originated. A combination of plotting genotype likelihoods and calculating mean genotype likelihood ratios (D-LR) appeared to be an effective way to predict whether F-0 immigrants could be identified for a particular pair of populations using a given set of markers.