938 resultados para PARTIAL LIKELIHOOD
Resumo:
When we study the variables that a ffect survival time, we usually estimate their eff ects by the Cox regression model. In biomedical research, e ffects of the covariates are often modi ed by a biomarker variable. This leads to covariates-biomarker interactions. Here biomarker is an objective measurement of the patient characteristics at baseline. Liu et al. (2015) has built up a local partial likelihood bootstrap model to estimate and test this interaction e ffect of covariates and biomarker, but the R code developed by Liu et al. (2015) can only handle one variable and one interaction term and can not t the model with adjustment to nuisance variables. In this project, we expand the model to allow adjustment to nuisance variables, expand the R code to take any chosen interaction terms, and we set up many parameters for users to customize their research. We also build up an R package called "lplb" to integrate the complex computations into a simple interface. We conduct numerical simulation to show that the new method has excellent fi nite sample properties under both the null and alternative hypothesis. We also applied the method to analyze data from a prostate cancer clinical trial with acid phosphatase (AP) biomarker.
Resumo:
Ties among event times are often recorded in survival studies. For example, in a two week laboratory study where event times are measured in days, ties are very likely to occur. The proportional hazards model might be used in this setting using an approximated partial likelihood function. This approximation works well when the number of ties is small. on the other hand, discrete regression models are suggested when the data are heavily tied. However, in many situations it is not clear which approach should be used in practice. In this work, empirical guidelines based on Monte Carlo simulations are provided. These recommendations are based on a measure of the amount of tied data present and the mean square error. An example illustrates the proposed criterion.
Resumo:
Survival models involving frailties are commonly applied in studies where correlated event time data arise due to natural or artificial clustering. In this paper we present an application of such models in the animal breeding field. Specifically, a mixed survival model with a multivariate correlated frailty term is proposed for the analysis of data from over 3611 Brazilian Nellore cattle. The primary aim is to evaluate parental genetic effects on the trait length in days that their progeny need to gain a commercially specified standard weight gain. This trait is not measured directly but can be estimated from growth data. Results point to the importance of genetic effects and suggest that these models constitute a valuable data analysis tool for beef cattle breeding.
Resumo:
Background/Aims: Statistical analysis of age-at-onset involving family data is particularly complicated because there is a correlation pattern that needs to be modeled and also because there are measurements that are censored. In this paper, our main purpose was to evaluate the effect of genetic and shared family environmental factors on age-at-onset of three cardiovascular risk factors: hypertension, diabetes and high cholesterol. Methods: The mixed-effects Cox model proposed by Pankratz et al. [2005] was used to analyze the data from 81 families, involving 1,675 individuals from the village of Baependi, in the state of Minas Gerais, Brazil. Results: The analyses performed showed that the polygenic effect plays a greater role than the shared family environmental effect in explaining the variability of the age-at-onset of hypertension, diabetes and high cholesterol. The model which simultaneously evaluated both effects indicated that there are individuals which may have risk of hypertension due to polygenic effects 130% higher than the overall average risk for the entire sample. For diabetes and high cholesterol the risks of some individuals were 115 and 45%, respectively, higher than the overall average risk for the entire population. Conclusions: Results showed evidence of significant polygenic effects indicating that age-at-onset is a useful trait for gene mapping of the common complex diseases analyzed. In addition, we found that the polygenic random component might absorb the effects of some covariates usually considered in the risk evaluation, such as gender, age and BMI. Copyright (C) 2008 S. Karger AG, Basel
Resumo:
Multivariate lifetime data arise in various forms including recurrent event data when individuals are followed to observe the sequence of occurrences of a certain type of event; correlated lifetime when an individual is followed for the occurrence of two or more types of events, or when distinct individuals have dependent event times. In most studies there are covariates such as treatments, group indicators, individual characteristics, or environmental conditions, whose relationship to lifetime is of interest. This leads to a consideration of regression models.The well known Cox proportional hazards model and its variations, using the marginal hazard functions employed for the analysis of multivariate survival data in literature are not sufficient to explain the complete dependence structure of pair of lifetimes on the covariate vector. Motivated by this, in Chapter 2, we introduced a bivariate proportional hazards model using vector hazard function of Johnson and Kotz (1975), in which the covariates under study have different effect on two components of the vector hazard function. The proposed model is useful in real life situations to study the dependence structure of pair of lifetimes on the covariate vector . The well known partial likelihood approach is used for the estimation of parameter vectors. We then introduced a bivariate proportional hazards model for gap times of recurrent events in Chapter 3. The model incorporates both marginal and joint dependence of the distribution of gap times on the covariate vector . In many fields of application, mean residual life function is considered superior concept than the hazard function. Motivated by this, in Chapter 4, we considered a new semi-parametric model, bivariate proportional mean residual life time model, to assess the relationship between mean residual life and covariates for gap time of recurrent events. The counting process approach is used for the inference procedures of the gap time of recurrent events. In many survival studies, the distribution of lifetime may depend on the distribution of censoring time. In Chapter 5, we introduced a proportional hazards model for duration times and developed inference procedures under dependent (informative) censoring. In Chapter 6, we introduced a bivariate proportional hazards model for competing risks data under right censoring. The asymptotic properties of the estimators of the parameters of different models developed in previous chapters, were studied. The proposed models were applied to various real life situations.
Resumo:
We propose a new method for fitting proportional hazards models with error-prone covariates. Regression coefficients are estimated by solving an estimating equation that is the average of the partial likelihood scores based on imputed true covariates. For the purpose of imputation, a linear spline model is assumed on the baseline hazard. We discuss consistency and asymptotic normality of the resulting estimators, and propose a stochastic approximation scheme to obtain the estimates. The algorithm is easy to implement, and reduces to the ordinary Cox partial likelihood approach when the measurement error has a degenerative distribution. Simulations indicate high efficiency and robustness. We consider the special case where error-prone replicates are available on the unobserved true covariates. As expected, increasing the number of replicate for the unobserved covariates increases efficiency and reduces bias. We illustrate the practical utility of the proposed method with an Eastern Cooperative Oncology Group clinical trial where a genetic marker, c-myc expression level, is subject to measurement error.
Resumo:
In traditional criminal investigation, uncertainties are often dealt with using a combination of common sense, practical considerations and experience, but rarely with tailored statistical models. For example, in some countries, in order to search for a given profile in the national DNA database, it must have allelic information for six or more of the ten SGM Plus loci for a simple trace. If the profile does not have this amount of information then it cannot be searched in the national DNA database (NDNAD). This requirement (of a result at six or more loci) is not based on a statistical approach, but rather on the feeling that six or more would be sufficient. A statistical approach, however, could be more rigorous and objective and would take into consideration factors such as the probability of adventitious matches relative to the actual database size and/or investigator's requirements in a sensible way. Therefore, this research was undertaken to establish scientific foundations pertaining to the use of partial SGM Plus loci profiles (or similar) for investigation.
Resumo:
Abstract : In the subject of fingerprints, the rise of computers tools made it possible to create powerful automated search algorithms. These algorithms allow, inter alia, to compare a fingermark to a fingerprint database and therefore to establish a link between the mark and a known source. With the growth of the capacities of these systems and of data storage, as well as increasing collaboration between police services on the international level, the size of these databases increases. The current challenge for the field of fingerprint identification consists of the growth of these databases, which makes it possible to find impressions that are very similar but coming from distinct fingers. However and simultaneously, this data and these systems allow a description of the variability between different impressions from a same finger and between impressions from different fingers. This statistical description of the withinand between-finger variabilities computed on the basis of minutiae and their relative positions can then be utilized in a statistical approach to interpretation. The computation of a likelihood ratio, employing simultaneously the comparison between the mark and the print of the case, the within-variability of the suspects' finger and the between-variability of the mark with respect to a database, can then be based on representative data. Thus, these data allow an evaluation which may be more detailed than that obtained by the application of rules established long before the advent of these large databases or by the specialists experience. The goal of the present thesis is to evaluate likelihood ratios, computed based on the scores of an automated fingerprint identification system when the source of the tested and compared marks is known. These ratios must support the hypothesis which it is known to be true. Moreover, they should support this hypothesis more and more strongly with the addition of information in the form of additional minutiae. For the modeling of within- and between-variability, the necessary data were defined, and acquired for one finger of a first donor, and two fingers of a second donor. The database used for between-variability includes approximately 600000 inked prints. The minimal number of observations necessary for a robust estimation was determined for the two distributions used. Factors which influence these distributions were also analyzed: the number of minutiae included in the configuration and the configuration as such for both distributions, as well as the finger number and the general pattern for between-variability, and the orientation of the minutiae for within-variability. In the present study, the only factor for which no influence has been shown is the orientation of minutiae The results show that the likelihood ratios resulting from the use of the scores of an AFIS can be used for evaluation. Relatively low rates of likelihood ratios supporting the hypothesis known to be false have been obtained. The maximum rate of likelihood ratios supporting the hypothesis that the two impressions were left by the same finger when the impressions came from different fingers obtained is of 5.2 %, for a configuration of 6 minutiae. When a 7th then an 8th minutia are added, this rate lowers to 3.2 %, then to 0.8 %. In parallel, for these same configurations, the likelihood ratios obtained are on average of the order of 100,1000, and 10000 for 6,7 and 8 minutiae when the two impressions come from the same finger. These likelihood ratios can therefore be an important aid for decision making. Both positive evolutions linked to the addition of minutiae (a drop in the rates of likelihood ratios which can lead to an erroneous decision and an increase in the value of the likelihood ratio) were observed in a systematic way within the framework of the study. Approximations based on 3 scores for within-variability and on 10 scores for between-variability were found, and showed satisfactory results. Résumé : Dans le domaine des empreintes digitales, l'essor des outils informatisés a permis de créer de puissants algorithmes de recherche automatique. Ces algorithmes permettent, entre autres, de comparer une trace à une banque de données d'empreintes digitales de source connue. Ainsi, le lien entre la trace et l'une de ces sources peut être établi. Avec la croissance des capacités de ces systèmes, des potentiels de stockage de données, ainsi qu'avec une collaboration accrue au niveau international entre les services de police, la taille des banques de données augmente. Le défi actuel pour le domaine de l'identification par empreintes digitales consiste en la croissance de ces banques de données, qui peut permettre de trouver des impressions très similaires mais provenant de doigts distincts. Toutefois et simultanément, ces données et ces systèmes permettent une description des variabilités entre différentes appositions d'un même doigt, et entre les appositions de différents doigts, basées sur des larges quantités de données. Cette description statistique de l'intra- et de l'intervariabilité calculée à partir des minuties et de leurs positions relatives va s'insérer dans une approche d'interprétation probabiliste. Le calcul d'un rapport de vraisemblance, qui fait intervenir simultanément la comparaison entre la trace et l'empreinte du cas, ainsi que l'intravariabilité du doigt du suspect et l'intervariabilité de la trace par rapport à une banque de données, peut alors se baser sur des jeux de données représentatifs. Ainsi, ces données permettent d'aboutir à une évaluation beaucoup plus fine que celle obtenue par l'application de règles établies bien avant l'avènement de ces grandes banques ou par la seule expérience du spécialiste. L'objectif de la présente thèse est d'évaluer des rapports de vraisemblance calcul és à partir des scores d'un système automatique lorsqu'on connaît la source des traces testées et comparées. Ces rapports doivent soutenir l'hypothèse dont il est connu qu'elle est vraie. De plus, ils devraient soutenir de plus en plus fortement cette hypothèse avec l'ajout d'information sous la forme de minuties additionnelles. Pour la modélisation de l'intra- et l'intervariabilité, les données nécessaires ont été définies, et acquises pour un doigt d'un premier donneur, et deux doigts d'un second donneur. La banque de données utilisée pour l'intervariabilité inclut environ 600000 empreintes encrées. Le nombre minimal d'observations nécessaire pour une estimation robuste a été déterminé pour les deux distributions utilisées. Des facteurs qui influencent ces distributions ont, par la suite, été analysés: le nombre de minuties inclus dans la configuration et la configuration en tant que telle pour les deux distributions, ainsi que le numéro du doigt et le dessin général pour l'intervariabilité, et la orientation des minuties pour l'intravariabilité. Parmi tous ces facteurs, l'orientation des minuties est le seul dont une influence n'a pas été démontrée dans la présente étude. Les résultats montrent que les rapports de vraisemblance issus de l'utilisation des scores de l'AFIS peuvent être utilisés à des fins évaluatifs. Des taux de rapports de vraisemblance relativement bas soutiennent l'hypothèse que l'on sait fausse. Le taux maximal de rapports de vraisemblance soutenant l'hypothèse que les deux impressions aient été laissées par le même doigt alors qu'en réalité les impressions viennent de doigts différents obtenu est de 5.2%, pour une configuration de 6 minuties. Lorsqu'une 7ème puis une 8ème minutie sont ajoutées, ce taux baisse d'abord à 3.2%, puis à 0.8%. Parallèlement, pour ces mêmes configurations, les rapports de vraisemblance sont en moyenne de l'ordre de 100, 1000, et 10000 pour 6, 7 et 8 minuties lorsque les deux impressions proviennent du même doigt. Ces rapports de vraisemblance peuvent donc apporter un soutien important à la prise de décision. Les deux évolutions positives liées à l'ajout de minuties (baisse des taux qui peuvent amener à une décision erronée et augmentation de la valeur du rapport de vraisemblance) ont été observées de façon systématique dans le cadre de l'étude. Des approximations basées sur 3 scores pour l'intravariabilité et sur 10 scores pour l'intervariabilité ont été trouvées, et ont montré des résultats satisfaisants.
Resumo:
This study determined whether clinical salt-sensitive hypertension (cSSHT) results from the interaction between partial arterial baroreceptor impairment and a high-sodium (HNa) diet. In three series (S-I, S-II, S-III), mean arterial pressure (MAP) of conscious male Wistar ChR003 rats was measured once before (pdMAP) and twice after either sham (SHM) or bilateral aortic denervation (AD), following 7 days on a low-sodium (LNa) diet (LNaMAP) and then 21 days on a HNa diet (HNaMAP). The roles of plasma nitric oxide bioavailability (pNOB), renal medullary superoxide anion production (RMSAP), and mRNA expression of NAD(P)H oxidase and superoxide dismutase were also assessed. In SHM (n=11) and AD (n=15) groups of S-I, LNaMAP-pdMAP was 10.5±2.1 vs 23±2.1 mmHg (P<0.001), and the salt-sensitivity index (SSi; HNaMAP−LNaMAP) was 6.0±1.9 vs 12.7±1.9 mmHg (P=0.03), respectively. In the SHM group, all rats were normotensive, and 36% were salt sensitive (SSi≥10 mmHg), whereas in the AD group ∼50% showed cSSHT. A 45% reduction in pNOB (P≤0.004) was observed in both groups in dietary transit. RMSAP increased in the AD group on both diets but more so on the HNa diet (S-II, P<0.03) than on the LNa diet (S-III, P<0.04). MAP modeling in rats without a renal hypertensive genotype indicated that the AD*HNa diet interaction (P=0.008) increases the likelihood of developing cSSHT. Translationally, these findings help to explain why subjects with clinical salt-sensitive normotension may transition to cSSHT.
Resumo:
Data available on continuous-time diffusions are always sampled discretely in time. In most cases, the likelihood function of the observations is not directly computable. This survey covers a sample of the statistical methods that have been developed to solve this problem. We concentrate on some recent contributions to the literature based on three di§erent approaches to the problem: an improvement of the Euler-Maruyama discretization scheme, the employment of Martingale Estimating Functions, and the application of Generalized Method of Moments (GMM).
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Resumo:
Negative-ion mode electrospray ionization, ESI(-), with Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was coupled to a Partial Least Squares (PLS) regression and variable selection methods to estimate the total acid number (TAN) of Brazilian crude oil samples. Generally, ESI(-)-FT-ICR mass spectra present a power of resolution of ca. 500,000 and a mass accuracy less than 1 ppm, producing a data matrix containing over 5700 variables per sample. These variables correspond to heteroatom-containing species detected as deprotonated molecules, [M - H](-) ions, which are identified primarily as naphthenic acids, phenols and carbazole analog species. The TAN values for all samples ranged from 0.06 to 3.61 mg of KOH g(-1). To facilitate the spectral interpretation, three methods of variable selection were studied: variable importance in the projection (VIP), interval partial least squares (iPLS) and elimination of uninformative variables (UVE). The UVE method seems to be more appropriate for selecting important variables, reducing the dimension of the variables to 183 and producing a root mean square error of prediction of 0.32 mg of KOH g(-1). By reducing the size of the data, it was possible to relate the selected variables with their corresponding molecular formulas, thus identifying the main chemical species responsible for the TAN values.
Resumo:
The aim of this work was to characterize the effects of partial inhibition of respiratory complex I by rotenone on H2O2 production by isolated rat brain mitochondria in different respiratory states. Flow cytometric analysis of membrane potential in isolated mitochondria indicated that rotenone leads to uniform respiratory inhibition when added to a suspension of mitochondria. When mitochondria were incubated in the presence of a low concentration of rotenone (10 nm) and NADH-linked substrates, oxygen consumption was reduced from 45.9 ± 1.0 to 26.4 ± 2.6 nmol O2 mg(-1) min(-1) and from 7.8 ± 0.3 to 6.3 ± 0.3 nmol O2 mg(-1) min(-1) in respiratory states 3 (ADP-stimulated respiration) and 4 (resting respiration), respectively. Under these conditions, mitochondrial H2O2 production was stimulated from 12.2 ± 1.1 to 21.0 ± 1.2 pmol H2O2 mg(-1) min(-1) and 56.5 ± 4.7 to 95.0 ± 11.1 pmol H2O2 mg(-1) min(-1) in respiratory states 3 and 4, respectively. Similar results were observed when comparing mitochondrial preparations enriched with synaptic or nonsynaptic mitochondria or when 1-methyl-4-phenylpyridinium ion (MPP(+)) was used as a respiratory complex I inhibitor. Rotenone-stimulated H2O2 production in respiratory states 3 and 4 was associated with a high reduction state of endogenous nicotinamide nucleotides. In succinate-supported mitochondrial respiration, where most of the mitochondrial H2O2 production relies on electron backflow from complex II to complex I, low rotenone concentrations inhibited H2O2 production. Rotenone had no effect on mitochondrial elimination of micromolar concentrations of H2O2. The present results support the conclusion that partial complex I inhibition may result in mitochondrial energy crisis and oxidative stress, the former being predominant under oxidative phosphorylation and the latter under resting respiration conditions.