938 resultados para Non-parametric regression methods
Resumo:
Gaussian processes provide natural non-parametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.
Resumo:
1. Pearson's correlation coefficient only tests whether the data fit a linear model. With large numbers of observations, quite small values of r become significant and the X variable may only account for a minute proportion of the variance in Y. Hence, the value of r squared should always be calculated and included in a discussion of the significance of r. 2. The use of r assumes that a bivariate normal distribution is present and this assumption should be examined prior to the study. If Pearson's r is not appropriate, then a non-parametric correlation coefficient such as Spearman's rs may be used. 3. A significant correlation should not be interpreted as indicating causation especially in observational studies in which there is a high probability that the two variables are correlated because of their mutual correlations with other variables. 4. In studies of measurement error, there are problems in using r as a test of reliability and the ‘intra-class correlation coefficient’ should be used as an alternative. A correlation test provides only limited information as to the relationship between two variables. Fitting a regression line to the data using the method known as ‘least square’ provides much more information and the methods of regression and their application in optometry will be discussed in the next article.
Resumo:
Ten common doubts of chemistry students and professionals about their statistical applications are discussed. The use of the N-1 denominator instead of N is described for the standard deviation. The statistical meaning of the denominators of the root mean square error of calibration (RMSEC) and root mean square error of validation (RMSEV) are given for researchers using multivariate calibration methods. The reason why scientists and engineers use the average instead of the median is explained. Several problematic aspects about regression and correlation are treated. The popular use of triplicate experiments in teaching and research laboratories is seen to have its origin in statistical confidence intervals. Nonparametric statistics and bootstrapping methods round out the discussion.
Resumo:
The zero-inflated negative binomial model is used to account for overdispersion detected in data that are initially analyzed under the zero-Inflated Poisson model A frequentist analysis a jackknife estimator and a non-parametric bootstrap for parameter estimation of zero-inflated negative binomial regression models are considered In addition an EM-type algorithm is developed for performing maximum likelihood estimation Then the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and some ways to perform global influence analysis are derived In order to study departures from the error assumption as well as the presence of outliers residual analysis based on the standardized Pearson residuals is discussed The relevance of the approach is illustrated with a real data set where It is shown that zero-inflated negative binomial regression models seems to fit the data better than the Poisson counterpart (C) 2010 Elsevier B V All rights reserved
Resumo:
This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.
Resumo:
There has been a resurgence of interest in the mean trace length estimator of Pahl for window sampling of traces. The estimator has been dealt with by Mauldon and Zhang and Einstein in recent publications. The estimator is a very useful one in that it is non-parametric. However, despite some discussion regarding the statistical distribution of the estimator, none of the recent works or the original work by Pahl provide a rigorous basis for the determination a confidence interval for the estimator or a confidence region for the estimator and the corresponding estimator of trace spatial intensity in the sampling window. This paper shows, by consideration of a simplified version of the problem but without loss of generality, that the estimator is in fact the maximum likelihood estimator (MLE) and that it can be considered essentially unbiased. As the MLE, it possesses the least variance of all estimators and confidence intervals or regions should therefore be available through application of classical ML theory. It is shown that valid confidence intervals can in fact be determined. The results of the work and the calculations of the confidence intervals are illustrated by example. (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
Objective: To compare measurements of the upper arm cross-sectional areas (total arm area,arm muscle area, and arm fat area of healthy neonates) as calculated using anthropometry with the values obtained by ultrasonography. Materials and methods: This study was performed on 60 consecutively born healthy neonates: gestational age (mean6SD) 39.661.2 weeks, birth weight 3287.16307.7 g, 27 males (45%) and 33 females (55%). Mid-arm circumference and tricipital skinfold thickness measurements were taken on the left upper mid-arm according to the conventional anthropometric method to calculate total arm area, arm muscle area and arm fat area. The ultrasound evaluation was performed at the same arm location using a Toshiba sonolayer SSA-250AÒ, which allows the calculation of the total arm area, arm muscle area and arm fat area by the number of pixels enclosed in the plotted areas. Statistical analysis: whenever appropriate, parametric and non-parametric tests were used in order to compare measurements of paired samples and of groups of samples. Results: No significant differences between males and females were found in any evaluated measurements, estimated either by anthropometry or by ultrasound. Also the median of total arm area did not differ significantly with either method (P50.337). Although there is evidence of concordance of the total arm area measurements (r50.68, 95% CI: 0.55–0.77) the two methods of measurement differed for arm muscle area and arm fat area. The estimated median of measurements by ultrasound for arm muscle area were significantly lower than those estimated by the anthropometric method, which differed by as much as 111% (P,0.001). The estimated median ultrasound measurement of the arm fat was higher than the anthropometric arm fat area by as much as 31% (P,0.001). Conclusion: Compared with ultrasound measurements using skinfold measurements and mid-arm circumference without further correction may lead to overestimation of the cross-sectional area of muscle and underestimation of the cross-sectional fat area. The correlation between the two methods could be interpreted as an indication for further search of correction factors in the equations.
Resumo:
RESUMO - Enquadramento: A Brucelose é uma antropozoonose prevalente no Mundo e é uma das mais negligenciadas. A sua transmissão ao ser humano é directa e indirecta, e acontece por via de contacto com animal infectado, o consumo de leite e seus derivados não pasteurizados e a não observância de uso de equipamentos de protecção individual e colectiva, entre outros factores. O conhecimento da prevalência e incidência da brucelose animal e humana no Namibe, uma província de Angola, é muito escasso sendo poucos os estudos que evidenciam esta doença no seio dos profissionais da pecuária expostos: trabalhadores de matadouros, veterinários e criadores de gado. É assim pertinente, com base em estudos científicos específicos, caracterizar esta situação. Objectivos: Caracterizar os ambientes dos profissionais (matadouro, talhos e salas municipais de abate e explorações); estimar a seroprevalência da brucelose humana em profissionais da pecuária (trabalhadores de matadouros e criadores de gado bovino) na província do Namibe, Angola em 2012; determinar a associação da presença da brucelose humana com variáveis sócio-demográficas, de conhecimento, de práticas e de características das explorações; determinar a prevalência da Brucelose em animais e em explorações; caracterizar os factores associados à presença da Brucelose em explorações bovinas; caracterizar o conhecimento e práticas sobre a Brucelose dos profissionais da pecuária e analisar a relação entre as prevalências nas explorações (infectadas versus não infectadas) e nos criadores (infectados versus não infectados). Métodos e materiais: estudos observacional e transversal seroepidemiológico em 131 trabalhadores de talhos, salas de abate e matadouro e 192 criadores amostrados aleatoriamente em toda província do Namibe. Os dados foram obtidos através da colheita de sangue e da aplicação de um questionário. Os testes laboratoriais utilizados foram o Rosa de Bengala (RBT) e a Aglutinação Lenta em Tubos (SAT). O estudo de conhecimento foi principalmente centrado na pergunta “Já ouviu falar de Brucelose” e nas questões relativas ao nível de conhecimento e práticas (indicadores baseados nas percentagens de respostas correctas ou práticas adequadas) dos factores de risco da Brucelose. Também foram investigados 1344 animais (em 192 explorações) com recurso ao método de diagnóstico laboratorial RBT para análise de soro sanguíneo e, complementarmente, foi aplicado um questionário aos respectivos criadores. Em termos de análise estatística, para além da abordagem descritiva, foram utilizados os testes de Independência do Quiquadrado, Fisher, Teste não paramétrico de Mann-Whitney, Teste de correlação de Spearman. Adicionalmente, com base em modelos de regressão logística, foram determinados odds ratio e os respectivos intervalos de confiança utilizando um nível de significância de 5%. Resultados: os ambientes dos profissionais (matadouro, talhos e salas municipais de abate e explorações) não reuniram as condições higio-sanitárias definidas internacionalmente como adequadas. Nos profissionais a infecção geral ponderada da Brucelose foi de 15.56% (IC95% : 13.61-17.50), sendo 5.34% em trabalhadores e 16.66% (IC95% : 11.39-21.93) em criadores. A significância estatística foi observada entre a seroprevalência humana e a categoria (trabalhador e criador) (p< 0.001) e o nível de instrução (p= 0.032), início de actividade (p= 0.079) e local de serviço (p= 0.055). Num contexto multivariado o factor positivamente associado à brucelose em profissionais foi a categoria profissional (OR = 3.54, IC95%: 1.57-8.30, relativo aos criadores em relação a trabalhadores). As taxas gerais aparentes de prevalência em animais e explorações foram respectivamente de 14.96% (IC 95%, 12.97-17.19) e de 40.10% (IC 95%, 32.75-47.93). Encontrou-se uma correlação positiva moderada entre o número de animais infectados por exploração com a média do número de abortos na exploração = 0.531, p< 0.001). Em média os profissionais tiveram um conhecimento global muito insuficiente (16.1%), tendo os trabalhadores apresentado valores mais elevados que os criadores (20.2% e 13.8%), diferença não estatisticamente significativa (p= 0.170). As perguntas “o leite in natura é fervido antes do consumo humano?”, “contacto com materiais fetais animais?”, “contacto com aerossóis no local de trabalho?” e “já fez alguma vez o teste de Brucelose humana?” (relacionadas com práticas) e as perguntas “já ouviu falar da Brucelose?”, “Brucelose é doença zoonótica/só animal/só humana? e “como a Brucelose se transmite aos humanos?” apresentaram níveis médios de práticas adequadas e conhecimentos correctos inferiores a 20%. Nas explorações infectadas, 39% dos criadores foram positivos (infectados) e nas não infectadas apenas 1.7%. O risco de um criador ser infectado estando numa exploração infectada foi significativamente mais elevado (OR= 36, IC95%: 8.28-157.04). Conclusões: os ambientes dos profissionais (matadouros, salas municipais de abate e talhos e explorações) propiciam o risco à brucelose. O estudo permite aferir que a Brucelose humana em profissionais da pecuária e a Brucelose animal são prevalentes na província do Namibe. Os níveis de seroprevalência detectados são elevados comparandoos com outros encontrados em algumas localidades africanas que possuem condições similares às do Namibe. Perto de duas em cada cinco (40.10%) explorações estão infectadas por esta doença. O número de abortos (média) está claramente relacionado com as explorações infectadas. O conhecimento geral dos profissionais da pecuária sobre a Brucelose é muito insuficiente, tendo os trabalhadores mostrado um maior conhecimento em relação aos criadores, mas ambos com níveis alarmantes. Os criadores infectados estão relacionados com as explorações infectadas. Há necessidade de controlar a doença e de informar e educar os profissionais sobre a brucelose, sendo fundamental que os serviços provinciais de veterinária reforcem acções de divulgação e de fiscalização.
Resumo:
The Electrohysterogram (EHG) is a new instrument for pregnancy monitoring. It measures the uterine muscle electrical signal, which is closely related with uterine contractions. The EHG is described as a viable alternative and a more precise instrument than the currently most widely used method for the description of uterine contractions: the external tocogram. The EHG has also been indicated as a promising tool in the assessment of preterm delivery risk. This work intends to contribute towards the EHG characterization through the inventory of its components which are: • Contractions; • Labor contractions; • Alvarez waves; • Fetal movements; • Long Duration Low Frequency Waves; The instruments used for cataloging were: Spectral Analysis, parametric and non-parametric, energy estimators, time-frequency methods and the tocogram annotated by expert physicians. The EHG and respective tocograms were obtained from the Icelandic 16-electrode Electrohysterogram Database. 288 components were classified. There is not a component database of this type available for consultation. The spectral analysis module and power estimation was added to Uterine Explorer, an EHG analysis software developed in FCT-UNL. The importance of this component database is related to the need to improve the understanding of the EHG which is a relatively complex signal, as well as contributing towards the detection of preterm birth. Preterm birth accounts for 10% of all births and is one of the most relevant obstetric conditions. Despite the technological and scientific advances in perinatal medicine, in developed countries, prematurity is the major cause of neonatal death. Although various risk factors such as previous preterm births, infection, uterine malformations, multiple gestation and short uterine cervix in second trimester, have been associated with this condition, its etiology remains unknown [1][2][3].
Association between Angiotensin-Converting Enzyme Inhibitors and Troponin in Acute Coronary Syndrome
Resumo:
Background:Cardiovascular disease is the leading cause of mortality in the western world and its treatment should be optimized to decrease severe adverse events.Objective:To determine the effect of previous use of angiotensin-converting enzyme inhibitors on cardiac troponin I measurement in patients with acute coronary syndrome without ST-segment elevation and evaluate clinical outcomes at 180 days.Methods:Prospective, observational study, carried out in a tertiary center, in patients with acute coronary syndrome without ST-segment elevation. Clinical, electrocardiographic and laboratory variables were analyzed, with emphasis on previous use of angiotensin-converting enzyme inhibitors and cardiac troponin I. The Pearson chi-square tests (Pereira) or Fisher's exact test (Armitage) were used, as well as the non-parametric Mann-Whitney's test. Variables with significance levels of <10% were submitted to multiple logistic regression model.Results:A total of 457 patients with a mean age of 62.1 years, of whom 63.7% were males, were included. Risk factors such as hypertension (85.3%) and dyslipidemia (75.9%) were the most prevalent, with 35% of diabetics. In the evaluation of events at 180 days, there were 28 deaths (6.2%). The statistical analysis showed that the variables that interfered with troponin elevation (> 0.5 ng / mL) were high blood glucose at admission (p = 0.0034) and ST-segment depression ≥ 0.5 mm in one or more leads (p = 0.0016). The use of angiotensin-converting inhibitors prior to hospitalization was associated with troponin ≤ 0.5 ng / mL (p = 0.0482). The C-statistics for this model was 0.77.Conclusion:This study showed a correlation between prior use of angiotensin-converting enzyme inhibitors and reduction in the myocardial necrosis marker troponin I in patients admitted for acute coronary syndrome without ST-segment elevation. However, there are no data available yet to state that this reduction could lead to fewer severe clinical events such as death and re-infarction at 180 days.
Resumo:
This paper presents an analysis of motor vehicle insurance claims relating to vehicle damage and to associated medical expenses. We use univariate severity distributions estimated with parametric and non-parametric methods. The methods are implemented using the statistical package R. Parametric analysis is limited to estimation of normal and lognormal distributions for each of the two claim types. The nonparametric analysis presented involves kernel density estimation. We illustrate the benefits of applying transformations to data prior to employing kernel based methods. We use a log-transformation and an optimal transformation amongst a class of transformations that produces symmetry in the data. The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management. To this end, we have included in the Appendix of this paper all the R code that has been used in the analysis so that readers, both students and educators, can fully explore the techniques described
Resumo:
OBJECTIVES: To analyse the prevalence of lifetime recourse to prostitution (LRP) among men in the general population of Switzerland from a trend and cohort perspective. METHODS: Using nine repeated representative cross-sectional surveys from 1987 to 2000, age-specific estimates of LRP were computed. Trends and period effect were analysed as the evolution of cross-sectional population estimates within age groups and overall. Cohort analysis relied on cohorts constructed from the 1989 survey and followed in subsequent waves. Age and cohort effects were modelled using logistic regression and non-parametric monotone regression. RESULTS: Whereas prevalence for the younger groups was found to be logically lower, there was no consistent increasing or decreasing trend over the years; there was no significant period effect. For the 17-30 year age group, the mean estimate over 1987-2000 was 11.5% (range 8.3 to 12.7%); for the 31-45 year group, the mean was 21.5% (range over 1989-2000 20.3 to 23.0%). Regarding cohort analysis, the prevalence of LRP was found to increase steeply in the youngest ages before reaching a plateau near the age of 40 years. At the age of 43 years, the prevalence was estimated to be 22.6% (95% CI 21.1% to 24.1%). CONCLUSIONS: The steep increase in the cohort-wise prevalence of LRP in younger ages calls for a concentration of prevention activities in young people. If the plateauing at approximately 40 years of age is not followed by a further increase later in life, which is not known, then consumers of paid sex would be repeat buyers only, a fact that should be taken into account by prevention.
Resumo:
In occupational exposure assessment of airborne contaminants, exposure levels can either be estimated through repeated measurements of the pollutant concentration in air, expert judgment or through exposure models that use information on the conditions of exposure as input. In this report, we propose an empirical hierarchical Bayesian model to unify these approaches. Prior to any measurement, the hygienist conducts an assessment to generate prior distributions of exposure determinants. Monte-Carlo samples from these distributions feed two level-2 models: a physical, two-compartment model, and a non-parametric, neural network model trained with existing exposure data. The outputs of these two models are weighted according to the expert's assessment of their relevance to yield predictive distributions of the long-term geometric mean and geometric standard deviation of the worker's exposure profile (level-1 model). Bayesian inferences are then drawn iteratively from subsequent measurements of worker exposure. Any traditional decision strategy based on a comparison with occupational exposure limits (e.g. mean exposure, exceedance strategies) can then be applied. Data on 82 workers exposed to 18 contaminants in 14 companies were used to validate the model with cross-validation techniques. A user-friendly program running the model is available upon request.
Resumo:
The variation with latitude of incidence and mortality for cutaneous malignant melanoma (CMM) in the non-Maori population of New Zealand was assessed. For those aged 20 to 74 years, the effects of age, time period, birth-cohort, gender, and region (latitude), and some interactions between them were evaluated by log-linear regression methods. Increasing age-standardized incidence and mortality rates with increasing proximity to the equator were found for men and women. These latitude gradients were greater for males than females. The relative risk of melanoma in the most southern part of New Zealand (latitude 44 degrees S) compared with the most northern region (latitude 36 degrees S) was 0.63 (95 percent confidence interval [CI] = 0.60-0.67) for incidence and 0.76 (CI = 0.68-0.86) for mortality, both genders combined. The mean percentage change in CMM rates per degree of latitude for males was greater than those reported in other published studies. Differences between men and women in melanoma risk with latitude suggest that regional sun-behavior patterns or other risk factors may contribute to the latitude gradient observed.
Resumo:
Given $n$ independent replicates of a jointly distributed pair $(X,Y)\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk is small. We investigate the possibility of empirically assessingthe {\em complexity} of each model class, that is, the actual difficulty of the estimation problem within each class. The estimated complexities are in turn used to define an adaptive model selection procedure, which is based on complexity penalized empirical risk.The available data are divided into two parts. The first is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error. An estimate is chosen from the list of candidates in order to minimize the sum of class complexity and empirical risk. A distinguishing feature of the approach is that the complexity of each model class is assessed empirically, based on the size of its empirical cover.Finite sample performance bounds are established for the estimates, and these bounds are applied to several non-parametric estimation problems. The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand. In addition, it is shown that the estimate can be consistent,and even possess near optimal rates of convergence, when each model class has an infinite VC or pseudo dimension.For regression estimation with squared loss we modify our estimate to achieve a faster rate of convergence.