952 resultados para Linear models (Statistics)
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Background - The binding between peptide epitopes and major histocompatibility complex proteins (MHCs) is an important event in the cellular immune response. Accurate prediction of the binding between short peptides and the MHC molecules has long been a principal challenge for immunoinformatics. Recently, the modeling of MHC-peptide binding has come to emphasize quantitative predictions: instead of categorizing peptides as "binders" or "non-binders" or as "strong binders" and "weak binders", recent methods seek to make predictions about precise binding affinities. Results - We developed a quantitative support vector machine regression (SVR) approach, called SVRMHC, to model peptide-MHC binding affinities. As a non-linear method, SVRMHC was able to generate models that out-performed existing linear models, such as the "additive method". By adopting a new "11-factor encoding" scheme, SVRMHC takes into account similarities in the physicochemical properties of the amino acids constituting the input peptides. When applied to MHC-peptide binding data for three mouse class I MHC alleles, the SVRMHC models produced more accurate predictions than those produced previously. Furthermore, comparisons based on Receiver Operating Characteristic (ROC) analysis indicated that SVRMHC was able to out-perform several prominent methods in identifying strongly binding peptides. Conclusion - As a method with demonstrated performance in the quantitative modeling of MHC-peptide binding and in identifying strong binders, SVRMHC is a promising immunoinformatics tool with not inconsiderable future potential.
Resumo:
Prognostic procedures can be based on ranked linear models. Ranked regression type models are designed on the basis of feature vectors combined with set of relations defined on selected pairs of these vectors. Feature vectors are composed of numerical results of measurements on particular objects or events. Ranked relations defined on selected pairs of feature vectors represent additional knowledge and can reflect experts' opinion about considered objects. Ranked models have the form of linear transformations of feature vectors on a line which preserve a given set of relations in the best manner possible. Ranked models can be designed through the minimization of a special type of convex and piecewise linear (CPL) criterion functions. Some sets of ranked relations cannot be well represented by one ranked model. Decomposition of global model into a family of local ranked models could improve representation. A procedures of ranked models decomposition is described in this paper.
Resumo:
2000 Mathematics Subject Classification: 62H12, 62P99
Resumo:
Analysis of risk measures associated with price series data movements and its predictions are of strategic importance in the financial markets as well as to policy makers in particular for short- and longterm planning for setting up economic growth targets. For example, oilprice risk-management focuses primarily on when and how an organization can best prevent the costly exposure to price risk. Value-at-Risk (VaR) is the commonly practised instrument to measure risk and is evaluated by analysing the negative/positive tail of the probability distributions of the returns (profit or loss). In modelling applications, least-squares estimation (LSE)-based linear regression models are often employed for modeling and analyzing correlated data. These linear models are optimal and perform relatively well under conditions such as errors following normal or approximately normal distributions, being free of large size outliers and satisfying the Gauss-Markov assumptions. However, often in practical situations, the LSE-based linear regression models fail to provide optimal results, for instance, in non-Gaussian situations especially when the errors follow distributions with fat tails and error terms possess a finite variance. This is the situation in case of risk analysis which involves analyzing tail distributions. Thus, applications of the LSE-based regression models may be questioned for appropriateness and may have limited applicability. We have carried out the risk analysis of Iranian crude oil price data based on the Lp-norm regression models and have noted that the LSE-based models do not always perform the best. We discuss results from the L1, L2 and L∞-norm based linear regression models. ACM Computing Classification System (1998): B.1.2, F.1.3, F.2.3, G.3, J.2.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
BACKGROUND: Regional differences in physician supply can be found in many health care systems, regardless of their organizational and financial structure. A theoretical model is developed for the physicians' decision on office allocation, covering demand-side factors and a consumption time function. METHODS: To test the propositions following the theoretical model, generalized linear models were estimated to explain differences in 412 German districts. Various factors found in the literature were included to control for physicians' regional preferences. RESULTS: Evidence in favor of the first three propositions of the theoretical model could be found. Specialists show a stronger association to higher populated districts than GPs. Although indicators for regional preferences are significantly correlated with physician density, their coefficients are not as high as population density. CONCLUSIONS: If regional disparities should be addressed by political actions, the focus should be to counteract those parameters representing physicians' preferences in over- and undersupplied regions.
Resumo:
The long-term adverse effects on health associated with air pollution exposure can be estimated using either cohort or spatio-temporal ecological designs. In a cohort study, the health status of a cohort of people are assessed periodically over a number of years, and then related to estimated ambient pollution concentrations in the cities in which they live. However, such cohort studies are expensive and time consuming to implement, due to the long-term follow up required for the cohort. Therefore, spatio-temporal ecological studies are also being used to estimate the long-term health effects of air pollution as they are easy to implement due to the routine availability of the required data. Spatio-temporal ecological studies estimate the health impact of air pollution by utilising geographical and temporal contrasts in air pollution and disease risk across $n$ contiguous small-areas, such as census tracts or electoral wards, for multiple time periods. The disease data are counts of the numbers of disease cases occurring in each areal unit and time period, and thus Poisson log-linear models are typically used for the analysis. The linear predictor includes pollutant concentrations and known confounders such as socio-economic deprivation. However, as the disease data typically contain residual spatial or spatio-temporal autocorrelation after the covariate effects have been accounted for, these known covariates are augmented by a set of random effects. One key problem in these studies is estimating spatially representative pollution concentrations in each areal which are typically estimated by applying Kriging to data from a sparse monitoring network, or by computing averages over modelled concentrations (grid level) from an atmospheric dispersion model. The aim of this thesis is to investigate the health effects of long-term exposure to Nitrogen Dioxide (NO2) and Particular matter (PM10) in mainland Scotland, UK. In order to have an initial impression about the air pollution health effects in mainland Scotland, chapter 3 presents a standard epidemiological study using a benchmark method. The remaining main chapters (4, 5, 6) cover the main methodological focus in this thesis which has been threefold: (i) how to better estimate pollution by developing a multivariate spatio-temporal fusion model that relates monitored and modelled pollution data over space, time and pollutant; (ii) how to simultaneously estimate the joint effects of multiple pollutants; and (iii) how to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. Specifically, chapters 4 and 5 are developed to achieve (i), while chapter 6 focuses on (ii) and (iii). In chapter 4, I propose an integrated model for estimating the long-term health effects of NO2, that fuses modelled and measured pollution data to provide improved predictions of areal level pollution concentrations and hence health effects. The air pollution fusion model proposed is a Bayesian space-time linear regression model for relating the measured concentrations to the modelled concentrations for a single pollutant, whilst allowing for additional covariate information such as site type (e.g. roadside, rural, etc) and temperature. However, it is known that some pollutants might be correlated because they may be generated by common processes or be driven by similar factors such as meteorology. The correlation between pollutants can help to predict one pollutant by borrowing strength from the others. Therefore, in chapter 5, I propose a multi-pollutant model which is a multivariate spatio-temporal fusion model that extends the single pollutant model in chapter 4, which relates monitored and modelled pollution data over space, time and pollutant to predict pollution across mainland Scotland. Considering that we are exposed to multiple pollutants simultaneously because the air we breathe contains a complex mixture of particle and gas phase pollutants, the health effects of exposure to multiple pollutants have been investigated in chapter 6. Therefore, this is a natural extension to the single pollutant health effects in chapter 4. Given NO2 and PM10 are highly correlated (multicollinearity issue) in my data, I first propose a temporally-varying linear model to regress one pollutant (e.g. NO2) against another (e.g. PM10) and then use the residuals in the disease model as well as PM10, thus investigating the health effects of exposure to both pollutants simultaneously. Another issue considered in chapter 6 is to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. There are in total four approaches being developed to adjust the exposure uncertainty. Finally, chapter 7 summarises the work contained within this thesis and discusses the implications for future research.
Resumo:
Species occurrence and abundance models are important tools that can be used in biodiversity conservation, and can be applied to predict or plan actions needed to mitigate the environmental impacts of hydropower dams. In this study our objectives were: (i) to model the occurrence and abundance of threatened plant species, (ii) to verify the relationship between predicted occurrence and true abundance, and (iii) to assess whether models based on abundance are more effective in predicting species occurrence than those based on presence–absence data. Individual representatives of nine species were counted within 388 randomly georeferenced plots (10 m × 50 m) around the Barra Grande hydropower dam reservoir in southern Brazil. We modelled their relationship with 15 environmental variables using both occurrence (Generalised Linear Models) and abundance data (Hurdle and Zero-Inflated models). Overall, occurrence models were more accurate than abundance models. For all species, observed abundance was significantly, although not strongly, correlated with the probability of occurrence. This correlation lost significance when zero-abundance (absence) sites were excluded from analysis, but only when this entailed a substantial drop in sample size. The same occurred when analysing relationships between abundance and probability of occurrence from previously published studies on a range of different species, suggesting that future studies could potentially use probability of occurrence as an approximate indicator of abundance when the latter is not possible to obtain. This possibility might, however, depend on life history traits of the species in question, with some traits favouring a relationship between occurrence and abundance. Reconstructing species abundance patterns from occurrence could be an important tool for conservation planning and the management of threatened species, allowing scientists to indicate the best areas for collection and reintroduction of plant germplasm or choose conservation areas most likely to maintain viable populations.
Resumo:
Species distribution and ecological niche models are increasingly used in biodiversity management and conservation. However, one thing that is important but rarely done is to follow up on the predictive performance of these models over time, to check if their predictions are fulfilled and maintain accuracy, or if they apply only to the set in which they were produced. In 2003, a distribution model of the Eurasian otter (Lutra lutra) in Spain was published, based on the results of a country-wide otter survey published in 1998. This model was built with logistic regression of otter presence-absence in UTM 10 km2 cells on a diverse set of environmental, human and spatial variables, selected according to statistical criteria. Here we evaluate this model against the results of the most recent otter survey, carried out a decade later and after a significant expansion of the otter distribution area in this country. Despite the time elapsed and the evident changes in this species’ distribution, the model maintained a good predictive capacity, considering both discrimination and calibration measures. Otter distribution did not expand randomly or simply towards vicinity areas,m but specifically towards the areas predicted as most favourable by the model based on data from 10 years before. This corroborates the utility of predictive distribution models, at least in the medium term and when they are made with robust methods and relevant predictor variables.
Resumo:
Estudos epidemiológicos são estudos estatísticos onde se procura relacionar ocorrências de eventos de saúde com uma ou várias causas específicas. A importância que os modelos epidemiológicos assumem hoje no estudo de doenças de foro oncológico, em particular no estabelecimento das suas etiologias, é incontornável. Segundo Ogden, J. (1999) o cancro é "um crescimento incontrolável de células anormais que produzem tumores chamados neoplasias". Estes tumores podem ter origem benigna (não se espalham pelo corpo) ou maligna (apresentam metastização de outros órgãos). Sendo uma doença actual, com uma elevada taxa de incidência em Portugal quando comparada com outras doenças (Instituto Nacional de Estatística- INE, 2009), aumentando esta taxa com a idade tal como refere Marques, L. (2003), podendo ocorrer o diagnóstico desta doença em qualquer idade. De acordo com INE (2000) pode dizer-se que o cancro está entre as três principais causas de morte em Portugal, registando-se um aumento progressivo do seu peso proporcional, sendo o cancro da mama o tipo de cancro mais comum entre as mulheres e uma das doenças com maior impacto na nossa sociedade. O objectivo principal deste trabalho é a estimação e modelação do risco de contrair uma doença de natureza não contagiosa e rara (neste caso, cancro da mama), usando dados da região do Alentejo. Pretende-se fazer um apanhado das metodologias mais empregues nesta área e aplicá-las na prática, com ênfase nos estudos caso-controlo e nos modelos lineares generalizados (GLM) - mais concretamente regressão logística. Os estudos caso-controlo são usados para identificar os factores que podem contribuir para uma condição médica, comparando indivíduos que têm essa condição (casos) com pacientes que não têm a condição, mas que de resto são semelhantes (controlos). Neste trabalho utilizou-se essa metodologia para estudar a associação entre o viver em ambiente rural/urbano e o cancro da mama. Tendo em conta que o objectivo principal deste estudo se prende com o estudo da relação entre variáveis, mais propriamente, análise de influência que uma ou mais variáveis (explicativas) têm sobre uma variável de interesse (resposta), para esse efeito são estudados os modelos lineares generalizados - GLM - unificados na mesma moldura teórica pela primeira vez por Nelder & Wedderburn (1972) - e, posteriormente aplicados ao conjunto de dados sobre cancro da mama na Região do Alentejo. O presente trabalho pretende assim, ser um contributo na identificação de factores de risco do cancro da mama na região do Alentejo. ABSTRACT: Epidemiological studies are statistical studies where attempts to relate occurrences of health events with one or more specific causes. The importance of epidemiological models that are far in the study of diseases of cancer forum, particularly in establishing their etiology, is inescapable. According to Ogden, J. (1999) cancer is "an incontrollable growth of abnormal cells that produce tumors called cancer". These tumors may be benign (not spread throughout the body) or malignant (show metastasis to other organs). Being a current illness with a high incidence rate in Portugal compared with the same respect to other diseases (National Statistics 1nstitute -1NE, 2009) having an increasing rate with age as mentioned Marques, L. (2003), and can possibly be diagnosed at any age. According to 1NE (2000) the cancer is among the top three causes of death in Portugal and there is a progressive increase of its proportional weight. Breast cancer is the most common form of cancer among women and the diseases with major impact in our society. The main objective of this work is to model and estimate the risk of contracting a non-contagious and rare disease (in this case, breast cancer), using data from the Alentejo region. It is intended to summarize some of the methodologies employed in this area and apply them in practice, with emphasis on case-control studies and generalized linear models (GLM) - more specifically the logistic regression. The case-control studies are used to identify factors that may contribute to a medical condition, comparing individuals who have this condition (cases) with patients who have not the condition but that are otherwise similar (controls). ln this work we used this methodology to study the association between living in a rural/urban and breast cancer. Given that the main objective of this study rather relates to the study of the relationship between variables to analyze the influence that one or more variables (explanatory) have on a variable (response), for this purpose we study the generalized linear models - GLM - first unified in the same theoretical framework by Nelder and Wedderburn (1972) and subsequently applied to the data set on breast cancer in the Alentejo region. This work intends to be a contribution in identifying risk factors for breast cancer in the Alentejo region.
Resumo:
This is an ecological, analytical and retrospective study comprising the 645 municipalities in the State of São Paulo, the scope of which was to determine the relationship between socioeconomic, demographic variables and the model of care in relation to infant mortality rates in the period from 1998 to 2008. The ratio of average annual change for each indicator per stratum coverage was calculated. Infant mortality was analyzed according to the model for repeated measures over time, adjusted for the following correction variables: the city's population, proportion of Family Health Programs (PSFs) deployed, proportion of Growth Acceleration Programs (PACs) deployed, per capita GDP and SPSRI (São Paulo social responsibility index). The analysis was performed by generalized linear models, considering the gamma distribution. Multiple comparisons were performed with the likelihood ratio with chi-square approximate distribution, considering a significance level of 5%. There was a decrease in infant mortality over the years (p < 0.05), with no significant difference from 2004 to 2008 (p > 0.05). The proportion of PSFs deployed (p < 0.0001) and per capita GDP (p < 0.0001) were significant in the model. The decline of infant mortality in this period was influenced by the growth of per capita GDP and PSFs.
Resumo:
OBJECTIVES: To assess risk and protective factors for chronic noncommunicable diseases (CNCD) and to identify social inequalities in their distribution among Brazilian adults. METHODS: The data used were collected in 2007 through VIGITEL, an ongoing population-based telephone survey. This surveillance system was implemented in all of the Brazilian State capitals, over 54,000 interviews were analyzed. Age-adjusted prevalence ratios for trends at different schooling levels were calculated using Poisson regression with linear models. RESULTS: These analyses have shown differences in the prevalence of risk and protective factors for CNCD by gender and schooling. Among men, the prevalence ratios of overweight, consumption of meat with visible fat, and dyslipidemia were higher among men with more schooling, while tobacco use, sedentary lifestyle, and high-blood pressure were lower. Among women, tobacco use, overweight, obesity, high-blood pressure and diabetes were lower among men with more schooling, and consumption of meat with visible fat and sedentary lifestyles were higher. As for protective factors, fruit and vegetables intake and physical activity were higher in both men and women with more schooling. CONCLUSION: Gender and schooling influence on risk and protective factors for CNCD, being the values less favorable for men. vigitel is a useful tool for monitoring these factors amongst the Brazilian population.
Resumo:
OBJETIVOS: Evaluar los factores de riesgo de enfermedades crónicas no transmisibles (ECNT) e identificar las desigualdades sociales relacionadas con su distribución en la población adulta brasileña.MÉTODOS: Se estudiaron los factores de riesgo de ECNT (entre ellos el consumo de tabaco, el sobrepeso y la obesidad, el bajo consumo de frutas y vegetales [BCFV], la insuficiente actividad física en el tiempo de ocio [IAFTO], el estilo de vida sedentario y el consumo excesivo de alcohol) en una muestra probabilística de 54369 adultos de 26 capitales estatales de Brasil y el Distrito Federal en 2006. Se utilizó el Sistema de Vigilancia de los Factores Protectores y de Riesgo para Enfermedades Crónicas No Transmisibles por Entrevistas Telefónicas (VIGITEL), un sistema de encuestas telefónicas asistido por computadora, y se calcularon las prevalencias ajustadas por la edad para las tendencias en cuanto al nivel educacional mediante la regresión de Poisson con modelos lineales. RESULTADOS: Los hombres informaron mayor consumo de tabaco, sobrepeso, BCFV, estilo de vida sedentario y consumo excesivo de alcohol que las mujeres, pero menos IAFTO. En los hombres, la educación se asoció con un mayor sobrepeso y un estilo de vida sedentario, pero con un menor consumo de tabaco, BCFV e IAFTO. En las mujeres, la educación se asoció con un menor consumo de tabaco, sobrepeso, obesidad, BCFV e IAFTO, pero aumentó el estilo de vida sedentario CONCLUSIONES: En Brasil, la prevalencia de factores de riesgo para ECNT (excepto IAFTO) es mayor en los hombres que en las mujeres. En ambos sexos, el nivel de educación influye en la prevalencia de los factores de riesgo para ECNT
Resumo:
Background: Large inequalities of mortality by most cancers in general, by mouth and pharynx cancer in particular, have been associated to behaviour and geopolitical factors. The assessment of socioeconomic covariates of cancer mortality may be relevant to a full comprehension of distal determinants of the disease, and to appraise opportune interventions. The objective of this study was to compare socioeconomic inequalities in male mortality by oral and pharyngeal cancer in two major cities of Europe and South America. Methods: The official system of information on mortality provided data on deaths in each city; general censuses informed population data. Age-adjusted death rates by oral and pharyngeal cancer for men were independently assessed for neighbourhoods of Barcelona, Spain, and Sao Paulo, Brazil, from 1995 to 2003. Uniform methodological criteria instructed the comparative assessment of magnitude, trends and spatial distribution of mortality. General linear models assessed ecologic correlations between death rates and socioeconomic indices (unemployment, schooling levels and the human development index) at the inner-city area level. Results obtained for each city were subsequently compared. Results: Mortality of men by oral and pharyngeal cancer ranked higher in Barcelona (9.45 yearly deaths per 100,000 male inhabitants) than in Spain and Europe as a whole; rates were on decrease. Sao Paulo presented a poorer profile, with higher magnitude (11.86) and stationary trend. The appraisal of ecologic correlations indicated an unequal and inequitably distributed burden of disease in both cities, with poorer areas tending to present higher mortality. Barcelona had a larger gradient of mortality than Sao Paulo, indicating a higher inequality of cancer deaths across its neighbourhoods. Conclusion: The quantitative monitoring of inequalities in health may contribute to the formulation of redistributive policies aimed at the concurrent promotion of wellbeing and social justice. The assessment of groups experiencing a higher burden of disease can instruct health services to provide additional resources for expanding preventive actions and facilities aimed at early diagnosis, standardized treatments and rehabilitation.