991 resultados para Random variables
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Estatística, 2015.
Resumo:
International audience
Resumo:
Bahadur representation and its applications have attracted a large number of publications and presentations on a wide variety of problems. Mixing dependency is weak enough to describe the dependent structure of random variables, including observations in time series and longitudinal studies. This note proves the Bahadur representation of sample quantiles for strongly mixing random variables (including ½-mixing and Á-mixing) under very weak mixing coe±cients. As application, the asymptotic normality is derived. These results greatly improves those recently reported in literature.
Resumo:
The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it.
Resumo:
The main topic of this thesis is confounding in linear regression models. It arises when a relationship between an observed process, the covariate, and an outcome process, the response, is influenced by an unmeasured process, the confounder, associated with both. Consequently, the estimators for the regression coefficients of the measured covariates might be severely biased, less efficient and characterized by misleading interpretations. Confounding is an issue when the primary target of the work is the estimation of the regression parameters. The central point of the dissertation is the evaluation of the sampling properties of parameter estimators. This work aims to extend the spatial confounding framework to general structured settings and to understand the behaviour of confounding as a function of the data generating process structure parameters in several scenarios focusing on the joint covariate-confounder structure. In line with the spatial statistics literature, our purpose is to quantify the sampling properties of the regression coefficient estimators and, in turn, to identify the most prominent quantities depending on the generative mechanism impacting confounding. Once the sampling properties of the estimator conditionally on the covariate process are derived as ratios of dependent quadratic forms in Gaussian random variables, we provide an analytic expression of the marginal sampling properties of the estimator using Carlson’s R function. Additionally, we propose a representative quantity for the magnitude of confounding as a proxy of the bias, its first-order Laplace approximation. To conclude, we work under several frameworks considering spatial and temporal data with specific assumptions regarding the covariance and cross-covariance functions used to generate the processes involved. This study allows us to claim that the variability of the confounder-covariate interaction and of the covariate plays the most relevant role in determining the principal marker of the magnitude of confounding.
Resumo:
In this work, we explore and demonstrate the potential for modeling and classification using quantile-based distributions, which are random variables defined by their quantile function. In the first part we formalize a least squares estimation framework for the class of linear quantile functions, leading to unbiased and asymptotically normal estimators. Among the distributions with a linear quantile function, we focus on the flattened generalized logistic distribution (fgld), which offers a wide range of distributional shapes. A novel naïve-Bayes classifier is proposed that utilizes the fgld estimated via least squares, and through simulations and applications, we demonstrate its competitiveness against state-of-the-art alternatives. In the second part we consider the Bayesian estimation of quantile-based distributions. We introduce a factor model with independent latent variables, which are distributed according to the fgld. Similar to the independent factor analysis model, this approach accommodates flexible factor distributions while using fewer parameters. The model is presented within a Bayesian framework, an MCMC algorithm for its estimation is developed, and its effectiveness is illustrated with data coming from the European Social Survey. The third part focuses on depth functions, which extend the concept of quantiles to multivariate data by imposing a center-outward ordering in the multivariate space. We investigate the recently introduced integrated rank-weighted (IRW) depth function, which is based on the distribution of random spherical projections of the multivariate data. This depth function proves to be computationally efficient and to increase its flexibility we propose different methods to explicitly model the projected univariate distributions. Its usefulness is shown in classification tasks: the maximum depth classifier based on the IRW depth is proven to be asymptotically optimal under certain conditions, and classifiers based on the IRW depth are shown to perform well in simulated and real data experiments.
Resumo:
We study a class of models of correlated random networks in which vertices are characterized by hidden variables controlling the establishment of edges between pairs of vertices. We find analytical expressions for the main topological properties of these models as a function of the distribution of hidden variables and the probability of connecting vertices. The expressions obtained are checked by means of numerical simulations in a particular example. The general model is extended to describe a practical algorithm to generate random networks with an a priori specified correlation structure. We also present an extension of the class, to map nonequilibrium growing networks to networks with hidden variables that represent the time at which each vertex was introduced in the system.
Resumo:
In this paper we study the accumulated claim in some fixed time period, skipping the classical assumption of mutual independence between the variables involved. Two basic models are considered: Model I assumes that any pair of claims are equally correlated which means that the corresponding square-integrable sequence is exchangeable one. Model 2 states that the correlations between the adjacent claims are the same. Recurrence and explicit expressions for the joint probability generating function are derived and the impact of the dependence parameter (correlation coefficient) in both models is examined. The Markov binomial distribution is obtained as a particular case under assumptions of Model 2. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Species distribution modeling has relevant implications for the studies of biodiversity, decision making about conservation and knowledge about ecological requirements of the species. The aim of this study was to evaluate if the use of forest inventories can improve the estimation of occurrence probability, identify the limits of the potential distribution and habitat preference of a group of timber tree species. The environmental predictor variables were: elevation, slope, aspect, normalized difference vegetation index (NDVI) and height above the nearest drainage (HAND). To estimate the distribution of species we used the maximum entropy method (Maxent). In comparison with a random distribution, using topographic variables and vegetation index as features, the Maxent method predicted with an average accuracy of 86% the geographical distribution of studied species. The altitude and NDVI were the most important variables. There were limitations to the interpolation of the models for non-sampled locations and that are outside of the elevation gradient associated with the occurrence data in approximately 7% of the basin area. Ceiba pentandra (samaúma), Castilla ulei (caucho) and Hura crepitans (assacu) is more likely to occur in nearby water course areas. Clarisia racemosa (guariúba), Amburana acreana (cerejeira), Aspidosperma macrocarpon (pereiro), Apuleia leiocarpa (cumaru cetim), Aspidosperma parvifolium (amarelão) and Astronium lecointei (aroeira) can also occur in upland forest and well drained soils. This modeling approach has potential for application on other tropical species still less studied, especially those that are under pressure from logging.
Resumo:
This article analyzes empirically the main existing theories on income and population city growth: increasing returns to scale, locational fundamentals and random growth. To do this we implement a threshold nonlinearity test that extends standard linear growth regression models to a dataset on urban, climatological and macroeconomic variables on 1,175 U.S. cities. Our analysis reveals the existence of increasing returns when per-capita income levels are beyond $19; 264. Despite this, income growth is mostly explained by social and locational fundamentals. Population growth also exhibits two distinct equilibria determined by a threshold value of 116,300 inhabitants beyond which city population grows at a higher rate. Income and population growth do not go hand in hand, implying an optimal level of population beyond which income growth stagnates or deteriorates
Predicting random level and seasonality of hotel prices. A structural equation growth curve approach
Resumo:
This article examines the effect on price of different characteristics of holiday hotels in the sun-and-beach segment, under the hedonic function perspective. Monthly prices of the majority of hotels in the Spanish continental Mediterranean coast are gathered from May to October 1999 from the tour operator catalogues. Hedonic functions are specified as random-effect models and parametrized as structural equation models with two latent variables, a random peak season price and a random width of seasonal fluctuations. Characteristics of the hotel and the region where they are located are used as predictors of both latent variables. Besides hotel category, region, distance to the beach, availability of parking place and room equipment have an effect on peak price and also on seasonality. 3- star hotels have the highest seasonality and hotels located in the southern regions the lowest, which could be explained by a warmer climate in autumn
Resumo:
The aim of the study was to determine the prevalence and variables associated with the pattern of risky health behavior (PRHB) among adolescent students in Cartagena, Colombia. A cross-sectional study was designed to investigate PRHB in a random cluster sample of students from middle and high schools. The associations were adjusted by logistic regression. A total of 2,625 students participated in this research, with ages from 10 to 20 years, mean=13.8 years (SD=2.0), and 54.3% were women. A total of 332 students reported PRHB (12.7%, 95%CI 11.4–14.0). Age over 15 years (OR=2.19, 95%CI 1.72–2.79), not being heterosexual (OR=1.98, 95%CI 1.36-2.87), poor/mediocre academic performance (OR=1.87, 95%CI 1.47–2.38), family dysfunction (OR=1.78, 95%CI 1.40–2.28) and male gender (OR=1.58, 95%CI 1.24–2.01) were associated with PRHB. One in every eight students presented a PRHB. It is important to pay greater attention to students who are over 15 years of age, male, not heterosexual, with a poor/mediocre academic performance and a dysfunctional family.
Resumo:
OBJECTIVETo determine if there is a relationship between adherence to nutritional recommendations and sociodemographic variables in Brazilian patients with type 2 diabetes mellitus.METHODSCross-sectional observational study using a stratified random sample of 423 individuals. The Food Frequency Questionnaire (FFQ) was used, and the Fisher's exact test was applied with 95% confidence interval (p<0.05).RESULTSOf the 423 subjects, 66.7% were women, mean age of 62.4 years (SD = 11.8), 4.3 years of schooling on average (SD = 3.6) and family income of less than two minimum wages. There was association between the female gender and adherence to diet with adequate cholesterol content (OR: 2.03; CI: 1.23; 3.34), between four and more years of education and adherence to fractionation of meals (OR: 1 92 CI: 1.19; 3.10), and income of less than two minimum wages and adherence to diet with adequate cholesterol content (OR: 1.74; CI: 1.03, 2.95).CONCLUSIONAdherence to nutritional recommendations was associated with the female gender, more than four years of education and family income of less than two minimum wages.
Resumo:
Objective: To describe the methodology of Confirmatory Factor Analyis for categorical items and to apply this methodology to evaluate the factor structure and invariance of the WHO-Disability Assessment Schedule (WHODAS-II) questionnaire, developed by the World HealthOrganization.Methods: Data used for the analysis come from the European Study of Mental Disorders(ESEMeD), a cross-sectional interview to a representative sample of the general population of 6 european countries (n=8796). Respondents were administered a modified version of theWHODAS-II, that measures functional disability in the previous 30 days in 6 differentdimensions: Understanding and Communicating; Self-Care, Getting Around, Getting Along withOthers, Life Activities and Participation. The questionnaire includes two types of items: 22severity items (5 points likert) and 8 frequency items (continuous). An Exploratory factoranalysis (EFA) with promax rotation was conducted on a random 50% of the sample. Theremaining half of the sample was used to perform a Confirmatory Factor Analysis (CFA) inorder to compare three different models: (a) the model suggested by the results obtained in theEFA; (b) the theoretical model suggested by the WHO with 6 dimensions; (c) a reduced modelequivalent to model b where 4 of the frequency items are excluded. Moreover, a second orderfactor was also evaluated. Finally, a CFA with covariates was estimated in order to evaluatemeasurement invariance of the items between Mediterranean and non-mediterranean countries.Results: The solution that provided better results in the EFA was that containing 7 factors. Twoof the frequency items presented high factor loadings in the same factor, and one of thempresented factor loadings smaller than 0.3 with all the factors. With regard to the CFA, thereduced model (model c) presented the best goodness of fit results (CFI=0.992,TLI=0.996,RMSEA=0.024). The second order factor structure presented adequate goodness of fit (CFI=0.987,TLI=0.991, RMSEA=0.036). Measurement non-invariance was detected for one of the items of thequestionnaire (FD20 ¿ Embarrassment due to health problems).Conclusions: AFC confirmed the initial hypothesis about the factorial structure of the WHODAS-II in 6factors. The second order factor supports the existence of a global dimension of disability. The use of 4of the frequency items is not recommended in the scoring of the corresponding dimensions.
Resumo:
L’objecte del present estudi és caracteritzar el temps de vol (Tv) de la fase aèria en l’exercici d’arrencada en halterofília. Es descriu el seu comportament en funció de l’increment progressiu de la càrrega i en relació a variables biomecàniques de l’estirada, així com la seva evolució en un cicle d’entrenament. Es va fer un test màxim de càrregues progressives amb set halterofilistes (n = 7) de competició. Mitjançant els sistemes de valoració Musclelab i Chronojump es van registrar els valors de: força (F), potència (P), velocitat (V), pic de velocitat (pV) i alçada relativa (Hrel ) de la barra en l’estirada, al costat del Tv del desplaçament dels peus de l’aixecador a l’entrada sota la barra. Es va observar una moderada correlació negativa (r = –0,561; p < 0,01) entre el Tv i la càrrega màxima del test (%1RMT). No es van trobar correlacions significatives per al Tv respecte a la resta de variables analitzades. El Tv disminuïa amb l’increment de la càrrega en rangs submàxims, i era de natura aleatòria amb l’ocupació de càrregues màximes. En un subgrup de la mostra (n = 4) es van valorar les mateixes variables passades vuit setmanes. El Tv, la Pmàx i el pV suggereixen ser variables suficientment sensibles per monitoritzar els canvis generats per l’entrenament en vuit setmanes, encara que la reduïda dimensió mostral no va permetre aconseguir diferències significatives. Aquests resultats destaquen la possibilitat de considerar el Tv i la P com a mesures de control en l’entrenament d’halterofilistes, preferentment en l’ús de càrregues submàximes.