41 resultados para random regression model
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
In CoDaWork’05, we presented an application of discriminant function analysis (DFA) to 4 differentcompositional datasets and modelled the first canonical variable using a segmented regression modelsolely based on an observation about the scatter plots. In this paper, multiple linear regressions areapplied to different datasets to confirm the validity of our proposed model. In addition to dating theunknown tephras by calibration as discussed previously, another method of mapping the unknown tephrasinto samples of the reference set or missing samples in between consecutive reference samples isproposed. The application of these methodologies is demonstrated with both simulated and real datasets.This new proposed methodology provides an alternative, more acceptable approach for geologists as theirfocus is on mapping the unknown tephra with relevant eruptive events rather than estimating the age ofunknown tephra.Kew words: Tephrochronology; Segmented regression
Resumo:
This paper generalizes the original random matching model of money byKiyotaki and Wright (1989) (KW) in two aspects: first, the economy ischaracterized by an arbitrary distribution of agents who specialize in producing aparticular consumption good; and second, these agents have preferences suchthat they want to consume any good with some probability. The resultsdepend crucially on the size of the fraction of producers of each goodand the probability with which different agents want to consume eachgood. KW and other related models are shown to be parameterizations ofthis more general one.
Resumo:
We apply the formalism of the continuous-time random walk to the study of financial data. The entire distribution of prices can be obtained once two auxiliary densities are known. These are the probability densities for the pausing time between successive jumps and the corresponding probability density for the magnitude of a jump. We have applied the formalism to data on the U.S. dollardeutsche mark future exchange, finding good agreement between theory and the observed data.
Resumo:
Logistic regression is included into the analysis techniques which are valid for observationalmethodology. However, its presence at the heart of thismethodology, and more specifically in physical activity and sports studies, is scarce. With a view to highlighting the possibilities this technique offers within the scope of observational methodology applied to physical activity and sports, an application of the logistic regression model is presented. The model is applied in the context of an observational design which aims to determine, from the analysis of use of the playing area, which football discipline (7 a side football, 9 a side football or 11 a side football) is best adapted to the child"s possibilities. A multiple logistic regression model can provide an effective prognosis regarding the probability of a move being successful (reaching the opposing goal area) depending on the sector in which the move commenced and the football discipline which is being played.
Resumo:
We present a model for transport in multiply scattering media based on a three-dimensional generalization of the persistent random walk. The model assumes that photons move along directions that are parallel to the axes. Although this hypothesis is not realistic, it allows us to solve exactly the problem of multiple scattering propagation in a thin slab. Among other quantities, the transmission probability and the mean transmission time can be calculated exactly. Besides being completely solvable, the model could be used as a benchmark for approximation schemes to multiple light scattering.
Resumo:
In automobile insurance, it is useful to achieve a priori ratemaking by resorting to gene- ralized linear models, and here the Poisson regression model constitutes the most widely accepted basis. However, insurance companies distinguish between claims with or without bodily injuries, or claims with full or partial liability of the insured driver. This paper exa- mines an a priori ratemaking procedure when including two di®erent types of claim. When assuming independence between claim types, the premium can be obtained by summing the premiums for each type of guarantee and is dependent on the rating factors chosen. If the independence assumption is relaxed, then it is unclear as to how the tari® system might be a®ected. In order to answer this question, bivariate Poisson regression models, suitable for paired count data exhibiting correlation, are introduced. It is shown that the usual independence assumption is unrealistic here. These models are applied to an automobile insurance claims database containing 80,994 contracts belonging to a Spanish insurance company. Finally, the consequences for pure and loaded premiums when the independence assumption is relaxed by using a bivariate Poisson regression model are analysed.
Resumo:
The context where the university admissions exams are performed is presented and the main concerns about this exams are outlined and discussed from a statistical point of view. The paper offers an illustration of the use of random coefficient models in the study of educational data. The association between two individual scores (one internal and the other external to the school) and the effect of the school in the external exam is analized by a regression model with random intercept and fixed slope. A variance component model for the analysis of the grading process is also presented. The paper ends with an outline of the main findings and the presentation of some specific proposals to improve and control the equity of the system. Some pedagogic reflections are also included.
Resumo:
In the fixed design regression model, additional weights areconsidered for the Nadaraya--Watson and Gasser--M\"uller kernel estimators.We study their asymptotic behavior and the relationships between new andclassical estimators. For a simple family of weights, and considering theIMSE as global loss criterion, we show some possible theoretical advantages.An empirical study illustrates the performance of the weighted estimatorsin finite samples.
Resumo:
Most methods for small-area estimation are based on composite estimators derived from design- or model-based methods. A composite estimator is a linear combination of a direct and an indirect estimator with weights that usually depend on unknown parameters which need to be estimated. Although model-based small-area estimators are usually based on random-effects models, the assumption of fixed effects is at face value more appropriate.Model-based estimators are justified by the assumption of random (interchangeable) area effects; in practice, however, areas are not interchangeable. In the present paper we empirically assess the quality of several small-area estimators in the setting in which the area effects are treated as fixed. We consider two settings: one that draws samples from a theoretical population, and another that draws samples from an empirical population of a labor force register maintained by the National Institute of Social Security (NISS) of Catalonia. We distinguish two types of composite estimators: a) those that use weights that involve area specific estimates of bias and variance; and, b) those that use weights that involve a common variance and a common squared bias estimate for all the areas. We assess their precision and discuss alternatives to optimizing composite estimation in applications.
Resumo:
This paper shows how recently developed regression-based methods for thedecomposition of health inequality can be extended to incorporateindividual heterogeneity in the responses of health to the explanatoryvariables. We illustrate our method with an application to the CanadianNPHS of 1994. Our strategy for the estimation of heterogeneous responsesis based on the quantile regression model. The results suggest that thereis an important degree of heterogeneity in the association of health toexplanatory variables which, in turn, accounts for a substantial percentageof inequality in observed health. A particularly interesting finding isthat the marginal response of health to income is zero for healthyindividuals but positive and significant for unhealthy individuals. Theheterogeneity in the income response reduces both overall health inequalityand income related health inequality.
Resumo:
We consider an infinite number of noninteracting lattice random walkers with the goal of determining statistical properties of the time, out of a total time T, that a single site has been occupied by n random walkers. Initially the random walkers are assumed uniformly distributed on the lattice except for the target site at the origin, which is unoccupied. The random-walk model is taken to be a continuous-time random walk and the pausing-time density at the target site is allowed to differ from the pausing-time density at other sites. We calculate the dependence of the mean time of occupancy by n random walkers as a function of n and the observation time T. We also find the variance for the cumulative time during which the site is unoccupied. The large-T behavior of the variance differs according as the random walk is transient or recurrent. It is shown that the variance is proportional to T at large T in three or more dimensions, it is proportional to T3/2 in one dimension and to TlnT in two dimensions.
Resumo:
This paper tries to resolve some of the main shortcomings in the empirical literature of location decisions for new plants, i.e. spatial effects and overdispersion. Spatial effects are omnipresent, being a source of overdispersion in the data as well as a factor shaping the functional relationship between the variables that explain a firm’s location decisions. Using Count Data models, empirical researchers have dealt with overdispersion and excess zeros by developments of the Poisson regression model. This study aims to take this a step further, by adopting Bayesian methods and models in order to tackle the excess of zeros, spatial and non-spatial overdispersion and spatial dependence simultaneously. Data for Catalonia is used and location determinants are analysed to that end. The results show that spatial effects are determinant. Additionally, overdispersion is descomposed into an unstructured iid effect and a spatially structured effect. Keywords: Bayesian Analysis, Spatial Models, Firm Location. JEL Classification: C11, C21, R30.
Resumo:
Grade retention practices are at the forefront of the educational debate. In this paper, we use PISA 2009 data for Spain to measure the effect of grade retention on students achievement. One important problem when analyzing this question is that school outcomes and the propensity to repeat a grade are likely to be determined simultaneously. We address this problem by estimating a Switching Regression Model. We find that grade retention has a negative impact on educational outcomes, but we confi rm the importance of endogenous selection, which makes observed differences between repeaters and non-repeaters appear 14.6% lower than they actually are. The effect on PISA scores of repeating is much smaller (-10% of non-repeaters average) than the counterfactual reduction that non-repeaters would suffer had they been retained as repeaters (-24% of their average). Furthermore, those who repeated a grade during primary education suffered more than those who repeated a grade of secondary school, although the effect of repeating at both times is, as expected, much larger.
Resumo:
Early detection of breast cancer (BC) with mammography may cause overdiagnosis andovertreatment, detecting tumors which would remain undiagnosed during a lifetime. The aims of this study were: first, to model invasive BC incidence trends in Catalonia (Spain) taking into account reproductive and screening data; and second, to quantify the extent of BC overdiagnosis. We modeled the incidence of invasive BC using a Poisson regression model. Explanatory variables were:age at diagnosis and cohort characteristics (completed fertility rate, percentage of women that use mammography at age 50, and year of birth). This model also was used to estimate the background incidence in the absence of screening. We used a probabilistic model to estimate the expected BC incidence if women in the population usedmammography as reported in health surveys. The difference between the observed and expected cumulative incidences provided an estimate of overdiagnosis.Incidence of invasive BC increased, especially in cohorts born from 1940 to 1955. The biggest increase was observed in these cohorts between the ages of 50 to 65 years, where the final BC incidence rates more than doubled the initial ones. Dissemination of mammography was significantly associated with BC incidence and overdiagnosis. Our estimates of overdiagnosis ranged from 0.4% to 46.6%, for women born around 1935 and 1950, respectively.Our results support the existence of overdiagnosis in Catalonia attributed to mammography usage, and the limited malignant potential of some tumors may play an important role. Women should be better informed about this risk. Research should be oriented towards personalized screening and risk assessment tools
Resumo:
In this article we compare regression models obtained to predict PhD students’ academic performance in the universities of Girona (Spain) and Slovenia. Explanatory variables are characteristics of PhD student’s research group understood as an egocentered social network, background and attitudinal characteristics of the PhD students and some characteristics of the supervisors. Academic performance was measured by the weighted number of publications. Two web questionnaires were designed, one for PhD students and one for their supervisors and other research group members. Most of the variables were easily comparable across universities due to the careful translation procedure and pre-tests. When direct comparison was notpossible we created comparable indicators. We used a regression model in which the country was introduced as a dummy coded variable including all possible interaction effects. The optimal transformations of the main and interaction variables are discussed. Some differences between Slovenian and Girona universities emerge. Some variables like supervisor’s performance and motivation for autonomy prior to starting the PhD have the same positive effect on the PhD student’s performance in both countries. On the other hand, variables like too close supervision by the supervisor and having children have a negative influence in both countries. However, we find differences between countries when we observe the motivation for research prior to starting the PhD which increases performance in Slovenia but not in Girona. As regards network variables, frequency of supervisor advice increases performance in Slovenia and decreases it in Girona. The negative effect in Girona could be explained by the fact that additional contacts of the PhD student with his/her supervisor might indicate a higher workload in addition to or instead of a better advice about the dissertation. The number of external student’s advice relationships and social support mean contact intensity are not significant in Girona, but they have a negative effect in Slovenia. We might explain the negative effect of external advice relationships in Slovenia by saying that a lot of external advice may actually result from a lack of the more relevant internal advice