986 resultados para Instrumental variable regression
Resumo:
The literature on local services has focused on the effects of privatization and, if anything, has compared the effects of private and mixed public-private systems versus public provision. However, alternative forms of provision such as cooperatives, which can be very prevalent in many developing countries, have been completely ignored. In this paper, we investigate the effects of communal water provison (Comités Vecinales and Juntas Administrativas de Servicios de Saneamiento) on child health in Peru. Using detailed survey data at the household- and child-level for the years 2006-2010, we exploit the cross-section variability to assess the differential impact of this form of provision. Despite controlling for a wide range of household and local characteristics, the municipalities served by communal organizations are more likely to have poorer health indicators, what would result in a downward bias on the absolute magnitude of the effect of cooperatives. We rely on an instrumental variable strategy to deal with this potential endogeneity problem, and use the personnel resources and the administrative urban/rural classi fication of the municipalities as instruments for the provision type. The results show a negative and signi cant effect of comunal water provision on diarrhea among under- five year old children. Keywords: water utilities, cooperatives, child health, regulation, Peru. JEL Classi fication Numbers: L33; L50; L95
Resumo:
Using historical data for all Swiss cantons from 1890 to 2000, we estimate the causal effect of direct democracy on government spending. The main innovation in this paper is that we use fixed effects to control for unobserved heterogeneity and instrumental variables to address the potential endogeneity of institutions. We find that the budget referendum and lower costs to launch a voter initiative are effective tools in reducing canton level spending. However, we find no evidence that the budget referendum results in more decentralized government or a larger local government. Our instrumental variable estimates suggest that a mandatory budget referendum reduces the size of canton spending between 13 and 19 percent. A 1 percent lower signature requirement for the initiative reduces canton spending by up to 2 percent.
Resumo:
This paper applies the theoretical literature on nonparametric bounds ontreatment effects to the estimation of how limited English proficiency (LEP)affects wages and employment opportunities for Hispanic workers in theUnited States. I analyze the identifying power of several weak assumptionson treatment response and selection, and stress the interactions between LEPand education, occupation and immigration status. I show that thecombination of two weak but credible assumptions provides informative upperbounds on the returns to language skills for certain subgroups of thepopulation. Adding age at arrival as a monotone instrumental variable alsoprovides informative lower bounds.
Resumo:
The World Health Organization estimates that 300 million clinical cases of malaria occur annually and observed that during the 80's and part of the 90's its incidence increased. In this paper we explore the influence of refugees from civil wars on the incidence of malaria in the refugee-receiving countries. Using civil wars as an instrumental variable we show that for each 1,000 refugees there are between 2,000 and 2,700 cases of malaria in the refugee receiving country. On average 13% of the cases of malaria reported by the WHO are caused by forced migration as a consequence of civil wars.
Resumo:
In this paper we study, having as theoretical reference the economic model of crime (Becker, 1968; Ehrlich, 1973), which are the socioeconomic and demographic determinants of crime in Spain paying attention on the role of provincial peculiarities. We estimate a crime equation using a panel dataset of Spanish provinces (NUTS3) for the period 1993 to 1999 employing the GMMsystem estimator. Empirical results suggest that lagged crime rate and clear-up rate are correlated to all typologies of crime rate considered. Property crimes are better explained by socioeconomic variables (GDP per capita, GDP growth rate and percentage of population with high school and university degree), while demographic factors reveal important and significant influences, in particular for crimes against the person. These results are obtained using an instrumental variable approach that takes advantage of the dynamic properties of our dataset to control for both measurement errors in crime data and joint endogeneity of the explanatory variables
Resumo:
In this paper we study, having as theoretical reference the economic model of crime (Becker, 1968; Ehrlich, 1973), which are the socioeconomic and demographic determinants of crime in Spain paying attention on the role of provincial peculiarities. We estimate a crime equation using a panel dataset of Spanish provinces (NUTS3) for the period 1993 to 1999 employing the GMMsystem estimator. Empirical results suggest that lagged crime rate and clear-up rate are correlated to all typologies of crime rate considered. Property crimes are better explained by socioeconomic variables (GDP per capita, GDP growth rate and percentage of population with high school and university degree), while demographic factors reveal important and significant influences, in particular for crimes against the person. These results are obtained using an instrumental variable approach that takes advantage of the dynamic properties of our dataset to control for both measurement errors in crime data and joint endogeneity of the explanatory variables
Resumo:
This report describes a statewide study conducted to develop main-channel slope (MCS) curves for 138 selected streams in Iowa with drainage areas greater than 100 square miles. MCS values determined from the curves can be used in regression equations for estimating flood frequency discharges. Multi-variable regression equations previously developed for two of the three hydrologic regions defined for Iowa require the measurement of MCS. Main-channel slope is a difficult measurement to obtain for large streams using 1:24,000-scale topographic maps. The curves developed in this report provide a simplified method for determining MCS values for sites located along large streams in Iowa within hydrologic Regions 2 and 3. The curves were developed using MCS values quantified for 2,058 selected sites along 138 selected streams in Iowa. A geographic information system (GIS) technique and 1:24,000-scale topographic data were used to quantify MCS values for the stream sites. The sites were selected at about 5-mile intervals along the streams. River miles were quantified for each stream site using a GIS program. Data points for river-mile and MCS values were plotted and a best-fit curve was developed for each stream. An adjustment was applied to all 138 curves to compensate for differences in MCS values between manual measurements and GIS quantification. The multi-variable equations for Regions 2 and 3 were developed using manual measurements of MCS. A comparison of manual measurements and GIS quantification of MCS indicates that manual measurements typically produce greater values of MCS compared to GIS quantification. Median differences between manual measurements and GIS quantification of MCS are 14.8 and 17.7 percent for Regions 2 and 3, respectively. Comparisons of percentage differences between flood-frequency discharges calculated using MCS values of manual measurements and GIS quantification indicate that use of GIS values of MCS for Region 3 substantially underestimate flood discharges. Mean and median percentage differences for 2- to 500-year recurrence-interval flood discharges ranged from 5.0 to 5.3 and 4.3 to 4.5 percent, respectively, for Region 2 and ranged from 18.3 to 27.1 and 12.3 to 17.3 percent for Region 3. The MCS curves developed from GIS quantification were adjusted by 14.8 percent for streams located in Region 2 and by 17.7 percent for streams located in Region 3. Comparisons of percentage differences between flood discharges calculated using MCS values of manual measurements and adjusted-GIS quantification for Regions 2 and 3 indicate that the flood-discharge estimates are comparable. For Region 2, mean percentage differences for 2- to 500-year recurrence-interval flood discharges ranged between 0.6 and 0.8 percent and median differences were 0.0 percent. For Region 3, mean and median differences ranged between 5.4 to 8.4 and 0.0 to 0.3 percent, respectively. A list of selected stream sites presented with each curve provides information about the sites including river miles, drainage areas, the location of U.S. Geological Survey stream flowgage stations, and the location of streams Abstract crossing hydro logic region boundaries or the Des Moines Lobe landforms region boundary. Two examples are presented for determining river-mile and MCS values, and two techniques are presented for computing flood-frequency discharges.
Resumo:
The method of instrumental variable (referred to as Mendelian randomization when the instrument is a genetic variant) has been initially developed to infer on a causal effect of a risk factor on some outcome of interest in a linear model. Adapting this method to nonlinear models, however, is known to be problematic. In this paper, we consider the simple case when the genetic instrument, the risk factor, and the outcome are all binary. We compare via simulations the usual two-stages estimate of a causal odds-ratio and its adjusted version with a recently proposed estimate in the context of a clinical trial with noncompliance. In contrast to the former two, we confirm that the latter is (under some conditions) a valid estimate of a causal odds-ratio defined in the subpopulation of compliers, and we propose its use in the context of Mendelian randomization. By analogy with a clinical trial with noncompliance, compliers are those individuals for whom the presence/absence of the risk factor X is determined by the presence/absence of the genetic variant Z (i.e., for whom we would observe X = Z whatever the alleles randomly received at conception). We also recall and illustrate the huge variability of instrumental variable estimates when the instrument is weak (i.e., with a low percentage of compliers, as is typically the case with genetic instruments for which this proportion is frequently smaller than 10%) where the inter-quartile range of our simulated estimates was up to 18 times higher compared to a conventional (e.g., intention-to-treat) approach. We thus conclude that the need to find stronger instruments is probably as important as the need to develop a methodology allowing to consistently estimate a causal odds-ratio.
Resumo:
A statewide study was conducted to develop regression equations for estimating flood-frequency discharges for ungaged stream sites in Iowa. Thirty-eight selected basin characteristics were quantified and flood-frequency analyses were computed for 291 streamflow-gaging stations in Iowa and adjacent States. A generalized-skew-coefficient analysis was conducted to determine whether generalized skew coefficients could be improved for Iowa. Station skew coefficients were computed for 239 gaging stations in Iowa and adjacent States, and an isoline map of generalized-skew-coefficient values was developed for Iowa using variogram modeling and kriging methods. The skew map provided the lowest mean square error for the generalized-skew- coefficient analysis and was used to revise generalized skew coefficients for flood-frequency analyses for gaging stations in Iowa. Regional regression analysis, using generalized least-squares regression and data from 241 gaging stations, was used to develop equations for three hydrologic regions defined for the State. The regression equations can be used to estimate flood discharges that have recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years for ungaged stream sites in Iowa. One-variable equations were developed for each of the three regions and multi-variable equations were developed for two of the regions. Two sets of equations are presented for two of the regions because one-variable equations are considered easy for users to apply and the predictive accuracies of multi-variable equations are greater. Standard error of prediction for the one-variable equations ranges from about 34 to 45 percent and for the multi-variable equations range from about 31 to 42 percent. A region-of-influence regression method was also investigated for estimating flood-frequency discharges for ungaged stream sites in Iowa. A comparison of regional and region-of-influence regression methods, based on ease of application and root mean square errors, determined the regional regression method to be the better estimation method for Iowa. Techniques for estimating flood-frequency discharges for streams in Iowa are presented for determining ( 1) regional regression estimates for ungaged sites on ungaged streams; (2) weighted estimates for gaged sites; and (3) weighted estimates for ungaged sites on gaged streams. The technique for determining regional regression estimates for ungaged sites on ungaged streams requires determining which of four possible examples applies to the location of the stream site and its basin. Illustrations for determining which example applies to an ungaged stream site and for applying both the one-variable and multi-variable regression equations are provided for the estimation techniques.
Resumo:
[spa] La estimación del impacto del tamaño de la populación sobre la probabilidad de conflicto civil se complica por el sesgo de endogeneidad y las variables omitidas. Este artículo trata el problema de causalidad utilizando métodos de variables instrumentales en un panel de 37 países del África Sub-sahariana en el período 1981-2004. Encontramos que un aumento de la población en un 1% aumenta la probabilidad de conflicto civil por un 5.2%.
Resumo:
Several Authors Have Discussed Recently the Limited Dependent Variable Regression Model with Serial Correlation Between Residuals. the Pseudo-Maximum Likelihood Estimators Obtained by Ignoring Serial Correlation Altogether, Have Been Shown to Be Consistent. We Present Alternative Pseudo-Maximum Likelihood Estimators Which Are Obtained by Ignoring Serial Correlation Only Selectively. Monte Carlo Experiments on a Model with First Order Serial Correlation Suggest That Our Alternative Estimators Have Substantially Lower Mean-Squared Errors in Medium Size and Small Samples, Especially When the Serial Correlation Coefficient Is High. the Same Experiments Also Suggest That the True Level of the Confidence Intervals Established with Our Estimators by Assuming Asymptotic Normality, Is Somewhat Lower Than the Intended Level. Although the Paper Focuses on Models with Only First Order Serial Correlation, the Generalization of the Proposed Approach to Serial Correlation of Higher Order Is Also Discussed Briefly.
Resumo:
It is well known that standard asymptotic theory is not valid or is extremely unreliable in models with identification problems or weak instruments [Dufour (1997, Econometrica), Staiger and Stock (1997, Econometrica), Wang and Zivot (1998, Econometrica), Stock and Wright (2000, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. One possible way out consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows one to build exact tests and confidence sets only for the full vector of the coefficients of the endogenous explanatory variables in a structural equation, which in general does not allow for individual coefficients. This problem may in principle be overcome by using projection techniques [Dufour (1997, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. AR-types are emphasized because they are robust to both weak instruments and instrument exclusion. However, these techniques can be implemented only by using costly numerical techniques. In this paper, we provide a complete analytic solution to the problem of building projection-based confidence sets from Anderson-Rubin-type confidence sets. The latter involves the geometric properties of “quadrics” and can be viewed as an extension of usual confidence intervals and ellipsoids. Only least squares techniques are required for building the confidence intervals. We also study by simulation how “conservative” projection-based confidence sets are. Finally, we illustrate the methods proposed by applying them to three different examples: the relationship between trade and growth in a cross-section of countries, returns to education, and a study of production functions in the U.S. economy.
Resumo:
We discuss statistical inference problems associated with identification and testability in econometrics, and we emphasize the common nature of the two issues. After reviewing the relevant statistical notions, we consider in turn inference in nonparametric models and recent developments on weakly identified models (or weak instruments). We point out that many hypotheses, for which test procedures are commonly proposed, are not testable at all, while some frequently used econometric methods are fundamentally inappropriate for the models considered. Such situations lead to ill-defined statistical problems and are often associated with a misguided use of asymptotic distributional results. Concerning nonparametric hypotheses, we discuss three basic problems for which such difficulties occur: (1) testing a mean (or a moment) under (too) weak distributional assumptions; (2) inference under heteroskedasticity of unknown form; (3) inference in dynamic models with an unlimited number of parameters. Concerning weakly identified models, we stress that valid inference should be based on proper pivotal functions —a condition not satisfied by standard Wald-type methods based on standard errors — and we discuss recent developments in this field, mainly from the viewpoint of building valid tests and confidence sets. The techniques discussed include alternative proposed statistics, bounds, projection, split-sampling, conditioning, Monte Carlo tests. The possibility of deriving a finite-sample distributional theory, robustness to the presence of weak instruments, and robustness to the specification of a model for endogenous explanatory variables are stressed as important criteria assessing alternative procedures.
Resumo:
La migration internationale d’étudiants est un investissement couteux pour les familles dans beaucoup de pays en voie de développement. Cependant, cet investissement est susceptible de générer des bénéfices financiers et sociaux relativement importants aux investisseurs, tout autant que des externalités pour d’autres membres de la famille. Cette thèse s’intéresse à deux aspects importants de la migration des étudiants internationaux : (i) Qui part? Quels sont les déterminants de la probabilité de migration? (ii) Qui paie? Comment la famille s’organise-t-elle pour couvrir les frais de la migration? (iii) Qui y gagne? Ce flux migratoire est-il au bénéfice du pays d’origine? Entreprendre une telle étude met le chercheur en face de défis importants, notamment, l’absence de données complètes et fiables; la dispersion géographique des étudiants migrants en étant la cause première. La première contribution importante de ce travail est le développement d’une méthode de sondage en « boule de neige » pour des populations difficiles à atteindre, ainsi que d’estimateurs corrigeant les possibles biais de sélection. A partir de cette méthodologie, j’ai collecté des données incluant simultanément des étudiants migrants et non-migrants du Cameroun en utilisant une plateforme internet. Un second défi relativement bien documenté est la présence d’endogénéité du choix d’éducation. Nous tirons avantage des récents développements théoriques dans le traitement des problèmes d’identification dans les modèles de choix discrets pour résoudre cette difficulté, tout en conservant la simplicité des hypothèses nécessaires. Ce travail constitue l’une des premières applications de cette méthodologie à des questions de développement. Le premier chapitre de la thèse étudie la décision prise par la famille d’investir dans la migration étudiante. Il propose un modèle structurel empirique de choix discret qui reflète à la fois le rendement brut de la migration et la contrainte budgétaire liée au problème de choix des agents. Nos résultats démontrent que le choix du niveau final d’éducation, les résultats académiques et l’aide de la famille sont des déterminants importants de la probabilité d’émigrer, au contraire du genre qui ne semble pas affecter très significativement la décision familiale. Le second chapitre s’efforce de comprendre comment les agents décident de leur participation à la décision de migration et comment la famille partage les profits et décourage le phénomène de « passagers clandestins ». D’autres résultats dans la littérature sur l’identification partielle nous permettent de considérer des comportements stratégiques au sein de l’unité familiale. Les premières estimations suggèrent que le modèle « unitaire », où un agent représentatif maximise l’utilité familiale ne convient qu’aux familles composées des parents et de l’enfant. Les aidants extérieurs subissent un cout strictement positif pour leur participation, ce qui décourage leur implication. Les obligations familiales et sociales semblent expliquer les cas de participation d’un aidant, mieux qu’un possible altruisme de ces derniers. Finalement, le troisième chapitre présente le cadre théorique plus général dans lequel s’imbriquent les modèles développés dans les précédents chapitres. Les méthodes d’identification et d’inférence présentées sont spécialisées aux jeux finis avec information complète. Avec mes co-auteurs, nous proposons notamment une procédure combinatoire pour une implémentation efficace du bootstrap aux fins d’inférences dans les modèles cités ci-dessus. Nous en faisons une application sur les déterminants du choix familial de soins à long terme pour des parents âgés.
Resumo:
Ma thèse est composée de trois essais sur l'inférence par le bootstrap à la fois dans les modèles de données de panel et les modèles à grands nombres de variables instrumentales #VI# dont un grand nombre peut être faible. La théorie asymptotique n'étant pas toujours une bonne approximation de la distribution d'échantillonnage des estimateurs et statistiques de tests, je considère le bootstrap comme une alternative. Ces essais tentent d'étudier la validité asymptotique des procédures bootstrap existantes et quand invalides, proposent de nouvelles méthodes bootstrap valides. Le premier chapitre #co-écrit avec Sílvia Gonçalves# étudie la validité du bootstrap pour l'inférence dans un modèle de panel de données linéaire, dynamique et stationnaire à effets fixes. Nous considérons trois méthodes bootstrap: le recursive-design bootstrap, le fixed-design bootstrap et le pairs bootstrap. Ces méthodes sont des généralisations naturelles au contexte des panels des méthodes bootstrap considérées par Gonçalves et Kilian #2004# dans les modèles autorégressifs en séries temporelles. Nous montrons que l'estimateur MCO obtenu par le recursive-design bootstrap contient un terme intégré qui imite le biais de l'estimateur original. Ceci est en contraste avec le fixed-design bootstrap et le pairs bootstrap dont les distributions sont incorrectement centrées à zéro. Cependant, le recursive-design bootstrap et le pairs bootstrap sont asymptotiquement valides quand ils sont appliqués à l'estimateur corrigé du biais, contrairement au fixed-design bootstrap. Dans les simulations, le recursive-design bootstrap est la méthode qui produit les meilleurs résultats. Le deuxième chapitre étend les résultats du pairs bootstrap aux modèles de panel non linéaires dynamiques avec des effets fixes. Ces modèles sont souvent estimés par l'estimateur du maximum de vraisemblance #EMV# qui souffre également d'un biais. Récemment, Dhaene et Johmans #2014# ont proposé la méthode d'estimation split-jackknife. Bien que ces estimateurs ont des approximations asymptotiques normales centrées sur le vrai paramètre, de sérieuses distorsions demeurent à échantillons finis. Dhaene et Johmans #2014# ont proposé le pairs bootstrap comme alternative dans ce contexte sans aucune justification théorique. Pour combler cette lacune, je montre que cette méthode est asymptotiquement valide lorsqu'elle est utilisée pour estimer la distribution de l'estimateur split-jackknife bien qu'incapable d'estimer la distribution de l'EMV. Des simulations Monte Carlo montrent que les intervalles de confiance bootstrap basés sur l'estimateur split-jackknife aident grandement à réduire les distorsions liées à l'approximation normale en échantillons finis. En outre, j'applique cette méthode bootstrap à un modèle de participation des femmes au marché du travail pour construire des intervalles de confiance valides. Dans le dernier chapitre #co-écrit avec Wenjie Wang#, nous étudions la validité asymptotique des procédures bootstrap pour les modèles à grands nombres de variables instrumentales #VI# dont un grand nombre peu être faible. Nous montrons analytiquement qu'un bootstrap standard basé sur les résidus et le bootstrap restreint et efficace #RE# de Davidson et MacKinnon #2008, 2010, 2014# ne peuvent pas estimer la distribution limite de l'estimateur du maximum de vraisemblance à information limitée #EMVIL#. La raison principale est qu'ils ne parviennent pas à bien imiter le paramètre qui caractérise l'intensité de l'identification dans l'échantillon. Par conséquent, nous proposons une méthode bootstrap modifiée qui estime de facon convergente cette distribution limite. Nos simulations montrent que la méthode bootstrap modifiée réduit considérablement les distorsions des tests asymptotiques de type Wald #$t$# dans les échantillons finis, en particulier lorsque le degré d'endogénéité est élevé.