907 resultados para Mixed model under selection
Resumo:
La dernière décennie a connu un intérêt croissant pour les problèmes posés par les variables instrumentales faibles dans la littérature économétrique, c’est-à-dire les situations où les variables instrumentales sont faiblement corrélées avec la variable à instrumenter. En effet, il est bien connu que lorsque les instruments sont faibles, les distributions des statistiques de Student, de Wald, du ratio de vraisemblance et du multiplicateur de Lagrange ne sont plus standard et dépendent souvent de paramètres de nuisance. Plusieurs études empiriques portant notamment sur les modèles de rendements à l’éducation [Angrist et Krueger (1991, 1995), Angrist et al. (1999), Bound et al. (1995), Dufour et Taamouti (2007)] et d’évaluation des actifs financiers (C-CAPM) [Hansen et Singleton (1982,1983), Stock et Wright (2000)], où les variables instrumentales sont faiblement corrélées avec la variable à instrumenter, ont montré que l’utilisation de ces statistiques conduit souvent à des résultats peu fiables. Un remède à ce problème est l’utilisation de tests robustes à l’identification [Anderson et Rubin (1949), Moreira (2002), Kleibergen (2003), Dufour et Taamouti (2007)]. Cependant, il n’existe aucune littérature économétrique sur la qualité des procédures robustes à l’identification lorsque les instruments disponibles sont endogènes ou à la fois endogènes et faibles. Cela soulève la question de savoir ce qui arrive aux procédures d’inférence robustes à l’identification lorsque certaines variables instrumentales supposées exogènes ne le sont pas effectivement. Plus précisément, qu’arrive-t-il si une variable instrumentale invalide est ajoutée à un ensemble d’instruments valides? Ces procédures se comportent-elles différemment? Et si l’endogénéité des variables instrumentales pose des difficultés majeures à l’inférence statistique, peut-on proposer des procédures de tests qui sélectionnent les instruments lorsqu’ils sont à la fois forts et valides? Est-il possible de proposer les proédures de sélection d’instruments qui demeurent valides même en présence d’identification faible? Cette thèse se focalise sur les modèles structurels (modèles à équations simultanées) et apporte des réponses à ces questions à travers quatre essais. Le premier essai est publié dans Journal of Statistical Planning and Inference 138 (2008) 2649 – 2661. Dans cet essai, nous analysons les effets de l’endogénéité des instruments sur deux statistiques de test robustes à l’identification: la statistique d’Anderson et Rubin (AR, 1949) et la statistique de Kleibergen (K, 2003), avec ou sans instruments faibles. D’abord, lorsque le paramètre qui contrôle l’endogénéité des instruments est fixe (ne dépend pas de la taille de l’échantillon), nous montrons que toutes ces procédures sont en général convergentes contre la présence d’instruments invalides (c’est-à-dire détectent la présence d’instruments invalides) indépendamment de leur qualité (forts ou faibles). Nous décrivons aussi des cas où cette convergence peut ne pas tenir, mais la distribution asymptotique est modifiée d’une manière qui pourrait conduire à des distorsions de niveau même pour de grands échantillons. Ceci inclut, en particulier, les cas où l’estimateur des double moindres carrés demeure convergent, mais les tests sont asymptotiquement invalides. Ensuite, lorsque les instruments sont localement exogènes (c’est-à-dire le paramètre d’endogénéité converge vers zéro lorsque la taille de l’échantillon augmente), nous montrons que ces tests convergent vers des distributions chi-carré non centrées, que les instruments soient forts ou faibles. Nous caractérisons aussi les situations où le paramètre de non centralité est nul et la distribution asymptotique des statistiques demeure la même que dans le cas des instruments valides (malgré la présence des instruments invalides). Le deuxième essai étudie l’impact des instruments faibles sur les tests de spécification du type Durbin-Wu-Hausman (DWH) ainsi que le test de Revankar et Hartley (1973). Nous proposons une analyse en petit et grand échantillon de la distribution de ces tests sous l’hypothèse nulle (niveau) et l’alternative (puissance), incluant les cas où l’identification est déficiente ou faible (instruments faibles). Notre analyse en petit échantillon founit plusieurs perspectives ainsi que des extensions des précédentes procédures. En effet, la caractérisation de la distribution de ces statistiques en petit échantillon permet la construction des tests de Monte Carlo exacts pour l’exogénéité même avec les erreurs non Gaussiens. Nous montrons que ces tests sont typiquement robustes aux intruments faibles (le niveau est contrôlé). De plus, nous fournissons une caractérisation de la puissance des tests, qui exhibe clairement les facteurs qui déterminent la puissance. Nous montrons que les tests n’ont pas de puissance lorsque tous les instruments sont faibles [similaire à Guggenberger(2008)]. Cependant, la puissance existe tant qu’au moins un seul instruments est fort. La conclusion de Guggenberger (2008) concerne le cas où tous les instruments sont faibles (un cas d’intérêt mineur en pratique). Notre théorie asymptotique sous les hypothèses affaiblies confirme la théorie en échantillon fini. Par ailleurs, nous présentons une analyse de Monte Carlo indiquant que: (1) l’estimateur des moindres carrés ordinaires est plus efficace que celui des doubles moindres carrés lorsque les instruments sont faibles et l’endogenéité modérée [conclusion similaire à celle de Kiviet and Niemczyk (2007)]; (2) les estimateurs pré-test basés sur les tests d’exogenété ont une excellente performance par rapport aux doubles moindres carrés. Ceci suggère que la méthode des variables instrumentales ne devrait être appliquée que si l’on a la certitude d’avoir des instruments forts. Donc, les conclusions de Guggenberger (2008) sont mitigées et pourraient être trompeuses. Nous illustrons nos résultats théoriques à travers des expériences de simulation et deux applications empiriques: la relation entre le taux d’ouverture et la croissance économique et le problème bien connu du rendement à l’éducation. Le troisième essai étend le test d’exogénéité du type Wald proposé par Dufour (1987) aux cas où les erreurs de la régression ont une distribution non-normale. Nous proposons une nouvelle version du précédent test qui est valide même en présence d’erreurs non-Gaussiens. Contrairement aux procédures de test d’exogénéité usuelles (tests de Durbin-Wu-Hausman et de Rvankar- Hartley), le test de Wald permet de résoudre un problème courant dans les travaux empiriques qui consiste à tester l’exogénéité partielle d’un sous ensemble de variables. Nous proposons deux nouveaux estimateurs pré-test basés sur le test de Wald qui performent mieux (en terme d’erreur quadratique moyenne) que l’estimateur IV usuel lorsque les variables instrumentales sont faibles et l’endogénéité modérée. Nous montrons également que ce test peut servir de procédure de sélection de variables instrumentales. Nous illustrons les résultats théoriques par deux applications empiriques: le modèle bien connu d’équation du salaire [Angist et Krueger (1991, 1999)] et les rendements d’échelle [Nerlove (1963)]. Nos résultats suggèrent que l’éducation de la mère expliquerait le décrochage de son fils, que l’output est une variable endogène dans l’estimation du coût de la firme et que le prix du fuel en est un instrument valide pour l’output. Le quatrième essai résout deux problèmes très importants dans la littérature économétrique. D’abord, bien que le test de Wald initial ou étendu permette de construire les régions de confiance et de tester les restrictions linéaires sur les covariances, il suppose que les paramètres du modèle sont identifiés. Lorsque l’identification est faible (instruments faiblement corrélés avec la variable à instrumenter), ce test n’est en général plus valide. Cet essai développe une procédure d’inférence robuste à l’identification (instruments faibles) qui permet de construire des régions de confiance pour la matrices de covariances entre les erreurs de la régression et les variables explicatives (possiblement endogènes). Nous fournissons les expressions analytiques des régions de confiance et caractérisons les conditions nécessaires et suffisantes sous lesquelles ils sont bornés. La procédure proposée demeure valide même pour de petits échantillons et elle est aussi asymptotiquement robuste à l’hétéroscédasticité et l’autocorrélation des erreurs. Ensuite, les résultats sont utilisés pour développer les tests d’exogénéité partielle robustes à l’identification. Les simulations Monte Carlo indiquent que ces tests contrôlent le niveau et ont de la puissance même si les instruments sont faibles. Ceci nous permet de proposer une procédure valide de sélection de variables instrumentales même s’il y a un problème d’identification. La procédure de sélection des instruments est basée sur deux nouveaux estimateurs pré-test qui combinent l’estimateur IV usuel et les estimateurs IV partiels. Nos simulations montrent que: (1) tout comme l’estimateur des moindres carrés ordinaires, les estimateurs IV partiels sont plus efficaces que l’estimateur IV usuel lorsque les instruments sont faibles et l’endogénéité modérée; (2) les estimateurs pré-test ont globalement une excellente performance comparés à l’estimateur IV usuel. Nous illustrons nos résultats théoriques par deux applications empiriques: la relation entre le taux d’ouverture et la croissance économique et le modèle de rendements à l’éducation. Dans la première application, les études antérieures ont conclu que les instruments n’étaient pas trop faibles [Dufour et Taamouti (2007)] alors qu’ils le sont fortement dans la seconde [Bound (1995), Doko et Dufour (2009)]. Conformément à nos résultats théoriques, nous trouvons les régions de confiance non bornées pour la covariance dans le cas où les instruments sont assez faibles.
Resumo:
A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora.
Resumo:
In this paper a novel rank estimation technique for trajectories motion segmentation within the Local Subspace Affinity (LSA) framework is presented. This technique, called Enhanced Model Selection (EMS), is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built by LSA. The results on synthetic and real data show that without any a priori knowledge, EMS automatically provides an accurate and robust rank estimation, improving the accuracy of the final motion segmentation
Resumo:
Considering the difficulty in the insulin dosage selection and the problem of hyper- and hypoglycaemia episodes in type 1 diabetes, dosage-aid systems appear as tremendously helpful for these patients. A model-based approach to this problem must unavoidably consider uncertainty sources such as the large intra-patient variability and food intake. This work addresses the prediction of glycaemia for a given insulin therapy face to parametric and input uncertainty, by means of modal interval analysis. As result, a band containing all possible glucose excursions suffered by the patient for the given uncertainty is obtained. From it, a safer prediction of possible hyper- and hypoglycaemia episodes can be calculated
Resumo:
The intraseasonal variability (ISV) of the Indian summer monsoon is dominated by a 30–50 day oscillation between “active” and “break” events of enhanced and reduced rainfall over the subcontinent, respectively. These organized convective events form in the equatorial Indian Ocean and propagate north to India. Atmosphere–ocean coupled processes are thought to play a key role the intensity and propagation of these events. A high-resolution, coupled atmosphere–mixed-layer-oceanmodel is assembled: HadKPP. HadKPP comprises the Hadley Centre Atmospheric Model (HadAM3) and the K Profile Parameterization (KPP) mixed-layer ocean model. Following studies that upper-ocean vertical resolution and sub-diurnal coupling frequencies improve the simulation of ISV in SSTs, KPP is run at 1 m vertical resolution near the surface; the atmosphere and ocean are coupled every three hours. HadKPP accurately simulates the 30–50 day ISV in rainfall and SSTs over India and the Bay of Bengal, respectively, but suffers from low ISV on the equator. This is due to the HadAM3 convection scheme producing limited ISV in surface fluxes. HadKPP demonstrates little of the observed northward propagation of intraseasonal events, producing instead a standing oscillation. The lack of equatorial ISV in convection in HadAM3 constrains the ability of KPP to produce equatorial SST anomalies, which further weakens the ISV of convection. It is concluded that while atmosphere–ocean interactions are undoubtedly essential to an accurate simulation of ISV, they are not a panacea for model deficiencies. In regions where the atmospheric forcing is adequate, such as the Bay of Bengal, KPP produces SST anomalies that are comparable to the Tropical Rainfall Measuring Mission Microwave Imager (TMI) SST analyses in both their magnitude and their timing with respect to rainfall anomalies over India. HadKPP also displays a much-improved phase relationship between rainfall and SSTs over a HadAM3 ensemble forced by observed SSTs, when both are compared to observations. Coupling to mixed-layer models such as KPP has the potential to improve operational predictions of ISV, particularly when the persistence time of SST anomalies is shorter than the forecast lead time.
Resumo:
Using the Met Office large-eddy model (LEM) we simulate a mixed-phase altocumulus cloud that was observed from Chilbolton in southern England by a 94 GHz Doppler radar, a 905 nm lidar, a dual-wavelength microwave radiometer and also by four radiosondes. It is important to test and evaluate such simulations with observations, since there are significant differences between results from different cloud-resolving models for ice clouds. Simulating the Doppler radar and lidar data within the LEM allows us to compare observed and modelled quantities directly, and allows us to explore the relationships between observed and unobserved variables. For general-circulation models, which currently tend to give poor representations of mixed-phase clouds, the case shows the importance of using: (i) separate prognostic ice and liquid water, (ii) a vertical resolution that captures the thin layers of liquid water, and (iii) an accurate representation the subgrid vertical velocities that allow liquid water to form. It is shown that large-scale ascents and descents are significant for this case, and so the horizontally averaged LEM profiles are relaxed towards observed profiles to account for these. The LEM simulation then gives a reasonable. cloud, with an ice-water path approximately two thirds of that observed, with liquid water at the cloud top, as observed. However, the liquid-water cells that form in the updraughts at cloud top in the LEM have liquid-water paths (LWPs) up to half those observed, and there are too few cells, giving a mean LWP five to ten times smaller than observed. In reality, ice nucleation and fallout may deplete ice-nuclei concentrations at the cloud top, allowing more liquid water to form there, but this process is not represented in the model. Decreasing the heterogeneous nucleation rate in the LEM increased the LWP, which supports this hypothesis. The LEM captures the increase in the standard deviation in Doppler velocities (and so vertical winds) with height, but values are 1.5 to 4 times smaller than observed (although values are larger in an unforced model run, this only increases the modelled LWP by a factor of approximately two). The LEM data show that, for values larger than approximately 12 cm s(-1), the standard deviation in Doppler velocities provides an almost unbiased estimate of the standard deviation in vertical winds, but provides an overestimate for smaller values. Time-smoothing the observed Doppler velocities and modelled mass-squared-weighted fallspeeds shows that observed fallspeeds are approximately two-thirds of the modelled values. Decreasing the modelled fallspeeds to those observed increases the modelled IWC, giving an IWP 1.6 times that observed.
Resumo:
Using mixed logit models to analyse choice data is common but requires ex ante specification of the functional forms of preference distributions. We make the case for greater use of bounded functional forms and propose the use of the Marginal Likelihood, calculated using Bayesian techniques, as a single measure of model performance across non nested mixed logit specifications. Using this measure leads to very different rankings of model specifications compared to alternative rule of thumb measures. The approach is illustrated using data from a choice experiment regarding GM food types which provides insights regarding the recent WTO dispute between the EU and the US, Canada and Argentina and whether labelling and trade regimes should be based on the production process or product composition.
Resumo:
The effects of varying doses of fungicides, alone or in mixtures, on selection for triazole resistance were examined under field conditions. Two experiments were conducted using the triazole fungicide fluquinconazole with the strobilurin fungicide azoxystrobin as a mixture partner. Inoculated wheat plots with a known ratio of more sensitive to less sensitive isolates of the leaf blotch fungus Mycosphaerella graminicola were sprayed with fungicide and sampled once symptoms had appeared. Selection for fluquinconazole resistance increased in proportion to the dose, up to one-half of the full dose (the maximum tested) in both experiments. At the higher doses of fluquinconazole, the addition of azoxystrobin was associated with a decrease in selection (nonsignificant in the first experiment) for triazole resistance. Control by low doses of fluquinconazole was increased by mixture with azoxystrobin, but at higher doses mixture with azoxystrobin sometimes decreased control, so that reduced selection was obtained at the cost of some reduction in control. The effects on resistance are not necessarily general consequences of mixing fungicides, and suggest that the properties of any specific mixture may need to be demonstrated experimentally. Selection was inversely related to control in the unmixed treatments in both experiments, but the relationship was weaker in the mixtures with azoxystrobin.
Resumo:
Presented herein is an experimental design that allows the effects of several radiative forcing factors on climate to be estimated as precisely as possible from a limited suite of atmosphere-only general circulation model (GCM) integrations. The forcings include the combined effect of observed changes in sea surface temperatures, sea ice extent, stratospheric (volcanic) aerosols, and solar output, plus the individual effects of several anthropogenic forcings. A single linear statistical model is used to estimate the forcing effects, each of which is represented by its global mean radiative forcing. The strong colinearity in time between the various anthropogenic forcings provides a technical problem that is overcome through the design of the experiment. This design uses every combination of anthropogenic forcing rather than having a few highly replicated ensembles, which is more commonly used in climate studies. Not only is this design highly efficient for a given number of integrations, but it also allows the estimation of (nonadditive) interactions between pairs of anthropogenic forcings. The simulated land surface air temperature changes since 1871 have been analyzed. The changes in natural and oceanic forcing, which itself contains some forcing from anthropogenic and natural influences, have the most influence. For the global mean, increasing greenhouse gases and the indirect aerosol effect had the largest anthropogenic effects. It was also found that an interaction between these two anthropogenic effects in the atmosphere-only GCM exists. This interaction is similar in magnitude to the individual effects of changing tropospheric and stratospheric ozone concentrations or to the direct (sulfate) aerosol effect. Various diagnostics are used to evaluate the fit of the statistical model. For the global mean, this shows that the land temperature response is proportional to the global mean radiative forcing, reinforcing the use of radiative forcing as a measure of climate change. The diagnostic tests also show that the linear model was suitable for analyses of land surface air temperature at each GCM grid point. Therefore, the linear model provides precise estimates of the space time signals for all forcing factors under consideration. For simulated 50-hPa temperatures, results show that tropospheric ozone increases have contributed to stratospheric cooling over the twentieth century almost as much as changes in well-mixed greenhouse gases.
Resumo:
Commercial dodecylbenzene cable fluid was aged at temperatures of 105 and 135 degrees C in dry oxygen-free nitrogen. In addition, selected samples were aged at 135 degrees C under sealed conditions where air was excluded from the headspace above the oil. A variety of analytical techniques, such as ultra-violet visible and infra-red spectroscopy, acid number and water content measurements, were then used to characterize the aged oils. In addition, their electrical properties were assessed by dielectric spectroscopy. Compared with ageing in air, the ageing rate was reduced significantly and, as expected, no major oxidation peaks were detected in the infrared spectrometer. Significantly, very little absorbance at 680 nm ("red absorbers") was detected in samples aged with copper and, consequentially, no large increases in dielectric loss were recorded within the ageing times considered here. This study compliments previous investigations on cable fluid and 1-phenyldodecane aged in air and show that the same ageing indicators are valid in oils aged under conditions which more closely resemble those found in high voltage plant.
Resumo:
Stirred, pH controlled batch cultures were carried out with faecal inocula and various chitosans to investigate the fermentation of chitosan derivatives by the human gut flora. Changes in bacterial levels and short chain fatty acids were measured over time. Low, medium and high molecular weight chitosan caused a decrease in bacteroides, bifidobacteria, clostridia and lactobacilli. A similar pattern was seen with chitosan oligosaccharide (COS). Butyrate levels also decreased. A three-stage fermentation model of the human colon was used for investigation of the metabolism of COS. In a region representing the proximal colon, clostridia decreased while lactobacilli increased. In the region representing the transverse colon, bacteroides and clostridia increased. Distally a small increase in bacteroides occurred. Butyrate levels increased. Under the highly competitive conditions of the human colon, many members of the microflora, are unable to compete for chitosans of low, medium or high molecular weight. COS were more easily utilised and when added to an in vitro colonic model led to increased production of butyrate, but some populations of potentially detrimental bacteria also increased. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.