954 resultados para Sample selection
Resumo:
Suggests an alternative and computationally simpler approach of non-random sampling of labour economics and represents an observed outcome of an individual female′s choice of whether or not to participate in the labour market. Concludes that there is an alternative to the Heckman two-step estimator.
Resumo:
Consider a general regression model with an arbitrary and unknown link function and a stochastic selection variable that determines whether the outcome variable is observable or missing. The paper proposes U-statistics that are based on kernel functions as estimators for the directions of the parameter vectors in the link function and the selection equation, and shows that these estimators are consistent and asymptotically normal.
Resumo:
This paper extends previous analyses of the choice between internal and external R&D to consider the costs of internal R&D. The Heckman two-stage estimator is used to estimate the determinants of internal R&D unit cost (i.e. cost per product innovation) allowing for sample selection effects. Theory indicates that R&D unit cost will be influenced by scale issues and by the technological opportunities faced by the firm. Transaction costs encountered in research activities are allowed for and, in addition, consideration is given to issues of market structure which influence the choice of R&D mode without affecting the unit cost of internal or external R&D. The model is tested on data from a sample of over 500 UK manufacturing plants which have engaged in product innovation. The key determinants of R&D mode are the scale of plant and R&D input, and market structure conditions. In terms of the R&D cost equation, scale factors are again important and have a non-linear relationship with R&D unit cost. Specificities in physical and human capital also affect unit cost, but have no clear impact on the choice of R&D mode. There is no evidence of technological opportunity affecting either R&D cost or the internal/external decision.
Resumo:
This paper utilizes the Survey of Work History (1981) data to examine the importance of non-random sampling in the context of a model of interfirm labour mobility. The paper adopts Heckman's two-step procedure in order to estimate a three-equation model incorporating an individual's mobility status as endogenously determined. The main conclusion is that in estimating wage equations it is important to consider the role of job mobility and to correct for the effects of sample-selection bias. The results generally accord with those reported by Osberg et al. (1986) in the only previous Canadian study of job mobility in a sample-selection context.
Resumo:
Immigration has played an important role in the historical development of Australia. Thus, it is no surprise that a large body of empirical work has developed, which focuses upon how migrants fare in the land of opportunity. Much of the literature is comparatively recent, i.e. the last ten years or so, encouraged by the advent of public availability of Australian crosssection micro data. Several different aspects of migrant welfare have been addressed, with major emphasis being placed upon earnings and unemployment experience. For recent examples see Haig (1980), Stromback (1984), Chiswick and Miller (1985), Tran-Nam and Nevile (1988) and Beggs and Chapman (1988). The present paper contributes to the literature by providing additional empirical evidence on the native/migrant earnings differential. The data utilised are from the rather neglected Australian Bureau of Statistics, ABS Special Supplementary Survey No.4. 1982, otherwise known as the Family Survey. The paper also examines the importance of distinguishing between the wage and salary sector and the self-employment sector when discussing native/migrant differentials. Separate earnings equations for the two labour market groups are estimated and the native/migrant earnings differential is broken down by employment status. This is a novel application in the Australian context and provides some insight into the earnings of the selfemployed, a group that despite its size (around 20 per cent of the labour force) is frequently ignored by economic research. Most previous empirical research fails to examine the effect of employment status on earnings. Stromback (1984) includes a dummy variable representing self-employment status in an earnings equation estimated over a pooled sample of paid and self-employed workers. The variable is found to be highly significant, which leads Stromback to question the efficacy of including the self-employed in the estimation sample. The suggestion is that part of self-employed earnings represent a return to non-human capital investment, i.e. investments in machinery, buildings etc, the structural determinants of earnings differ significantly from those for paid employees. Tran-Nam and Nevile (1988) deal with differences between paid employees and the selfemployed by deleting the latter from their sample. However, deleting the self-employed from the estimation sample may lead to bias in the OLS estimation method (see Heckman 1979). The desirable properties of OLS are dependent upon estimation on a random sample. Thus, the 'Ran-Nam and Nevile results are likely to suffer from bias unless individuals are randomly allocated between self-employment and paid employment. The current analysis extends Tran-Nam and Nevile (1988) by explicitly treating the choice of paid employment versus self-employment as being endogenously determined. This allows an explicit test for the appropriateness of deleting self-employed workers from the sample. Earnings equations that are corrected for sample selection are estimated for both natives and migrants in the paid employee sector. The Heckman (1979) two-step estimator is employed. The paper is divided into five major sections. The next section presents the econometric model incorporating the specification of the earnings generating process together with an explicit model determining an individual's employment status. In Section 111 the data are described. Section IV draws together the main econometric results of the paper. First, the probit estimates of the labour market status equation are documented. This is followed by presentation and discussion of the Heckman two-stage estimates of the earnings specification for both native and migrant Australians. Separate earnings equations are estimated for paid employees and the self-employed. Section V documents estimates of the nativelmigrant earnings differential for both categories of employees. To aid comparison with earlier work, the Oaxaca decomposition of the earnings differential for paid-employees is carried out for both the simple OLS regression results as well as the parameter estimates corrected for sample selection effects. These differentials are interpreted and compared with previous Australian findings. A short section concludes the paper.
Resumo:
The question of whether or not there exists a meaningful economic distinction between quits and layoffs has attracted considerable attention. This paper utilizes a recent test proposed by J. S. Cramer and G. Ridder (1991) to test formally whether quits and layoffs may legitimately be aggregated into a single undifferentiated job-mover category. The paper also estimates wage equations for job stayers, quits, and layoffs, corrected for the endogeneity of job mobility. The major results are that quits and lay-off cannot legitimately be pooled and correction for sample selection would appear to be important.
Resumo:
This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that 1) amongst existing query strategies, the ones based on the classification model’s confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and 2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.
Resumo:
The robustness of multivariate calibration models, based on near infrared spectroscopy, for the assessment of total soluble solids (TSS) and dry matter (DM) of intact mandarin fruit (Citrus reticulata cv. Imperial) was assessed. TSS calibration model performance was validated in terms of prediction of populations of fruit not in the original population (different harvest days from a single tree, different harvest localities, different harvest seasons). Of these, calibration performance was most affected by validation across seasons (signal to noise statistic on root mean squared error of prediction of 3.8, compared with 20 and 13 for locality and harvest day, respectively). Procedures for sample selection from the validation population for addition to the calibration population (‘model updating’) were considered for both TSS and DM models. Random selection from the validation group worked as well as more sophisticated selection procedures, with approximately 20 samples required. Models that were developed using samples at a range of temperatures were robust in validation for TSS and DM.
Resumo:
In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions be- come similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that prob- ability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this man- ifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspace-based method for unsupervised domain adaptation, and show that both these manifold-based techniques outperform the cor- responding approaches based on the MMD. Furthermore, we show that our subspace-based approach yields state-of- the-art results on a standard object recognition benchmark.
Resumo:
Background Despite decades of research, bullying in all its forms is still a significant problem within schools in Australia, as it is internationally. Anti-bullying policies and guidelines are thought to be one strategy as part of a whole school approach to reduce bullying. However, although Australian schools are required to have these policies, their effectiveness is not clear. As policies and guidelines about bullying and cyberbullying are developed within education departments, this paper explores the perspectives of those who are involved in their construction. Purpose This study examined the perspectives of professionals involved in policy construction, across three different Australian states. The aim was to determine how their relative jurisdictions define bullying and cyberbullying, the processes for developing policy, the bullying prevention and intervention recommendations given to schools and the content considered essential in current policies. Sample Eleven key stakeholders from three Australian states with similar education systems were invited to participate. The sample selection criteria included professionals with experience and training in education, cyber-safety and the responsibility to contribute to or make decisions which inform policy in this area for schools in their state. Design and methods Participants were interviewed about the definitions of bullying they used in their state policy frameworks; the extent to which cyberbullying was included; and the content they considered essential for schools to include in anti-bullying policies. Data were collected through in-depth, semi-structured interviews and analysed thematically. Findings Seven themes were identified in the data: - (1) Definition of bullying and cyberbullying; - (2) Existence of a policy template; - (3) Policy location; - (4) Adding cyberbullying; - (5) Distinguishing between bullying and cyberbullying; - (6) Effective policy, and; - (7) Policy as a prevention or intervention tool. The results were similar both across state boundaries and also across different disciplines. Conclusion Analysis of the data suggested that, across the themes, there was some lack of information about bullying and cyberbullying. This limitation could affect the subsequent development, dissemination and sustainability of school anti-bullying policies, which have implications for the translation of research to inform better student outcomes.
Resumo:
Modern sample surveys started to spread after statistician at the U.S. Bureau of the Census in the 1940s had developed a sampling design for the Current Population Survey (CPS). A significant factor was also that digital computers became available for statisticians. In the beginning of 1950s, the theory was documented in textbooks on survey sampling. This thesis is about the development of the statistical inference for sample surveys. For the first time the idea of statistical inference was enunciated by a French scientist, P. S. Laplace. In 1781, he published a plan for a partial investigation in which he determined the sample size needed to reach the desired accuracy in estimation. The plan was based on Laplace s Principle of Inverse Probability and on his derivation of the Central Limit Theorem. They were published in a memoir in 1774 which is one of the origins of statistical inference. Laplace s inference model was based on Bernoulli trials and binominal probabilities. He assumed that populations were changing constantly. It was depicted by assuming a priori distributions for parameters. Laplace s inference model dominated statistical thinking for a century. Sample selection in Laplace s investigations was purposive. In 1894 in the International Statistical Institute meeting, Norwegian Anders Kiaer presented the idea of the Representative Method to draw samples. Its idea was that the sample would be a miniature of the population. It is still prevailing. The virtues of random sampling were known but practical problems of sample selection and data collection hindered its use. Arhtur Bowley realized the potentials of Kiaer s method and in the beginning of the 20th century carried out several surveys in the UK. He also developed the theory of statistical inference for finite populations. It was based on Laplace s inference model. R. A. Fisher contributions in the 1920 s constitute a watershed in the statistical science He revolutionized the theory of statistics. In addition, he introduced a new statistical inference model which is still the prevailing paradigm. The essential idea is to draw repeatedly samples from the same population and the assumption that population parameters are constants. Fisher s theory did not include a priori probabilities. Jerzy Neyman adopted Fisher s inference model and applied it to finite populations with the difference that Neyman s inference model does not include any assumptions of the distributions of the study variables. Applying Fisher s fiducial argument he developed the theory for confidence intervals. Neyman s last contribution to survey sampling presented a theory for double sampling. This gave the central idea for statisticians at the U.S. Census Bureau to develop the complex survey design for the CPS. Important criterion was to have a method in which the costs of data collection were acceptable, and which provided approximately equal interviewer workloads, besides sufficient accuracy in estimation.
Resumo:
The objectives of this study were to make a detailed and systematic empirical analysis of microfinance borrowers and non-borrowers in Bangladesh and also examine how efficiency measures are influenced by the access to agricultural microfinance. In the empirical analysis, this study used both parametric and non-parametric frontier approaches to investigate differences in efficiency estimates between microfinance borrowers and non-borrowers. This thesis, based on five articles, applied data obtained from a survey of 360 farm households from north-central and north-western regions in Bangladesh. The methods used in this investigation involve stochastic frontier (SFA) and data envelopment analysis (DEA) in addition to sample selectivity and limited dependent variable models. In article I, technical efficiency (TE) estimation and identification of its determinants were performed by applying an extended Cobb-Douglas stochastic frontier production function. The results show that farm households had a mean TE of 83% with lower TE scores for the non-borrowers of agricultural microfinance. Addressing institutional policies regarding the consolidation of individual plots into farm units, ensuring access to microfinance, extension education for the farmers with longer farming experience are suggested to improve the TE of the farmers. In article II, the objective was to assess the effects of access to microfinance on household production and cost efficiency (CE) and to determine the efficiency differences between the microfinance participating and non-participating farms. In addition, a non-discretionary DEA model was applied to capture directly the influence of microfinance on farm households production and CE. The results suggested that under both pooled DEA models and non-discretionary DEA models, farmers with access to microfinance were significantly more efficient than their non-borrowing counterparts. Results also revealed that land fragmentation, family size, household wealth, on farm-training and off farm income share are the main determinants of inefficiency after effectively correcting for sample selection bias. In article III, the TE of traditional variety (TV) and high-yielding-variety (HYV) rice producers were estimated in addition to investigating the determinants of adoption rate of HYV rice. Furthermore, the role of TE as a potential determinant to explain the differences of adoption rate of HYV rice among the farmers was assessed. The results indicated that in spite of its much higher yield potential, HYV rice production was associated with lower TE and had a greater variability in yield. It was also found that TE had a significant positive influence on the adoption rates of HYV rice. In article IV, we estimated profit efficiency (PE) and profit-loss between microfinance borrowers and non-borrowers by a sample selection framework, which provided a general framework for testing and taking into account the sample selection in the stochastic (profit) frontier function analysis. After effectively correcting for selectivity bias, the mean PE of the microfinance borrowers and non-borrowers were estimated at 68% and 52% respectively. This suggested that a considerable share of profits were lost due to profit inefficiencies in rice production. The results also demonstrated that access to microfinance contributes significantly to increasing PE and reducing profit-loss per hectare land. In article V, the effects of credit constraints on TE, allocative efficiency (AE) and CE were assessed while adequately controlling for sample selection bias. The confidence intervals were determined by the bootstrap method for both samples. The results indicated that differences in average efficiency scores of credit constrained and unconstrained farms were not statistically significant although the average efficiencies tended to be higher in the group of unconstrained farms. After effectively correcting for selectivity bias, household experience, number of dependents, off-farm income, farm size, access to on farm training and yearly savings were found to be the main determinants of inefficiencies. In general, the results of the study revealed the existence substantial technical, allocative, economic inefficiencies and also considerable profit inefficiencies. The results of the study suggested the need to streamline agricultural microfinance by the microfinance institutions (MFIs), donor agencies and government at all tiers. Moreover, formulating policies that ensure greater access to agricultural microfinance to the smallholder farmers on a sustainable basis in the study areas to enhance productivity and efficiency has been recommended. Key Words: Technical, allocative, economic efficiency, DEA, Non-discretionary DEA, selection bias, bootstrapping, microfinance, Bangladesh.
Resumo:
This paper focuses on studying the relationship between patent latent variables and patent price. From the existing literature, seven patent latent variables, namely age, generality, originality, foreign filings, technology field, forward citations, and backward citations were identified as having an influence on patent value. We used Ocean Tomo's patent auction price data in this study. We transformed the price and the predictor variables (excluding the dummy variables) to its logarithmic value. The OLS estimates revealed that forward citations and foreign filings were positively correlated to price. Both the variables jointly explained 14.79% of the variance in patent pricing. We did not find sufficient evidence to come up with any definite conclusions on the relationship between price and the variables such as age, technology field, generality, backward citations and originality. The Heckman two-stage sample selection model was used to test for selection bias. (C) 2011 Elsevier Ltd. All rights reserved.