31 resultados para Semi-parametric models

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The spatial distribution of self-employment in India: evidence from semiparametric geoadditive models, Regional Studies. The entrepreneurship literature has rarely considered spatial location as a micro-determinant of occupational choice. It has also ignored self-employment in developing countries. Using Bayesian semiparametric geoadditive techniques, this paper models spatial location as a micro-determinant of self-employment choice in India. The empirical results suggest the presence of spatial occupational neighbourhoods and a clear north–south divide in self-employment when the entire sample is considered; however, spatial variation in the non-agriculture sector disappears to a large extent when individual factors that influence self-employment choice are explicitly controlled. The results further suggest non-linear effects of age, education and wealth on self-employment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged 1. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists. Methods: We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether a-posteriori two prognosis groups are separable on the evidence of the gene lists. A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset. Results: The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results. Conclusion: The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers. However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses. We conclude that many of the patients involved in such medical studies are intrinsically unclassifiable on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Practitioners assess performance of entities in increasingly large and complicated datasets. If non-parametric models, such as Data Envelopment Analysis, were ever considered as simple push-button technologies, this is impossible when many variables are available or when data have to be compiled from several sources. This paper introduces by the 'COOPER-framework' a comprehensive model for carrying out non-parametric projects. The framework consists of six interrelated phases: Concepts and objectives, On structuring data, Operational models, Performance comparison model, Evaluation, and Result and deployment. Each of the phases describes some necessary steps a researcher should examine for a well defined and repeatable analysis. The COOPER-framework provides for the novice analyst guidance, structure and advice for a sound non-parametric analysis. The more experienced analyst benefits from a check list such that important issues are not forgotten. In addition, by the use of a standardized framework non-parametric assessments will be more reliable, more repeatable, more manageable, faster and less costly. © 2010 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The generative topographic mapping (GTM) model was introduced by Bishop et al. (1998, Neural Comput. 10(1), 215-234) as a probabilistic re- formulation of the self-organizing map (SOM). It offers a number of advantages compared with the standard SOM, and has already been used in a variety of applications. In this paper we report on several extensions of the GTM, including an incremental version of the EM algorithm for estimating the model parameters, the use of local subspace models, extensions to mixed discrete and continuous data, semi-linear models which permit the use of high-dimensional manifolds whilst avoiding computational intractability, Bayesian inference applied to hyper-parameters, and an alternative framework for the GTM based on Gaussian processes. All of these developments directly exploit the probabilistic structure of the GTM, thereby allowing the underlying modelling assumptions to be made explicit. They also highlight the advantages of adopting a consistent probabilistic framework for the formulation of pattern recognition algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent years there has been an increased interest in applying non-parametric methods to real-world problems. Significant research has been devoted to Gaussian processes (GPs) due to their increased flexibility when compared with parametric models. These methods use Bayesian learning, which generally leads to analytically intractable posteriors. This thesis proposes a two-step solution to construct a probabilistic approximation to the posterior. In the first step we adapt the Bayesian online learning to GPs: the final approximation to the posterior is the result of propagating the first and second moments of intermediate posteriors obtained by combining a new example with the previous approximation. The propagation of em functional forms is solved by showing the existence of a parametrisation to posterior moments that uses combinations of the kernel function at the training points, transforming the Bayesian online learning of functions into a parametric formulation. The drawback is the prohibitive quadratic scaling of the number of parameters with the size of the data, making the method inapplicable to large datasets. The second step solves the problem of the exploding parameter size and makes GPs applicable to arbitrarily large datasets. The approximation is based on a measure of distance between two GPs, the KL-divergence between GPs. This second approximation is with a constrained GP in which only a small subset of the whole training dataset is used to represent the GP. This subset is called the em Basis Vector, or BV set and the resulting GP is a sparse approximation to the true posterior. As this sparsity is based on the KL-minimisation, it is probabilistic and independent of the way the posterior approximation from the first step is obtained. We combine the sparse approximation with an extension to the Bayesian online algorithm that allows multiple iterations for each input and thus approximating a batch solution. The resulting sparse learning algorithm is a generic one: for different problems we only change the likelihood. The algorithm is applied to a variety of problems and we examine its performance both on more classical regression and classification tasks and to the data-assimilation and a simple density estimation problems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents differences in firm-level total factor productivity (TFP) across 22 manufacturing and 17 service industries in Germany over the period 1995–2004. It is an attempt to study whether and to what extent foreign multinational enterprises (MNEs) are more productive relative to German firms. As well as distinguishing between foreign and domestic firms, we also distinguish between German MNEs and domestic firms that do not have any foreign presence. Controlling for endogeneity through semi-parametric techniques, our findings indicate considerable heterogeneity in firm performance across types of firms. The foreign/domestic distinction is not as clear cut as has been suggested elsewhere; multinationality is important in explaining productivity differences rather than foreignness.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Traditionally, geostatistical algorithms are contained within specialist GIS and spatial statistics software. Such packages are often expensive, with relatively complex user interfaces and steep learning curves, and cannot be easily integrated into more complex process chains. In contrast, Service Oriented Architectures (SOAs) promote interoperability and loose coupling within distributed systems, typically using XML (eXtensible Markup Language) and Web services. Web services provide a mechanism for a user to discover and consume a particular process, often as part of a larger process chain, with minimal knowledge of how it works. Wrapping current geostatistical algorithms with a Web service layer would thus increase their accessibility, but raises several complex issues. This paper discusses a solution to providing interoperable, automatic geostatistical processing through the use of Web services, developed in the INTAMAP project (INTeroperability and Automated MAPping). The project builds upon Open Geospatial Consortium standards for describing observations, typically used within sensor webs, and employs Geography Markup Language (GML) to describe the spatial aspect of the problem domain. Thus the interpolation service is extremely flexible, being able to support a range of observation types, and can cope with issues such as change of support and differing error characteristics of sensors (by utilising descriptions of the observation process provided by SensorML). XML is accepted as the de facto standard for describing Web services, due to its expressive capabilities which allow automatic discovery and consumption by ‘naive’ users. Any XML schema employed must therefore be capable of describing every aspect of a service and its processes. However, no schema currently exists that can define the complex uncertainties and modelling choices that are often present within geostatistical analysis. We show a solution to this problem, developing a family of XML schemata to enable the description of a full range of uncertainty types. These types will range from simple statistics, such as the kriging mean and variances, through to a range of probability distributions and non-parametric models, such as realisations from a conditional simulation. By employing these schemata within a Web Processing Service (WPS) we show a prototype moving towards a truly interoperable geostatistical software architecture.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For analysing financial time series two main opposing viewpoints exist, either capital markets are completely stochastic and therefore prices follow a random walk, or they are deterministic and consequently predictable. For each of these views a great variety of tools exist with which it can be tried to confirm the hypotheses. Unfortunately, these methods are not well suited for dealing with data characterised in part by both paradigms. This thesis investigates these two approaches in order to model the behaviour of financial time series. In the deterministic framework methods are used to characterise the dimensionality of embedded financial data. The stochastic approach includes here an estimation of the unconditioned and conditional return distributions using parametric, non- and semi-parametric density estimation techniques. Finally, it will be shown how elements from these two approaches could be combined to achieve a more realistic model for financial time series.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes a semiparametric smooth-coefficient stochastic production frontier model where all the coefficients are expressed as some unknown functions of environmental factors. The inefficiency term is multiplicatively decomposed into a scaling function of the environmental factors and a standard truncated normal random variable. A testing procedure is suggested for the relevance of the environmental factors. Monte Carlo study shows plausible ¯nite sample behavior of our proposed estimation and inference procedure. An empirical example is given, where both the semiparametric and standard parametric models are estimated and results are compared.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We examine the empirical evidence for an environmental Kuznets curve using a semiparametric smooth coefficient regression model that allows us to incorporate flexibility in the parameter estimates, while maintaining the basic econometric structure that is typically used to estimate the pollution-income relationship. This allows us to assess the sensitivity to parameter heterogeneity of typical parametric models used to estimate the relationship between pollution and income, as well as identify why the results from such models are seldom found to be robust. Our results confirm that the resulting relationship between pollution and income is fragile; we show that the estimated pollution-income relationship depends substantially on the heterogeneity of the slope coefficients and the parameter values at which the relationship is evaluated. Different sets of parameters obtained from the semiparametric model give rise to many different shapes for the pollution-income relationship that are commonly found in the literature.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper analyses the survival of the complete cohort of more than 162,000 limited companies incorporated in Britain in 2001 over the subsequent five-year period. For this purpose, we estimate firms' hazards of failure and survival functions using nonparametric and semi-parametric techniques. The paper focuses on two important policy-related issues.The first is to what extent survival rates vary across regions in Britain. A second, and related, policy issue concerns innovation. The data available allows us to look at the intellectual property (IP) activity of all British firms, including that of the 162,000 new firms in 2001. The results indicate substantial differences in survival rates across regions, and also that IP activity is associated with a higher probability of survival. These differences across regions, and the importance of IP activity, remain when we condition on a large range of regional, industry and firm-level characteristics shifting firms' hazards of failure.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A simplified (without phase modulator) scheme of a black box optical regenerator is proposed, where an appropriate nonlinear propagation is used to enhance regeneration. Applying semi-theoretical models the authors optimise and demonstrate feasibility of error-free long distance transmission at 40 Gbit/s.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteHGTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKabanpami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteHGTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Most parametric software cost estimation models used today evolved in the late 70's and early 80's. At that time, the dominant software development techniques being used were the early 'structured methods'. Since then, several new systems development paradigms and methods have emerged, one being Jackson Systems Development (JSD). As current cost estimating methods do not take account of these developments, their non-universality means they cannot provide adequate estimates of effort and hence cost. In order to address these shortcomings two new estimation methods have been developed for JSD projects. One of these methods JSD-FPA, is a top-down estimating method, based on the existing MKII function point method. The other method, JSD-COCOMO, is a sizing technique which sizes a project, in terms of lines of code, from the process structure diagrams and thus provides an input to the traditional COCOMO method.The JSD-FPA method allows JSD projects in both the real-time and scientific application areas to be costed, as well as the commercial information systems applications to which FPA is usually applied. The method is based upon a three-dimensional view of a system specification as opposed to the largely data-oriented view traditionally used by FPA. The method uses counts of various attributes of a JSD specification to develop a metric which provides an indication of the size of the system to be developed. This size metric is then transformed into an estimate of effort by calculating past project productivity and utilising this figure to predict the effort and hence cost of a future project. The effort estimates produced were validated by comparing them against the effort figures for six actual projects.The JSD-COCOMO method uses counts of the levels in a process structure chart as the input to an empirically derived model which transforms them into an estimate of delivered source code instructions.