919 resultados para High-dimensional


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Major in Competition and Regulation

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In Portugal, about 20% of full-time workers are employed under a fixed-term contract. Using a rich longitudinal matched employer-employee dataset for Portugal, with more than 20 million observations and covering the 2002-2012 period, we confirm the common idea that fixed-term contracts are not desirable when compared to permanent ones, by estimating a conditional wage gap of -1.7 log points. Then, we evaluate the sources of that wage penalty by combining a three way high-dimensional fixed effects model with the decomposition of Gelbach (2014), in which the three dimensions considered are the worker’s unobserved ability, the firm’s compensation wage policy and the job title effect. It is shown that the average worker with a fixed-term contract is less productive than his/her permanent counterparts, explaining -3.92 log points of the FTC wage penalty. Additionally, the sorting of workers into lower-paid job titles is also responsible for -0.59 log points of the wage gap. Surprisingly, we found that the allocation of workers among firms mitigates the existing wage penalty (in 4.23 log points), as fixed-term workers are concentrated into firms with a more generous compensation policy. Finally, following Figueiredo et al. (2014), we further control for the worker-firm match characteristics and reach the conclusion that fixed-term employment relationships have an overrepresentation of low quality worker-firm matches, explaining 0.65 log points of the FTC wage penalty.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We develop methods for Bayesian inference in vector error correction models which are subject to a variety of switches in regime (e.g. Markov switches in regime or structural breaks). An important aspect of our approach is that we allow both the cointegrating vectors and the number of cointegrating relationships to change when the regime changes. We show how Bayesian model averaging or model selection methods can be used to deal with the high-dimensional model space that results. Our methods are used in an empirical study of the Fisher effect.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We develop methods for Bayesian inference in vector error correction models which are subject to a variety of switches in regime (e.g. Markov switches in regime or structural breaks). An important aspect of our approach is that we allow both the cointegrating vectors and the number of cointegrating relationships to change when the regime changes. We show how Bayesian model averaging or model selection methods can be used to deal with the high-dimensional model space that results. Our methods are used in an empirical study of the Fisher e ffect.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Des progrès significatifs ont été réalisés dans le domaine de l'intégration quantitative des données géophysique et hydrologique l'échelle locale. Cependant, l'extension à de plus grandes échelles des approches correspondantes constitue encore un défi majeur. Il est néanmoins extrêmement important de relever ce défi pour développer des modèles fiables de flux des eaux souterraines et de transport de contaminant. Pour résoudre ce problème, j'ai développé une technique d'intégration des données hydrogéophysiques basée sur une procédure bayésienne de simulation séquentielle en deux étapes. Cette procédure vise des problèmes à plus grande échelle. L'objectif est de simuler la distribution d'un paramètre hydraulique cible à partir, d'une part, de mesures d'un paramètre géophysique pertinent qui couvrent l'espace de manière exhaustive, mais avec une faible résolution (spatiale) et, d'autre part, de mesures locales de très haute résolution des mêmes paramètres géophysique et hydraulique. Pour cela, mon algorithme lie dans un premier temps les données géophysiques de faible et de haute résolution à travers une procédure de réduction déchelle. Les données géophysiques régionales réduites sont ensuite reliées au champ du paramètre hydraulique à haute résolution. J'illustre d'abord l'application de cette nouvelle approche dintégration des données à une base de données synthétiques réaliste. Celle-ci est constituée de mesures de conductivité hydraulique et électrique de haute résolution réalisées dans les mêmes forages ainsi que destimations des conductivités électriques obtenues à partir de mesures de tomographic de résistivité électrique (ERT) sur l'ensemble de l'espace. Ces dernières mesures ont une faible résolution spatiale. La viabilité globale de cette méthode est testée en effectuant les simulations de flux et de transport au travers du modèle original du champ de conductivité hydraulique ainsi que du modèle simulé. Les simulations sont alors comparées. Les résultats obtenus indiquent que la procédure dintégration des données proposée permet d'obtenir des estimations de la conductivité en adéquation avec la structure à grande échelle ainsi que des predictions fiables des caractéristiques de transports sur des distances de moyenne à grande échelle. Les résultats correspondant au scénario de terrain indiquent que l'approche d'intégration des données nouvellement mise au point est capable d'appréhender correctement les hétérogénéitées à petite échelle aussi bien que les tendances à gande échelle du champ hydraulique prévalent. Les résultats montrent également une flexibilté remarquable et une robustesse de cette nouvelle approche dintégration des données. De ce fait, elle est susceptible d'être appliquée à un large éventail de données géophysiques et hydrologiques, à toutes les gammes déchelles. Dans la deuxième partie de ma thèse, j'évalue en détail la viabilité du réechantillonnage geostatique séquentiel comme mécanisme de proposition pour les méthodes Markov Chain Monte Carlo (MCMC) appliquées à des probmes inverses géophysiques et hydrologiques de grande dimension . L'objectif est de permettre une quantification plus précise et plus réaliste des incertitudes associées aux modèles obtenus. En considérant une série dexemples de tomographic radar puits à puits, j'étudie deux classes de stratégies de rééchantillonnage spatial en considérant leur habilité à générer efficacement et précisément des réalisations de la distribution postérieure bayésienne. Les résultats obtenus montrent que, malgré sa popularité, le réechantillonnage séquentiel est plutôt inefficace à générer des échantillons postérieurs indépendants pour des études de cas synthétiques réalistes, notamment pour le cas assez communs et importants où il existe de fortes corrélations spatiales entre le modèle et les paramètres. Pour résoudre ce problème, j'ai développé un nouvelle approche de perturbation basée sur une déformation progressive. Cette approche est flexible en ce qui concerne le nombre de paramètres du modèle et lintensité de la perturbation. Par rapport au rééchantillonage séquentiel, cette nouvelle approche s'avère être très efficace pour diminuer le nombre requis d'itérations pour générer des échantillons indépendants à partir de la distribution postérieure bayésienne. - Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending corresponding approaches beyond the local scale still represents a major challenge, yet is critically important for the development of reliable groundwater flow and contaminant transport models. To address this issue, I have developed a hydrogeophysical data integration technique based on a two-step Bayesian sequential simulation procedure that is specifically targeted towards larger-scale problems. The objective is to simulate the distribution of a target hydraulic parameter based on spatially exhaustive, but poorly resolved, measurements of a pertinent geophysical parameter and locally highly resolved, but spatially sparse, measurements of the considered geophysical and hydraulic parameters. To this end, my algorithm links the low- and high-resolution geophysical data via a downscaling procedure before relating the downscaled regional-scale geophysical data to the high-resolution hydraulic parameter field. I first illustrate the application of this novel data integration approach to a realistic synthetic database consisting of collocated high-resolution borehole measurements of the hydraulic and electrical conductivities and spatially exhaustive, low-resolution electrical conductivity estimates obtained from electrical resistivity tomography (ERT). The overall viability of this method is tested and verified by performing and comparing flow and transport simulations through the original and simulated hydraulic conductivity fields. The corresponding results indicate that the proposed data integration procedure does indeed allow for obtaining faithful estimates of the larger-scale hydraulic conductivity structure and reliable predictions of the transport characteristics over medium- to regional-scale distances. The approach is then applied to a corresponding field scenario consisting of collocated high- resolution measurements of the electrical conductivity, as measured using a cone penetrometer testing (CPT) system, and the hydraulic conductivity, as estimated from electromagnetic flowmeter and slug test measurements, in combination with spatially exhaustive low-resolution electrical conductivity estimates obtained from surface-based electrical resistivity tomography (ERT). The corresponding results indicate that the newly developed data integration approach is indeed capable of adequately capturing both the small-scale heterogeneity as well as the larger-scale trend of the prevailing hydraulic conductivity field. The results also indicate that this novel data integration approach is remarkably flexible and robust and hence can be expected to be applicable to a wide range of geophysical and hydrological data at all scale ranges. In the second part of my thesis, I evaluate in detail the viability of sequential geostatistical resampling as a proposal mechanism for Markov Chain Monte Carlo (MCMC) methods applied to high-dimensional geophysical and hydrological inverse problems in order to allow for a more accurate and realistic quantification of the uncertainty associated with the thus inferred models. Focusing on a series of pertinent crosshole georadar tomographic examples, I investigated two classes of geostatistical resampling strategies with regard to their ability to efficiently and accurately generate independent realizations from the Bayesian posterior distribution. The corresponding results indicate that, despite its popularity, sequential resampling is rather inefficient at drawing independent posterior samples for realistic synthetic case studies, notably for the practically common and important scenario of pronounced spatial correlation between model parameters. To address this issue, I have developed a new gradual-deformation-based perturbation approach, which is flexible with regard to the number of model parameters as well as the perturbation strength. Compared to sequential resampling, this newly proposed approach was proven to be highly effective in decreasing the number of iterations required for drawing independent samples from the Bayesian posterior distribution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present a method for segmenting white matter tracts from high angular resolution diffusion MR. images by representing the data in a 5 dimensional space of position and orientation. Whereas crossing fiber tracts cannot be separated in 3D position space, they clearly disentangle in 5D position-orientation space. The segmentation is done using a 5D level set method applied to hyper-surfaces evolving in 5D position-orientation space. In this paper we present a methodology for constructing the position-orientation space. We then show how to implement the standard level set method in such a non-Euclidean high dimensional space. The level set theory is basically defined for N-dimensions but there are several practical implementation details to consider, such as mean curvature. Finally, we will show results from a synthetic model and a few preliminary results on real data of a human brain acquired by high angular resolution diffusion MRI.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Self-organizing maps (Kohonen 1997) is a type of artificial neural network developedto explore patterns in high-dimensional multivariate data. The conventional versionof the algorithm involves the use of Euclidean metric in the process of adaptation ofthe model vectors, thus rendering in theory a whole methodology incompatible withnon-Euclidean geometries.In this contribution we explore the two main aspects of the problem:1. Whether the conventional approach using Euclidean metric can shed valid resultswith compositional data.2. If a modification of the conventional approach replacing vectorial sum and scalarmultiplication by the canonical operators in the simplex (i.e. perturbation andpowering) can converge to an adequate solution.Preliminary tests showed that both methodologies can be used on compositional data.However, the modified version of the algorithm performs poorer than the conventionalversion, in particular, when the data is pathological. Moreover, the conventional ap-proach converges faster to a solution, when data is \well-behaved".Key words: Self Organizing Map; Artificial Neural networks; Compositional data

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper proposes an approach aimed at detecting optimal model parameter combinations to achieve the most representative description of uncertainty in the model performance. A classification problem is posed to find the regions of good fitting models according to the values of a cost function. Support Vector Machine (SVM) classification in the parameter space is applied to decide if a forward model simulation is to be computed for a particular generated model. SVM is particularly designed to tackle classification problems in high-dimensional space in a non-parametric and non-linear way. SVM decision boundaries determine the regions that are subject to the largest uncertainty in the cost function classification, and, therefore, provide guidelines for further iterative exploration of the model space. The proposed approach is illustrated by a synthetic example of fluid flow through porous media, which features highly variable response due to the parameter values' combination.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Graphical displays which show inter--sample distances are importantfor the interpretation and presentation of multivariate data. Except whenthe displays are two--dimensional, however, they are often difficult tovisualize as a whole. A device, based on multidimensional unfolding, isdescribed for presenting some intrinsically high--dimensional displays infewer, usually two, dimensions. This goal is achieved by representing eachsample by a pair of points, say $R_i$ and $r_i$, so that a theoreticaldistance between the $i$-th and $j$-th samples is represented twice, onceby the distance between $R_i$ and $r_j$ and once by the distance between$R_j$ and $r_i$. Self--distances between $R_i$ and $r_i$ need not be zero.The mathematical conditions for unfolding to exhibit symmetry are established.Algorithms for finding approximate fits, not constrained to be symmetric,are discussed and some examples are given.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper proposes a numerical solution method for general equilibrium models with a continuum of heterogeneous agents, which combines elements of projection and of perturbation methods. The basic idea is to solve first for the stationary solutionof the model, without aggregate shocks but with fully specified idiosyncratic shocks. Afterwards one computes a first-order perturbation of the solution in the aggregate shocks. This approach allows to include a high-dimensional representation of the cross-sectional distribution in the state vector. The method is applied to a model of household saving with uninsurable income risk and liquidity constraints. The model includes not only productivity shocks, but also shocks to redistributive taxation, which cause substantial short-run variation in the cross-sectional distribution of wealth. If those shocks are operative, it is shown that a solution method based on very few statistics of the distribution is not suitable, while the proposed method can solve the model with high accuracy, at least for the case of small aggregate shocks. Techniques are discussed to reduce the dimension of the state space such that higher order perturbations are feasible.Matlab programs to solve the model can be downloaded.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work proposes novel network analysis techniques for multivariate time series.We define the network of a multivariate time series as a graph where verticesdenote the components of the process and edges denote non zero long run partialcorrelations. We then introduce a two step LASSO procedure, called NETS, toestimate high dimensional sparse Long Run Partial Correlation networks. This approachis based on a VAR approximation of the process and allows to decomposethe long run linkages into the contribution of the dynamic and contemporaneousdependence relations of the system. The large sample properties of the estimatorare analysed and we establish conditions for consistent selection and estimation ofthe non zero long run partial correlations. The methodology is illustrated with anapplication to a panel of U.S. bluechips.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.