73 resultados para Error estimate.
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical {\sc vc} dimension, empirical {\sc vc} entropy, andmargin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.
Resumo:
Several methods have been suggested to estimate non-linear models with interaction terms in the presence of measurement error. Structural equation models eliminate measurement error bias, but require large samples. Ordinary least squares regression on summated scales, regression on factor scores and partial least squares are appropriate for small samples but do not correct measurement error bias. Two stage least squares regression does correct measurement error bias but the results strongly depend on the instrumental variable choice. This article discusses the old disattenuated regression method as an alternative for correcting measurement error in small samples. The method is extended to the case of interaction terms and is illustrated on a model that examines the interaction effect of innovation and style of use of budgets on business performance. Alternative reliability estimates that can be used to disattenuate the estimates are discussed. A comparison is made with the alternative methods. Methods that do not correct for measurement error bias perform very similarly and considerably worse than disattenuated regression
Resumo:
Selected configuration interaction (SCI) for atomic and molecular electronic structure calculations is reformulated in a general framework encompassing all CI methods. The linked cluster expansion is used as an intermediate device to approximate CI coefficients BK of disconnected configurations (those that can be expressed as products of combinations of singly and doubly excited ones) in terms of CI coefficients of lower-excited configurations where each K is a linear combination of configuration-state-functions (CSFs) over all degenerate elements of K. Disconnected configurations up to sextuply excited ones are selected by Brown's energy formula, ΔEK=(E-HKK)BK2/(1-BK2), with BK determined from coefficients of singly and doubly excited configurations. The truncation energy error from disconnected configurations, Δdis, is approximated by the sum of ΔEKS of all discarded Ks. The remaining (connected) configurations are selected by thresholds based on natural orbital concepts. Given a model CI space M, a usual upper bound ES is computed by CI in a selected space S, and EM=E S+ΔEdis+δE, where δE is a residual error which can be calculated by well-defined sensitivity analyses. An SCI calculation on Ne ground state featuring 1077 orbitals is presented. Convergence to within near spectroscopic accuracy (0.5 cm-1) is achieved in a model space M of 1.4× 109 CSFs (1.1 × 1012 determinants) containing up to quadruply excited CSFs. Accurate energy contributions of quintuples and sextuples in a model space of 6.5 × 1012 CSFs are obtained. The impact of SCI on various orbital methods is discussed. Since ΔEdis can readily be calculated for very large basis sets without the need of a CI calculation, it can be used to estimate the orbital basis incompleteness error. A method for precise and efficient evaluation of ES is taken up in a companion paper
Resumo:
Based on an behavioral equilibrium exchange rate model, this paper examines the determinants of the real effective exchange rate and evaluates the degree of misalignment of a group of currencies since 1980. Within a panel cointegration setting, we estimate the relationship between exchange rate and a set of economic fundamentals, such as traded-nontraded productivity differentials and the stock of foreign assets. Having ascertained the variables are integrated and cointegrated, the long-run equilibrium value of the fundamentals are estimated and used to derive equilibrium exchange rates and misalignments. Although there is statistical homogeneity, some structural differences were found to exist between advanced and emerging economies.
Resumo:
Report for the scientific sojourn carried out at the University of California at Berkeley, from September to December 2007. Environmental niche modelling (ENM) techniques are powerful tools to predict species potential distributions. In the last ten years, a plethora of novel methodological approaches and modelling techniques have been developed. During three months, I stayed at the University of California, Berkeley, working under the supervision of Dr. David R. Vieites. The aim of our work was to quantify the error committed by these techniques, but also to test how an increase in the sample size affects the resultant predictions. Using MaxEnt software we generated distribution predictive maps, from different sample sizes, of the Eurasian quail (Coturnix coturnix) in the Iberian Peninsula. The quail is a generalist species from a climatic point of view, but an habitat specialist. The resultant distribution maps were compared with the real distribution of the species. This distribution was obtained from recent bird atlases from Spain and Portugal. Results show that ENM techniques can have important errors when predicting the species distribution of generalist species. Moreover, an increase of sample size is not necessary related with a better performance of the models. We conclude that a deep knowledge of the species’ biology and the variables affecting their distribution is crucial for an optimal modelling. The lack of this knowledge can induce to wrong conclusions.
Resumo:
Interaction effects are usually modeled by means of moderated regression analysis. Structural equation models with non-linear constraints make it possible to estimate interaction effects while correcting formeasurement error. From the various specifications, Jöreskog and Yang's(1996, 1998), likely the most parsimonious, has been chosen and further simplified. Up to now, only direct effects have been specified, thus wasting much of the capability of the structural equation approach. This paper presents and discusses an extension of Jöreskog and Yang's specification that can handle direct, indirect and interaction effects simultaneously. The model is illustrated by a study of the effects of an interactive style of use of budgets on both company innovation and performance
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models
Resumo:
In the accounting literature, interaction or moderating effects are usually assessed by means of OLS regression and summated rating scales are constructed to reduce measurement error bias. Structural equation models and two-stage least squares regression could be used to completely eliminate this bias, but large samples are needed. Partial Least Squares are appropriate for small samples but do not correct measurement error bias. In this article, disattenuated regression is discussed as a small sample alternative and is illustrated on data of Bisbe and Otley (in press) that examine the interaction effect of innovation and style of use of budgets on performance. Sizeable differences emerge between OLS and disattenuated regression
Resumo:
The author studies the error and complexity of the discrete random walk Monte Carlo technique for radiosity, using both the shooting and gathering methods. The author shows that the shooting method exhibits a lower complexity than the gathering one, and under some constraints, it has a linear complexity. This is an improvement over a previous result that pointed to an O(n log n) complexity. The author gives and compares three unbiased estimators for each method, and obtains closed forms and bounds for their variances. The author also bounds the expected value of the mean square error (MSE). Some of the results obtained are also shown
Resumo:
Møller-Plesset (MP2) and Becke-3-Lee-Yang-Parr (B3LYP) calculations have been used to compare the geometrical parameters, hydrogen-bonding properties, vibrational frequencies and relative energies for several X- and X+ hydrogen peroxide complexes. The geometries and interaction energies were corrected for the basis set superposition error (BSSE) in all the complexes (1-5), using the full counterpoise method, yielding small BSSE values for the 6-311 + G(3df,2p) basis set used. The interaction energies calculated ranged from medium to strong hydrogen-bonding systems (1-3) and strong electrostatic interactions (4 and 5). The molecular interactions have been characterized using the atoms in molecules theory (AIM), and by the analysis of the vibrational frequencies. The minima on the BSSE-counterpoise corrected potential-energy surface (PES) have been determined as described by S. Simón, M. Duran, and J. J. Dannenberg, and the results were compared with the uncorrected PES
Resumo:
A comparision of the local effects of the basis set superposition error (BSSE) on the electron densities and energy components of three representative H-bonded complexes was carried out. The electron densities were obtained with Hartee-Fock and density functional theory versions of the chemical Hamiltonian approach (CHA) methodology. It was shown that the effects of the BSSE were common for all complexes studied. The electron density difference maps and the chemical energy component analysis (CECA) analysis confirmed that the local effects of the BSSE were different when diffuse functions were present in the calculations
Resumo:
The effect of basis set superposition error (BSSE) on molecular complexes is analyzed. The BSSE causes artificial delocalizations which modify the first order electron density. The mechanism of this effect is assessed for the hydrogen fluoride dimer with several basis sets. The BSSE-corrected first-order electron density is obtained using the chemical Hamiltonian approach versions of the Roothaan and Kohn-Sham equations. The corrected densities are compared to uncorrected densities based on the charge density critical points. Contour difference maps between BSSE-corrected and uncorrected densities on the molecular plane are also plotted to gain insight into the effects of BSSE correction on the electron density
Resumo:
The basis set superposition error-free second-order MØller-Plesset perturbation theory of intermolecular interactions was studied. The difficulties of the counterpoise (CP) correction in open-shell systems were also discussed. The calculations were performed by a program which was used for testing the new variants of the theory. It was shown that the CP correction for the diabatic surfaces should be preferred to the adiabatic ones
Resumo:
Geometries, vibrational frequencies, and interaction energies of the CNH⋯O3 and HCCH⋯O3 complexes are calculated in a counterpoise-corrected (CP-corrected) potential-energy surface (PES) that corrects for the basis set superposition error (BSSE). Ab initio calculations are performed at the Hartree-Fock (HF) and second-order Møller-Plesset (MP2) levels, using the 6-31G(d,p) and D95++(d,p) basis sets. Interaction energies are presented including corrections for zero-point vibrational energy (ZPVE) and thermal correction to enthalpy at 298 K. The CP-corrected and conventional PES are compared; the unconnected PES obtained using the larger basis set including diffuse functions exhibits a double well shape, whereas use of the 6-31G(d,p) basis set leads to a flat single-well profile. The CP-corrected PES has always a multiple-well shape. In particular, it is shown that the CP-corrected PES using the smaller basis set is qualitatively analogous to that obtained with the larger basis sets, so the CP method becomes useful to correctly describe large systems, where the use of small basis sets may be necessary
Resumo:
We describe a simple method to automate the geometric optimization of molecular orbital calculations of supermolecules on potential surfaces that are corrected for basis set superposition error using the counterpoise (CP) method. This method is applied to the H-bonding complexes HF/HCN, HF/H2O, and HCCH/H2O using the 6-31G(d,p) and D95 + + (d,p) basis sets at both the Hartree-Fock and second-order Møller-Plesset levels. We report the interaction energies, geometries, and vibrational frequencies of these complexes on the CP-optimized surfaces; and compare them with similar values calculated using traditional methods, including the (more traditional) single point CP correction. Upon optimization on the CP-corrected surface, the interaction energies become more negative (before vibrational corrections) and the H-bonding stretching vibrations decrease in all cases. The extent of the effects vary from extremely small to quite large depending on the complex and the calculational method. The relative magnitudes of the vibrational corrections cannot be predicted from the H-bond stretching frequencies alone