964 resultados para variance component models
Resumo:
The aim of this thesis is to apply multilevel regression model in context of household surveys. Hierarchical structure in this type of data is characterized by many small groups. In last years comparative and multilevel analysis in the field of perceived health have grown in size. The purpose of this thesis is to develop a multilevel analysis with three level of hierarchy for Physical Component Summary outcome to: evaluate magnitude of within and between variance at each level (individual, household and municipality); explore which covariates affect on perceived physical health at each level; compare model-based and design-based approach in order to establish informativeness of sampling design; estimate a quantile regression for hierarchical data. The target population are the Italian residents aged 18 years and older. Our study shows a high degree of homogeneity within level 1 units belonging from the same group, with an intraclass correlation of 27% in a level-2 null model. Almost all variance is explained by level 1 covariates. In fact, in our model the explanatory variables having more impact on the outcome are disability, unable to work, age and chronic diseases (18 pathologies). An additional analysis are performed by using novel procedure of analysis :"Linear Quantile Mixed Model", named "Multilevel Linear Quantile Regression", estimate. This give us the possibility to describe more generally the conditional distribution of the response through the estimation of its quantiles, while accounting for the dependence among the observations. This has represented a great advantage of our models with respect to classic multilevel regression. The median regression with random effects reveals to be more efficient than the mean regression in representation of the outcome central tendency. A more detailed analysis of the conditional distribution of the response on other quantiles highlighted a differential effect of some covariate along the distribution.
Resumo:
Questa tesi è incentrata sull'analisi della formula di Dupire, che permette di ottenere un'espressione della volatilità locale, nei modelli di Lévy esponenziali. Vengono studiati i modelli di mercato Merton, Kou e Variance Gamma dimostrando che quando si è off the money la volatilità locale tende ad infinito per il tempo di maturità delle opzioni che tende a zero. In particolare viene proposta una procedura di regolarizzazione tale per cui il processo di volatilità locale di Dupire ricrea i corretti prezzi delle opzioni anche quando si ha la presenza di salti. Infine tale risultato viene provato numericamente risolvendo il problema di Cauchy per i prezzi delle opzioni.
Resumo:
In the first chapter, I develop a panel no-cointegration test which extends Pesaran, Shin and Smith (2001)'s bounds test to the panel framework by considering the individual regressions in a Seemingly Unrelated Regression (SUR) system. This allows to take into account unobserved common factors that contemporaneously affect all the units of the panel and provides, at the same time, unit-specific test statistics. Moreover, the approach is particularly suited when the number of individuals of the panel is small relatively to the number of time series observations. I develop the algorithm to implement the test and I use Monte Carlo simulation to analyze the properties of the test. The small sample properties of the test are remarkable, compared to its single equation counterpart. I illustrate the use of the test through a test of Purchasing Power Parity in a panel of EU15 countries. In the second chapter of my PhD thesis, I verify the Expectation Hypothesis of the Term Structure in the repurchasing agreements (repo) market with a new testing approach. I consider an "inexact" formulation of the EHTS, which models a time-varying component in the risk premia and I treat the interest rates as a non-stationary cointegrated system. The effect of the heteroskedasticity is controlled by means of testing procedures (bootstrap and heteroskedasticity correction) which are robust to variance and covariance shifts over time. I fi#nd that the long-run implications of EHTS are verified. A rolling window analysis clarifies that the EHTS is only rejected in periods of turbulence of #financial markets. The third chapter introduces the Stata command "bootrank" which implements the bootstrap likelihood ratio rank test algorithm developed by Cavaliere et al. (2012). The command is illustrated through an empirical application on the term structure of interest rates in the US.
Resumo:
A critical point in the analysis of ground displacements time series is the development of data driven methods that allow the different sources that generate the observed displacements to be discerned and characterised. A widely used multivariate statistical technique is the Principal Component Analysis (PCA), which allows reducing the dimensionality of the data space maintaining most of the variance of the dataset explained. Anyway, PCA does not perform well in finding the solution to the so-called Blind Source Separation (BSS) problem, i.e. in recovering and separating the original sources that generated the observed data. This is mainly due to the assumptions on which PCA relies: it looks for a new Euclidean space where the projected data are uncorrelated. The Independent Component Analysis (ICA) is a popular technique adopted to approach this problem. However, the independence condition is not easy to impose, and it is often necessary to introduce some approximations. To work around this problem, I use a variational bayesian ICA (vbICA) method, which models the probability density function (pdf) of each source signal using a mix of Gaussian distributions. This technique allows for more flexibility in the description of the pdf of the sources, giving a more reliable estimate of them. Here I present the application of the vbICA technique to GPS position time series. First, I use vbICA on synthetic data that simulate a seismic cycle (interseismic + coseismic + postseismic + seasonal + noise) and a volcanic source, and I study the ability of the algorithm to recover the original (known) sources of deformation. Secondly, I apply vbICA to different tectonically active scenarios, such as the 2009 L'Aquila (central Italy) earthquake, the 2012 Emilia (northern Italy) seismic sequence, and the 2006 Guerrero (Mexico) Slow Slip Event (SSE).
Resumo:
This thesis deals with three different physical models, where each model involves a random component which is linked to a cubic lattice. First, a model is studied, which is used in numerical calculations of Quantum Chromodynamics.In these calculations random gauge-fields are distributed on the bonds of the lattice. The formulation of the model is fitted into the mathematical framework of ergodic operator families. We prove, that for small coupling constants, the ergodicity of the underlying probability measure is indeed ensured and that the integrated density of states of the Wilson-Dirac operator exists. The physical situations treated in the next two chapters are more similar to one another. In both cases the principle idea is to study a fermion system in a cubic crystal with impurities, that are modeled by a random potential located at the lattice sites. In the second model we apply the Hartree-Fock approximation to such a system. For the case of reduced Hartree-Fock theory at positive temperatures and a fixed chemical potential we consider the limit of an infinite system. In that case we show the existence and uniqueness of minimizers of the Hartree-Fock functional. In the third model we formulate the fermion system algebraically via C*-algebras. The question imposed here is to calculate the heat production of the system under the influence of an outer electromagnetic field. We show that the heat production corresponds exactly to what is empirically predicted by Joule's law in the regime of linear response.
Resumo:
Analyzing and modeling relationships between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects in chemical datasets is a challenging task for scientific researchers in the field of cheminformatics. Therefore, (Q)SAR model validation is essential to ensure future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to approve its use in real-world scenarios as an alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model is still under discussion. In this work, we empirically compare a k-fold cross-validation with external test set validation. The introduced workflow allows to apply the built and validated models to large amounts of unseen data, and to compare the performance of the different validation approaches. Our experimental results indicate that cross-validation produces (Q)SAR models with higher predictivity than external test set validation and reduces the variance of the results. Statistical validation is important to evaluate the performance of (Q)SAR models, but does not support the user in better understanding the properties of the model or the underlying correlations. We present the 3D molecular viewer CheS-Mapper (Chemical Space Mapper) that arranges compounds in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kinds of features, like structural fragments as well as quantitative chemical descriptors. Comprehensive functionalities including clustering, alignment of compounds according to their 3D structure, and feature highlighting aid the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. Even though visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allows for the investigation of model validation results are still lacking. We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. New functionalities in CheS-Mapper 2.0 facilitate the analysis of (Q)SAR information and allow the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. Our approach reveals if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.
Resumo:
Adaptive radiation is usually thought to be associated with speciation, but the evolution of intraspecific polymorphisms without speciation is also possible. The radiation of cichlid fish in Lake Victoria (LV) is perhaps the most impressive example of a recent rapid adaptive radiation, with 600+ very young species. Key questions about its origin remain poorly characterized, such as the importance of speciation versus polymorphism, whether species persist on evolutionary time scales, and if speciation happens more commonly in small isolated or in large connected populations. We used 320 individuals from 105 putative species from Lakes Victoria, Edward, Kivu, Albert, Nabugabo and Saka, in a radiation-wide amplified fragment length polymorphism (AFLP) genome scan to address some of these questions. We demonstrate pervasive signatures of speciation supporting the classical model of adaptive radiation associated with speciation. A positive relationship between the age of lakes and the average genomic differentiation of their species, and a significant fraction of molecular variance explained by above-species level taxonomy suggest the persistence of species on evolutionary time scales, with radiation through sequential speciation rather than a single starburst. Finally the large gene diversity retained from colonization to individual species in every radiation suggests large effective population sizes and makes speciation in small geographical isolates unlikely.
Resumo:
Model-based calibration of steady-state engine operation is commonly performed with highly parameterized empirical models that are accurate but not very robust, particularly when predicting highly nonlinear responses such as diesel smoke emissions. To address this problem, and to boost the accuracy of more robust non-parametric methods to the same level, GT-Power was used to transform the empirical model input space into multiple input spaces that simplified the input-output relationship and improved the accuracy and robustness of smoke predictions made by three commonly used empirical modeling methods: Multivariate Regression, Neural Networks and the k-Nearest Neighbor method. The availability of multiple input spaces allowed the development of two committee techniques: a 'Simple Committee' technique that used averaged predictions from a set of 10 pre-selected input spaces chosen by the training data and the "Minimum Variance Committee" technique where the input spaces for each prediction were chosen on the basis of disagreement between the three modeling methods. This latter technique equalized the performance of the three modeling methods. The successively increasing improvements resulting from the use of a single best transformed input space (Best Combination Technique), Simple Committee Technique and Minimum Variance Committee Technique were verified with hypothesis testing. The transformed input spaces were also shown to improve outlier detection and to improve k-Nearest Neighbor performance when predicting dynamic emissions with steady-state training data. An unexpected finding was that the benefits of input space transformation were unaffected by changes in the hardware or the calibration of the underlying GT-Power model.
Resumo:
The study describes brain areas involved in medial temporal lobe (mTL) seizures of 12 patients. All patients showed so-called oro-alimentary behavior within the first 20 s of clinical seizure manifestation characteristic of mTL seizures. Single photon emission computed tomography (SPECT) images of regional cerebral blood flow (rCBF) were acquired from the patients in ictal and interictal phases and from normal volunteers. Image analysis employed categorical comparisons with statistical parametric mapping and principal component analysis (PCA) to assess functional connectivity. PCA supplemented the findings of the categorical analysis by decomposing the covariance matrix containing images of patients and healthy subjects into distinct component images of independent variance, including areas not identified by the categorical analysis. Two principal components (PCs) discriminated the subject groups: patients with right or left mTL seizures and normal volunteers, indicating distinct neuronal networks implicated by the seizure. Both PCs were correlated with seizure duration, one positively and the other negatively, confirming their physiological significance. The independence of the two PCs yielded a clear clustering of subject groups. The local pattern within the temporal lobe describes critical relay nodes which are the counterpart of oro-alimentary behavior: (1) right mesial temporal zone and ipsilateral anterior insula in right mTL seizures, and (2) temporal poles on both sides that are densely interconnected by the anterior commissure. Regions remote from the temporal lobe may be related to seizure propagation and include positively and negatively loaded areas. These patterns, the covarying areas of the temporal pole and occipito-basal visual association cortices, for example, are related to known anatomic paths.
Resumo:
OBJECTIVES: This paper examines four different levels of possible variation in symptom reporting: occasion, day, person and family. DESIGN: In order to rule out effects of retrospection, concurrent symptom reporting was assessed prospectively using a computer-assisted self-report method. METHODS: A decomposition of variance in symptom reporting was conducted using diary data from families with adolescent children. We used palmtop computers to assess concurrent somatic complaints from parents and children six times a day for seven consecutive days. In two separate studies, 314 and 254 participants from 96 and 77 families, respectively, participated. A generalized multilevel linear models approach was used to analyze the data. Symptom reports were modelled using a logistic response function, and random effects were allowed at the family, person and day level, with extra-binomial variation allowed for on the occasion level. RESULTS: Substantial variability was observed at the person, day and occasion level but not at the family level. CONCLUSIONS: To explain symptom reporting in normally healthy individuals, situational as well as person characteristics should be taken into account. Family characteristics, however, would not help to clarify symptom reporting in all family members.
Resumo:
Marginal generalized linear models can be used for clustered and longitudinal data by fitting a model as if the data were independent and using an empirical estimator of parameter standard errors. We extend this approach to data where the number of observations correlated with a given one grows with sample size and show that parameter estimates are consistent and asymptotically Normal with a slower convergence rate than for independent data, and that an information sandwich variance estimator is consistent. We present two problems that motivated this work, the modelling of patterns of HIV genetic variation and the behavior of clustered data estimators when clusters are large.
Resumo:
Investigators interested in whether a disease aggregates in families often collect case-control family data, which consist of disease status and covariate information for families selected via case or control probands. Here, we focus on the use of case-control family data to investigate the relative contributions to the disease of additive genetic effects (A), shared family environment (C), and unique environment (E). To this end, we describe a ACE model for binary family data and then introduce an approach to fitting the model to case-control family data. The structural equation model, which has been described previously, combines a general-family extension of the classic ACE twin model with a (possibly covariate-specific) liability-threshold model for binary outcomes. Our likelihood-based approach to fitting involves conditioning on the proband’s disease status, as well as setting prevalence equal to a pre-specified value that can be estimated from the data themselves if necessary. Simulation experiments suggest that our approach to fitting yields approximately unbiased estimates of the A, C, and E variance components, provided that certain commonly-made assumptions hold. These assumptions include: the usual assumptions for the classic ACE and liability-threshold models; assumptions about shared family environment for relative pairs; and assumptions about the case-control family sampling, including single ascertainment. When our approach is used to fit the ACE model to Austrian case-control family data on depression, the resulting estimate of heritability is very similar to those from previous analyses of twin data.
Resumo:
We derive the additive-multiplicative error model for microarray intensities, and describe two applications. For the detection of differentially expressed genes, we obtain a statistic whose variance is approximately independent of the mean intensity. For the post hoc calibration (normalization) of data with respect to experimental factors, we describe a method for parameter estimation.
Resumo:
Statistical shape analysis techniques commonly employed in the medical imaging community, such as active shape models or active appearance models, rely on principal component analysis (PCA) to decompose shape variability into a reduced set of interpretable components. In this paper we propose principal factor analysis (PFA) as an alternative and complementary tool to PCA providing a decomposition into modes of variation that can be more easily interpretable, while still being a linear efficient technique that performs dimensionality reduction (as opposed to independent component analysis, ICA). The key difference between PFA and PCA is that PFA models covariance between variables, rather than the total variance in the data. The added value of PFA is illustrated on 2D landmark data of corpora callosa outlines. Then, a study of the 3D shape variability of the human left femur is performed. Finally, we report results on vector-valued 3D deformation fields resulting from non-rigid registration of ventricles in MRI of the brain.
Resumo:
It is an important and difficult challenge to protect modern interconnected power system from blackouts. Applying advanced power system protection techniques and increasing power system stability are ways to improve the reliability and security of power systems. Phasor-domain software packages such as Power System Simulator for Engineers (PSS/E) can be used to study large power systems but cannot be used for transient analysis. In order to observe both power system stability and transient behavior of the system during disturbances, modeling has to be done in the time-domain. This work focuses on modeling of power systems and various control systems in the Alternative Transients Program (ATP). ATP is a time-domain power system modeling software in which all the power system components can be modeled in detail. Models are implemented with attention to component representation and parameters. The synchronous machine model includes the saturation characteristics and control interface. Transient Analysis Control System is used to model the excitation control system, power system stabilizer and the turbine governor system of the synchronous machine. Several base cases of a single machine system are modeled and benchmarked against PSS/E. A two area system is modeled and inter-area and intra-area oscillations are observed. The two area system is reduced to a two machine system using reduced dynamic equivalencing. The original and the reduced systems are benchmarked against PSS/E. This work also includes the simulation of single-pole tripping using one of the base case models. Advantages of single-pole tripping and comparison of system behavior against three-pole tripping are studied. Results indicate that the built-in control system models in PSS/E can be effectively reproduced in ATP. The benchmarked models correctly simulate the power system dynamics. The successful implementation of a dynamically reduced system in ATP shows promise for studying a small sub-system of a large system without losing the dynamic behaviors. Other aspects such as relaying can be investigated using the benchmarked models. It is expected that this work will provide guidance in modeling different control systems for the synchronous machine and in representing dynamic equivalents of large power systems.