10 resultados para model selection in binary regression

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis deals with the problem of Model Selection (MS) motivated by information and prediction theory, focusing on parametric time series (TS) models. The main contribution of the thesis is the extension to the multivariate case of the Misspecification-Resistant Information Criterion (MRIC), a criterion introduced recently that solves Akaike’s original research problem posed 50 years ago, which led to the definition of the AIC. The importance of MS is witnessed by the huge amount of literature devoted to it and published in scientific journals of many different disciplines. Despite such a widespread treatment, the contributions that adopt a mathematically rigorous approach are not so numerous and one of the aims of this project is to review and assess them. Chapter 2 discusses methodological aspects of MS from information theory. Information criteria (IC) for the i.i.d. setting are surveyed along with their asymptotic properties; and the cases of small samples, misspecification, further estimators. Chapter 3 surveys criteria for TS. IC and prediction criteria are considered for: univariate models (AR, ARMA) in the time and frequency domain, parametric multivariate (VARMA, VAR); nonparametric nonlinear (NAR); and high-dimensional models. The MRIC answers Akaike’s original question on efficient criteria, for possibly-misspecified (PM) univariate TS models in multi-step prediction with high-dimensional data and nonlinear models. Chapter 4 extends the MRIC to PM multivariate TS models for multi-step prediction introducing the Vectorial MRIC (VMRIC). We show that the VMRIC is asymptotically efficient by proving the decomposition of the MSPE matrix and the consistency of its Method-of-Moments Estimator (MoME), for Least Squares multi-step prediction with univariate regressor. Chapter 5 extends the VMRIC to the general multiple regressor case, by showing that the MSPE matrix decomposition holds, obtaining consistency for its MoME, and proving its efficiency. The chapter concludes with a digression on the conditions for PM VARX models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main topic of this thesis is confounding in linear regression models. It arises when a relationship between an observed process, the covariate, and an outcome process, the response, is influenced by an unmeasured process, the confounder, associated with both. Consequently, the estimators for the regression coefficients of the measured covariates might be severely biased, less efficient and characterized by misleading interpretations. Confounding is an issue when the primary target of the work is the estimation of the regression parameters. The central point of the dissertation is the evaluation of the sampling properties of parameter estimators. This work aims to extend the spatial confounding framework to general structured settings and to understand the behaviour of confounding as a function of the data generating process structure parameters in several scenarios focusing on the joint covariate-confounder structure. In line with the spatial statistics literature, our purpose is to quantify the sampling properties of the regression coefficient estimators and, in turn, to identify the most prominent quantities depending on the generative mechanism impacting confounding. Once the sampling properties of the estimator conditionally on the covariate process are derived as ratios of dependent quadratic forms in Gaussian random variables, we provide an analytic expression of the marginal sampling properties of the estimator using Carlson’s R function. Additionally, we propose a representative quantity for the magnitude of confounding as a proxy of the bias, its first-order Laplace approximation. To conclude, we work under several frameworks considering spatial and temporal data with specific assumptions regarding the covariance and cross-covariance functions used to generate the processes involved. This study allows us to claim that the variability of the confounder-covariate interaction and of the covariate plays the most relevant role in determining the principal marker of the magnitude of confounding.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We study some perturbative and nonperturbative effects in the framework of the Standard Model of particle physics. In particular we consider the time dependence of the Higgs vacuum expectation value given by the dynamics of the StandardModel and study the non-adiabatic production of both bosons and fermions, which is intrinsically non-perturbative. In theHartree approximation, we analyze the general expressions that describe the dissipative dynamics due to the backreaction of the produced particles. Then, we solve numerically some relevant cases for the Standard Model phenomenology in the regime of relatively small oscillations of the Higgs vacuum expectation value (vev). As perturbative effects, we consider the leading logarithmic resummation in small Bjorken x QCD, concentrating ourselves on the Nc dependence of the Green functions associated to reggeized gluons. Here the eigenvalues of the BKP kernel for states of more than three reggeized gluons are unknown in general, contrary to the large Nc limit (planar limit) case where the problem becomes integrable. In this contest we consider a 4-gluon kernel for a finite number of colors and define some simple toy models for the configuration space dynamics, which are directly solvable with group theoretical methods. In particular we study the depencence of the spectrum of thesemodelswith respect to the number of colors andmake comparisons with the planar limit case. In the final part we move on the study of theories beyond the Standard Model, considering models built on AdS5 S5/Γ orbifold compactifications of the type IIB superstring, where Γ is the abelian group Zn. We present an appealing three family N = 0 SUSY model with n = 7 for the order of the orbifolding group. This result in a modified Pati–Salam Model which reduced to the StandardModel after symmetry breaking and has interesting phenomenological consequences for LHC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we aim to propose a new approach for preliminary epidemiological studies on Standardized Mortality Ratios (SMR) collected in many spatial regions. A preliminary study on SMRs aims to formulate hypotheses to be investigated via individual epidemiological studies that avoid bias carried on by aggregated analyses. Starting from collecting disease counts and calculating expected disease counts by means of reference population disease rates, in each area an SMR is derived as the MLE under the Poisson assumption on each observation. Such estimators have high standard errors in small areas, i.e. where the expected count is low either because of the low population underlying the area or the rarity of the disease under study. Disease mapping models and other techniques for screening disease rates among the map aiming to detect anomalies and possible high-risk areas have been proposed in literature according to the classic and the Bayesian paradigm. Our proposal is approaching this issue by a decision-oriented method, which focus on multiple testing control, without however leaving the preliminary study perspective that an analysis on SMR indicators is asked to. We implement the control of the FDR, a quantity largely used to address multiple comparisons problems in the eld of microarray data analysis but which is not usually employed in disease mapping. Controlling the FDR means providing an estimate of the FDR for a set of rejected null hypotheses. The small areas issue arises diculties in applying traditional methods for FDR estimation, that are usually based only on the p-values knowledge (Benjamini and Hochberg, 1995; Storey, 2003). Tests evaluated by a traditional p-value provide weak power in small areas, where the expected number of disease cases is small. Moreover tests cannot be assumed as independent when spatial correlation between SMRs is expected, neither they are identical distributed when population underlying the map is heterogeneous. The Bayesian paradigm oers a way to overcome the inappropriateness of p-values based methods. Another peculiarity of the present work is to propose a hierarchical full Bayesian model for FDR estimation in testing many null hypothesis of absence of risk.We will use concepts of Bayesian models for disease mapping, referring in particular to the Besag York and Mollié model (1991) often used in practice for its exible prior assumption on the risks distribution across regions. The borrowing of strength between prior and likelihood typical of a hierarchical Bayesian model takes the advantage of evaluating a singular test (i.e. a test in a singular area) by means of all observations in the map under study, rather than just by means of the singular observation. This allows to improve the power test in small areas and addressing more appropriately the spatial correlation issue that suggests that relative risks are closer in spatially contiguous regions. The proposed model aims to estimate the FDR by means of the MCMC estimated posterior probabilities b i's of the null hypothesis (absence of risk) for each area. An estimate of the expected FDR conditional on data (\FDR) can be calculated in any set of b i's relative to areas declared at high-risk (where thenull hypothesis is rejected) by averaging the b i's themselves. The\FDR can be used to provide an easy decision rule for selecting high-risk areas, i.e. selecting as many as possible areas such that the\FDR is non-lower than a prexed value; we call them\FDR based decision (or selection) rules. The sensitivity and specicity of such rule depend on the accuracy of the FDR estimate, the over-estimation of FDR causing a loss of power and the under-estimation of FDR producing a loss of specicity. Moreover, our model has the interesting feature of still being able to provide an estimate of relative risk values as in the Besag York and Mollié model (1991). A simulation study to evaluate the model performance in FDR estimation accuracy, sensitivity and specificity of the decision rule, and goodness of estimation of relative risks, was set up. We chose a real map from which we generated several spatial scenarios whose counts of disease vary according to the spatial correlation degree, the size areas, the number of areas where the null hypothesis is true and the risk level in the latter areas. In summarizing simulation results we will always consider the FDR estimation in sets constituted by all b i's selected lower than a threshold t. We will show graphs of the\FDR and the true FDR (known by simulation) plotted against a threshold t to assess the FDR estimation. Varying the threshold we can learn which FDR values can be accurately estimated by the practitioner willing to apply the model (by the closeness between\FDR and true FDR). By plotting the calculated sensitivity and specicity (both known by simulation) vs the\FDR we can check the sensitivity and specicity of the corresponding\FDR based decision rules. For investigating the over-smoothing level of relative risk estimates we will compare box-plots of such estimates in high-risk areas (known by simulation), obtained by both our model and the classic Besag York Mollié model. All the summary tools are worked out for all simulated scenarios (in total 54 scenarios). Results show that FDR is well estimated (in the worst case we get an overestimation, hence a conservative FDR control) in small areas, low risk levels and spatially correlated risks scenarios, that are our primary aims. In such scenarios we have good estimates of the FDR for all values less or equal than 0.10. The sensitivity of\FDR based decision rules is generally low but specicity is high. In such scenario the use of\FDR = 0:05 or\FDR = 0:10 based selection rule can be suggested. In cases where the number of true alternative hypotheses (number of true high-risk areas) is small, also FDR = 0:15 values are well estimated, and \FDR = 0:15 based decision rules gains power maintaining an high specicity. On the other hand, in non-small areas and non-small risk level scenarios the FDR is under-estimated unless for very small values of it (much lower than 0.05); this resulting in a loss of specicity of a\FDR = 0:05 based decision rule. In such scenario\FDR = 0:05 or, even worse,\FDR = 0:1 based decision rules cannot be suggested because the true FDR is actually much higher. As regards the relative risk estimation, our model achieves almost the same results of the classic Besag York Molliè model. For this reason, our model is interesting for its ability to perform both the estimation of relative risk values and the FDR control, except for non-small areas and large risk level scenarios. A case of study is nally presented to show how the method can be used in epidemiology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Among the experimental methods commonly used to define the behaviour of a full scale system, dynamic tests are the most complete and efficient procedures. A dynamic test is an experimental process, which would define a set of characteristic parameters of the dynamic behaviour of the system, such as natural frequencies of the structure, mode shapes and the corresponding modal damping values associated. An assessment of these modal characteristics can be used both to verify the theoretical assumptions of the project, to monitor the performance of the structural system during its operational use. The thesis is structured in the following chapters: The first introductive chapter recalls some basic notions of dynamics of structure, focusing the discussion on the problem of systems with multiply degrees of freedom (MDOF), which can represent a generic real system under study, when it is excited with harmonic force or in free vibration. The second chapter is entirely centred on to the problem of dynamic identification process of a structure, if it is subjected to an experimental test in forced vibrations. It first describes the construction of FRF through classical FFT of the recorded signal. A different method, also in the frequency domain, is subsequently introduced; it allows accurately to compute the FRF using the geometric characteristics of the ellipse that represents the direct input-output comparison. The two methods are compared and then the attention is focused on some advantages of the proposed methodology. The third chapter focuses on the study of real structures when they are subjected to experimental test, where the force is not known, like in an ambient or impact test. In this analysis we decided to use the CWT, which allows a simultaneous investigation in the time and frequency domain of a generic signal x(t). The CWT is first introduced to process free oscillations, with excellent results both in terms of frequencies, dampings and vibration modes. The application in the case of ambient vibrations defines accurate modal parameters of the system, although on the damping some important observations should be made. The fourth chapter is still on the problem of post processing data acquired after a vibration test, but this time through the application of discrete wavelet transform (DWT). In the first part the results obtained by the DWT are compared with those obtained by the application of CWT. Particular attention is given to the use of DWT as a tool for filtering the recorded signal, in fact in case of ambient vibrations the signals are often affected by the presence of a significant level of noise. The fifth chapter focuses on another important aspect of the identification process: the model updating. In this chapter, starting from the modal parameters obtained from some environmental vibration tests, performed by the University of Porto in 2008 and the University of Sheffild on the Humber Bridge in England, a FE model of the bridge is defined, in order to define what type of model is able to capture more accurately the real dynamic behaviour of the bridge. The sixth chapter outlines the necessary conclusions of the presented research. They concern the application of a method in the frequency domain in order to evaluate the modal parameters of a structure and its advantages, the advantages in applying a procedure based on the use of wavelet transforms in the process of identification in tests with unknown input and finally the problem of 3D modeling of systems with many degrees of freedom and with different types of uncertainty.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pig meat quality is determined by several parameters, such as lipid content, tenderness, water-holding capacity, pH, color and flavor, that affect consumers’ acceptance and technological properties of meat. Carcass quality parameters are important for the production of fresh and dry-cure high-quality products, in particular the fat deposition and the lean cut yield. The identification of genes and markers associated with meat and carcass quality traits is of prime interest, for the possibility of improving the traits by marker-assisted selection (MAS) schemes. Therefore, the aim of this thesis was to investigate seven candidate genes for meat and carcass quality traits in pigs. In particular, we focused on genes belonging to the family of the lipid droplet coat proteins perilipins (PLIN1 and PLIN2) and to the calpain/calpastatin system (CAST, CAPN1, CAPN3, CAPNS1) and on the gene encoding for PPARg-coactivator 1A (PPARGC1A). In general, the candidate genes investigation included the protein localization, the detection of polymorphisms, the association analysis with meat and carcass traits and the analysis of the expression level, in order to assess the involvement of the gene in pork quality. Some of the analyzed genes showed effects on various pork traits that are subject to selection in genetic improvement programs, suggesting a possible involvement of the genes in controlling the traits variability. In particular, significant association results have been obtained for PLIN2, CAST and PPARGC1A genes, that are worthwhile of further validation. The obtained results contribute to a better understanding of biological mechanisms important for pig production as well as for a possible use of pig as animal model for studies regarding obesity in humans.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis three measurements of top-antitop differential cross section at an energy in the center of mass of 7 TeV will be shown, as a function of the transverse momentum, the mass and the rapidity of the top-antitop system. The analysis has been carried over a data sample of about 5/fb recorded with the ATLAS detector. The events have been selected with a cut based approach in the "one lepton plus jets" channel, where the lepton can be either an electron or a muon. The most relevant backgrounds (multi-jet QCD and W+jets) have been extracted using data driven methods; the others (Z+ jets, diboson and single top) have been simulated with Monte Carlo techniques. The final, background-subtracted, distributions have been corrected, using unfolding methods, for the detector and selection effects. At the end, the results have been compared with the theoretical predictions. The measurements are dominated by the systematic uncertainties and show no relevant deviation from the Standard Model predictions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The diagnosis, grading and classification of tumours has benefited considerably from the development of DCE-MRI which is now essential to the adequate clinical management of many tumour types due to its capability in detecting active angiogenesis. Several strategies have been proposed for DCE-MRI evaluation. Visual inspection of contrast agent concentration curves vs time is a very simple yet operator dependent procedure, therefore more objective approaches have been developed in order to facilitate comparison between studies. In so called model free approaches, descriptive or heuristic information extracted from time series raw data have been used for tissue classification. The main issue concerning these schemes is that they have not a direct interpretation in terms of physiological properties of the tissues. On the other hand, model based investigations typically involve compartmental tracer kinetic modelling and pixel-by-pixel estimation of kinetic parameters via non-linear regression applied on region of interests opportunely selected by the physician. This approach has the advantage to provide parameters directly related to the pathophysiological properties of the tissue such as vessel permeability, local regional blood flow, extraction fraction, concentration gradient between plasma and extravascular-extracellular space. Anyway, nonlinear modelling is computational demanding and the accuracy of the estimates can be affected by the signal-to-noise ratio and by the initial solutions. The principal aim of this thesis is investigate the use of semi-quantitative and quantitative parameters for segmentation and classification of breast lesion. The objectives can be subdivided as follow: describe the principal techniques to evaluate time intensity curve in DCE-MRI with focus on kinetic model proposed in literature; to evaluate the influence in parametrization choice for a classic bi-compartmental kinetic models; to evaluate the performance of a method for simultaneous tracer kinetic modelling and pixel classification; to evaluate performance of machine learning techniques training for segmentation and classification of breast lesion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a need to identify factors that are able to influence health in old age and to develop interventions that could slow down the process of aging and its associated pathologies. Lifestyle modifications, and especially nutrition, appear to be promising strategies to promote healthy aging. Their impact on aging biomarkers has been poorly investigated. In the first part of this work, we evaluated the impact of a one-year Mediterranean-like diet, delivered within the framework of the NU-AGE project in 120 elderly subjects, on epigenetic age acceleration measures assessed with Horvath’s clock. We observed a rejuvenation of participants after nutritional intervention. The effect was more marked in the group of Polish females and in subjects who were epigenetically older at baseline. In the second part of this work, we developed a new model of epigenetic biomarker, based on a gene-targeted approach with the EpiTYPER® system. We selected six regions of interest (associated with ELOVL2, NHLRC1, SIRT7/MAFG, AIM2, EDARADD and TFAP2E genes) and constructed our model through a ridge regression analysis. In controls, estimation of chronological age was accurate, with a correlation coefficient between predicted and chronological age of 0.92 and a mean absolute deviation of 4.70 years. Our model was able to capture phenomena of accelerated or decelerated aging, in Down syndrome subjects and centenarians and offspring respectively. Applying our model to samples of the NU-AGE project, we observed similar results to the ones obtained with the canonical epigenetic clock, with a rejuvenation of the individuals after one-year of nutritional intervention. Together, our findings indicate that nutrition can promote epigenetic rejuvenation and that epigenetic age acceleration measures could be suitable biomarkers to evaluate their impact. We demonstrated that the effect of the dietary intervention is country-, sex- and individual-specific, thus suggesting the need for a personalized approach to nutritional interventions.