5 resultados para Model selection
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
In the present study we are using multi variate analysis techniques to discriminate signal from background in the fully hadronic decay channel of ttbar events. We give a brief introduction to the role of the Top quark in the standard model and a general description of the CMS Experiment at LHC. We have used the CMS experiment computing and software infrastructure to generate and prepare the data samples used in this analysis. We tested the performance of three different classifiers applied to our data samples and used the selection obtained with the Multi Layer Perceptron classifier to give an estimation of the statistical and systematical uncertainty on the cross section measurement.
Resumo:
This thesis presents a creative and practical approach to dealing with the problem of selection bias. Selection bias may be the most important vexing problem in program evaluation or in any line of research that attempts to assert causality. Some of the greatest minds in economics and statistics have scrutinized the problem of selection bias, with the resulting approaches – Rubin’s Potential Outcome Approach(Rosenbaum and Rubin,1983; Rubin, 1991,2001,2004) or Heckman’s Selection model (Heckman, 1979) – being widely accepted and used as the best fixes. These solutions to the bias that arises in particular from self selection are imperfect, and many researchers, when feasible, reserve their strongest causal inference for data from experimental rather than observational studies. The innovative aspect of this thesis is to propose a data transformation that allows measuring and testing in an automatic and multivariate way the presence of selection bias. The approach involves the construction of a multi-dimensional conditional space of the X matrix in which the bias associated with the treatment assignment has been eliminated. Specifically, we propose the use of a partial dependence analysis of the X-space as a tool for investigating the dependence relationship between a set of observable pre-treatment categorical covariates X and a treatment indicator variable T, in order to obtain a measure of bias according to their dependence structure. The measure of selection bias is then expressed in terms of inertia due to the dependence between X and T that has been eliminated. Given the measure of selection bias, we propose a multivariate test of imbalance in order to check if the detected bias is significant, by using the asymptotical distribution of inertia due to T (Estadella et al. 2005) , and by preserving the multivariate nature of data. Further, we propose the use of a clustering procedure as a tool to find groups of comparable units on which estimate local causal effects, and the use of the multivariate test of imbalance as a stopping rule in choosing the best cluster solution set. The method is non parametric, it does not call for modeling the data, based on some underlying theory or assumption about the selection process, but instead it calls for using the existing variability within the data and letting the data to speak. The idea of proposing this multivariate approach to measure selection bias and test balance comes from the consideration that in applied research all aspects of multivariate balance, not represented in the univariate variable- by-variable summaries, are ignored. The first part contains an introduction to evaluation methods as part of public and private decision process and a review of the literature of evaluation methods. The attention is focused on Rubin Potential Outcome Approach, matching methods, and briefly on Heckman’s Selection Model. The second part focuses on some resulting limitations of conventional methods, with particular attention to the problem of how testing in the correct way balancing. The third part contains the original contribution proposed , a simulation study that allows to check the performance of the method for a given dependence setting and an application to a real data set. Finally, we discuss, conclude and explain our future perspectives.
Resumo:
The presented study carried out an analysis on rural landscape changes. In particular the study focuses on the understanding of driving forces acting on the rural built environment using a statistical spatial model implemented through GIS techniques. It is well known that the study of landscape changes is essential for a conscious decision making in land planning. From a bibliography review results a general lack of studies dealing with the modeling of rural built environment and hence a theoretical modelling approach for such purpose is needed. The advancement in technology and modernity in building construction and agriculture have gradually changed the rural built environment. In addition, the phenomenon of urbanization of a determined the construction of new volumes that occurred beside abandoned or derelict rural buildings. Consequently there are two types of transformation dynamics affecting mainly the rural built environment that can be observed: the conversion of rural buildings and the increasing of building numbers. It is the specific aim of the presented study to propose a methodology for the development of a spatial model that allows the identification of driving forces that acted on the behaviours of the building allocation. In fact one of the most concerning dynamic nowadays is related to an irrational expansion of buildings sprawl across landscape. The proposed methodology is composed by some conceptual steps that cover different aspects related to the development of a spatial model: the selection of a response variable that better describe the phenomenon under study, the identification of possible driving forces, the sampling methodology concerning the collection of data, the most suitable algorithm to be adopted in relation to statistical theory and method used, the calibration process and evaluation of the model. A different combination of factors in various parts of the territory generated favourable or less favourable conditions for the building allocation and the existence of buildings represents the evidence of such optimum. Conversely the absence of buildings expresses a combination of agents which is not suitable for building allocation. Presence or absence of buildings can be adopted as indicators of such driving conditions, since they represent the expression of the action of driving forces in the land suitability sorting process. The existence of correlation between site selection and hypothetical driving forces, evaluated by means of modeling techniques, provides an evidence of which driving forces are involved in the allocation dynamic and an insight on their level of influence into the process. GIS software by means of spatial analysis tools allows to associate the concept of presence and absence with point futures generating a point process. Presence or absence of buildings at some site locations represent the expression of these driving factors interaction. In case of presences, points represent locations of real existing buildings, conversely absences represent locations were buildings are not existent and so they are generated by a stochastic mechanism. Possible driving forces are selected and the existence of a causal relationship with building allocations is assessed through a spatial model. The adoption of empirical statistical models provides a mechanism for the explanatory variable analysis and for the identification of key driving variables behind the site selection process for new building allocation. The model developed by following the methodology is applied to a case study to test the validity of the methodology. In particular the study area for the testing of the methodology is represented by the New District of Imola characterized by a prevailing agricultural production vocation and were transformation dynamic intensively occurred. The development of the model involved the identification of predictive variables (related to geomorphologic, socio-economic, structural and infrastructural systems of landscape) capable of representing the driving forces responsible for landscape changes.. The calibration of the model is carried out referring to spatial data regarding the periurban and rural area of the study area within the 1975-2005 time period by means of Generalised linear model. The resulting output from the model fit is continuous grid surface where cells assume values ranged from 0 to 1 of probability of building occurrences along the rural and periurban area of the study area. Hence the response variable assesses the changes in the rural built environment occurred in such time interval and is correlated to the selected explanatory variables by means of a generalized linear model using logistic regression. Comparing the probability map obtained from the model to the actual rural building distribution in 2005, the interpretation capability of the model can be evaluated. The proposed model can be also applied to the interpretation of trends which occurred in other study areas, and also referring to different time intervals, depending on the availability of data. The use of suitable data in terms of time, information, and spatial resolution and the costs related to data acquisition, pre-processing, and survey are among the most critical aspects of model implementation. Future in-depth studies can focus on using the proposed model to predict short/medium-range future scenarios for the rural built environment distribution in the study area. In order to predict future scenarios it is necessary to assume that the driving forces do not change and that their levels of influence within the model are not far from those assessed for the time interval used for the calibration.
Resumo:
One of the main targets of the CMS experiment is to search for the Standard Model Higgs boson. The 4-lepton channel (from the Higgs decay h->ZZ->4l, l = e,mu) is one of the most promising. The analysis is based on the identification of two opposite-sign, same-flavor lepton pairs: leptons are required to be isolated and to come from the same primary vertex. The Higgs would be statistically revealed by the presence of a resonance peak in the 4-lepton invariant mass distribution. The 4-lepton analysis at CMS is presented, spanning on its most important aspects: lepton identification, variables of isolation, impact parameter, kinematics, event selection, background control and statistical analysis of results. The search leads to an evidence for a signal presence with a statistical significance of more than four standard deviations. The excess of data, with respect to the background-only predictions, indicates the presence of a new boson, with a mass of about 126 GeV/c2 , decaying to two Z bosons, whose characteristics are compatible with the SM Higgs ones.
Resumo:
The development of a multibody model of a motorbike engine cranktrain is presented in this work, with an emphasis on flexible component model reduction. A modelling methodology based upon the adoption of non-ideal joints at interface locations, and the inclusion of component flexibility, is developed: both are necessary tasks if one wants to capture dynamic effects which arise in lightweight, high-speed applications. With regard to the first topic, both a ball bearing model and a journal bearing model are implemented, in order to properly capture the dynamic effects of the main connections in the system: angular contact ball bearings are modelled according to a five-DOF nonlinear scheme in order to grasp the crankshaft main bearings behaviour, while an impedance-based hydrodynamic bearing model is implemented providing an enhanced operation prediction at the conrod big end locations. Concerning the second matter, flexible models of the crankshaft and the connecting rod are produced. The well-established Craig-Bampton reduction technique is adopted as a general framework to obtain reduced model representations which are suitable for the subsequent multibody analyses. A particular component mode selection procedure is implemented, based on the concept of Effective Interface Mass, allowing an assessment of the accuracy of the reduced models prior to the nonlinear simulation phase. In addition, a procedure to alleviate the effects of modal truncation, based on the Modal Truncation Augmentation approach, is developed. In order to assess the performances of the proposed modal reduction schemes, numerical tests are performed onto the crankshaft and the conrod models in both frequency and modal domains. A multibody model of the cranktrain is eventually assembled and simulated using a commercial software. Numerical results are presented, demonstrating the effectiveness of the implemented flexible model reduction techniques. The advantages over the conventional frequency-based truncation approach are discussed.