14 resultados para Zero-inflated models, Statistical models, Poisson, Negative binomial, Statistical methods
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Statistical modelling and statistical learning theory are two powerful analytical frameworks for analyzing signals and developing efficient processing and classification algorithms. In this thesis, these frameworks are applied for modelling and processing biomedical signals in two different contexts: ultrasound medical imaging systems and primate neural activity analysis and modelling. In the context of ultrasound medical imaging, two main applications are explored: deconvolution of signals measured from a ultrasonic transducer and automatic image segmentation and classification of prostate ultrasound scans. In the former application a stochastic model of the radio frequency signal measured from a ultrasonic transducer is derived. This model is then employed for developing in a statistical framework a regularized deconvolution procedure, for enhancing signal resolution. In the latter application, different statistical models are used to characterize images of prostate tissues, extracting different features. These features are then uses to segment the images in region of interests by means of an automatic procedure based on a statistical model of the extracted features. Finally, machine learning techniques are used for automatic classification of the different region of interests. In the context of neural activity signals, an example of bio-inspired dynamical network was developed to help in studies of motor-related processes in the brain of primate monkeys. The presented model aims to mimic the abstract functionality of a cell population in 7a parietal region of primate monkeys, during the execution of learned behavioural tasks.
Resumo:
In this thesis two major topics inherent with medical ultrasound images are addressed: deconvolution and segmentation. In the first case a deconvolution algorithm is described allowing statistically consistent maximum a posteriori estimates of the tissue reflectivity to be restored. These estimates are proven to provide a reliable source of information for achieving an accurate characterization of biological tissues through the ultrasound echo. The second topic involves the definition of a semi automatic algorithm for myocardium segmentation in 2D echocardiographic images. The results show that the proposed method can reduce inter- and intra observer variability in myocardial contours delineation and is feasible and accurate even on clinical data.
Resumo:
Neurodevelopment of preterm children has become an outcome of major interest since the improvement in survival due to advances in neonatal care. Many studies focused on the relationships among prenatal characteristics and neurodevelopmental outcome in order to identify the higher risk preterms’ subgroups. The aim of this study is to analyze and put in relation growth and development trajectories to investigate their association. 346 children born at the S.Orsola Hospital in Bologna from 01/01/2005 to 30/06/2011 with a birth weight of <1500 grams were followed up in a longitudinal study at different intervals from 3 to 24 months of corrected age. During follow-up visits, preterms’ main biometrical characteristics were measured and the Griffiths Mental Development Scale was administered to assess neurodevelopment. Latent Curve Models were developed to estimate the trajectories of length and of neurodevelopment, both separately and combined in a single model, and to assess the influence of clinical and socio-economic variables. Neurodevelopment trajectory was stepwise declining over time and length trajectory showed a steep increase until 12 months and was flat afterwards. Higher initial values of length were correlated with higher initial values of neurodevelopment and predicted a more declining neurodevelopment. SGA preterms and those from families with higher status had a less declining neurodevelopment slope, while being born from a migrant mother proved negative on neurodevelopment through the mediating effect of a being taller at 3 months. A longer stay in NICU used as a proxy of preterms’ morbidity) was predictive of lower initial neurodevelopment levels. At 24 months, neurodevelopment is more similar among preterms and is more accurately evaluated. The association among preterms’ neurodevelopment and physiological growth may provide further insights on the determinants of preterms’ outcomes. Sound statistical methods, exploiting all the information collected in a longitudinal study, may be more appropriate to the analysis.
Resumo:
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
Resumo:
Logistics involves planning, managing, and organizing the flows of goods from the point of origin to the point of destination in order to meet some requirements. Logistics and transportation aspects are very important and represent a relevant costs for producing and shipping companies, but also for public administration and private citizens. The optimization of resources and the improvement in the organization of operations is crucial for all branches of logistics, from the operation management to the transportation. As we will have the chance to see in this work, optimization techniques, models, and algorithms represent important methods to solve the always new and more complex problems arising in different segments of logistics. Many operation management and transportation problems are related to the optimization class of problems called Vehicle Routing Problems (VRPs). In this work, we consider several real-world deterministic and stochastic problems that are included in the wide class of the VRPs, and we solve them by means of exact and heuristic methods. We treat three classes of real-world routing and logistics problems. We deal with one of the most important tactical problems that arises in the managing of the bike sharing systems, that is the Bike sharing Rebalancing Problem (BRP). We propose models and algorithms for real-world earthwork optimization problems. We describe the 3DP process and we highlight several optimization issues in 3DP. Among those, we define the problem related to the tool path definition in the 3DP process, the 3D Routing Problem (3DRP), which is a generalization of the arc routing problem. We present an ILP model and several heuristic algorithms to solve the 3DRP.
Resumo:
The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.
Resumo:
Environmental computer models are deterministic models devoted to predict several environmental phenomena such as air pollution or meteorological events. Numerical model output is given in terms of averages over grid cells, usually at high spatial and temporal resolution. However, these outputs are often biased with unknown calibration and not equipped with any information about the associated uncertainty. Conversely, data collected at monitoring stations is more accurate since they essentially provide the true levels. Due the leading role played by numerical models, it now important to compare model output with observations. Statistical methods developed to combine numerical model output and station data are usually referred to as data fusion. In this work, we first combine ozone monitoring data with ozone predictions from the Eta-CMAQ air quality model in order to forecast real-time current 8-hour average ozone level defined as the average of the previous four hours, current hour, and predictions for the next three hours. We propose a Bayesian downscaler model based on first differences with a flexible coefficient structure and an efficient computational strategy to fit model parameters. Model validation for the eastern United States shows consequential improvement of our fully inferential approach compared with the current real-time forecasting system. Furthermore, we consider the introduction of temperature data from a weather forecast model into the downscaler, showing improved real-time ozone predictions. Finally, we introduce a hierarchical model to obtain spatially varying uncertainty associated with numerical model output. We show how we can learn about such uncertainty through suitable stochastic data fusion modeling using some external validation data. We illustrate our Bayesian model by providing the uncertainty map associated with a temperature output over the northeastern United States.
Resumo:
This doctoral work gains deeper insight into the dynamics of knowledge flows within and across clusters, unfolding their features, directions and strategic implications. Alliances, networks and personnel mobility are acknowledged as the three main channels of inter-firm knowledge flows, thus offering three heterogeneous measures to analyze the phenomenon. The interplay between the three channels and the richness of available research methods, has allowed for the elaboration of three different papers and perspectives. The common empirical setting is the IT cluster in Bangalore, for its distinguished features as a high-tech cluster and for its steady yearly two-digit growth around the service-based business model. The first paper deploys both a firm-level and a tie-level analysis, exploring the cases of 4 domestic companies and of 2 MNCs active the cluster, according to a cluster-based perspective. The distinction between business-domain knowledge and technical knowledge emerges from the qualitative evidence, further confirmed by quantitative analyses at tie-level. At firm-level, the specialization degree seems to be influencing the kind of knowledge shared, while at tie-level both the frequency of interaction and the governance mode prove to determine differences in the distribution of knowledge flows. The second paper zooms out and considers the inter-firm networks; particularly focusing on the role of cluster boundary, internal and external networks are analyzed, in their size, long-term orientation and exploration degree. The research method is purely qualitative and allows for the observation of the evolving strategic role of internal network: from exploitation-based to exploration-based. Moreover, a causal pattern is emphasized, linking the evolution and features of the external network to the evolution and features of internal network. The final paper addresses the softer and more micro-level side of knowledge flows: personnel mobility. A social capital perspective is here developed, which considers both employees’ acquisition and employees’ loss as building inter-firm ties, thus enhancing company’s overall social capital. Negative binomial regression analyses at dyad-level test the significant impact of cluster affiliation (cluster firms vs non-cluster firms), industry affiliation (IT firms vs non-IT fims) and foreign affiliation (MNCs vs domestic firms) in shaping the uneven distribution of personnel mobility, and thus of knowledge flows, among companies.
Resumo:
Recently, the JPL's MarCO mission demonstrated that these probes are also mature enough to be employed in the deep space, even though with the limitations related to the employed commercial components. Currently, other deep space CubeSats are planned either as stand-alone missions or as companions of a traditional large probe. Therefore, developing a dedicated navigation suite is crucial to reaching the mission's goals, considering the limitations of the onboard components compared to typical deep space missions. In this framework, the LICIACube mission represents an ideal candidate test-bench, as it performs a flyby of the Didymos asteroid system subject to a strong position, epochs, and pointing requirements. This mission will also allow us to infer the capabilities of such microsatellites and highlight their limitations compared with the benefits of a lighter design and tailoring efforts. In this work, the OD and guidance methods and tools adopted for classical deep space missions have been tailored for the CubeSat applications and validated through extensive analyses. In addition, navigation procedures and interfaces have been designed in view of the operations foreseen in late 2022. The pre-launch covariance analysis has been performed to assess the mission's feasibility for the nominal trajectory and its associated uncertainties, based on conservative assumptions on the main parameters. Extensive sensitivity analyses have been carried out to understand the main mission parameters affecting the performance and to demonstrate the robustness of the designed trajectory and operation schedule in fulfilling the mission requirements. The developed system was also stressed by tuning the models to access different reconstruction methods for the maneuvers. The analysis demonstrated the feasibility of the LICIACube mission navigation in compliance with the mission requirements, compatible with the limited resources available, both in space and on the ground.
Resumo:
It is well known that the deposition of gaseous pollutants and aerosols plays a major role in causing the deterioration of monuments and built cultural heritage in European cities. Despite of many studies dedicated to the environmental damage of cultural heritage, in case of cement mortars, commonly used in the 20th century architecture, the deterioration due to air multipollutants impact, especially the formation of black crusts, is still not well explored making this issue a challenging area of research. This work centers on cement mortars – environment interactions, focusing on the diagnosis of the damage on the modern built heritage due to air multi-pollutants. For this purpose three sites, exposed to different urban areas in Europe, were selected for sampling and subsequent laboratory analyses: Centennial Hall, Wroclaw (Poland), Chiesa dell'Autostrada del Sole, Florence (Italy), Casa Galleria Vichi, Florence (Italy). The sampling sessions were performed taking into account the height from the ground level and protection from rain run off (sheltered, partly sheltered and exposed areas). The complete characterization of collected damage layer and underlying materials was performed using a range of analytical techniques: optical and scanning electron microscopy, X ray diffractometry, differential and gravimetric thermal analysis, ion chromatography, flash combustion/gas chromatographic analysis, inductively coupled plasma-optical emission spectrometer. The data were elaborated using statistical methods (i.e. principal components analyses) and enrichment factor for cement mortars was calculated for the first time. The results obtained from the experimental activity performed on the damage layers indicate that gypsum, due to the deposition of atmospheric sulphur compounds, is the main damage product at surfaces sheltered from rain run-off at Centennial Hall and Casa Galleria Vichi. By contrast, gypsum has not been identified in the samples collected at Chiesa dell'Autostrada del Sole. This is connected to the restoration works, particularly surface cleaning, regularly performed for the maintenance of the building. Moreover, the results obtained demonstrated the correlation between the location of the building and the composition of the damage layer: Centennial Hall is mainly undergoing to the impact of pollutants emitted from the close coal power stations, whilst Casa Galleria Vichi is principally affected by pollutants from vehicular exhaust in front of the building.
Resumo:
Throughout the twentieth century statistical methods have increasingly become part of experimental research. In particular, statistics has made quantification processes meaningful in the soft sciences, which had traditionally relied on activities such as collecting and describing diversity rather than timing variation. The thesis explores this change in relation to agriculture and biology, focusing on analysis of variance and experimental design, the statistical methods developed by the mathematician and geneticist Ronald Aylmer Fisher during the 1920s. The role that Fisher’s methods acquired as tools of scientific research, side by side with the laboratory equipment and the field practices adopted by research workers, is here investigated bottom-up, beginning with the computing instruments and the information technologies that were the tools of the trade for statisticians. Four case studies show under several perspectives the interaction of statistics, computing and information technologies, giving on the one hand an overview of the main tools – mechanical calculators, statistical tables, punched and index cards, standardised forms, digital computers – adopted in the period, and on the other pointing out how these tools complemented each other and were instrumental for the development and dissemination of analysis of variance and experimental design. The period considered is the half-century from the early 1920s to the late 1960s, the institutions investigated are Rothamsted Experimental Station and the Galton Laboratory, and the statisticians examined are Ronald Fisher and Frank Yates.
Resumo:
This Ph.D. thesis focuses on the investigation of some chemical and sensorial analytical parameters linked to the quality and purity of different categories of oils obtained by olives: extra virgin olive oils, both those that are sold in the large retail trade (supermarkets and discounts) and those directly collected at some Italian mills, and lower-quality oils (refined, lampante and “repaso”). Concurrently with the adoption of traditional and well-known analytical procedures such as gas chromatography and high-performance liquid chromatography, I carried out a set-up of innovative, fast and environmentally-friend methods. For example, I developed some analytical approaches based on Fourier transform medium infrared spectroscopy (FT-MIR) and time domain reflectometry (TDR), coupled with a robust chemometric elaboration of the results. I investigated some other freshness and quality markers that are not included in official parameters (in Italian and European regulations): the adoption of such a full chemical and sensorial analytical plan allowed me to obtain interesting information about the degree of quality of the EVOOs, mostly within the Italian market. Here the range of quality of EVOOs resulted very wide, in terms of sensory attributes, price classes and chemical parameters. Thanks to the collaboration with other Italian and foreign research groups, I carried out several applicative studies, especially focusing on the shelf-life of oils obtained by olives and on the effects of thermal stresses on the quality of the products. I also studied some innovative technological treatments, such as the clarification by using inert gases, as an alternative to the traditional filtration. Moreover, during a three-and-a-half months research stay at the University of Applied Sciences in Zurich, I also carried out a study related to the application of statistical methods for the elaboration of sensory results, obtained thanks to the official Swiss Panel and to some consumer tests.
Resumo:
Despite the scientific achievement of the last decades in the astrophysical and cosmological fields, the majority of the Universe energy content is still unknown. A potential solution to the “missing mass problem” is the existence of dark matter in the form of WIMPs. Due to the very small cross section for WIMP-nuleon interactions, the number of expected events is very limited (about 1 ev/tonne/year), thus requiring detectors with large target mass and low background level. The aim of the XENON1T experiment, the first tonne-scale LXe based detector, is to be sensitive to WIMP-nucleon cross section as low as 10^-47 cm^2. To investigate the possibility of such a detector to reach its goal, Monte Carlo simulations are mandatory to estimate the background. To this aim, the GEANT4 toolkit has been used to implement the detector geometry and to simulate the decays from the various background sources: electromagnetic and nuclear. From the analysis of the simulations, the level of background has been found totally acceptable for the experiment purposes: about 1 background event in a 2 tonne-years exposure. Indeed, using the Maximum Gap method, the XENON1T sensitivity has been evaluated and the minimum for the WIMP-nucleon cross sections has been found at 1.87 x 10^-47 cm^2, at 90% CL, for a WIMP mass of 45 GeV/c^2. The results have been independently cross checked by using the Likelihood Ratio method that confirmed such results with an agreement within less than a factor two. Such a result is completely acceptable considering the intrinsic differences between the two statistical methods. Thus, in the PhD thesis it has been proven that the XENON1T detector will be able to reach the designed sensitivity, thus lowering the limits on the WIMP-nucleon cross section by about 2 orders of magnitude with respect to the current experiments.