9 resultados para Data analysis system
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (≃100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the “instrument familiarization plan”), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.
Resumo:
The discovery of the Cosmic Microwave Background (CMB) radiation in 1965 is one of the fundamental milestones supporting the Big Bang theory. The CMB is one of the most important source of information in cosmology. The excellent accuracy of the recent CMB data of WMAP and Planck satellites confirmed the validity of the standard cosmological model and set a new challenge for the data analysis processes and their interpretation. In this thesis we deal with several aspects and useful tools of the data analysis. We focus on their optimization in order to have a complete exploitation of the Planck data and contribute to the final published results. The issues investigated are: the change of coordinates of CMB maps using the HEALPix package, the problem of the aliasing effect in the generation of low resolution maps, the comparison of the Angular Power Spectrum (APS) extraction performances of the optimal QML method, implemented in the code called BolPol, and the pseudo-Cl method, implemented in Cromaster. The QML method has been then applied to the Planck data at large angular scales to extract the CMB APS. The same method has been applied also to analyze the TT parity and the Low Variance anomalies in the Planck maps, showing a consistent deviation from the standard cosmological model, the possible origins for this results have been discussed. The Cromaster code instead has been applied to the 408 MHz and 1.42 GHz surveys focusing on the analysis of the APS of selected regions of the synchrotron emission. The new generation of CMB experiments will be dedicated to polarization measurements, for which are necessary high accuracy devices for separating the polarizations. Here a new technology, called Photonic Crystals, is exploited to develop a new polarization splitter device and its performances are compared to the devices used nowadays.
Resumo:
This thesis is a collection of works focused on the topic of Earthquake Early Warning, with a special attention to large magnitude events. The topic is addressed from different points of view and the structure of the thesis reflects the variety of the aspects which have been analyzed. The first part is dedicated to the giant, 2011 Tohoku-Oki earthquake. The main features of the rupture process are first discussed. The earthquake is then used as a case study to test the feasibility Early Warning methodologies for very large events. Limitations of the standard approaches for large events arise in this chapter. The difficulties are related to the real-time magnitude estimate from the first few seconds of recorded signal. An evolutionary strategy for the real-time magnitude estimate is proposed and applied to the single Tohoku-Oki earthquake. In the second part of the thesis a larger number of earthquakes is analyzed, including small, moderate and large events. Starting from the measurement of two Early Warning parameters, the behavior of small and large earthquakes in the initial portion of recorded signals is investigated. The aim is to understand whether small and large earthquakes can be distinguished from the initial stage of their rupture process. A physical model and a plausible interpretation to justify the observations are proposed. The third part of the thesis is focused on practical, real-time approaches for the rapid identification of the potentially damaged zone during a seismic event. Two different approaches for the rapid prediction of the damage area are proposed and tested. The first one is a threshold-based method which uses traditional seismic data. Then an innovative approach using continuous, GPS data is explored. Both strategies improve the prediction of large scale effects of strong earthquakes.
Resumo:
The aging process is characterized by the progressive fitness decline experienced at all the levels of physiological organization, from single molecules up to the whole organism. Studies confirmed inflammaging, a chronic low-level inflammation, as a deeply intertwined partner of the aging process, which may provide the “common soil” upon which age-related diseases develop and flourish. Thus, albeit inflammation per se represents a physiological process, it can rapidly become detrimental if it goes out of control causing an excess of local and systemic inflammatory response, a striking risk factor for the elderly population. Developing interventions to counteract the establishment of this state is thus a top priority. Diet, among other factors, represents a good candidate to regulate inflammation. Building on top of this consideration, the EU project NU-AGE is now trying to assess if a Mediterranean diet, fortified for the elderly population needs, may help in modulating inflammaging. To do so, NU-AGE enrolled a total of 1250 subjects, half of which followed a 1-year long diet, and characterized them by mean of the most advanced –omics and non –omics analyses. The aim of this thesis was the development of a solid data management pipeline able to efficiently cope with the results of these assays, which are now flowing inside a centralized database, ready to be used to test the most disparate scientific hypotheses. At the same time, the work hereby described encompasses the data analysis of the GEHA project, which was focused on identifying the genetic determinants of longevity, with a particular focus on developing and applying a method for detecting epistatic interactions in human mtDNA. Eventually, in an effort to propel the adoption of NGS technologies in everyday pipeline, we developed a NGS variant calling pipeline devoted to solve all the sequencing-related issues of the mtDNA.
Resumo:
The Assimilation in the Unstable Subspace (AUS) was introduced by Trevisan and Uboldi in 2004, and developed by Trevisan, Uboldi and Carrassi, to minimize the analysis and forecast errors by exploiting the flow-dependent instabilities of the forecast-analysis cycle system, which may be thought of as a system forced by observations. In the AUS scheme the assimilation is obtained by confining the analysis increment in the unstable subspace of the forecast-analysis cycle system so that it will have the same structure of the dominant instabilities of the system. The unstable subspace is estimated by Breeding on the Data Assimilation System (BDAS). AUS- BDAS has already been tested in realistic models and observational configurations, including a Quasi-Geostrophicmodel and a high dimensional, primitive equation ocean model; the experiments include both fixed and“adaptive”observations. In these contexts, the AUS-BDAS approach greatly reduces the analysis error, with reasonable computational costs for data assimilation with respect, for example, to a prohibitive full Extended Kalman Filter. This is a follow-up study in which we revisit the AUS-BDAS approach in the more basic, highly nonlinear Lorenz 1963 convective model. We run observation system simulation experiments in a perfect model setting, and with two types of model error as well: random and systematic. In the different configurations examined, and in a perfect model setting, AUS once again shows better efficiency than other advanced data assimilation schemes. In the present study, we develop an iterative scheme that leads to a significant improvement of the overall assimilation performance with respect also to standard AUS. In particular, it boosts the efficiency of regime’s changes tracking, with a low computational cost. Other data assimilation schemes need estimates of ad hoc parameters, which have to be tuned for the specific model at hand. In Numerical Weather Prediction models, tuning of parameters — and in particular an estimate of the model error covariance matrix — may turn out to be quite difficult. Our proposed approach, instead, may be easier to implement in operational models.
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
The quench characteristics of second generation (2 G) YBCO Coated Conductor (CC) tapes are of fundamental importance for the design and safe operation of superconducting cables and magnets based on this material. Their ability to transport high current densities at high temperature, up to 77 K, and at very high fields, over 20 T, together with the increasing knowledge in their manufacturing, which is reducing their cost, are pushing the use of this innovative material in numerous system applications, from high field magnets for research to motors and generators as well as for cables. The aim of this Ph. D. thesis is the experimental analysis and numerical simulations of quench in superconducting HTS tapes and coils. A measurements facility for the characterization of superconducting tapes and coils was designed, assembled and tested. The facility consist of a cryostat, a cryocooler, a vacuum system, resistive and superconducting current leads and signal feedthrough. Moreover, the data acquisition system and the software for critical current and quench measurements were developed. A 2D model was developed using the finite element code COMSOL Multiphysics R . The problem of modeling the high aspect ratio of the tape is tackled by multiplying the tape thickness by a constant factor, compensating the heat and electrical balance equations by introducing a material anisotropy. The model was then validated both with the results of a 1D quench model based on a non-linear electric circuit coupled to a thermal model of the tape, to literature measurements and to critical current and quench measurements made in the cryogenic facility. Finally the model was extended to the study of coils and windings with the definition of the tape and stack homogenized properties. The procedure allows the definition of a multi-scale hierarchical model, able to simulate the windings with different degrees of detail.
Resumo:
Coastal sand dunes represent a richness first of all in terms of defense from the sea storms waves and the saltwater ingression; moreover these morphological elements constitute an unique ecosystem of transition between the sea and the land environment. The research about dune system is a strong part of the coastal sciences, since the last century. Nowadays this branch have assumed even more importance for two reasons: on one side the born of brand new technologies, especially related to the Remote Sensing, have increased the researcher possibilities; on the other side the intense urbanization of these days have strongly limited the dune possibilities of development and fragmented what was remaining from the last century. This is particularly true in the Ravenna area, where the industrialization united to the touristic economy and an intense subsidence, have left only few dune ridges residual still active. In this work three different foredune ridges, along the Ravenna coast, have been studied with Laser Scanner technology. This research didn’t limit to analyze volume or spatial difference, but try also to find new ways and new features to monitor this environment. Moreover the author planned a series of test to validate data from Terrestrial Laser Scanner (TLS), with the additional aim of finalize a methodology to test 3D survey accuracy. Data acquired by TLS were then applied on one hand to test some brand new applications, such as Digital Shore Line Analysis System (DSAS) and Computational Fluid Dynamics (CFD), to prove their efficacy in this field; on the other hand the author used TLS data to find any correlation with meteorological indexes (Forcing Factors), linked to sea and wind (Fryberger's method) applying statistical tools, such as the Principal Component Analysis (PCA).