11 resultados para Automatic Analysis of Multivariate Categorical Data Sets

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (≃100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the “instrument familiarization plan”), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Time Series Analysis of multispectral satellite data offers an innovative way to extract valuable information of our changing planet. This is now a real option for scientists thanks to data availability as well as innovative cloud-computing platforms, such as Google Earth Engine. The integration of different missions would mitigate known issues in multispectral time series construction, such as gaps due to clouds or other atmospheric effects. With this purpose, harmonization among Landsat-like missions is possible through statistical analysis. This research offers an overview of the different instruments from Landsat and Sentinel missions (TM, ETM, OLI, OLI-2 and MSI sensors) and products levels (Collection-2 Level-1 and Surface Reflectance for Landsat and Level-1C and Level-2A for Sentinel-2). Moreover, a cross-sensors comparison was performed to assess the interoperability of the sensors on-board Landsat and Sentinel-2 constellations, having in mind a possible combined use for time series analysis. Firstly, more than 20,000 pairs of images almost simultaneously acquired all over Europe were selected over a period of several years. The study performed a cross-comparison analysis on these data, and provided an assessment of the calibration coefficients that can be used to minimize differences in the combined use. Four of the most popular vegetation indexes were selected for the study: NDVI, EVI, SAVI and NDMI. As a result, it is possible to reconstruct a longer and denser harmonized time series since 1984, useful for vegetation monitoring purposes. Secondly, the spectral characteristics of the recent Landsat-9 mission were assessed for a combined use with Landsat-8 and Sentinel-2. A cross-sensor analysis of common bands of more than 3,000 almost simultaneous acquisitions verified a high consistency between datasets. The most relevant discrepancy has been observed in the blue and SWIRS bands, often used in vegetation and water related studies. This analysis was supported with spectroradiometer ground measurements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The cone penetration test (CPT), together with its recent variation (CPTU), has become the most widely used in-situ testing technique for soil profiling and geotechnical characterization. The knowledge gained over the last decades on the interpretation procedures in sands and clays is certainly wide, whilst very few contributions can be found as regards the analysis of CPT(u) data in intermediate soils. Indeed, it is widely accepted that at the standard rate of penetration (v = 20 mm/s), drained penetration occurs in sands while undrained penetration occurs in clays. However, a problem arise when the available interpretation approaches are applied to cone measurements in silts, sandy silts, silty or clayey sands, since such intermediate geomaterials are often characterized by permeability values within the range in which partial drainage is very likely to occur. Hence, the application of the available and well-established interpretation procedures, developed for ‘standard’ clays and sands, may result in invalid estimates of soil parameters. This study aims at providing a better understanding on the interpretation of CPTU data in natural sand and silt mixtures, by taking into account two main aspects, as specified below: 1)Investigating the effect of penetration rate on piezocone measurements, with the aim of identifying drainage conditions when cone penetration is performed at a standard rate. This part of the thesis has been carried out with reference to a specific CPTU database recently collected in a liquefaction-prone area (Emilia-Romagna Region, Italy). 2)Providing a better insight into the interpretation of piezocone tests in the widely studied silty sediments of the Venetian lagoon (Italy). Research has focused on the calibration and verification of some site-specific correlations, with special reference to the estimate of compressibility parameters for the assessment of long-term settlements of the Venetian coastal defences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the present work we perform an econometric analysis of the Tribal art market. To this aim, we use a unique and original database that includes information on Tribal art market auctions worldwide from 1998 to 2011. In Literature, art prices are modelled through the hedonic regression model, a classic fixed-effect model. The main drawback of the hedonic approach is the large number of parameters, since, in general, art data include many categorical variables. In this work, we propose a multilevel model for the analysis of Tribal art prices that takes into account the influence of time on artwork prices. In fact, it is natural to assume that time exerts an influence over the price dynamics in various ways. Nevertheless, since the set of objects change at every auction date, we do not have repeated measurements of the same items over time. Hence, the dataset does not constitute a proper panel; rather, it has a two-level structure in that items, level-1 units, are grouped in time points, level-2 units. The main theoretical contribution is the extension of classical multilevel models to cope with the case described above. In particular, we introduce a model with time dependent random effects at the second level. We propose a novel specification of the model, derive the maximum likelihood estimators and implement them through the E-M algorithm. We test the finite sample properties of the estimators and the validity of the own-written R-code by means of a simulation study. Finally, we show that the new model improves considerably the fit of the Tribal art data with respect to both the hedonic regression model and the classic multilevel model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The surface electrocardiogram (ECG) is an established diagnostic tool for the detection of abnormalities in the electrical activity of the heart. The interest of the ECG, however, extends beyond the diagnostic purpose. In recent years, studies in cognitive psychophysiology have related heart rate variability (HRV) to memory performance and mental workload. The aim of this thesis was to analyze the variability of surface ECG derived rhythms, at two different time scales: the discrete-event time scale, typical of beat-related features (Objective I), and the “continuous” time scale of separated sources in the ECG (Objective II), in selected scenarios relevant to psychophysiological and clinical research, respectively. Objective I) Joint time-frequency and non-linear analysis of HRV was carried out, with the goal of assessing psychophysiological workload (PPW) in response to working memory engaging tasks. Results from fourteen healthy young subjects suggest the potential use of the proposed indices in discriminating PPW levels in response to varying memory-search task difficulty. Objective II) A novel source-cancellation method based on morphology clustering was proposed for the estimation of the atrial wavefront in atrial fibrillation (AF) from body surface potential maps. Strong direct correlation between spectral concentration (SC) of atrial wavefront and temporal variability of the spectral distribution was shown in persistent AF patients, suggesting that with higher SC, shorter observation time is required to collect spectral distribution, from which the fibrillatory rate is estimated. This could be time and cost effective in clinical decision-making. The results held for reduced leads sets, suggesting that a simplified setup could also be considered, further reducing the costs. In designing the methods of this thesis, an online signal processing approach was kept, with the goal of contributing to real-world applicability. An algorithm for automatic assessment of ambulatory ECG quality, and an automatic ECG delineation algorithm were designed and validated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Analysts, politicians and international players from all over the world look at China as one of the most powerful countries on the international scenario, and as a country whose economic development can significantly impact on the economies of the rest of the world. However many aspects of this country have still to be investigated. First the still fundamental role played by Chinese rural areas for the general development of the country from a political, economic and social point of view. In particular, the way in which the rural areas have influenced the social stability of the whole country has been widely discussed due to their strict relationship with the urban areas where most people from the countryside emigrate searching for a job and a better life. In recent years many studies have mostly focused on the urbanization phenomenon with little interest in the living conditions in rural areas and in the deep changes which have occurred in some, mainly agricultural provinces. An analysis of the level of infrastructure is one of the main aspects which highlights the principal differences in terms of living conditions between rural and urban areas. In this thesis, I first carried out the analysis through the multivariate statistics approach (Principal Component Analysis and Cluster Analysis) in order to define the new map of rural areas based on the analysis of living conditions. In the second part I elaborated an index (Living Conditions Index) through the Fuzzy Expert/Inference System. Finally I compared this index (LCI) to the results obtained from the cluster analysis drawing geographic maps. The data source is the second national agricultural census of China carried out in 2006. In particular, I analysed the data refer to villages but aggregated at province level.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Perfusion CT imaging of the liver has potential to improve evaluation of tumour angiogenesis. Quantitative parameters can be obtained applying mathematical models to Time Attenuation Curve (TAC). However, there are still some difficulties for an accurate quantification of perfusion parameters due, for example, to algorithms employed, to mathematical model, to patient’s weight and cardiac output and to the acquisition system. In this thesis, new parameters and alternative methodologies about liver perfusion CT are presented in order to investigate the cause of variability of this technique. Firstly analysis were made to assess the variability related to the mathematical model used to compute arterial Blood Flow (BFa) values. Results were obtained implementing algorithms based on “ maximum slope method” and “Dual input one compartment model” . Statistical analysis on simulated data demonstrated that the two methods are not interchangeable. Anyway slope method is always applicable in clinical context. Then variability related to TAC processing in the application of slope method is analyzed. Results compared with manual selection allow to identify the best automatic algorithm to compute BFa. The consistency of a Standardized Perfusion Index (SPV) was evaluated and a simplified calibration procedure was proposed. At the end the quantitative value of perfusion map was analyzed. ROI approach and map approach provide related values of BFa and this means that pixel by pixel algorithm give reliable quantitative results. Also in pixel by pixel approach slope method give better results. In conclusion the development of new automatic algorithms for a consistent computation of BFa and the analysis and definition of simplified technique to compute SPV parameter, represent an improvement in the field of liver perfusion CT analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tractor rollover represent a primary cause of death or serious injury in agriculture and despite the mandatory Roll-Over Protective Structures (ROPS), that reduced the number of injuries, tractor accidents are still of great concern. Because of their versatility and wide use many studies on safety are concerned with the stability of tractors, but they often prefer controlled tests or laboratory tests. The evaluation of tractors working in field, instead, is a very complex issue because the rollover could be influenced by the interaction among operator, tractor and environment. Recent studies are oriented towards the evaluation of the actual working conditions developing prototypes for driver assistance and data acquisition. Currently these devices are produced and sold by manufacturers. A warning device was assessed in this study with the aim to evaluate its performance and to collect data on different variables influencing the dynamics of tractors in field by monitoring continuously the working conditions of tractors operating at the experimental farm of the Bologna University. The device consists of accelerometers, gyroscope, GSM/GPRS, GPS for geo-referencing and a transceiver for the automatic recognition of tractor-connected equipment. A microprocessor processes data and provides information, through a dedicated algorithm requiring data on the geometry of the tested tractor, on the level of risk for the operator in terms of probable loss of stability and suggests corrective measures to reduce the potential instability of the tractor.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work is focused on the study of saltwater intrusion in coastal aquifers, and in particular on the realization of conceptual schemes to evaluate the risk associated with it. Saltwater intrusion depends on different natural and anthropic factors, both presenting a strong aleatory behaviour, that should be considered for an optimal management of the territory and water resources. Given the uncertainty of problem parameters, the risk associated with salinization needs to be cast in a probabilistic framework. On the basis of a widely adopted sharp interface formulation, key hydrogeological problem parameters are modeled as random variables, and global sensitivity analysis is used to determine their influence on the position of saltwater interface. The analyses presented in this work rely on an efficient model reduction technique, based on Polynomial Chaos Expansion, able to combine the best description of the model without great computational burden. When the assumptions of classical analytical models are not respected, and this occurs several times in the applications to real cases of study, as in the area analyzed in the present work, one can adopt data-driven techniques, based on the analysis of the data characterizing the system under study. It follows that a model can be defined on the basis of connections between the system state variables, with only a limited number of assumptions about the "physical" behaviour of the system.