17 resultados para Data Mining and its Application
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Electromagnetic spectrum can be identified as a resource for the designer, as well as for the manufacturer, from two complementary points of view: first, because it is a good in great demand by many different kind of applications; second, because despite its scarce availability, it may be advantageous to use more spectrum than necessary. This is the case of Spread-Spectrum Systems, those systems in which the transmitted signal is spread over a wide frequency band, much wider, in fact, than the minimum bandwidth required to transmit the information being sent. Part I of this dissertation deals with Spread-Spectrum Clock Generators (SSCG) aiming at reducing Electro Magnetic Interference (EMI) of clock signals in integrated circuits (IC) design. In particular, the modulation of the clock and the consequent spreading of its spectrum are obtained through a random modulating signal outputted by a chaotic map, i.e. a discrete-time dynamical system showing chaotic behavior. The advantages offered by this kind of modulation are highlighted. Three different prototypes of chaos-based SSCG are presented in all their aspects: design, simulation, and post-fabrication measurements. The third one, operating at a frequency equal to 3GHz, aims at being applied to Serial ATA, standard de facto for fast data transmission to and from Hard Disk Drives. The most extreme example of spread-spectrum signalling is the emerging ultra-wideband (UWB) technology, which proposes the use of large sections of the radio spectrum at low amplitudes to transmit high-bandwidth digital data. In part II of the dissertation, two UWB applications are presented, both dealing with the advantages as well as with the challenges of a wide-band system, namely: a chaos-based sequence generation method for reducing Multiple Access Interference (MAI) in Direct Sequence UWB Wireless-Sensor-Networks (WSNs), and design and simulations of a Low-Noise Amplifier (LNA) for impulse radio UWB. This latter topic was studied during a study-abroad period in collaboration with Delft University of Technology, Delft, Netherlands.
Resumo:
The purpose of this Thesis is to develop a robust and powerful method to classify galaxies from large surveys, in order to establish and confirm the connections between the principal observational parameters of the galaxies (spectral features, colours, morphological indices), and help unveil the evolution of these parameters from $z \sim 1$ to the local Universe. Within the framework of zCOSMOS-bright survey, and making use of its large database of objects ($\sim 10\,000$ galaxies in the redshift range $0 < z \lesssim 1.2$) and its great reliability in redshift and spectral properties determinations, first we adopt and extend the \emph{classification cube method}, as developed by Mignoli et al. (2009), to exploit the bimodal properties of galaxies (spectral, photometric and morphologic) separately, and then combining together these three subclassifications. We use this classification method as a test for a newly devised statistical classification, based on Principal Component Analysis and Unsupervised Fuzzy Partition clustering method (PCA+UFP), which is able to define the galaxy population exploiting their natural global bimodality, considering simultaneously up to 8 different properties. The PCA+UFP analysis is a very powerful and robust tool to probe the nature and the evolution of galaxies in a survey. It allows to define with less uncertainties the classification of galaxies, adding the flexibility to be adapted to different parameters: being a fuzzy classification it avoids the problems due to a hard classification, such as the classification cube presented in the first part of the article. The PCA+UFP method can be easily applied to different datasets: it does not rely on the nature of the data and for this reason it can be successfully employed with others observables (magnitudes, colours) or derived properties (masses, luminosities, SFRs, etc.). The agreement between the two classification cluster definitions is very high. ``Early'' and ``late'' type galaxies are well defined by the spectral, photometric and morphological properties, both considering them in a separate way and then combining the classifications (classification cube) and treating them as a whole (PCA+UFP cluster analysis). Differences arise in the definition of outliers: the classification cube is much more sensitive to single measurement errors or misclassifications in one property than the PCA+UFP cluster analysis, in which errors are ``averaged out'' during the process. This method allowed us to behold the \emph{downsizing} effect taking place in the PC spaces: the migration between the blue cloud towards the red clump happens at higher redshifts for galaxies of larger mass. The determination of $M_{\mathrm{cross}}$ the transition mass is in significant agreement with others values in literature.
Resumo:
In the thesis we present the implementation of the quadratic maximum likelihood (QML) method, ideal to estimate the angular power spectrum of the cross-correlation between cosmic microwave background (CMB) and large scale structure (LSS) maps as well as their individual auto-spectra. Such a tool is an optimal method (unbiased and with minimum variance) in pixel space and goes beyond all the previous harmonic analysis present in the literature. We describe the implementation of the QML method in the {\it BolISW} code and demonstrate its accuracy on simulated maps throughout a Monte Carlo. We apply this optimal estimator to WMAP 7-year and NRAO VLA Sky Survey (NVSS) data and explore the robustness of the angular power spectrum estimates obtained by the QML method. Taking into account the shot noise and one of the systematics (declination correction) in NVSS, we can safely use most of the information contained in this survey. On the contrary we neglect the noise in temperature since WMAP is already cosmic variance dominated on the large scales. Because of a discrepancy in the galaxy auto spectrum between the estimates and the theoretical model, we use two different galaxy distributions: the first one with a constant bias $b$ and the second one with a redshift dependent bias $b(z)$. Finally, we make use of the angular power spectrum estimates obtained by the QML method to derive constraints on the dark energy critical density in a flat $\Lambda$CDM model by different likelihood prescriptions. When using just the cross-correlation between WMAP7 and NVSS maps with 1.8° resolution, we show that $\Omega_\Lambda$ is about the 70\% of the total energy density, disfavouring an Einstein-de Sitter Universe at more than 2 $\sigma$ CL (confidence level).
Resumo:
Landslide hazard and risk are growing as a consequence of climate change and demographic pressure. Land‐use planning represents a powerful tool to manage this socio‐economic problem and build sustainable and landslide resilient communities. Landslide inventory maps are a cornerstone of land‐use planning and, consequently, their quality assessment represents a burning issue. This work aimed to define the quality parameters of a landslide inventory and assess its spatial and temporal accuracy with regard to its possible applications to land‐use planning. In this sense, I proceeded according to a two‐steps approach. An overall assessment of the accuracy of data geographic positioning was performed on four case study sites located in the Italian Northern Apennines. The quantification of the overall spatial and temporal accuracy, instead, focused on the Dorgola Valley (Province of Reggio Emilia). The assessment of spatial accuracy involved a comparison between remotely sensed and field survey data, as well as an innovative fuzzylike analysis of a multi‐temporal landslide inventory map. Conversely, long‐ and short‐term landslide temporal persistence was appraised over a period of 60 years with the aid of 18 remotely sensed image sets. These results were eventually compared with the current Territorial Plan for Provincial Coordination (PTCP) of the Province of Reggio Emilia. The outcome of this work suggested that geomorphologically detected and mapped landslides are a significant approximation of a more complex reality. In order to convey to the end‐users this intrinsic uncertainty, a new form of cartographic representation is needed. In this sense, a fuzzy raster landslide map may be an option. With regard to land‐use planning, landslide inventory maps, if appropriately updated, confirmed to be essential decision‐support tools. This research, however, proved that their spatial and temporal uncertainty discourages any direct use as zoning maps, especially when zoning itself is associated to statutory or advisory regulations.
Resumo:
Fog oases, locally named Lomas, are distributed in a fragmented way along the western coast of Chile and Peru (South America) between ~6°S and 30°S following an altitudinal gradient determined by a fog layer. This fragmentation has been attributed to the hyper aridity of the desert. However, periodically climatic events influence the ‘normal seasonality’ of this ecosystem through a higher than average water input that triggers plant responses (e.g. primary productivity and phenology). The impact of the climatic oscillation may vary according to the season (wet/dry). This thesis evaluates the potential effect of climate oscillations, such as El Niño Southern Oscillation (ENSO), through the analysis of vegetation of this ecosystem following different approaches: Chapters two and three show the analysis of fog oasis along the Peruvian and Chilean deserts. The objectives are: 1) to explain the floristic connection of fog oases analysing their taxa composition differences and the phylogenetic affinities among them, 2) to explore the climate variables related to ENSO which likely affect fog production, and the responses of Lomas vegetation (composition, productivity, distribution) to climate patterns during ENSO events. Chapters four and five describe a fog-oasis in southern Peru during the 2008-2010 period. The objectives are: 3) to describe and create a new vegetation map of the Lomas vegetation using remote sensing analysis supported by field survey data, and 4) to identify the vegetation change during the dry season. The first part of our results show that: 1) there are three significantly different groups of Lomas (Northern Peru, Southern Peru, and Chile) with a significant phylogenetic divergence among them. The species composition reveals a latitudinal gradient of plant assemblages. The species origin, growth-forms typologies, and geographic position also reinforce the differences among groups. 2) Contradictory results have emerged from studies of low-cloud anomalies and the fog-collection during El Niño (EN). EN increases water availability in fog oases when fog should be less frequent due to the reduction of low-clouds amount and stratocumulus. Because a minor role of fog during EN is expected, it is likely that measurements of fog-water collection during EN are considering drizzle and fog at the same time. Although recent studies on fog oases have shown some relationship with the ENSO, responses of vegetation have been largely based on descriptive data, the absence of large temporal records limit the establishment of a direct relationship with climatic oscillations. The second part of the results show that: 3) five different classes of different spectral values correspond to the main land cover of Lomas using a Vegetation Index (VI). The study case is characterised by shrubs and trees with variable cover (dense, semi-dense and open). A secondary area is covered by small shrubs where the dominant tree species is not present. The cacti area and the old terraces with open vegetation were not identified with the VI. Agriculture is present in the area. Finally, 4) contrary to the dry season of 2008 and 2009 years, a higher VI was obtained during the dry season of 2010. The VI increased up to three times their average value, showing a clear spectral signal change, which coincided with the ENSO event of that period.
Resumo:
Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.
Resumo:
In recent years, the use of Reverse Engineering systems has got a considerable interest for a wide number of applications. Therefore, many research activities are focused on accuracy and precision of the acquired data and post processing phase improvements. In this context, this PhD Thesis deals with the definition of two novel methods for data post processing and data fusion between physical and geometrical information. In particular a technique has been defined for error definition in 3D points’ coordinates acquired by an optical triangulation laser scanner, with the aim to identify adequate correction arrays to apply under different acquisition parameters and operative conditions. Systematic error in data acquired is thus compensated, in order to increase accuracy value. Moreover, the definition of a 3D thermogram is examined. Object geometrical information and its thermal properties, coming from a thermographic inspection, are combined in order to have a temperature value for each recognizable point. Data acquired by an optical triangulation laser scanner are also used to normalize temperature values and make thermal data independent from thermal-camera point of view.
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.
Resumo:
The discovery of the Cosmic Microwave Background (CMB) radiation in 1965 is one of the fundamental milestones supporting the Big Bang theory. The CMB is one of the most important source of information in cosmology. The excellent accuracy of the recent CMB data of WMAP and Planck satellites confirmed the validity of the standard cosmological model and set a new challenge for the data analysis processes and their interpretation. In this thesis we deal with several aspects and useful tools of the data analysis. We focus on their optimization in order to have a complete exploitation of the Planck data and contribute to the final published results. The issues investigated are: the change of coordinates of CMB maps using the HEALPix package, the problem of the aliasing effect in the generation of low resolution maps, the comparison of the Angular Power Spectrum (APS) extraction performances of the optimal QML method, implemented in the code called BolPol, and the pseudo-Cl method, implemented in Cromaster. The QML method has been then applied to the Planck data at large angular scales to extract the CMB APS. The same method has been applied also to analyze the TT parity and the Low Variance anomalies in the Planck maps, showing a consistent deviation from the standard cosmological model, the possible origins for this results have been discussed. The Cromaster code instead has been applied to the 408 MHz and 1.42 GHz surveys focusing on the analysis of the APS of selected regions of the synchrotron emission. The new generation of CMB experiments will be dedicated to polarization measurements, for which are necessary high accuracy devices for separating the polarizations. Here a new technology, called Photonic Crystals, is exploited to develop a new polarization splitter device and its performances are compared to the devices used nowadays.
Resumo:
The Internet of Things (IoT) is the next industrial revolution: we will interact naturally with real and virtual devices as a key part of our daily life. This technology shift is expected to be greater than the Web and Mobile combined. As extremely different technologies are needed to build connected devices, the Internet of Things field is a junction between electronics, telecommunications and software engineering. Internet of Things application development happens in silos, often using proprietary and closed communication protocols. There is the common belief that only if we can solve the interoperability problem we can have a real Internet of Things. After a deep analysis of the IoT protocols, we identified a set of primitives for IoT applications. We argue that each IoT protocol can be expressed in term of those primitives, thus solving the interoperability problem at the application protocol level. Moreover, the primitives are network and transport independent and make no assumption in that regard. This dissertation presents our implementation of an IoT platform: the Ponte project. Privacy issues follows the rise of the Internet of Things: it is clear that the IoT must ensure resilience to attacks, data authentication, access control and client privacy. We argue that it is not possible to solve the privacy issue without solving the interoperability problem: enforcing privacy rules implies the need to limit and filter the data delivery process. However, filtering data require knowledge of how the format and the semantics of the data: after an analysis of the possible data formats and representations for the IoT, we identify JSON-LD and the Semantic Web as the best solution for IoT applications. Then, this dissertation present our approach to increase the throughput of filtering semantic data by a factor of ten.
Resumo:
We have realized a Data Acquisition chain for the use and characterization of APSEL4D, a 32 x 128 Monolithic Active Pixel Sensor, developed as a prototype for frontier experiments in high energy particle physics. In particular a transition board was realized for the conversion between the chip and the FPGA voltage levels and for the signal quality enhancing. A Xilinx Spartan-3 FPGA was used for real time data processing, for the chip control and the communication with a Personal Computer through a 2.0 USB port. For this purpose a firmware code, developed in VHDL language, was written. Finally a Graphical User Interface for the online system monitoring, hit display and chip control, based on windows and widgets, was realized developing a C++ code and using Qt and Qwt dedicated libraries. APSEL4D and the full acquisition chain were characterized for the first time with the electron beam of the transmission electron microscope and with 55Fe and 90Sr radioactive sources. In addition, a beam test was performed at the T9 station of the CERN PS, where hadrons of momentum of 12 GeV/c are available. The very high time resolution of APSEL4D (up to 2.5 Mfps, but used at 6 kfps) was fundamental in realizing a single electron Young experiment using nanometric double slits obtained by a FIB technique. On high statistical samples, it was possible to observe the interference and diffractions of single isolated electrons traveling inside a transmission electron microscope. For the first time, the information on the distribution of the arrival time of the single electrons has been extracted.
Resumo:
Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.
Resumo:
Synthetic lethality represents an anticancer strategy that targets tumor specific gene defects. One of the most studied application is the use of PARP inhibitors (e.g. olaparib) in BRCA1/2-less cancer cells. In BRCA2-defective tumors, olaparib (OLA) inhibits DNA single-strand break repair, while BRCA2 mutations hamper homologous recombination (HR) repair. The simultaneous impairment of those pathways leads BRCA-less cells to death by synthetic lethality. The projects described in this thesis were aimed at extending the use of OLA in cancer cells that do not carry a mutation in BRCA2 by combining this drug with compounds that could mimic a BRCA-less environment via HR inhibition. We demonstrated the effectiveness of our “fully small-molecule induced synthetic lethality” by using two different approaches. In the direct approach (Project A), we identified a series of neo-synthesized compounds (named RAD51-BRCA2 disruptors) that mimic BRCA2 mutations by disrupting the RAD51-BRCA2 interaction and thus the HR pathway. Compound ARN 24089 inhibited HR in human pancreatic adenocarcinoma cell line and triggered synthetic lethality by synergizing with OLA. Interestingly, the observed synthetic lethality was triggered by tackling two biochemically different mechanisms: enzyme inhibition (PARP) and protein-protein disruption (RAD51-BRCA2). In the indirect approach (Project B), we inhibited HR by interfering with the cellular metabolism through inhibition of LDH activity. The obtained data suggest an LDH-mediated control on HR that can be exerted by regulating either the energy supply needed to this repair mechanism or the expression level of genes involved in DNA repair. LDH inhibition also succeeded in increasing the efficiency of OLA in BRCA-proficient cell lines. Although preliminary, these results highlight a complex relationship between metabolic reactions and the control of DNA integrity. Both the described projects proved that our “fully small-molecule-induced synthetic lethality” approach could be an innovative approach to unmet oncological needs.
Resumo:
The increasing demand for alternatives to meat food products, which is linked to ethical and environmental reasons, highlights the necessity of using different protein sources. Plant proteins provide a valid option, thanks to the relative low costs, high availability and wide supply sources. The current process used to produce plant concentrates and isolates is the alkaline extraction followed by isoelectric precipitation. However, despite the high purity of the proteins, it presents some drawbacks. Innovative protein extraction processes are emerging, with the aim of reducing the environmental impact and the costs, as well as improving the functional properties. In this study, the traditional wet protein extraction and another simplified wet process were used to obtain protein-rich extracts out of different plants. The sources considered in the project were de-oiled sunflower and canola, chickpea, lentils, and the camelina meal, an emerging oleaginous seed interesting for its high content of omega 3. The extracts obtained from the two processes were then analysed for their capacities to hold water and fat, to form gel and a stable foam. Results highlighted strong differences concerning the protein content, yield and functionalities. The extracts obtained with the alkaline process confirmed the literature data about the four plant sources (sunflower, canola, chickpea and lentils) and allow to obtain a camelina concentrate with a protein content of 63 % and a protein recovery of 41 %. The second easiest process was not effective to obtain a protein enrichment in oleaginous sources, whereas an enrichment of 10 and 15 % was obtained in chickpea and lentils, respectively. The functional properties were also completely different: the easiest process produced protein ingredients completely water-soluble at pH 7, with a discrete foaming capacity compared to the extracts obtained with alkaline process. These characteristics could make these extracts suitable for the plant milk-analogue products.