5 resultados para Open Data, Dati Aperti, Open Government Data
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
To understand a city and its urban structure it is necessary to study its history. This is feasible through GIS (Geographical Information Systems) and its by-products on the web. Starting from a cartographic view they allow an initial understanding of, and a comparison between, present and past data together with an easy and intuitive access to database information. The research done led to the creation of a GIS for the city of Bologna. It is based on varied data such as historical map, vector and alphanumeric historical data, etc.. After providing information about GIS we thought of spreading and sharing the collected data on the Web after studying two solutions available on the market: Web Mapping and WebGIS. In this study we discuss the stages, beginning with the development of Historical GIS of Bologna, which led to the making of a WebGIS Open Source (MapServer and Chameleon) and the Web Mapping services (Google Earth, Google Maps and OpenLayers).
Resumo:
Subduction zones are the favorite places to generate tsunamigenic earthquakes, where friction between oceanic and continental plates causes the occurrence of a strong seismicity. The topics and the methodologies discussed in this thesis are focussed to the understanding of the rupture process of the seismic sources of great earthquakes that generate tsunamis. The tsunamigenesis is controlled by several kinematical characteristic of the parent earthquake, as the focal mechanism, the depth of the rupture, the slip distribution along the fault area and by the mechanical properties of the source zone. Each of these factors plays a fundamental role in the tsunami generation. Therefore, inferring the source parameters of tsunamigenic earthquakes is crucial to understand the generation of the consequent tsunami and so to mitigate the risk along the coasts. The typical way to proceed when we want to gather information regarding the source process is to have recourse to the inversion of geophysical data that are available. Tsunami data, moreover, are useful to constrain the portion of the fault area that extends offshore, generally close to the trench that, on the contrary, other kinds of data are not able to constrain. In this thesis I have discussed the rupture process of some recent tsunamigenic events, as inferred by means of an inverse method. I have presented the 2003 Tokachi-Oki (Japan) earthquake (Mw 8.1). In this study the slip distribution on the fault has been inferred by inverting tsunami waveform, GPS, and bottom-pressure data. The joint inversion of tsunami and geodetic data has revealed a much better constrain for the slip distribution on the fault rather than the separate inversions of single datasets. Then we have studied the earthquake occurred on 2007 in southern Sumatra (Mw 8.4). By inverting several tsunami waveforms, both in the near and in the far field, we have determined the slip distribution and the mean rupture velocity along the causative fault. Since the largest patch of slip was concentrated on the deepest part of the fault, this is the likely reason for the small tsunami waves that followed the earthquake, pointing out how much the depth of the rupture plays a crucial role in controlling the tsunamigenesis. Finally, we have presented a new rupture model for the great 2004 Sumatra earthquake (Mw 9.2). We have performed the joint inversion of tsunami waveform, GPS and satellite altimetry data, to infer the slip distribution, the slip direction, and the rupture velocity on the fault. Furthermore, in this work we have presented a novel method to estimate, in a self-consistent way, the average rigidity of the source zone. The estimation of the source zone rigidity is important since it may play a significant role in the tsunami generation and, particularly for slow earthquakes, a low rigidity value is sometimes necessary to explain how a relatively low seismic moment earthquake may generate significant tsunamis; this latter point may be relevant for explaining the mechanics of the tsunami earthquakes, one of the open issues in present day seismology. The investigation of these tsunamigenic earthquakes has underlined the importance to use a joint inversion of different geophysical data to determine the rupture characteristics. The results shown here have important implications for the implementation of new tsunami warning systems – particularly in the near-field – the improvement of the current ones, and furthermore for the planning of the inundation maps for tsunami-hazard assessment along the coastal area.
Resumo:
Il problema dell'antibiotico-resistenza è un problema di sanità pubblica per affrontare il quale è necessario un sistema di sorveglianza basato sulla raccolta e l'analisi dei dati epidemiologici di laboratorio. Il progetto di dottorato è consistito nello sviluppo di una applicazione web per la gestione di tali dati di antibiotico sensibilità di isolati clinici utilizzabile a livello di ospedale. Si è creata una piattaforma web associata a un database relazionale per avere un’applicazione dinamica che potesse essere aggiornata facilmente inserendo nuovi dati senza dover manualmente modificare le pagine HTML che compongono l’applicazione stessa. E’ stato utilizzato il database open-source MySQL in quanto presenta numerosi vantaggi: estremamente stabile, elevate prestazioni, supportato da una grande comunità online ed inoltre gratuito. Il contenuto dinamico dell’applicazione web deve essere generato da un linguaggio di programmazione tipo “scripting” che automatizzi operazioni di inserimento, modifica, cancellazione, visualizzazione di larghe quantità di dati. E’ stato scelto il PHP, linguaggio open-source sviluppato appositamente per la realizzazione di pagine web dinamiche, perfettamente utilizzabile con il database MySQL. E’ stata definita l’architettura del database creando le tabelle contenenti i dati e le relazioni tra di esse: le anagrafiche, i dati relativi ai campioni, microrganismi isolati e agli antibiogrammi con le categorie interpretative relative al dato antibiotico. Definite tabelle e relazioni del database è stato scritto il codice associato alle funzioni principali: inserimento manuale di antibiogrammi, importazione di antibiogrammi multipli provenienti da file esportati da strumenti automatizzati, modifica/eliminazione degli antibiogrammi precedenti inseriti nel sistema, analisi dei dati presenti nel database con tendenze e andamenti relativi alla prevalenza di specie microbiche e alla chemioresistenza degli stessi, corredate da grafici. Lo sviluppo ha incluso continui test delle funzioni via via implementate usando reali dati clinici e sono stati introdotti appositi controlli e l’introduzione di una semplice e pulita veste grafica.
Resumo:
This thesis is a collection of essays related to the topic of innovation in the service sector. The choice of this structure is functional to the purpose of single out some of the relevant issues and try to tackle them, revising first the state of the literature and then proposing a way forward. Three relevant issues has been therefore selected: (i) the definition of innovation in the service sector and the connected question of measurement of innovation; (ii) the issue of productivity in services; (iii) the classification of innovative firms in the service sector. Facing the first issue, chapter II shows how the initial width of the original Schumpeterian definition of innovation has been narrowed and then passed to the service sector form the manufacturing one in a reduce technological form. Chapter III tackle the issue of productivity in services, discussing the difficulties for measuring productivity in a context where the output is often immaterial. We reconstruct the dispute on the Baumol’s cost disease argument and propose two different ways to go forward in the research on productivity in services: redefining the output along the line of a characteristic approach; and redefining the inputs, particularly analysing which kind of input it’s worth saving. Chapter IV derives an integrated taxonomy of innovative service and manufacturing firms, using data coming from the 2008 CIS survey for Italy. This taxonomy is based on the enlarged definition of “innovative firm” deriving from the Schumpeterian definition of innovation and classify firms using a cluster analysis techniques. The result is the emergence of a four cluster solution, where firms are differentiated by the breadth of the innovation activities in which they are involved. Chapter 5 reports some of the main conclusions of each singular previous chapter and the points worth of further research in the future.