945 resultados para Data-driven science


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette recherche explore comment l’infrastructure et les utilisations d’eBird, l’un des plus grands projets de science citoyenne dans le monde, se développent et évoluent dans le temps et l’espace. Nous nous concentrerons sur le travail d’eBird avec deux de ses partenaires latino-américains, le Mexique et le Pérou, chacun avec un portail Web géré par des organisations locales. eBird, qui est maintenant un grand réseau mondial de partenariats, donne occasion aux citoyens du monde entier la possibilité de contribuer à la science et à la conservation d’oiseaux à partir de ses observations téléchargées en ligne. Ces observations sont gérées et gardées dans une base de données qui est unifiée, globale et accessible pour tous ceux qui s’intéressent au sujet des oiseaux et sa conservation. De même, les utilisateurs profitent des fonctionnalités de la plateforme pour organiser et visualiser leurs données et celles d’autres. L’étude est basée sur une méthodologie qualitative à partir de l’observation des plateformes Web et des entrevues semi-structurées avec les membres du Laboratoire d’ornithologie de Cornell, l’équipe eBird et les membres des organisations partenaires locales responsables d’eBird Pérou et eBird Mexique. Nous analysons eBird comme une infrastructure qui prend en considération les aspects sociaux et techniques dans son ensemble, comme un tout. Nous explorons aussi à la variété de différents types d’utilisation de la plateforme et de ses données par ses divers utilisateurs. Trois grandes thématiques ressortent : l’importance de la collaboration comme une philosophie qui sous-tend le développement d’eBird, l’élargissement des relations et connexions d’eBird à travers ses partenariats, ainsi que l’augmentation de la participation et le volume des données. Finalement, au fil du temps on a vu une évolution des données et de ses différentes utilisations, et ce qu’eBird représente comme infrastructure.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Imaging technologies are widely used in application fields such as natural sciences, engineering, medicine, and life sciences. A broad class of imaging problems reduces to solve ill-posed inverse problems (IPs). Traditional strategies to solve these ill-posed IPs rely on variational regularization methods, which are based on minimization of suitable energies, and make use of knowledge about the image formation model (forward operator) and prior knowledge on the solution, but lack in incorporating knowledge directly from data. On the other hand, the more recent learned approaches can easily learn the intricate statistics of images depending on a large set of data, but do not have a systematic method for incorporating prior knowledge about the image formation model. The main purpose of this thesis is to discuss data-driven image reconstruction methods which combine the benefits of these two different reconstruction strategies for the solution of highly nonlinear ill-posed inverse problems. Mathematical formulation and numerical approaches for image IPs, including linear as well as strongly nonlinear problems are described. More specifically we address the Electrical impedance Tomography (EIT) reconstruction problem by unrolling the regularized Gauss-Newton method and integrating the regularization learned by a data-adaptive neural network. Furthermore we investigate the solution of non-linear ill-posed IPs introducing a deep-PnP framework that integrates the graph convolutional denoiser into the proximal Gauss-Newton method with a practical application to the EIT, a recently introduced promising imaging technique. Efficient algorithms are then applied to the solution of the limited electrods problem in EIT, combining compressive sensing techniques and deep learning strategies. Finally, a transformer-based neural network architecture is adapted to restore the noisy solution of the Computed Tomography problem recovered using the filtered back-projection method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present Dissertation shows how recent statistical analysis tools and open datasets can be exploited to improve modelling accuracy in two distinct yet interconnected domains of flood hazard (FH) assessment. In the first Part, unsupervised artificial neural networks are employed as regional models for sub-daily rainfall extremes. The models aim to learn a robust relation to estimate locally the parameters of Gumbel distributions of extreme rainfall depths for any sub-daily duration (1-24h). The predictions depend on twenty morphoclimatic descriptors. A large study area in north-central Italy is adopted, where 2238 annual maximum series are available. Validation is performed over an independent set of 100 gauges. Our results show that multivariate ANNs may remarkably improve the estimation of percentiles relative to the benchmark approach from the literature, where Gumbel parameters depend on mean annual precipitation. Finally, we show that the very nature of the proposed ANN models makes them suitable for interpolating predicted sub-daily rainfall quantiles across space and time-aggregation intervals. In the second Part, decision trees are used to combine a selected blend of input geomorphic descriptors for predicting FH. Relative to existing DEM-based approaches, this method is innovative, as it relies on the combination of three characteristics: (1) simple multivariate models, (2) a set of exclusively DEM-based descriptors as input, and (3) an existing FH map as reference information. First, the methods are applied to northern Italy, represented with the MERIT DEM (∼90m resolution), and second, to the whole of Italy, represented with the EU-DEM (25m resolution). The results show that multivariate approaches may (a) significantly enhance flood-prone areas delineation relative to a selected univariate one, (b) provide accurate predictions of expected inundation depths, (c) produce encouraging results in extrapolation, (d) complete the information of imperfect reference maps, and (e) conveniently convert binary maps into continuous representation of FH.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In questo elaborato vengono analizzate differenti tecniche per la detection di jammer attivi e costanti in una comunicazione satellitare in uplink. Osservando un numero limitato di campioni ricevuti si vuole identificare la presenza di un jammer. A tal fine sono stati implementati i seguenti classificatori binari: support vector machine (SVM), multilayer perceptron (MLP), spectrum guarding e autoencoder. Questi algoritmi di apprendimento automatico dipendono dalle features che ricevono in ingresso, per questo motivo è stata posta particolare attenzione alla loro scelta. A tal fine, sono state confrontate le accuratezze ottenute dai detector addestrati utilizzando differenti tipologie di informazione come: i segnali grezzi nel tempo, le statistical features, le trasformate wavelet e lo spettro ciclico. I pattern prodotti dall’estrazione di queste features dai segnali satellitari possono avere dimensioni elevate, quindi, prima della detection, vengono utilizzati i seguenti algoritmi per la riduzione della dimensionalità: principal component analysis (PCA) e linear discriminant analysis (LDA). Lo scopo di tale processo non è quello di eliminare le features meno rilevanti, ma combinarle in modo da preservare al massimo l’informazione, evitando problemi di overfitting e underfitting. Le simulazioni numeriche effettuate hanno evidenziato come lo spettro ciclico sia in grado di fornire le features migliori per la detection producendo però pattern di dimensioni elevate, per questo motivo è stato necessario l’utilizzo di algoritmi di riduzione della dimensionalità. In particolare, l'algoritmo PCA è stato in grado di estrarre delle informazioni migliori rispetto a LDA, le cui accuratezze risentivano troppo del tipo di jammer utilizzato nella fase di addestramento. Infine, l’algoritmo che ha fornito le prestazioni migliori è stato il Multilayer Perceptron che ha richiesto tempi di addestramento contenuti e dei valori di accuratezza elevati.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nel panorama aziendale odierno, risulta essere di fondamentale importanza la capacità, da parte di un’azienda o di una società di servizi, di orientare in modo programmatico la propria innovazione in modo tale da poter essere competitivi sul mercato. In molti casi, questo e significa investire una cospicua somma di denaro in progetti che andranno a migliorare aspetti essenziali del prodotto o del servizio e che avranno un importante impatto sulla trasformazione digitale dell’azienda. Lo studio che viene proposto riguarda in particolar modo due approcci che sono tipicamente in antitesi tra loro proprio per il fatto che si basano su due tipologie di dati differenti, i Big Data e i Thick Data. I due approcci sono rispettivamente il Data Science e il Design Thinking. Nel corso dei seguenti capitoli, dopo aver definito gli approcci di Design Thinking e Data Science, verrà definito il concetto di blending e la problematica che ruota attorno all’intersezione dei due metodi di innovazione. Per mettere in evidenza i diversi aspetti che riguardano la tematica, verranno riportati anche casi di aziende che hanno integrato i due approcci nei loro processi di innovazione, ottenendo importanti risultati. In particolar modo verrà riportato il lavoro di ricerca svolto dall’autore riguardo l'esame, la classificazione e l'analisi della letteratura esistente all'intersezione dell'innovazione guidata dai dati e dal pensiero progettuale. Infine viene riportato un caso aziendale che è stato condotto presso la realtà ospedaliero-sanitaria di Parma in cui, a fronte di una problematica relativa al rapporto tra clinici dell’ospedale e clinici del territorio, si è progettato un sistema innovativo attraverso l’utilizzo del Design Thinking. Inoltre, si cercherà di sviluppare un’analisi critica di tipo “what-if” al fine di elaborare un possibile scenario di integrazione di metodi o tecniche provenienti anche dal mondo del Data Science e applicarlo al caso studio in oggetto.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação apresentada para obtenção do Grau de Doutor em Engenharia do Ambiente pela Universidade Nova de Lisboa,Faculdade de Ciências e Tecnologia

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Earthworks tasks aim at levelling the ground surface at a target construction area and precede any kind of structural construction (e.g., road and railway construction). It is comprised of sequential tasks, such as excavation, transportation, spreading and compaction, and it is strongly based on heavy mechanical equipment and repetitive processes. Under this context, it is essential to optimize the usage of all available resources under two key criteria: the costs and duration of earthwork projects. In this paper, we present an integrated system that uses two artificial intelligence based techniques: data mining and evolutionary multi-objective optimization. The former is used to build data-driven models capable of providing realistic estimates of resource productivity, while the latter is used to optimize resource allocation considering the two main earthwork objectives (duration and cost). Experiments held using real-world data, from a construction site, have shown that the proposed system is competitive when compared with current manual earthwork design.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype - phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. Results: To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. Conclusions: We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene - disease and gene - compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this study is multifaceted: 1) to describe eScience research in acomprehensive way; 2) to help library and information specialists understand the realm of eScience research and the information needs of the community and demonstrate the importance of LIS professionals within the eScience domain; 3) and to explore the current state of curricular content of ALA accredited MLS/MLIS programs to understand the extent to which they prepare new professionals within eScience librarianship. The literature review focuses heavily on eScientists and other data-driven researchers’ information service needs in addition to demonstrating how and why librarians and information specialists can and should fulfill these service gaps and information needs within eScience research. By looking at the current curriculum of American Library Association (ALA) accredited MLS/MLIS programs, we can identify potential gaps in knowledge and where to improve in order to prepare and train new MLS/MLIS graduates to fulfill the needs of eScientists. This investigation is meant to be informative and can be used as a tool for LIS programs to assess their curriculums in comparison to the needs of eScience and other data-driven and networked research. Finally, this investigation will provide awareness and insight into the services needed to support a thriving eScience and data-driven research community to the LIS profession.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Overview of the key aspects and approaches to open access, open data and open science, emphasizing on sharing scientific knowledge for sustainable progress and development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Overview of the growth of policies and a critical appraisal of the issues affecting open access, open data and open science policies. Example policies and a roadmap for open access, open research data and open science are included.