940 resultados para data-driven modelling
Resumo:
Performance modelling is a useful tool in the lifeycle of high performance scientific software, such as weather and climate models, especially as a means of ensuring efficient use of available computing resources. In particular, sufficiently accurate performance prediction could reduce the effort and experimental computer time required when porting and optimising a climate model to a new machine. In this paper, traditional techniques are used to predict the computation time of a simple shallow water model which is illustrative of the computation (and communication) involved in climate models. These models are compared with real execution data gathered on AMD Opteron-based systems, including several phases of the U.K. academic community HPC resource, HECToR. Some success is had in relating source code to achieved performance for the K10 series of Opterons, but the method is found to be inadequate for the next-generation Interlagos processor. The experience leads to the investigation of a data-driven application benchmarking approach to performance modelling. Results for an early version of the approach are presented using the shallow model as an example.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
Membrane bioreactors (MBRs) are a combination of activated sludge bioreactors and membrane filtration, enabling high quality effluent with a small footprint. However, they can be beset by fouling, which causes an increase in transmembrane pressure (TMP). Modelling and simulation of changes in TMP could be useful to describe fouling through the identification of the most relevant operating conditions. Using experimental data from a MBR pilot plant operated for 462days, two different models were developed: a deterministic model using activated sludge model n°2d (ASM2d) for the biological component and a resistance in-series model for the filtration component as well as a data-driven model based on multivariable regressions. Once validated, these models were used to describe membrane fouling (as changes in TMP over time) under different operating conditions. The deterministic model performed better at higher temperatures (>20°C), constant operating conditions (DO set-point, membrane air-flow, pH and ORP), and high mixed liquor suspended solids (>6.9gL-1) and flux changes. At low pH (<7) or periods with higher pH changes, the data-driven model was more accurate. Changes in the DO set-point of the aerobic reactor that affected the TMP were also better described by the data-driven model. By combining the use of both models, a better description of fouling can be achieved under different operating conditions
Resumo:
The global vegetation response to climate and atmospheric CO2 changes between the last glacial maximum and recent times is examined using an equilibrium vegetation model (BIOME4), driven by output from 17 climate simulations from the Palaeoclimate Modelling Intercomparison Project. Features common to all of the simulations include expansion of treeless vegetation in high northern latitudes; southward displacement and fragmentation of boreal and temperate forests; and expansion of drought-tolerant biomes in the tropics. These features are broadly consistent with pollen-based reconstructions of vegetation distribution at the last glacial maximum. Glacial vegetation in high latitudes reflects cold and dry conditions due to the low CO2 concentration and the presence of large continental ice sheets. The extent of drought-tolerant vegetation in tropical and subtropical latitudes reflects a generally drier low-latitude climate. Comparisons of the observations with BIOME4 simulations, with and without consideration of the direct physiological effect of CO2 concentration on C3 photosynthesis, suggest an important additional role of low CO2 concentration in restricting the extent of forests, especially in the tropics. Global forest cover was overestimated by all models when climate change alone was used to drive BIOME4, and estimated more accurately when physiological effects of CO2 concentration were included. This result suggests that both CO2 effects and climate effects were important in determining glacial-interglacial changes in vegetation. More realistic simulations of glacial vegetation and climate will need to take into account the feedback effects of these structural and physiological changes on the climate.
Resumo:
The creation of Causal Loop Diagrams (CLDs) is a major phase in the System Dynamics (SD) life-cycle, since the created CLDs express dependencies and feedback in the system under study, as well as, guide modellers in building meaningful simulation models. The cre-ation of CLDs is still subject to the modeller's domain expertise (mental model) and her ability to abstract the system, because of the strong de-pendency on semantic knowledge. Since the beginning of SD, available system data sources (written and numerical models) have always been sparsely available, very limited and imperfect and thus of little benefit to the whole modelling process. However, in recent years, we have seen an explosion in generated data, especially in all business related domains that are analysed via Business Dynamics (BD). In this paper, we intro-duce a systematic tool supported CLD creation approach, which analyses and utilises available disparate data sources within the business domain. We demonstrate the application of our methodology on a given business use-case and evaluate the resulting CLD. Finally, we propose directions for future research to further push the automation in the CLD creation and increase confidence in the generated CLDs.
Resumo:
The present Dissertation shows how recent statistical analysis tools and open datasets can be exploited to improve modelling accuracy in two distinct yet interconnected domains of flood hazard (FH) assessment. In the first Part, unsupervised artificial neural networks are employed as regional models for sub-daily rainfall extremes. The models aim to learn a robust relation to estimate locally the parameters of Gumbel distributions of extreme rainfall depths for any sub-daily duration (1-24h). The predictions depend on twenty morphoclimatic descriptors. A large study area in north-central Italy is adopted, where 2238 annual maximum series are available. Validation is performed over an independent set of 100 gauges. Our results show that multivariate ANNs may remarkably improve the estimation of percentiles relative to the benchmark approach from the literature, where Gumbel parameters depend on mean annual precipitation. Finally, we show that the very nature of the proposed ANN models makes them suitable for interpolating predicted sub-daily rainfall quantiles across space and time-aggregation intervals. In the second Part, decision trees are used to combine a selected blend of input geomorphic descriptors for predicting FH. Relative to existing DEM-based approaches, this method is innovative, as it relies on the combination of three characteristics: (1) simple multivariate models, (2) a set of exclusively DEM-based descriptors as input, and (3) an existing FH map as reference information. First, the methods are applied to northern Italy, represented with the MERIT DEM (∼90m resolution), and second, to the whole of Italy, represented with the EU-DEM (25m resolution). The results show that multivariate approaches may (a) significantly enhance flood-prone areas delineation relative to a selected univariate one, (b) provide accurate predictions of expected inundation depths, (c) produce encouraging results in extrapolation, (d) complete the information of imperfect reference maps, and (e) conveniently convert binary maps into continuous representation of FH.
Wavelet correlation between subjects: A time-scale data driven analysis for brain mapping using fMRI
Resumo:
Functional magnetic resonance imaging (fMRI) based on BOLD signal has been used to indirectly measure the local neural activity induced by cognitive tasks or stimulation. Most fMRI data analysis is carried out using the general linear model (GLM), a statistical approach which predicts the changes in the observed BOLD response based on an expected hemodynamic response function (HRF). In cases when the task is cognitively complex or in cases of diseases, variations in shape and/or delay may reduce the reliability of results. A novel exploratory method using fMRI data, which attempts to discriminate between neurophysiological signals induced by the stimulation protocol from artifacts or other confounding factors, is introduced in this paper. This new method is based on the fusion between correlation analysis and the discrete wavelet transform, to identify similarities in the time course of the BOLD signal in a group of volunteers. We illustrate the usefulness of this approach by analyzing fMRI data from normal subjects presented with standardized human face pictures expressing different degrees of sadness. The results show that the proposed wavelet correlation analysis has greater statistical power than conventional GLM or time domain intersubject correlation analysis. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The aim of this paper is to develop models for experimental open-channel water delivery systems and assess the use of three data-driven modeling tools toward that end. Water delivery canals are nonlinear dynamical systems and thus should be modeled to meet given operational requirements while capturing all relevant dynamics, including transport delays. Typically, the derivation of first principle models for open-channel systems is based on the use of Saint-Venant equations for shallow water, which is a time-consuming task and demands for specific expertise. The present paper proposes and assesses the use of three data-driven modeling tools: artificial neural networks, composite local linear models and fuzzy systems. The canal from Hydraulics and Canal Control Nucleus (A parts per thousand vora University, Portugal) will be used as a benchmark: The models are identified using data collected from the experimental facility, and then their performances are assessed based on suitable validation criterion. The performance of all models is compared among each other and against the experimental data to show the effectiveness of such tools to capture all significant dynamics within the canal system and, therefore, provide accurate nonlinear models that can be used for simulation or control. The models are available upon request to the authors.
Resumo:
Conferência: CONTROLO’2012 - 16-18 July 2012 - Funchal
Resumo:
Project submitted as part requirement for the degree of Masters in English teaching,
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
As huge amounts of data become available in organizations and society, specific data analytics skills and techniques are needed to explore this data and extract from it useful patterns, tendencies, models or other useful knowledge, which could be used to support the decision-making process, to define new strategies or to understand what is happening in a specific field. Only with a deep understanding of a phenomenon it is possible to fight it. In this paper, a data-driven analytics approach is used for the analysis of the increasing incidence of fatalities by pneumonia in the Portuguese population, characterizing the disease and its incidence in terms of fatalities, knowledge that can be used to define appropriate strategies that can aim to reduce this phenomenon, which has increased more than 65% in a decade.
Resumo:
Systemidentification, evolutionary automatic, data-driven model, fuzzy Takagi-Sugeno grammar, genotype interpretability, toxicity-prediction
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.