893 resultados para data driven approach
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
Membrane bioreactors (MBRs) are a combination of activated sludge bioreactors and membrane filtration, enabling high quality effluent with a small footprint. However, they can be beset by fouling, which causes an increase in transmembrane pressure (TMP). Modelling and simulation of changes in TMP could be useful to describe fouling through the identification of the most relevant operating conditions. Using experimental data from a MBR pilot plant operated for 462days, two different models were developed: a deterministic model using activated sludge model n°2d (ASM2d) for the biological component and a resistance in-series model for the filtration component as well as a data-driven model based on multivariable regressions. Once validated, these models were used to describe membrane fouling (as changes in TMP over time) under different operating conditions. The deterministic model performed better at higher temperatures (>20°C), constant operating conditions (DO set-point, membrane air-flow, pH and ORP), and high mixed liquor suspended solids (>6.9gL-1) and flux changes. At low pH (<7) or periods with higher pH changes, the data-driven model was more accurate. Changes in the DO set-point of the aerobic reactor that affected the TMP were also better described by the data-driven model. By combining the use of both models, a better description of fouling can be achieved under different operating conditions
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
This study occurred in 2009 and questioned how Ontario secondary school principals perceived their role had changed, over a 7 year period, in response to the increased demands of data-driven school environments. Specifically, it sought to identify principals' perceptions on how high-stakes testing and data-driven environments had affected their role, tasks, and accountability responsibilities. This study contextualized the emergence of the Education Quality and Accountability Offices (EQAO) as a central influence in the creation of data-driven school environments, and conceptualized the role of the principal as using data to inform and persuade a shift in thinking about the use of data to improve instruction and student achievement. The findings of the study suggest that data-driven environments had helped principals reclaim their positional power as instructional leaders, using data as an avenue back into the classroom. The use of data shifted the responsibilities of the principal to persuade teachers to work collaboratively to improve classroom instruction in order to demonstrate accountability.
Resumo:
In the current study, epidemiology study is done by means of literature survey in groups identified to be at higher potential for DDIs as well as in other cases to explore patterns of DDIs and the factors affecting them. The structure of the FDA Adverse Event Reporting System (FAERS) database is studied and analyzed in detail to identify issues and challenges in data mining the drug-drug interactions. The necessary pre-processing algorithms are developed based on the analysis and the Apriori algorithm is modified to suit the process. Finally, the modules are integrated into a tool to identify DDIs. The results are compared using standard drug interaction database for validation. 31% of the associations obtained were identified to be new and the match with existing interactions was 69%. This match clearly indicates the validity of the methodology and its applicability to similar databases. Formulation of the results using the generic names expanded the relevance of the results to a global scale. The global applicability helps the health care professionals worldwide to observe caution during various stages of drug administration thus considerably enhancing pharmacovigilance
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced
Resumo:
Title: Data-Driven Text Generation using Neural Networks Speaker: Pavlos Vougiouklis, University of Southampton Abstract: Recent work on neural networks shows their great potential at tackling a wide variety of Natural Language Processing (NLP) tasks. This talk will focus on the Natural Language Generation (NLG) problem and, more specifically, on the extend to which neural network language models could be employed for context-sensitive and data-driven text generation. In addition, a neural network architecture for response generation in social media along with the training methods that enable it to capture contextual information and effectively participate in public conversations will be discussed. Speaker Bio: Pavlos Vougiouklis obtained his 5-year Diploma in Electrical and Computer Engineering from the Aristotle University of Thessaloniki in 2013. He was awarded an MSc degree in Software Engineering from the University of Southampton in 2014. In 2015, he joined the Web and Internet Science (WAIS) research group of the University of Southampton and he is currently working towards the acquisition of his PhD degree in the field of Neural Network Approaches for Natural Language Processing. Title: Provenance is Complicated and Boring — Is there a solution? Speaker: Darren Richardson, University of Southampton Abstract: Paper trails, auditing, and accountability — arguably not the sexiest terms in computer science. But then you discover that you've possibly been eating horse-meat, and the importance of provenance becomes almost palpable. Having accepted that we should be creating provenance-enabled systems, the challenge of then communicating that provenance to casual users is not trivial: users should not have to have a detailed working knowledge of your system, and they certainly shouldn't be expected to understand the data model. So how, then, do you give users an insight into the provenance, without having to build a bespoke system for each and every different provenance installation? Speaker Bio: Darren is a final year Computer Science PhD student. He completed his undergraduate degree in Electronic Engineering at Southampton in 2012.
Resumo:
Transient neural assemblies mediated by synchrony in particular frequency ranges are thought to underlie cognition. We propose a new approach to their detection, using empirical mode decomposition (EMD), a data-driven approach removing the need for arbitrary bandpass filter cut-offs. Phase locking is sought between modes. We explore the features of EMD, including making a quantitative assessment of its ability to preserve phase content of signals, and proceed to develop a statistical framework with which to assess synchrony episodes. Furthermore, we propose a new approach to ensure signal decomposition using EMD. We adapt the Hilbert spectrum to a time-frequency representation of phase locking and are able to locate synchrony successfully in time and frequency between synthetic signals reminiscent of EEG. We compare our approach, which we call EMD phase locking analysis (EMDPL) with existing methods and show it to offer improved time-frequency localisation of synchrony.
Resumo:
Synoptic climatology relates the atmospheric circulation with the surface environment. The aim of this study is to examine the variability of the surface meteorological patterns, which are developing under different synoptic scale categories over a suburban area with complex topography. Multivariate Data Analysis techniques were performed to a data set with surface meteorological elements. Three principal components related to the thermodynamic status of the surface environment and the two components of the wind speed were found. The variability of the surface flows was related with atmospheric circulation categories by applying Correspondence Analysis. Similar surface thermodynamic fields develop under cyclonic categories, which are contrasted with the anti-cyclonic category. A strong, steady wind flow characterized by high shear values develops under the cyclonic Closed Low and the anticyclonic H–L categories, in contrast to the variable weak flow under the anticyclonic Open Anticyclone category.
Resumo:
Includes bibliography
Resumo:
Breast cancer (BC) is the most often diagnosed cancer entity of women worldwide. No molecular biomarkers are usable in the clinical routine for the early detection of BC. Proteomics is one of the dynamic tools for the successful examination of changes on the protein level. In this thesis different proteomics-based investigations were performed for the detection of protein and autoantibody biomarkers in serum samples of BC and healthy (CTRL) subjects. First, protein levels of candidates from previous profiling studies were investigated via antibody-microarray platform. Three proteins were found in distinct levels in both groups: secretoglobin family 1D member 1, alpha-2 macroglobulin and inter-alpha-trypsin inhibitor heavy chain family member 4. The second part was dedicated to the de novo exploration of potentially immunogenic tumor antigens (TA’s) with immunoprecipitation and Western immunoblotting followed by identification over mass spectrometry. Autoantibody levels were verified in individual serum profiling via the protein microarray platform. Two autoantibody’ cohorts (anti-Histone 2B and anti-Recoverin) were found in different levels in both groups. The findings of this PhD thesis underline deregulated serum protein and autoantibody levels in the presence of BC. Further investigations are needed to confirm the results in an independent study population.
Resumo:
This work is focused on the study of saltwater intrusion in coastal aquifers, and in particular on the realization of conceptual schemes to evaluate the risk associated with it. Saltwater intrusion depends on different natural and anthropic factors, both presenting a strong aleatory behaviour, that should be considered for an optimal management of the territory and water resources. Given the uncertainty of problem parameters, the risk associated with salinization needs to be cast in a probabilistic framework. On the basis of a widely adopted sharp interface formulation, key hydrogeological problem parameters are modeled as random variables, and global sensitivity analysis is used to determine their influence on the position of saltwater interface. The analyses presented in this work rely on an efficient model reduction technique, based on Polynomial Chaos Expansion, able to combine the best description of the model without great computational burden. When the assumptions of classical analytical models are not respected, and this occurs several times in the applications to real cases of study, as in the area analyzed in the present work, one can adopt data-driven techniques, based on the analysis of the data characterizing the system under study. It follows that a model can be defined on the basis of connections between the system state variables, with only a limited number of assumptions about the "physical" behaviour of the system.