903 resultados para Data-driven knowledge acquisition
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
Membrane bioreactors (MBRs) are a combination of activated sludge bioreactors and membrane filtration, enabling high quality effluent with a small footprint. However, they can be beset by fouling, which causes an increase in transmembrane pressure (TMP). Modelling and simulation of changes in TMP could be useful to describe fouling through the identification of the most relevant operating conditions. Using experimental data from a MBR pilot plant operated for 462days, two different models were developed: a deterministic model using activated sludge model n°2d (ASM2d) for the biological component and a resistance in-series model for the filtration component as well as a data-driven model based on multivariable regressions. Once validated, these models were used to describe membrane fouling (as changes in TMP over time) under different operating conditions. The deterministic model performed better at higher temperatures (>20°C), constant operating conditions (DO set-point, membrane air-flow, pH and ORP), and high mixed liquor suspended solids (>6.9gL-1) and flux changes. At low pH (<7) or periods with higher pH changes, the data-driven model was more accurate. Changes in the DO set-point of the aerobic reactor that affected the TMP were also better described by the data-driven model. By combining the use of both models, a better description of fouling can be achieved under different operating conditions
Resumo:
The study investigates organisational learning and knowledge acquisition of wood-based prefabricated building manufacturers. This certain group of case companies was chosen, because their management and their employees generally have a strong manufacturing and engineering background, while the housing sector is characterised by national norms, regulations, as well as local building styles. Considering this setting, it was investigated, how the case companies develop organisational learning capabilities, acquire and transfer knowledge for their internationalisation. The theoretical framework of this study constitutes the knowledge-based conceptualisation of internationalisation, which combines the traditional internationalisation process, as well as the international new venture perspective based on their commonalities in the knowledge-based view of the firm. Different theories of internationalisation, including the network-perspective, were outlined and a framework on organisational learning and knowledge acquisition was established. The empirical research followed a qualitative approach, deploying a multiple-case study with five case companies from Austria, Finland and Germany. In the study, the development of the wood-based prefabricated building industry and of the case companies are described, and the motives, facilitators and challenges for foreign expansion, as well as the companies’ internationalisation approaches are compared. Different methods of how companies facilitate the knowledge-exchange or learn about new markets are also outlined. Experience, market knowledge and personal contacts are considered essential for the internationalisation process. The major finding of the study is that it is not necessary to acquire the market knowledge internally in a slow process as proposed by the Uppsala model. In four cases companies engaged knowledge in symbiotic relations with local business partners. Thereby, the building manufacturers contribute their design and production capabilities, and in return, their local partners provide them with knowledge about the market and local regulations; while they manage the sales and construction operations. Thus, the study provides strong evidence for the propositions of network perspective. One case company developed the knowledge internally in a gradual process: it entered the market sequentially with several business lines, showing an increasing level of complexity. In both of the observed strategies, single-loop and double-loop learning processes occurred.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
This study occurred in 2009 and questioned how Ontario secondary school principals perceived their role had changed, over a 7 year period, in response to the increased demands of data-driven school environments. Specifically, it sought to identify principals' perceptions on how high-stakes testing and data-driven environments had affected their role, tasks, and accountability responsibilities. This study contextualized the emergence of the Education Quality and Accountability Offices (EQAO) as a central influence in the creation of data-driven school environments, and conceptualized the role of the principal as using data to inform and persuade a shift in thinking about the use of data to improve instruction and student achievement. The findings of the study suggest that data-driven environments had helped principals reclaim their positional power as instructional leaders, using data as an avenue back into the classroom. The use of data shifted the responsibilities of the principal to persuade teachers to work collaboratively to improve classroom instruction in order to demonstrate accountability.
Resumo:
As our world becomes increasingly interconnected, diseases can spread at a faster and faster rate. Recent years have seen large-scale influenza, cholera and ebola outbreaks and failing to react in a timely manner to outbreaks leads to a larger spread and longer persistence of the outbreak. Furthermore, diseases like malaria, polio and dengue fever have been eliminated in some parts of the world but continue to put a substantial burden on countries in which these diseases are still endemic. To reduce the disease burden and eventually move towards countrywide elimination of diseases such as malaria, understanding human mobility is crucial for both planning interventions as well as estimation of the prevalence of the disease. In this talk, I will discuss how various data sources can be used to estimate human movements, population distributions and disease prevalence as well as the relevance of this information for intervention planning. Particularly anonymised mobile phone data has been shown to be a valuable source of information for countries with unreliable population density and migration data and I will present several studies where mobile phone data has been used to derive these measures.
Resumo:
Seamless phase II/III clinical trials are conducted in two stages with treatment selection at the first stage. In the first stage, patients are randomized to a control or one of k > 1 experimental treatments. At the end of this stage, interim data are analysed, and a decision is made concerning which experimental treatment should continue to the second stage. If the primary endpoint is observable only after some period of follow-up, at the interim analysis data may be available on some early outcome on a larger number of patients than those for whom the primary endpoint is available. These early endpoint data can thus be used for treatment selection. For two previously proposed approaches, the power has been shown to be greater for one or other method depending on the true treatment effects and correlations. We propose a new approach that builds on the previously proposed approaches and uses data available at the interim analysis to estimate these parameters and then, on the basis of these estimates, chooses the treatment selection method with the highest probability of correctly selecting the most effective treatment. This method is shown to perform well compared with the two previously described methods for a wide range of true parameter values. In most cases, the performance of the new method is either similar to or, in some cases, better than either of the two previously proposed methods.
Resumo:
Performance modelling is a useful tool in the lifeycle of high performance scientific software, such as weather and climate models, especially as a means of ensuring efficient use of available computing resources. In particular, sufficiently accurate performance prediction could reduce the effort and experimental computer time required when porting and optimising a climate model to a new machine. In this paper, traditional techniques are used to predict the computation time of a simple shallow water model which is illustrative of the computation (and communication) involved in climate models. These models are compared with real execution data gathered on AMD Opteron-based systems, including several phases of the U.K. academic community HPC resource, HECToR. Some success is had in relating source code to achieved performance for the K10 series of Opterons, but the method is found to be inadequate for the next-generation Interlagos processor. The experience leads to the investigation of a data-driven application benchmarking approach to performance modelling. Results for an early version of the approach are presented using the shallow model as an example.
Resumo:
As the fidelity of virtual environments (VE) continues to increase, the possibility of using them as training platforms is becoming increasingly realistic for a variety of application domains, including military and emergency personnel training. In the past, there was much debate on whether the acquisition and subsequent transfer of spatial knowledge from VEs to the real world is possible, or whether the differences in medium during training would essentially be an obstacle to truly learning geometric space. In this paper, the authors present various cognitive and environmental factors that not only contribute to this process, but also interact with each other to a certain degree, leading to a variable exposure time requirement in order for the process of spatial knowledge acquisition (SKA) to occur. The cognitive factors that the authors discuss include a variety of individual user differences such as: knowledge and experience; cognitive gender differences; aptitude and spatial orientation skill; and finally, cognitive styles. Environmental factors discussed include: Size, Spatial layout complexity and landmark distribution. It may seem obvious that since every individual's brain is unique - not only through experience, but also through genetic predisposition that a one size fits all approach to training would be illogical. Furthermore, considering that various cognitive differences may further emerge when a certain stimulus is present (e.g. complex environmental space), it would make even more sense to understand how these factors can impact spatial memory, and to try to adapt the training session by providing visual/auditory cues as well as by changing the exposure time requirements for each individual. The impact of this research domain is important to VE training in general, however within service and military domains, guaranteeing appropriate spatial training is critical in order to ensure that disorientation does not occur in a life or death scenario.