783 resultados para Neural Network Models for Competing Risks Data
Resumo:
Dissertação para a obtenção do grau de Mestre em Engenharia Electrotécnica Ramo de Energia
Resumo:
Load forecasting has gradually becoming a major field of research in electricity industry. Therefore, Load forecasting is extremely important for the electric sector under deregulated environment as it provides a useful support to the power system management. Accurate power load forecasting models are required to the operation and planning of a utility company, and they have received increasing attention from researches of this field study. Many mathematical methods have been developed for load forecasting. This work aims to develop and implement a load forecasting method for short-term load forecasting (STLF), based on Holt-Winters exponential smoothing and an artificial neural network (ANN). One of the main contributions of this paper is the application of Holt-Winters exponential smoothing approach to the forecasting problem and, as an evaluation of the past forecasting work, data mining techniques are also applied to short-term Load forecasting. Both ANN and Holt-Winters exponential smoothing approaches are compared and evaluated.
Resumo:
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
Resumo:
The use of Geographic Information Systems has revolutionalized the handling and the visualization of geo-referenced data and has underlined the critic role of spatial analysis. The usual tools for such a purpose are geostatistics which are widely used in Earth science. Geostatistics are based upon several hypothesis which are not always verified in practice. On the other hand, Artificial Neural Network (ANN) a priori can be used without special assumptions and are known to be flexible. This paper proposes to discuss the application of ANN in the case of the interpolation of a geo-referenced variable.
Resumo:
MOTIVATION: Combinatorial interactions of transcription factors with cis-regulatory elements control the dynamic progression through successive cellular states and thus underpin all metazoan development. The construction of network models of cis-regulatory elements, therefore, has the potential to generate fundamental insights into cellular fate and differentiation. Haematopoiesis has long served as a model system to study mammalian differentiation, yet modelling based on experimentally informed cis-regulatory interactions has so far been restricted to pairs of interacting factors. Here, we have generated a Boolean network model based on detailed cis-regulatory functional data connecting 11 haematopoietic stem/progenitor cell (HSPC) regulator genes. RESULTS: Despite its apparent simplicity, the model exhibits surprisingly complex behaviour that we charted using strongly connected components and shortest-path analysis in its Boolean state space. This analysis of our model predicts that HSPCs display heterogeneous expression patterns and possess many intermediate states that can act as 'stepping stones' for the HSPC to achieve a final differentiated state. Importantly, an external perturbation or 'trigger' is required to exit the stem cell state, with distinct triggers characterizing maturation into the various different lineages. By focusing on intermediate states occurring during erythrocyte differentiation, from our model we predicted a novel negative regulation of Fli1 by Gata1, which we confirmed experimentally thus validating our model. In conclusion, we demonstrate that an advanced mammalian regulatory network model based on experimentally validated cis-regulatory interactions has allowed us to make novel, experimentally testable hypotheses about transcriptional mechanisms that control differentiation of mammalian stem cells. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Resumo:
The paper presents an approach for mapping of precipitation data. The main goal is to perform spatial predictions and simulations of precipitation fields using geostatistical methods (ordinary kriging, kriging with external drift) as well as machine learning algorithms (neural networks). More practically, the objective is to reproduce simultaneously both the spatial patterns and the extreme values. This objective is best reached by models integrating geostatistics and machine learning algorithms. To demonstrate how such models work, two case studies have been considered: first, a 2-day accumulation of heavy precipitation and second, a 6-day accumulation of extreme orographic precipitation. The first example is used to compare the performance of two optimization algorithms (conjugate gradients and Levenberg-Marquardt) of a neural network for the reproduction of extreme values. Hybrid models, which combine geostatistical and machine learning algorithms, are also treated in this context. The second dataset is used to analyze the contribution of radar Doppler imagery when used as external drift or as input in the models (kriging with external drift and neural networks). Model assessment is carried out by comparing independent validation errors as well as analyzing data patterns.
Resumo:
In occupational exposure assessment of airborne contaminants, exposure levels can either be estimated through repeated measurements of the pollutant concentration in air, expert judgment or through exposure models that use information on the conditions of exposure as input. In this report, we propose an empirical hierarchical Bayesian model to unify these approaches. Prior to any measurement, the hygienist conducts an assessment to generate prior distributions of exposure determinants. Monte-Carlo samples from these distributions feed two level-2 models: a physical, two-compartment model, and a non-parametric, neural network model trained with existing exposure data. The outputs of these two models are weighted according to the expert's assessment of their relevance to yield predictive distributions of the long-term geometric mean and geometric standard deviation of the worker's exposure profile (level-1 model). Bayesian inferences are then drawn iteratively from subsequent measurements of worker exposure. Any traditional decision strategy based on a comparison with occupational exposure limits (e.g. mean exposure, exceedance strategies) can then be applied. Data on 82 workers exposed to 18 contaminants in 14 companies were used to validate the model with cross-validation techniques. A user-friendly program running the model is available upon request.
Resumo:
Epidemiological data indicate that 75% of subjects with major psychiatric disorders have their onset in the age range of 17-24 years. An estimated 35-50% of college and university students drop out prematurely due to insufficient coping skills under chronic stress, while 85% of students receiving a psychiatric diagnosis withdraw from college/university prior to the completion of their education. In this study we aimed at developing standardized means for identifying students with insufficient coping skills under chronic stress and at risk for mental health problems. A sample of 1,217 college students from 3 different sites in the U.S. and Switzerland completed 2 self-report questionnaires: the Coping Strategies Inventory "COPE" and the Zurich Health Questionnaire "ZHQ" which assesses "regular exercises", "consumption behavior", "impaired physical health", "psychosomatic disturbances", and "impaired mental health". The data were subjected to structure analyses by means of a Neural Network approach. We found 2 highly stable and reproducible COPE scales that explained the observed inter-individual variation in coping behavior sufficiently well and in a socio-culturally independent way. The scales reflected basic coping behavior in terms of "activity-passivity" and "defeatism-resilience", and in the sense of stable, socio-culturally independent personality traits. Correlation analyses carried out for external validation revealed a close relationship between high scores on the defeatism scale and impaired physical and mental health. This underlined the role of insufficient coping behavior as a risk factor for physical and mental health problems. The combined COPE and ZHQ instruments appear to constitute powerful screening tools for insufficient coping skills under chronic stress and for risks of mental health problems.
Resumo:
OBJECTIVE. The main goal of this paper is to obtain a classification model based on feed-forward multilayer perceptrons in order to improve postpartum depression prediction during the 32 weeks after childbirth with a high sensitivity and specificity and to develop a tool to be integrated in a decision support system for clinicians. MATERIALS AND METHODS. Multilayer perceptrons were trained on data from 1397 women who had just given birth, from seven Spanish general hospitals, including clinical, environmental and genetic variables. A prospective cohort study was made just after delivery, at 8 weeks and at 32 weeks after delivery. The models were evaluated with the geometric mean of accuracies using a hold-out strategy. RESULTS. Multilayer perceptrons showed good performance (high sensitivity and specificity) as predictive models for postpartum depression. CONCLUSIONS. The use of these models in a decision support system can be clinically evaluated in future work. The analysis of the models by pruning leads to a qualitative interpretation of the influence of each variable in the interest of clinical protocols.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
The localization of Last Glacial Maximum (LGM) refugia is crucial information to understand a species' history and predict its reaction to future climate changes. However, many phylogeographical studies often lack sampling designs intensive enough to precisely localize these refugia. The hairy land snail Trochulus villosus has a small range centred on Switzerland, which could be intensively covered by sampling 455 individuals from 52 populations. Based on mitochondrial DNA sequences (COI and 16S), we identified two divergent lineages with distinct geographical distributions. Bayesian skyline plots suggested that both lineages expanded at the end of the LGM. To find where the origin populations were located, we applied the principles of ancestral character reconstruction and identified a candidate refugium for each mtDNA lineage: the French Jura and Central Switzerland, both ice-free during the LGM. Additional refugia, however, could not be excluded, as suggested by the microsatellite analysis of a population subset. Modelling the LGM niche of T. villosus, we showed that suitable climatic conditions were expected in the inferred refugia, but potentially also in the nunataks of the alpine ice shield. In a model selection approach, we compared several alternative recolonization scenarios by estimating the Akaike information criterion for their respective maximum-likelihood migration rates. The 'two refugia' scenario received by far the best support given the distribution of genetic diversity in T. villosus populations. Provided that fine-scale sampling designs and various analytical approaches are combined, it is possible to refine our necessary understanding of species responses to environmental changes.
Resumo:
The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.
Resumo:
Soil information is needed for managing the agricultural environment. The aim of this study was to apply artificial neural networks (ANNs) for the prediction of soil classes using orbital remote sensing products, terrain attributes derived from a digital elevation model and local geology information as data sources. This approach to digital soil mapping was evaluated in an area with a high degree of lithologic diversity in the Serra do Mar. The neural network simulator used in this study was JavaNNS and the backpropagation learning algorithm. For soil class prediction, different combinations of the selected discriminant variables were tested: elevation, declivity, aspect, curvature, curvature plan, curvature profile, topographic index, solar radiation, LS topographic factor, local geology information, and clay mineral indices, iron oxides and the normalized difference vegetation index (NDVI) derived from an image of a Landsat-7 Enhanced Thematic Mapper Plus (ETM+) sensor. With the tested sets, best results were obtained when all discriminant variables were associated with geological information (overall accuracy 93.2 - 95.6 %, Kappa index 0.924 - 0.951, for set 13). Excluding the variable profile curvature (set 12), overall accuracy ranged from 93.9 to 95.4 % and the Kappa index from 0.932 to 0.948. The maps based on the neural network classifier were consistent and similar to conventional soil maps drawn for the study area, although with more spatial details. The results show the potential of ANNs for soil class prediction in mountainous areas with lithological diversity.
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
Brain fluctuations at rest are not random but are structured in spatial patterns of correlated activity across different brain areas. The question of how resting-state functional connectivity (FC) emerges from the brain's anatomical connections has motivated several experimental and computational studies to understand structure-function relationships. However, the mechanistic origin of resting state is obscured by large-scale models' complexity, and a close structure-function relation is still an open problem. Thus, a realistic but simple enough description of relevant brain dynamics is needed. Here, we derived a dynamic mean field model that consistently summarizes the realistic dynamics of a detailed spiking and conductance-based synaptic large-scale network, in which connectivity is constrained by diffusion imaging data from human subjects. The dynamic mean field approximates the ensemble dynamics, whose temporal evolution is dominated by the longest time scale of the system. With this reduction, we demonstrated that FC emerges as structured linear fluctuations around a stable low firing activity state close to destabilization. Moreover, the model can be further and crucially simplified into a set of motion equations for statistical moments, providing a direct analytical link between anatomical structure, neural network dynamics, and FC. Our study suggests that FC arises from noise propagation and dynamical slowing down of fluctuations in an anatomically constrained dynamical system. Altogether, the reduction from spiking models to statistical moments presented here provides a new framework to explicitly understand the building up of FC through neuronal dynamics underpinned by anatomical connections and to drive hypotheses in task-evoked studies and for clinical applications.