184 resultados para MLP
Resumo:
The rationale of this study was to investigate molecular flexibility and its influence on physicochemical properties with a view to uncovering additional information on the fuzzy concept of dynamic molecular structure. Indeed, it is now known that computed molecular interaction fields (MIFs) such as molecular electrostatic potentials (MEPs) and lipophilicity potentials (MLPs) are conformation-dependent, as are dipole moments. A database of 125 compounds was used whose conformational space was explored, while conformation-dependent parameters were computed for each non-redundant conformer found in the conformational space of the compounds. These parameters were the virtual log P (log P(MLP), calculated by a MLP approach), the apolar surface area (ASA), polar surface area (PSA), and solvent-accessible surface (SAS). For each compound, the range taken by each parameter (its property space) was divided by the number of rotors taken as an index of flexibility, yielding a parameter termed 'molecular sensitivity'. This parameter was poorly correlated with others (i.e., it contains novel information) and showed the compounds to fall into two broad classes. 'Sensitive' molecules are those whose computed property ranges are markedly sensitive to conformational effects, whereas 'insensitive' (in fact, less sensitive) molecules have property ranges which are comparatively less affected by conformational fluctuations. A pharmacokinetic application is presented.
Resumo:
The present research deals with the review of the analysis and modeling of Swiss franc interest rate curves (IRC) by using unsupervised (SOM, Gaussian Mixtures) and supervised machine (MLP) learning algorithms. IRC are considered as objects embedded into different feature spaces: maturities; maturity-date, parameters of Nelson-Siegel model (NSM). Analysis of NSM parameters and their temporal and clustering structures helps to understand the relevance of model and its potential use for the forecasting. Mapping of IRC in a maturity-date feature space is presented and analyzed for the visualization and forecasting purposes.
Resumo:
Methods for the extraction of features from physiological datasets are growing needs as clinical investigations of Alzheimer’s disease (AD) in large and heterogeneous population increase. General tools allowing diagnostic regardless of recording sites, such as different hospitals, are essential and if combined to inexpensive non-invasive methods could critically improve mass screening of subjects with AD. In this study, we applied three state of the art multiway array decomposition (MAD) methods to extract features from electroencephalograms (EEGs) of AD patients obtained from multiple sites. In comparison to MAD, spectral-spatial average filter (SSFs) of control and AD subjects were used as well as a common blind source separation method, algorithm for multiple unknown signal extraction (AMUSE). We trained a feed-forward multilayer perceptron (MLP) to validate and optimize AD classification from two independent databases. Using a third EEG dataset, we demonstrated that features extracted from MAD outperformed features obtained from SSFs AMUSE in terms of root mean squared error (RMSE) and reaching up to 100% of accuracy in test condition. We propose that MAD maybe a useful tool to extract features for AD diagnosis offering great generalization across multi-site databases and opening doors to the discovery of new characterization of the disease.
Resumo:
OBJECTIVES: This study aimed at measuring the lipophilicity and ionization constants of diastereoisomeric dipeptides, interpreting them in terms of conformational behavior, and developing statistical models to predict them. METHODS: A series of 20 dipeptides of general structure NH(2) -L-X-(L or D)-His-OMe was designed and synthetized. Their experimental ionization constants (pK(1) , pK(2) and pK(3) ) and lipophilicity parameters (log P(N) and log D(7.4) ) were measured by potentiometry. Molecular modeling in three media (vacuum, water, and chloroform) was used to explore and sample their conformational space, and for each stored conformer to calculate their radius of gyration, virtual log P (preferably written as log P(MLP) , meaning obtained by the molecular lipophilicity potential (MLP) method) and polar surface area (PSA). Means and ranges were calculated for these properties, as was their sensitivity (i.e., the ratio between property range and number of rotatable bonds). RESULTS: Marked differences between diastereoisomers were seen in their experimental ionization constants and lipophilicity parameters. These differences are explained by molecular flexibility, configuration-dependent differences in intramolecular interactions, and accessibility of functional groups. Multiple linear equations correlated experimental lipophilicity parameters and ionization constants with PSA range and other calculated parameters. CONCLUSION: This study documents the differences in lipophilicity and ionization constants between diastereoisomeric dipeptides. Such configuration-dependent differences are shown to depend markedly on differences in conformational behavior and to be amenable to multiple linear regression. Chirality 24:566-576, 2012. © 2012 Wiley Periodicals, Inc.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.
Resumo:
The objective of this work was to evaluate sampling density on the prediction accuracy of soil orders, with high spatial resolution, in a viticultural zone of Serra Gaúcha, Southern Brazil. A digital elevation model (DEM), a cartographic base, a conventional soil map, and the Idrisi software were used. Seven predictor variables were calculated and read along with soil classes in randomly distributed points, with sampling densities of 0.5, 1, 1.5, 2, and 4 points per hectare. Data were used to train a decision tree (Gini) and three artificial neural networks: adaptive resonance theory, fuzzy ARTMap; self‑organizing map, SOM; and multi‑layer perceptron, MLP. Estimated maps were compared with the conventional soil map to calculate omission and commission errors, overall accuracy, and quantity and allocation disagreement. The decision tree was less sensitive to sampling density and had the highest accuracy and consistence. The SOM was the less sensitive and most consistent network. The MLP had a critical minimum and showed high inconsistency, whereas fuzzy ARTMap was more sensitive and less accurate. Results indicate that sampling densities used in conventional soil surveys can serve as a reference to predict soil orders in Serra Gaúcha.
Resumo:
Työn tavoitteena on selvittää voidaanko neuroverkkoa käyttää mallintamaan ja ennustamaan polttoaineen vaikutusta nykyaikaisen auton päästöihin. Näin pystyttäisiin vähentämään aikaa vievien ja kalliiden koeajojen tarvetta. Työ tehtiin Lappeenrannan teknillisen yliopiston ja Fortum Oy:n yhteistyöprojektissa. Työssä tehtiin kolme erilaista mallia. Ensimmäisenä tehtiin autokohtainen malli, jolla pyrittiin ennustamaan autokohtaista käyttäytymistä. Toiseksi kokeiltiin mallia, jossa automalli oli yhtenä syötteenä. Kolmantena yritettiin kiertää eräitä aineiston ongelmia käyttämällä "sumeutettuja" polttoaineiden koostumuksia. Työssä käytettiin MLP-neuroverkkoa, joka opetettiin backpropagation algoritmilla. Työssä havaittiin ettei käytettävissä olleella aineistolla ja käytetyillä malleilla pystytä riittävällä tarkkuudella mallintamaan polttoaineen vaikutusta päästöihin. Aineiston ongelmia olivat mm. suuret mittausvarianssit, aineiston pieni määrä sekä aineiston soveltumattomuus neuroverkolla mallintamiseen.
Resumo:
Työssä on tutkittu Koskisen Oy:n vaneritehtaan 2. kuivauslinjalla toimivaa viilun laatulajittelujärjestelmää, jonka toiminnan tehostamiseksi haettiin uusia, vaihtoehtoisia ratkaisuja. Lajittelujärjestelmän toiminnan nopeuttamiseen ja toimivuuden kehittämiseksi haettiin ratkaisuja dimensio-, reuna- ja sisävikojen käsittelyyn. Linjan käyttöasteen kasvattamiseksi sen vikadiagnostiikkaan ja toi¬min¬nan seurantaan haettiin myös uusia menetelmiä. Kuvatun arkin reunatietojen avulla pystytään ottamaan huomioon käytönaikaisten asemointivirheiden aiheuttamat mittavirheet. Vika-alueiden harmaatasoarvoista kerättyä tietoa käytetään histogrammipiirteiden irrotuksessa oksien luokittelua parantamiseksi. Neuroluokittelijoiden käyttöönottoa luokittelijoina puoltavat niiden luokittelunopeus itse luokittelussa ja lähes k-NN-luokittimen tasoon yltävä luokittelutarkkuus. Neuroluokittelijoista tutkittiin monikerros-Perceptron- (MLP) ja oppiva vektorikvantisaatio- (LVQ) luokittelijat. Edellä mainittujen muutosten käyttöönoton avulla parantuneen viiluarkin onnistunut laadutus tuo kustannussäästöjä yritykselle sekä viiluarkkien paremman hyväksikäytön että viilun jatkokäsittelyssä säästyvän työmäärän avulla.
Resumo:
In this article we presents a project [1] developed to demonstrate the capability that Multi-Layer Perceptrons (MLP) have to approximate non-linear functions [2]. The simulation has been implemented in Java to be used in all the computers by Internet [3], with a simple operation and pleasant interface. The power of the simulations is in the possibility of the user of seeing the evolutions of the approaches, the contribution of each neuron, the control of the different parameters, etc. In addition, to guide the user during the simulation, an online help has been implemented.
Resumo:
Objetivou-se, neste trabalho, avaliar o ajuste do modelo volumétrico de Schumacher e Hall por diferentes algoritmos, bem como a aplicação de redes neurais artificiais para estimação do volume de madeira de eucalipto em função do diâmetro a 1,30 m do solo (DAP), da altura total (Ht) e do clone. Foram utilizadas 21 cubagens de povoamentos de clones de eucalipto com DAP variando de 4,5 a 28,3 cm e altura total de 6,6 a 33,8 m, num total de 862 árvores. O modelo volumétrico de Schumacher e Hall foi ajustado nas formas linear e não linear, com os seguintes algoritmos: Gauss-Newton, Quasi-Newton, Levenberg-Marquardt, Simplex, Hooke-Jeeves Pattern, Rosenbrock Pattern, Simplex, Hooke-Jeeves e Rosenbrock, utilizado simultaneamente com o método Quasi-Newton e com o princípio da Máxima Verossimilhança. Diferentes arquiteturas e modelos (Multilayer Perceptron MLP e Radial Basis Function RBF) de redes neurais artificiais foram testados, sendo selecionadas as redes que melhor representaram os dados. As estimativas dos volumes foram avaliadas por gráficos de volume estimado em função do volume observado e pelo teste estatístico L&O. Assim, conclui-se que o ajuste do modelo de Schumacher e Hall pode ser usado na sua forma linear, com boa representatividade e sem apresentar tendenciosidade; os algoritmos Gauss-Newton, Quasi-Newton e Levenberg-Marquardt mostraram-se eficientes para o ajuste do modelo volumétrico de Schumacher e Hall, e as redes neurais artificiais apresentaram boa adequação ao problema, sendo elas altamente recomendadas para realizar prognose da produção de florestas plantadas.
Resumo:
Illnesses related to the heart are one of the major reasons for death all over the world causing many people to lose their lives in last decades. The good news is that many of those sicknesses are preventable if they are spotted in early stages. On the other hand, the number of the doctors are much lower than the number of patients. This will makes the auto diagnosing of diseases even more and more essential for humans today. Furthermore, when it comes to the diagnosing methods and algorithms, the current state of the art is lacking a comprehensive study on the comparison between different diagnosis solutions. Not having a single valid diagnosing solution has increased the confusion among scholars and made it harder for them to take further steps. This master thesis will address the issue of reliable diagnosing algorithm. We investigate ECG signals and the relation between different diseases and the heart’s electrical activity. Also, we will discuss the necessary steps needed for auto diagnosing the heart diseases including the literatures discussing the topic. The main goal of this master thesis is to find a single reliable diagnosing algorithm and quest for the best classifier to date for heart related sicknesses. Five most suited and most well-known classifiers, such as KNN, CART, MLP, Adaboost and SVM, have been investigated. To have a fair comparison, the ex-periment condition is kept the same for all classification methods. The UCI repository arrhythmia dataset will be used and the data will not be preprocessed. The experiment results indicates that AdaBoost noticeably classifies different diseases with a considera-bly better accuracy.
Resumo:
This thesis work studies the modelling of the colour difference using artificial neural network. Multilayer percepton (MLP) network is proposed to model CIEDE2000 colour difference formula. MLP is applied to classify colour points in CIE xy chromaticity diagram. In this context, the evaluation was performed using Munsell colour data and MacAdam colour discrimination ellipses. Moreover, in CIE xy chromaticity diagram just noticeable differences (JND) of MacAdam ellipses centres are computed by CIEDE2000, to compare JND of CIEDE2000 and MacAdam ellipses. CIEDE2000 changes the orientation of blue areas in CIE xy chromaticity diagram toward neutral areas, but on the whole it does not totally agree with the MacAdam ellipses. The proposed MLP for both modelling CIEDE2000 and classifying colour points showed good accuracy and achieved acceptable results.
Resumo:
Thesis (Ph.D.)--Brock University, 2010.
Resumo:
Hepatitis C virus (HCV) is the causative agent of Hepatitis C, a serious global health problem which results in liver cirrhosis and hepatocellular carcinoma. Currently there is no effective treatment or vaccine against the virus. Therefore, development of a therapeutic vaccine is of paramount importance. In this project, three alternative approaches were used to control HCV including a DNA vaccine, a recombinant viral vaccine and RNA interference. The first approach was to test the effect of different promoters on the efficacy of a DNA vaccine against HCV. Plasmids encoding HCV-NS3 and E1 antigens were designed under three different promoters, adenoviral E1A, MLP, and CMV ie. The promoter effect on the antigen expression in 293 cells, as well as on the antibody level in immunized BALB/c mice, was evaluated. The results showed that the antigens were successfully expressed from all vectors. The CMV ie promoter induced the highest antigen expression and the highest antibody level. Second, the efficiency of a recombinant adenovirus vaccine encoding HCV-NS3 was compared to that of a HCV-NS3 plasmid vaccine. The results showed that the recombinant adenovirus vaccine induced higher antibody levels as compared to the plasmid vaccine. The relationship between the immune response and miRNA was also evaluated. The levels of mir-181, mir-155, mir-21 and mir-296 were quantified in the sera of immunized animals. mir-181 and mir-21 were found to be upregulated in animals injected with adenoviral vectors. Third, two recombinant adenoviruses encoding siRNAs targeting both the helicase and protease parts of the NS3 region were tested for their ability to inhibit NS3 expression. The results showed that the siRNA against protease was more effective in silencing the HCV-NS3 gene in a HCV replicon cell line. This result confirmed the efficiency of adenovirus for siRNA delivery. These results confirmed that CMV ie is optimum promoter for immune response induction. Adenovirus was shown to be an effective delivery vector for antigens or siRNAs. In addition, miRNAs were proved to be involved in the regulation of immune response.
Resumo:
Les systèmes de traduction statistique à base de segments traduisent les phrases un segment à la fois, en plusieurs étapes. À chaque étape, ces systèmes ne considèrent que très peu d’informations pour choisir la traduction d’un segment. Les scores du dictionnaire de segments bilingues sont calculés sans égard aux contextes dans lesquels ils sont utilisés et les modèles de langue ne considèrent que les quelques mots entourant le segment traduit.Dans cette thèse, nous proposons un nouveau modèle considérant la phrase en entier lors de la sélection de chaque mot cible. Notre modèle d’intégration du contexte se différentie des précédents par l’utilisation d’un ppc (perceptron à plusieurs couches). Une propriété intéressante des ppc est leur couche cachée, qui propose une représentation alternative à celle offerte par les mots pour encoder les phrases à traduire. Une évaluation superficielle de cette représentation alter- native nous a montré qu’elle est capable de regrouper certaines phrases sources similaires même si elles étaient formulées différemment. Nous avons d’abord comparé avantageusement les prédictions de nos ppc à celles d’ibm1, un modèle couramment utilisé en traduction. Nous avons ensuite intégré nos ppc à notre système de traduction statistique de l’anglais vers le français. Nos ppc ont amélioré les traductions de notre système de base et d’un deuxième système de référence auquel était intégré IBM1.