107 resultados para k nearest neighbour

em Université de Lausanne, Switzerland


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The analysis of rockfall characteristics and spatial distribution is fundamental to understand and model the main factors that predispose to failure. In our study we analysed LiDAR point clouds aiming to: (1) detect and characterise single rockfalls; (2) investigate their spatial distribution. To this end, different cluster algorithms were applied: 1a) Nearest Neighbour Clutter Removal (NNCR) in combination with the Expectation?Maximization (EM) in order to separate feature points from clutter; 1b) a density based algorithm (DBSCAN) was applied to isolate the single clusters (i.e. the rockfall events); 2) finally we computed the Ripley's K-function to investigate the global spatial pattern of the extracted rockfalls. The method allowed proper identification and characterization of more than 600 rockfalls occurred on a cliff located in Puigcercos (Catalonia, Spain) during a time span of six months. The spatial distribution of these events proved that rockfall were clustered distributed at a welldefined distance-range. Computations were carried out using R free software for statistical computing and graphics. The understanding of the spatial distribution of precursory rockfalls may shed light on the forecasting of future failures.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Abstract. Terrestrial laser scanning (TLS) is one of the most promising surveying techniques for rockslope characteriza- tion and monitoring. Landslide and rockfall movements can be detected by means of comparison of sequential scans. One of the most pressing challenges of natural hazards is com- bined temporal and spatial prediction of rockfall. An outdoor experiment was performed to ascertain whether the TLS in- strumental error is small enough to enable detection of pre- cursory displacements of millimetric magnitude. This con- sists of a known displacement of three objects relative to a stable surface. Results show that millimetric changes cannot be detected by the analysis of the unprocessed datasets. Dis- placement measurement are improved considerably by ap- plying Nearest Neighbour (NN) averaging, which reduces the error (1σ ) up to a factor of 6. This technique was ap- plied to displacements prior to the April 2007 rockfall event at Castellfollit de la Roca, Spain. The maximum precursory displacement measured was 45 mm, approximately 2.5 times the standard deviation of the model comparison, hampering the distinction between actual displacement and instrumen- tal error using conventional methodologies. Encouragingly, the precursory displacement was clearly detected by apply- ing the NN averaging method. These results show that mil- limetric displacements prior to failure can be detected using TLS.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Counterfeit pharmaceutical products have become a widespread problem in the last decade. Various analytical techniques have been applied to discriminate between genuine and counterfeit products. Among these, Near-infrared (NIR) and Raman spectroscopy provided promising results.The present study offers a methodology allowing to provide more valuable information fororganisations engaged in the fight against counterfeiting of medicines.A database was established by analyzing counterfeits of a particular pharmaceutical product using Near-infrared (NIR) and Raman spectroscopy. Unsupervised chemometric techniques (i.e. principal component analysis - PCA and hierarchical cluster analysis - HCA) were implemented to identify the classes within the datasets. Gas Chromatography coupled to Mass Spectrometry (GC-MS) and Fourier Transform Infrared Spectroscopy (FT-IR) were used to determine the number of different chemical profiles within the counterfeits. A comparison with the classes established by NIR and Raman spectroscopy allowed to evaluate the discriminating power provided by these techniques. Supervised classifiers (i.e. k-Nearest Neighbors, Partial Least Squares Discriminant Analysis, Probabilistic Neural Networks and Counterpropagation Artificial Neural Networks) were applied on the acquired NIR and Raman spectra and the results were compared to the ones provided by the unsupervised classifiers.The retained strategy for routine applications, founded on the classes identified by NIR and Raman spectroscopy, uses a classification algorithm based on distance measures and Receiver Operating Characteristics (ROC) curves. The model is able to compare the spectrum of a new counterfeit with that of previously analyzed products and to determine if a new specimen belongs to one of the existing classes, consequently allowing to establish a link with other counterfeits of the database.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper deals with the development and application of the generic methodology for automatic processing (mapping and classification) of environmental data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve the problem of spatial data mapping (regression). The Probabilistic Neural Network (PNN) is considered as an automatic tool for spatial classifications. The automatic tuning of isotropic and anisotropic GRNN/PNN models using cross-validation procedure is presented. Results are compared with the k-Nearest-Neighbours (k-NN) interpolation algorithm using independent validation data set. Real case studies are based on decision-oriented mapping and classification of radioactively contaminated territories.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Avalanche forecasting is a complex process involving the assimilation of multiple data sources to make predictions over varying spatial and temporal resolutions. Numerically assisted forecasting often uses nearest neighbour methods (NN), which are known to have limitations when dealing with high dimensional data. We apply Support Vector Machines to a dataset from Lochaber, Scotland to assess their applicability in avalanche forecasting. Support Vector Machines (SVMs) belong to a family of theoretically based techniques from machine learning and are designed to deal with high dimensional data. Initial experiments showed that SVMs gave results which were comparable with NN for categorical and probabilistic forecasts. Experiments utilising the ability of SVMs to deal with high dimensionality in producing a spatial forecast show promise, but require further work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper deals with the development and application of the methodology for automatic mapping of pollution/contamination data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve this problem. The automatic tuning of isotropic and an anisotropic GRNN model using cross-validation procedure is presented. Results are compared with k-nearest-neighbours interpolation algorithm using independent validation data set. Quality of mapping is controlled by the analysis of raw data and the residuals using variography. Maps of probabilities of exceeding a given decision level and ?thick? isoline visualization of the uncertainties are presented as examples of decision-oriented mapping. Real case study is based on mapping of radioactively contaminated territories.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Na,K-ATPase is the main active transport system that maintains the large gradients of Na(+) and K(+) across the plasma membrane of animal cells. The crystal structure of a K(+)-occluding conformation of this protein has been recently published, but the movements of its different domains allowing for the cation pumping mechanism are not yet known. The structure of many more conformations is known for the related calcium ATPase SERCA, but the reliability of homology modeling is poor for several domains with low sequence identity, in particular the extracellular loops. To better define the structure of the large fourth extracellular loop between the seventh and eighth transmembrane segments of the alpha subunit, we have studied the formation of a disulfide bond between pairs of cysteine residues introduced by site-directed mutagenesis in the second and the fourth extracellular loop. We found a specific pair of cysteine positions (Y308C and D884C) for which extracellular treatment with an oxidizing agent inhibited the Na,K pump function, which could be rapidly restored by a reducing agent. The formation of the disulfide bond occurred preferentially under the E2-P conformation of Na,K-ATPase, in the absence of extracellular cations. Using recently published crystal structure and a distance constraint reproducing the existence of disulfide bond, we performed an extensive conformational space search using simulated annealing and showed that the Tyr(308) and Asp(884) residues can be in close proximity, and simultaneously, the SYGQ motif of the fourth extracellular loop, known to interact with the extracellular domain of the beta subunit, can be exposed to the exterior of the protein and can easily interact with the beta subunit.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The role of cell type-specific Na+,K+-ATPase isozymes in function-related glucose metabolism was studied using differentiated rat brain cell aggregate cultures. In mixed neuron-glia cultures, glucose utilization, determined by measuring the rate of radiolabeled 2-deoxyglucose accumulation, was markedly stimulated by the voltage-dependent sodium channel agonist veratridine (0.75 micromol/L), as well as by glutamate (100 micromol/L) and the ionotropic glutamate receptor agonist N-methyl-D-aspartate (NMDA) (10 micromol/L). Significant stimulation also was elicited by elevated extracellular potassium (12 mmol/L KCl), which was even more pronounced at 30 mmol/L KCl. In neuron-enriched cultures, a similar stimulation of glucose utilization was obtained with veratridine, specific ionotropic glutamate receptor agonists, and 30 mmol/L but not 12 mmol/L KCl. The effects of veratridine, glutamate, and NMDA were blocked by specific antagonists (tetrodotoxin, CNQX, or MK801, respectively). Low concentrations of ouabain (10(-6) mol/L) prevented stimulation by the depolarizing agents but reduced only partially the response to 12 mmol/L KCl. Together with previous data showing cell type-specific expression of Na+,K+-ATPase subunit isoforms in these cultures, the current results support the view that distinct isoforms of Na+,K+-ATPase regulate glucose utilization in neurons in response to membrane depolarization, and in glial cells in response to elevated extracellular potassium.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Na,K-ATPase is a major ion-motive ATPase of the P-type family responsible for many aspects of cellular homeostasis. To determine the structure of the pathway for cations across the transmembrane portion of the Na,K-ATPase, we mutated 24 residues of the fourth transmembrane segment into cysteine and studied their function and accessibility by exposure to the sulfhydryl reagent 2-aminoethyl-methanethiosulfonate. Accessibility was also examined after treatment with palytoxin, which transforms the Na,K-pump into a cation channel. Of the 24 tested cysteine mutants, seven had no or a much reduced transport function. In particular cysteine mutants of the highly conserved "PEG" motif had a strongly reduced activity. However, most of the non-functional mutants could still be transformed by palytoxin as well as all of the functional mutants. Accessibility, determined as a 2-aminoethyl-methanethiosulfonate-induced reduction of the transport activity or as inhibition of the membrane conductance after palytoxin treatment, was observed for the following positions: Phe(323), Ile(322), Gly(326), Ala(330), Pro(333), Glu(334), and Gly(335). In accordance with a structural model of the Na,K-ATPase obtained by homology modeling with the two published structures of sarcoplasmic and endoplasmic reticulum calcium ATPase (Protein Data Bank codes 1EUL and 1IWO), the results suggest the presence of a cation pathway along the side of the fourth transmembrane segment that faces the space between transmembrane segments 5 and 6. The phenylalanine residue in position 323 has a critical position at the outer mouth of the cation pathway. The residues thought to form the cation binding site II ((333)PEGL) are also part of the accessible wall of the cation pathway opened by palytoxin through the Na,K-pump.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Familial hemiplegic migraine type 2, an autosomal dominant form of migraine with aura, has been associated with four distinct mutations in the alpha2-subunit of the Na+,K+-ATPase. We have introduced these mutations in the alpha2-subunit of the human Na+,K+-ATPase and the corresponding mutations in the Bufo marinus alpha1-subunit and studied these mutants by expression in Xenopus oocyte. Metabolic labeling studies showed that the mutants were synthesized and associated with the beta-subunit, except for the alpha2HW887R mutant, which was poorly synthesized, and the alpha1BW890R, which was partially retained in the endoplasmic reticulum. [3H]ouabain binding showed the presence of the alpha2HR689Q and alpha2HM731T at the membrane, whereas the alpha2HL764P and alpha2HW887R could not be detected. Functional studies with the mutants of the B. marinus Na+,K+-ATPase showed a reduced or abolished electrogenic activity and a low K+ affinity for the alpha1BW890R mutant. Through different mechanisms, all these mutations result in a strong decrease of the functional expression of the Na+,K+-pump. The decreased activity in alpha2 isoform of the Na+,K+-pump expressed in astrocytes seems an essential component of hemiplegic migraine pathogenesis and may be responsible for the cortical spreading depression, which is one of the first events in migraine attacks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND AND PURPOSE: To assess whether the combined analysis of all phase III trials of nonvitamin-K-antagonist (non-VKA) oral anticoagulants in patients with atrial fibrillation and previous stroke or transient ischemic attack shows a significant difference in efficacy or safety compared with warfarin. METHODS: We searched PubMed until May 31, 2012, for randomized clinical trials using the following search items: atrial fibrillation, anticoagulation, warfarin, and previous stroke or transient ischemic attack. Studies had to be phase III trials in atrial fibrillation patients comparing warfarin with a non-VKA currently on the market or with the intention to be brought to the market in North America or Europe. Analysis was performed on intention-to-treat basis. A fixed-effects model was used as more appropriate than a random-effects model when combining a small number of studies. RESULTS: Among 47 potentially eligible articles, 3 were included in the meta-analysis. In 14 527 patients, non-VKAs were associated with a significant reduction of stroke/systemic embolism (odds ratios, 0.85 [95% CI, 074-0.99]; relative risk reduction, 14%; absolute risk reduction, 0.7%; number needed to treat, 134 over 1.8-2.0 years) compared with warfarin. Non-VKAs were also associated with a significant reduction of major bleeding compared with warfarin (odds ratios, 0.86 [95% CI, 075-0.99]; relative risk reduction, 13%; absolute risk reduction, 0.8%; number needed to treat, 125), mainly driven by the significant reduction of hemorrhagic stroke (odds ratios, 0.44 [95% CI, 032-0.62]; relative risk reduction, 57.9%; absolute risk reduction, 0.7%; number needed to treat, 139). CONCLUSIONS: In the context of the significant limitations of combining the results of disparate trials of different agents, non-VKAs seem to be associated with a significant reduction in rates of stroke or systemic embolism, hemorrhagic stroke, and major bleeding when compared with warfarin in patients with previous stroke or transient ischemic attack.