47 resultados para Collection of Network Data
em Université de Lausanne, Switzerland
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.
Resumo:
Computational network analysis provides new methods to analyze the brain's structural organization based on diffusion imaging tractography data. Networks are characterized by global and local metrics that have recently given promising insights into diagnosis and the further understanding of psychiatric and neurologic disorders. Most of these metrics are based on the idea that information in a network flows along the shortest paths. In contrast to this notion, communicability is a broader measure of connectivity which assumes that information could flow along all possible paths between two nodes. In our work, the features of network metrics related to communicability were explored for the first time in the healthy structural brain network. In addition, the sensitivity of such metrics was analysed using simulated lesions to specific nodes and network connections. Results showed advantages of communicability over conventional metrics in detecting densely connected nodes as well as subsets of nodes vulnerable to lesions. In addition, communicability centrality was shown to be widely affected by the lesions and the changes were negatively correlated with the distance from lesion site. In summary, our analysis suggests that communicability metrics that may provide an insight into the integrative properties of the structural brain network and that these metrics may be useful for the analysis of brain networks in the presence of lesions. Nevertheless, the interpretation of communicability is not straightforward; hence these metrics should be used as a supplement to the more standard connectivity network metrics.
Resumo:
The proportion of population living in or around cites is more important than ever. Urban sprawl and car dependence have taken over the pedestrian-friendly compact city. Environmental problems like air pollution, land waste or noise, and health problems are the result of this still continuing process. The urban planners have to find solutions to these complex problems, and at the same time insure the economic performance of the city and its surroundings. At the same time, an increasing quantity of socio-economic and environmental data is acquired. In order to get a better understanding of the processes and phenomena taking place in the complex urban environment, these data should be analysed. Numerous methods for modelling and simulating such a system exist and are still under development and can be exploited by the urban geographers for improving our understanding of the urban metabolism. Modern and innovative visualisation techniques help in communicating the results of such models and simulations. This thesis covers several methods for analysis, modelling, simulation and visualisation of problems related to urban geography. The analysis of high dimensional socio-economic data using artificial neural network techniques, especially self-organising maps, is showed using two examples at different scales. The problem of spatiotemporal modelling and data representation is treated and some possible solutions are shown. The simulation of urban dynamics and more specifically the traffic due to commuting to work is illustrated using multi-agent micro-simulation techniques. A section on visualisation methods presents cartograms for transforming the geographic space into a feature space, and the distance circle map, a centre-based map representation particularly useful for urban agglomerations. Some issues on the importance of scale in urban analysis and clustering of urban phenomena are exposed. A new approach on how to define urban areas at different scales is developed, and the link with percolation theory established. Fractal statistics, especially the lacunarity measure, and scale laws are used for characterising urban clusters. In a last section, the population evolution is modelled using a model close to the well-established gravity model. The work covers quite a wide range of methods useful in urban geography. Methods should still be developed further and at the same time find their way into the daily work and decision process of urban planners. La part de personnes vivant dans une région urbaine est plus élevé que jamais et continue à croître. L'étalement urbain et la dépendance automobile ont supplanté la ville compacte adaptée aux piétons. La pollution de l'air, le gaspillage du sol, le bruit, et des problèmes de santé pour les habitants en sont la conséquence. Les urbanistes doivent trouver, ensemble avec toute la société, des solutions à ces problèmes complexes. En même temps, il faut assurer la performance économique de la ville et de sa région. Actuellement, une quantité grandissante de données socio-économiques et environnementales est récoltée. Pour mieux comprendre les processus et phénomènes du système complexe "ville", ces données doivent être traitées et analysées. Des nombreuses méthodes pour modéliser et simuler un tel système existent et sont continuellement en développement. Elles peuvent être exploitées par le géographe urbain pour améliorer sa connaissance du métabolisme urbain. Des techniques modernes et innovatrices de visualisation aident dans la communication des résultats de tels modèles et simulations. Cette thèse décrit plusieurs méthodes permettant d'analyser, de modéliser, de simuler et de visualiser des phénomènes urbains. L'analyse de données socio-économiques à très haute dimension à l'aide de réseaux de neurones artificiels, notamment des cartes auto-organisatrices, est montré à travers deux exemples aux échelles différentes. Le problème de modélisation spatio-temporelle et de représentation des données est discuté et quelques ébauches de solutions esquissées. La simulation de la dynamique urbaine, et plus spécifiquement du trafic automobile engendré par les pendulaires est illustrée à l'aide d'une simulation multi-agents. Une section sur les méthodes de visualisation montre des cartes en anamorphoses permettant de transformer l'espace géographique en espace fonctionnel. Un autre type de carte, les cartes circulaires, est présenté. Ce type de carte est particulièrement utile pour les agglomérations urbaines. Quelques questions liées à l'importance de l'échelle dans l'analyse urbaine sont également discutées. Une nouvelle approche pour définir des clusters urbains à des échelles différentes est développée, et le lien avec la théorie de la percolation est établi. Des statistiques fractales, notamment la lacunarité, sont utilisées pour caractériser ces clusters urbains. L'évolution de la population est modélisée à l'aide d'un modèle proche du modèle gravitaire bien connu. Le travail couvre une large panoplie de méthodes utiles en géographie urbaine. Toutefois, il est toujours nécessaire de développer plus loin ces méthodes et en même temps, elles doivent trouver leur chemin dans la vie quotidienne des urbanistes et planificateurs.
Resumo:
The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.
Resumo:
The paper deals with the development and application of the methodology for automatic mapping of pollution/contamination data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve this problem. The automatic tuning of isotropic and an anisotropic GRNN model using cross-validation procedure is presented. Results are compared with k-nearest-neighbours interpolation algorithm using independent validation data set. Quality of mapping is controlled by the analysis of raw data and the residuals using variography. Maps of probabilities of exceeding a given decision level and ?thick? isoline visualization of the uncertainties are presented as examples of decision-oriented mapping. Real case study is based on mapping of radioactively contaminated territories.
Resumo:
BACKGROUND: We analysed 5-year treatment with agalsidase alfa enzyme replacement therapy in patients with Fabry's disease who were enrolled in the Fabry Outcome Survey observational database (FOS). METHODS: Baseline and 5-year data were available for up to 181 adults (126 men) in FOS. Serial data for cardiac mass and function, renal function, pain, and quality of life were assessed. Safety and sensitivity analyses were done in patients with baseline and at least one relevant follow-up measurement during the 5 years (n=555 and n=475, respectively). FINDINGS: In patients with baseline cardiac hypertrophy, treatment resulted in a sustained reduction in left ventricular mass (LVM) index after 5 years (from 71.4 [SD 22.5] g/m(2.7) to 64.1 [18.7] g/m(2.7), p=0.0111) and a significant increase in midwall fractional shortening (MFS) from 14.3% (2.3) to 16.0% (3.8) after 3 years (p=0.02). In patients without baseline hypertrophy, LVM index and MFS remained stable. Mean yearly fall in estimated glomerular filtration rate versus baseline after 5 years of enzyme replacement therapy was -3.17 mL/min per 1.73 m(2) for men and -0.89 mL/min per 1.73 m(2) for women. Average pain, measured by Brief Pain Inventory score, improved significantly, from 3.7 (2.3) at baseline to 2.5 (2.4) after 5 years (p=0.0023). Quality of life, measured by deviation scores from normal EuroQol values, improved significantly, from -0.24 (0.3) at baseline to -0.17 (0.3) after 5 years (p=0.0483). Findings were confirmed by sensitivity analysis. No unexpected safety concerns were identified. INTERPRETATION: By comparison with historical natural history data for patients with Fabry's disease who were not treated with enzyme replacement therapy, long-term treatment with agalsidase alfa leads to substantial and sustained clinical benefits. FUNDING: Shire Human Genetic Therapies AB.
Resumo:
BACKGROUND: Individually, randomised trials have not shown conclusively whether adjuvant chemotherapy benefits adult patients with localised resectable soft-tissue sarcoma.METHODS: A quantitative meta-analysis of updated data from individual patients from all available randomised trials was carried out to assess whether adjuvant chemotherapy improves overall survival, recurrence-free survival, and local and distant recurrence-free intervals (RFI) and whether chemotherapy is differentially effective in patients defined by age, sex, disease status at randomisation, disease site, histology, grade, tumour size, extent of resection, and use of radiotherapy.FINDINGS: 1568 patients from 14 trials of doxorubicin-based adjuvant chemotherapy were included (median follow-up 9.4 years). Hazard ratios of 0.73 (95% CI 0.56-0.94, p = 0.016) for local RFI, 0.70 (0.57-0.85, p = 0.0003) for distant RFI, and 0.75 (0.64-0.87, p = 0.0001) for overall recurrence-free survival, correspond to absolute benefits from adjuvant chemotherapy of 6% (95% CI 1-10), 10% (5-15), and 10% (5-15), respectively, at 10 years. For overall survival the hazard ratio of 0.89 (0.76-1.03) was not significant (p = 0.12), but represents an absolute benefit of 4% (1-9) at 10 years. These results were not affected by prespecified changes in the groups of patients analysed. There was no consistent evidence that the relative effect of adjuvant chemotherapy differed for any subgroup of patients for any endpoint. However, the best evidence of an effect of adjuvant chemotherapy for survival was seen in patients with sarcomas of the extremities.INTERPRETATION: The meta-analysis provides evidence that adjuvant doxorubicin-based chemotherapy significantly improves the time to local and distant recurrence and overall recurrence-free survival. There is a trend towards improved overall survival.
Advanced mapping of environmental data: Geostatistics, Machine Learning and Bayesian Maximum Entropy
Resumo:
This book combines geostatistics and global mapping systems to present an up-to-the-minute study of environmental data. Featuring numerous case studies, the reference covers model dependent (geostatistics) and data driven (machine learning algorithms) analysis techniques such as risk mapping, conditional stochastic simulations, descriptions of spatial uncertainty and variability, artificial neural networks (ANN) for spatial data, Bayesian maximum entropy (BME), and more.
Resumo:
Resume : L'utilisation de l'encre comme indice en sciences forensiques est décrite et encadrée par une littérature abondante, comprenant entre autres deux standards de l'American Society for Testing and Materials (ASTM). La grande majorité de cette littérature se préoccupe de l'analyse des caractéristiques physiques ou chimiques des encres. Les standards ASTM proposent quelques principes de base qui concernent la comparaison et l'interprétation de la valeur d'indice des encres en sciences forensiques. L'étude de cette littérature et plus particulièrement des standards ASTM, en ayant a l'esprit les développements intervenus dans le domaine de l'interprétation de l'indice forensique, montre qu'il existe un potentiel certain pour l'amélioration de l'utilisation de l'indice encre et de son impact dans l'enquête criminelle. Cette thèse propose d'interpréter l'indice encre en se basant sur le cadre défini par le théorème de Bayes. Cette proposition a nécessité le développement d'un système d'assurance qualité pour l'analyse et la comparaison d'échantillons d'encre. Ce système d'assurance qualité tire parti d'un cadre théorique nouvellement défini. La méthodologie qui est proposée dans ce travail a été testée de manière compréhensive, en tirant parti d'un set de données spécialement créer pour l'occasion et d'outils importés de la biométrie. Cette recherche répond de manière convaincante à un problème concret généralement rencontré en sciences forensiques. L'information fournie par le criminaliste, lors de l'examen de traces, est souvent bridée, car celui-ci essaie de répondre à la mauvaise question. L'utilisation d'un cadre théorique explicite qui définit et formalise le goal de l'examen criminaliste, permet de déterminer les besoins technologiques et en matière de données. Le développement de cette technologie et la collection des données pertinentes peut être justifiées économiquement et achevée de manière scientifique. Abstract : The contribution of ink evidence to forensic science is described and supported by an abundant literature and by two standards from the American Society for Testing and Materials (ASTM). The vast majority of the available literature is concerned with the physical and chemical analysis of ink evidence. The relevant ASTM standards mention some principles regarding the comparison of pairs of ink samples and the evaluation of their evidential value. The review of this literature and, more specifically, of the ASTM standards in the light of recent developments in the interpretation of forensic evidence has shown some potential improvements, which would maximise the benefits of the use of ink evidence in forensic science. This thesis proposes to interpret ink evidence using the widely accepted and recommended Bayesian theorem. This proposition has required the development of a new quality assurance process for the analysis and comparison of ink samples, as well as of the definition of a theoretical framework for ink evidence. The proposed technology has been extensively tested using a large dataset of ink samples and state of the art tools, commonly used in biometry. Overall, this research successfully answers to a concrete problem generally encountered in forensic science, where scientists tend to self-limit the usefulness of the information that is present in various types of evidence, by trying to answer to the wrong questions. The declaration of an explicit framework, which defines and formalises their goals and expected contributions to the criminal and civil justice system, enables the determination of their needs in terms of technology and data. The development of this technology and the collection of the data is then justified economically, structured scientifically and can be proceeded efficiently.