Biblioteca Digital

69 resultados para Stochastic algorithms

Machine Learning Algorithms for Spatial Categorical Data: Classification, Prediction, and Monitoring Networks Optimization

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Simple estimation of incident HIV infection rates in notification cohorts based on window periods of algorithms for evaluation of line-immunoassay result patterns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Tests for recent infections (TRIs) are important for HIV surveillance. We have shown that a patient's antibody pattern in a confirmatory line immunoassay (Inno-Lia) also yields information on time since infection. We have published algorithms which, with a certain sensitivity and specificity, distinguish between incident (< = 12 months) and older infection. In order to use these algorithms like other TRIs, i.e., based on their windows, we now determined their window periods. METHODS: We classified Inno-Lia results of 527 treatment-naïve patients with HIV-1 infection < = 12 months according to incidence by 25 algorithms. The time after which all infections were ruled older, i.e. the algorithm's window, was determined by linear regression of the proportion ruled incident in dependence of time since infection. Window-based incident infection rates (IIR) were determined utilizing the relationship 'Prevalence = Incidence x Duration' in four annual cohorts of HIV-1 notifications. Results were compared to performance-based IIR also derived from Inno-Lia results, but utilizing the relationship 'incident = true incident + false incident' and also to the IIR derived from the BED incidence assay. RESULTS: Window periods varied between 45.8 and 130.1 days and correlated well with the algorithms' diagnostic sensitivity (R(2) = 0.962; P<0.0001). Among the 25 algorithms, the mean window-based IIR among the 748 notifications of 2005/06 was 0.457 compared to 0.453 obtained for performance-based IIR with a model not correcting for selection bias. Evaluation of BED results using a window of 153 days yielded an IIR of 0.669. Window-based IIR and performance-based IIR increased by 22.4% and respectively 30.6% in 2008, while 2009 and 2010 showed a return to baseline for both methods. CONCLUSIONS: IIR estimations by window- and performance-based evaluations of Inno-Lia algorithm results were similar and can be used together to assess IIR changes between annual HIV notification cohorts.

Veja mais

Machine learning algorithms for topo-climatic data modelling

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Uncertainty Quantification with Support Vector Regression Prediction Models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.

Veja mais

Time-lapse and probabilistic inversion strategies for plane-wave electromagnetic data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

L'utilisation efficace des systèmes géothermaux, la séquestration du CO2 pour limiter le changement climatique et la prévention de l'intrusion d'eau salée dans les aquifères costaux ne sont que quelques exemples qui démontrent notre besoin en technologies nouvelles pour suivre l'évolution des processus souterrains à partir de la surface. Un défi majeur est d'assurer la caractérisation et l'optimisation des performances de ces technologies à différentes échelles spatiales et temporelles. Les méthodes électromagnétiques (EM) d'ondes planes sont sensibles à la conductivité électrique du sous-sol et, par conséquent, à la conductivité électrique des fluides saturant la roche, à la présence de fractures connectées, à la température et aux matériaux géologiques. Ces méthodes sont régies par des équations valides sur de larges gammes de fréquences, permettant détudier de manières analogues des processus allant de quelques mètres sous la surface jusqu'à plusieurs kilomètres de profondeur. Néanmoins, ces méthodes sont soumises à une perte de résolution avec la profondeur à cause des propriétés diffusives du champ électromagnétique. Pour cette raison, l'estimation des modèles du sous-sol par ces méthodes doit prendre en compte des informations a priori afin de contraindre les modèles autant que possible et de permettre la quantification des incertitudes de ces modèles de façon appropriée. Dans la présente thèse, je développe des approches permettant la caractérisation statique et dynamique du sous-sol à l'aide d'ondes EM planes. Dans une première partie, je présente une approche déterministe permettant de réaliser des inversions répétées dans le temps (time-lapse) de données d'ondes EM planes en deux dimensions. Cette stratégie est basée sur l'incorporation dans l'algorithme d'informations a priori en fonction des changements du modèle de conductivité électrique attendus. Ceci est réalisé en intégrant une régularisation stochastique et des contraintes flexibles par rapport à la gamme des changements attendus en utilisant les multiplicateurs de Lagrange. J'utilise des normes différentes de la norme l2 pour contraindre la structure du modèle et obtenir des transitions abruptes entre les régions du model qui subissent des changements dans le temps et celles qui n'en subissent pas. Aussi, j'incorpore une stratégie afin d'éliminer les erreurs systématiques de données time-lapse. Ce travail a mis en évidence l'amélioration de la caractérisation des changements temporels par rapport aux approches classiques qui réalisent des inversions indépendantes à chaque pas de temps et comparent les modèles. Dans la seconde partie de cette thèse, j'adopte un formalisme bayésien et je teste la possibilité de quantifier les incertitudes sur les paramètres du modèle dans l'inversion d'ondes EM planes. Pour ce faire, je présente une stratégie d'inversion probabiliste basée sur des pixels à deux dimensions pour des inversions de données d'ondes EM planes et de tomographies de résistivité électrique (ERT) séparées et jointes. Je compare les incertitudes des paramètres du modèle en considérant différents types d'information a priori sur la structure du modèle et différentes fonctions de vraisemblance pour décrire les erreurs sur les données. Les résultats indiquent que la régularisation du modèle est nécessaire lorsqu'on a à faire à un large nombre de paramètres car cela permet d'accélérer la convergence des chaînes et d'obtenir des modèles plus réalistes. Cependent, ces contraintes mènent à des incertitudes d'estimations plus faibles, ce qui implique des distributions a posteriori qui ne contiennent pas le vrai modèledans les régions ou` la méthode présente une sensibilité limitée. Cette situation peut être améliorée en combinant des méthodes d'ondes EM planes avec d'autres méthodes complémentaires telles que l'ERT. De plus, je montre que le poids de régularisation des paramètres et l'écart-type des erreurs sur les données peuvent être retrouvés par une inversion probabiliste. Finalement, j'évalue la possibilité de caractériser une distribution tridimensionnelle d'un panache de traceur salin injecté dans le sous-sol en réalisant une inversion probabiliste time-lapse tridimensionnelle d'ondes EM planes. Etant donné que les inversions probabilistes sont très coûteuses en temps de calcul lorsque l'espace des paramètres présente une grande dimension, je propose une stratégie de réduction du modèle ou` les coefficients de décomposition des moments de Legendre du panache de traceur injecté ainsi que sa position sont estimés. Pour ce faire, un modèle de résistivité de base est nécessaire. Il peut être obtenu avant l'expérience time-lapse. Un test synthétique montre que la méthodologie marche bien quand le modèle de résistivité de base est caractérisé correctement. Cette méthodologie est aussi appliquée à un test de trac¸age par injection d'une solution saline et d'acides réalisé dans un système géothermal en Australie, puis comparée à une inversion time-lapse tridimensionnelle réalisée selon une approche déterministe. L'inversion probabiliste permet de mieux contraindre le panache du traceur salin gr^ace à la grande quantité d'informations a priori incluse dans l'algorithme. Néanmoins, les changements de conductivités nécessaires pour expliquer les changements observés dans les données sont plus grands que ce qu'expliquent notre connaissance actuelle des phénomenès physiques. Ce problème peut être lié à la qualité limitée du modèle de résistivité de base utilisé, indiquant ainsi que des efforts plus grands devront être fournis dans le futur pour obtenir des modèles de base de bonne qualité avant de réaliser des expériences dynamiques. Les études décrites dans cette thèse montrent que les méthodes d'ondes EM planes sont très utiles pour caractériser et suivre les variations temporelles du sous-sol sur de larges échelles. Les présentes approches améliorent l'évaluation des modèles obtenus, autant en termes d'incorporation d'informations a priori, qu'en termes de quantification d'incertitudes a posteriori. De plus, les stratégies développées peuvent être appliquées à d'autres méthodes géophysiques, et offrent une grande flexibilité pour l'incorporation d'informations additionnelles lorsqu'elles sont disponibles. -- The efficient use of geothermal systems, the sequestration of CO2 to mitigate climate change, and the prevention of seawater intrusion in coastal aquifers are only some examples that demonstrate the need for novel technologies to monitor subsurface processes from the surface. A main challenge is to assure optimal performance of such technologies at different temporal and spatial scales. Plane-wave electromagnetic (EM) methods are sensitive to subsurface electrical conductivity and consequently to fluid conductivity, fracture connectivity, temperature, and rock mineralogy. These methods have governing equations that are the same over a large range of frequencies, thus allowing to study in an analogous manner processes on scales ranging from few meters close to the surface down to several hundreds of kilometers depth. Unfortunately, they suffer from a significant resolution loss with depth due to the diffusive nature of the electromagnetic fields. Therefore, estimations of subsurface models that use these methods should incorporate a priori information to better constrain the models, and provide appropriate measures of model uncertainty. During my thesis, I have developed approaches to improve the static and dynamic characterization of the subsurface with plane-wave EM methods. In the first part of this thesis, I present a two-dimensional deterministic approach to perform time-lapse inversion of plane-wave EM data. The strategy is based on the incorporation of prior information into the inversion algorithm regarding the expected temporal changes in electrical conductivity. This is done by incorporating a flexible stochastic regularization and constraints regarding the expected ranges of the changes by using Lagrange multipliers. I use non-l2 norms to penalize the model update in order to obtain sharp transitions between regions that experience temporal changes and regions that do not. I also incorporate a time-lapse differencing strategy to remove systematic errors in the time-lapse inversion. This work presents improvements in the characterization of temporal changes with respect to the classical approach of performing separate inversions and computing differences between the models. In the second part of this thesis, I adopt a Bayesian framework and use Markov chain Monte Carlo (MCMC) simulations to quantify model parameter uncertainty in plane-wave EM inversion. For this purpose, I present a two-dimensional pixel-based probabilistic inversion strategy for separate and joint inversions of plane-wave EM and electrical resistivity tomography (ERT) data. I compare the uncertainties of the model parameters when considering different types of prior information on the model structure and different likelihood functions to describe the data errors. The results indicate that model regularization is necessary when dealing with a large number of model parameters because it helps to accelerate the convergence of the chains and leads to more realistic models. These constraints also lead to smaller uncertainty estimates, which imply posterior distributions that do not include the true underlying model in regions where the method has limited sensitivity. This situation can be improved by combining planewave EM methods with complimentary geophysical methods such as ERT. In addition, I show that an appropriate regularization weight and the standard deviation of the data errors can be retrieved by the MCMC inversion. Finally, I evaluate the possibility of characterizing the three-dimensional distribution of an injected water plume by performing three-dimensional time-lapse MCMC inversion of planewave EM data. Since MCMC inversion involves a significant computational burden in high parameter dimensions, I propose a model reduction strategy where the coefficients of a Legendre moment decomposition of the injected water plume and its location are estimated. For this purpose, a base resistivity model is needed which is obtained prior to the time-lapse experiment. A synthetic test shows that the methodology works well when the base resistivity model is correctly characterized. The methodology is also applied to an injection experiment performed in a geothermal system in Australia, and compared to a three-dimensional time-lapse inversion performed within a deterministic framework. The MCMC inversion better constrains the water plumes due to the larger amount of prior information that is included in the algorithm. The conductivity changes needed to explain the time-lapse data are much larger than what is physically possible based on present day understandings. This issue may be related to the base resistivity model used, therefore indicating that more efforts should be given to obtain high-quality base models prior to dynamic experiments. The studies described herein give clear evidence that plane-wave EM methods are useful to characterize and monitor the subsurface at a wide range of scales. The presented approaches contribute to an improved appraisal of the obtained models, both in terms of the incorporation of prior information in the algorithms and the posterior uncertainty quantification. In addition, the developed strategies can be applied to other geophysical methods, and offer great flexibility to incorporate additional information when available.

Veja mais

High specificity of line-immunoassay based algorithms for recent HIV-1 infection independent of viral subtype and stage of disease.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ABSTRACT: BACKGROUND: Serologic testing algorithms for recent HIV seroconversion (STARHS) provide important information for HIV surveillance. We have shown that a patient's antibody reaction in a confirmatory line immunoassay (INNO-LIATM HIV I/II Score, Innogenetics) provides information on the duration of infection. Here, we sought to further investigate the diagnostic specificity of various Inno-Lia algorithms and to identify factors affecting it. METHODS: Plasma samples of 714 selected patients of the Swiss HIV Cohort Study infected for longer than 12 months and representing all viral clades and stages of chronic HIV-1 infection were tested blindly by Inno-Lia and classified as either incident (up to 12 m) or older infection by 24 different algorithms. Of the total, 524 patients received HAART, 308 had HIV-1 RNA below 50 copies/mL, and 620 were infected by a HIV-1 non-B clade. Using logistic regression analysis we evaluated factors that might affect the specificity of these algorithms. RESULTS: HIV-1 RNA <50 copies/mL was associated with significantly lower reactivity to all five HIV-1 antigens of the Inno-Lia and impaired specificity of most algorithms. Among 412 patients either untreated or with HIV-1 RNA ≥50 copies/mL despite HAART, the median specificity of the algorithms was 96.5% (range 92.0-100%). The only factor that significantly promoted false-incident results in this group was age, with false-incident results increasing by a few percent per additional year. HIV-1 clade, HIV-1 RNA, CD4 percentage, sex, disease stage, and testing modalities exhibited no significance. Results were similar among 190 untreated patients. CONCLUSIONS: The specificity of most Inno-Lia algorithms was high and not affected by HIV-1 variability, advanced disease and other factors promoting false-recent results in other STARHS. Specificity should be good in any group of untreated HIV-1 patients.

Veja mais

Ispol'zovanie metodov teorii vosstanovleniia v modeliakh optimal'nogo dobyvaniia pishchi [Restoration theory in models of optimal feeding behavior]

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The method of stochastic dynamic programming is widely used in ecology of behavior, but has some imperfections because of use of temporal limits. The authors presented an alternative approach based on the methods of the theory of restoration. Suggested method uses cumulative energy reserves per time unit as a criterium, that leads to stationary cycles in the area of states. This approach allows to study the optimal feeding by analytic methods.

Veja mais

Stochastic Population Projection for Germany - based on the Quadratic Spline approach to modelling age specific fertility rates

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This contribution builds upon a former paper by the authors (Lipps and Betz 2004), in which a stochastic population projection for East- and West Germany is performed. Aim was to forecast relevant population parameters and their distribution in a consistent way. We now present some modifications, which have been modelled since. First, population parameters for the entire German population are modelled. In order to overcome the modelling problem of the structural break in the East during reunification, we show that the adaptation process of the relevant figures by the East can be considered to be completed by now. As a consequence, German parameters can be modelled just by using the West German historic patterns, with the start-off population of entire Germany. Second, a new model to simulate age specific fertility rates is presented, based on a quadratic spline approach. This offers a higher flexibility to model various age specific fertility curves. The simulation results are compared with the scenario based official forecasts for Germany in 2050. Exemplary for some population parameters (e.g. dependency ratio), it can be shown that the range spanned by the medium and extreme variants correspond to the s-intervals in the stochastic framework. It seems therefore more appropriate to treat this range as a s-interval covering about two thirds of the true distribution.

Veja mais

Endoleak and in-stent thrombus detection with CT angiography in a thoracic aortic aneurysm phantom at different tube energies using filtered back projection and iterative algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

PURPOSE: To determine the lower limit of dose reduction with hybrid and fully iterative reconstruction algorithms in detection of endoleaks and in-stent thrombus of thoracic aorta with computed tomographic (CT) angiography by applying protocols with different tube energies and automated tube current modulation. MATERIALS AND METHODS: The calcification insert of an anthropomorphic cardiac phantom was replaced with an aortic aneurysm model containing a stent, simulated endoleaks, and an intraluminal thrombus. CT was performed at tube energies of 120, 100, and 80 kVp with incrementally increasing noise indexes (NIs) of 16, 25, 34, 43, 52, 61, and 70 and a 2.5-mm section thickness. NI directly controls radiation exposure; a higher NI allows for greater image noise and decreases radiation. Images were reconstructed with filtered back projection (FBP) and hybrid and fully iterative algorithms. Five radiologists independently analyzed lesion conspicuity to assess sensitivity and specificity. Mean attenuation (in Hounsfield units) and standard deviation were measured in the aorta to calculate signal-to-noise ratio (SNR). Attenuation and SNR of different protocols and algorithms were analyzed with analysis of variance or Welch test depending on data distribution. RESULTS: Both sensitivity and specificity were 100% for simulated lesions on images with 2.5-mm section thickness and an NI of 25 (3.45 mGy), 34 (1.83 mGy), or 43 (1.16 mGy) at 120 kVp; an NI of 34 (1.98 mGy), 43 (1.23 mGy), or 61 (0.61 mGy) at 100 kVp; and an NI of 43 (1.46 mGy) or 70 (0.54 mGy) at 80 kVp. SNR values showed similar results. With the fully iterative algorithm, mean attenuation of the aorta decreased significantly in reduced-dose protocols in comparison with control protocols at 100 kVp (311 HU at 16 NI vs 290 HU at 70 NI, P ≤ .0011) and 80 kVp (400 HU at 16 NI vs 369 HU at 70 NI, P ≤ .0007). CONCLUSION: Endoleaks and in-stent thrombus of thoracic aorta were detectable to 1.46 mGy (80 kVp) with FBP, 1.23 mGy (100 kVp) with the hybrid algorithm, and 0.54 mGy (80 kVp) with the fully iterative algorithm.

Veja mais

Estimating dormant and active hematopoietic stem cell kinetics through extensive modeling of bromodeoxyuridine label-retaining cell dynamics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bone marrow hematopoietic stem cells (HSCs) are responsible for both lifelong daily maintenance of all blood cells and for repair after cell loss. Until recently the cellular mechanisms by which HSCs accomplish these two very different tasks remained an open question. Biological evidence has now been found for the existence of two related mouse HSC populations. First, a dormant HSC (d-HSC) population which harbors the highest self-renewal potential of all blood cells but is only induced into active self-renewal in response to hematopoietic stress. And second, an active HSC (a-HSC) subset that by and large produces the progenitors and mature cells required for maintenance of day-to-day hematopoiesis. Here we present computational analyses further supporting the d-HSC concept through extensive modeling of experimental DNA label-retaining cell (LRC) data. Our conclusion that the presence of a slowly dividing subpopulation of HSCs is the most likely explanation (amongst the various possible causes including stochastic cellular variation) of the observed long term Bromodeoxyuridine (BrdU) retention, is confirmed by the deterministic and stochastic models presented here. Moreover, modeling both HSC BrdU uptake and dilution in three stages and careful treatment of the BrdU detection sensitivity permitted improved estimates of HSC turnover rates. This analysis predicts that d-HSCs cycle about once every 149-193 days and a-HSCs about once every 28-36 days. We further predict that, using LRC assays, a 75%-92.5% purification of d-HSCs can be achieved after 59-130 days of chase. Interestingly, the d-HSC proportion is now estimated to be around 30-45% of total HSCs - more than twice that of our previous estimate.

Veja mais

On the problematic of reserving and stochastic loss models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract Traditionally, the common reserving methods used by the non-life actuaries are based on the assumption that future claims are going to behave in the same way as they did in the past. There are two main sources of variability in the processus of development of the claims: the variability of the speed with which the claims are settled and the variability between the severity of the claims from different accident years. High changes in these processes will generate distortions in the estimation of the claims reserves. The main objective of this thesis is to provide an indicator which firstly identifies and quantifies these two influences and secondly to determine which model is adequate for a specific situation. Two stochastic models were analysed and the predictive distributions of the future claims were obtained. The main advantage of the stochastic models is that they provide measures of variability of the reserves estimates. The first model (PDM) combines one conjugate family Dirichlet - Multinomial with the Poisson distribution. The second model (NBDM) improves the first one by combining two conjugate families Poisson -Gamma (for distribution of the ultimate amounts) and Dirichlet Multinomial (for distribution of the incremental claims payments). It was found that the second model allows to find the speed variability in the reporting process and development of the claims severity as function of two above mentioned distributions' parameters. These are the shape parameter of the Gamma distribution and the Dirichlet parameter. Depending on the relation between them we can decide on the adequacy of the claims reserve estimation method. The parameters have been estimated by the Methods of Moments and Maximum Likelihood. The results were tested using chosen simulation data and then using real data originating from the three lines of business: Property/Casualty, General Liability, and Accident Insurance. These data include different developments and specificities. The outcome of the thesis shows that when the Dirichlet parameter is greater than the shape parameter of the Gamma, resulting in a model with positive correlation between the past and future claims payments, suggests the Chain-Ladder method as appropriate for the claims reserve estimation. In terms of claims reserves, if the cumulated payments are high the positive correlation will imply high expectations for the future payments resulting in high claims reserves estimates. The negative correlation appears when the Dirichlet parameter is lower than the shape parameter of the Gamma, meaning low expected future payments for the same high observed cumulated payments. This corresponds to the situation when claims are reported rapidly and fewer claims remain expected subsequently. The extreme case appears in the situation when all claims are reported at the same time leading to expectations for the future payments of zero or equal to the aggregated amount of the ultimate paid claims. For this latter case, the Chain-Ladder is not recommended.

Veja mais

Essays on the treatment of cash flows under stochastic interest rates

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract This paper shows how to calculate recursively the moments of the accumulated and discounted value of cash flows when the instantaneous rates of return follow a conditional ARMA process with normally distributed innovations. We investigate various moment based approaches to approximate the distribution of the accumulated value of cash flows and we assess their performance through stochastic Monte-Carlo simulations. We discuss the potential use in insurance and especially in the context of Asset-Liability Management of pension funds.

Veja mais

Data-driven clustering : new methods and applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.

Veja mais

Comparative performance of supertree algorithms in large data sets using the soapberry family (Sapindaceae) as a case study.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.

Veja mais

Advanced geostatistical and machine-learning models for spatial data analysis of radioactively contaminated regions

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.

Veja mais

69 resultados para Stochastic algorithms

Filtro por publicador