76 resultados para heterogeneous regressions algorithms
em Université de Lausanne, Switzerland
Resumo:
There is increasing evidence to suggest that the presence of mesoscopic heterogeneities constitutes the predominant attenuation mechanism at seismic frequencies. As a consequence, centimeter-scale perturbations of the subsurface physical properties should be taken into account for seismic modeling whenever detailed and accurate responses of the target structures are desired. This is, however, computationally prohibitive since extremely small grid spacings would be necessary. A convenient way to circumvent this problem is to use an upscaling procedure to replace the heterogeneous porous media by equivalent visco-elastic solids. In this work, we solve Biot's equations of motion to perform numerical simulations of seismic wave propagation through porous media containing mesoscopic heterogeneities. We then use an upscaling procedure to replace the heterogeneous poro-elastic regions by homogeneous equivalent visco-elastic solids and repeat the simulations using visco-elastic equations of motion. We find that, despite the equivalent attenuation behavior of the heterogeneous poro-elastic medium and the equivalent visco-elastic solid, the seismograms may differ due to diverging boundary conditions at fluid-solid interfaces, where there exist additional options for the poro-elastic case. In particular, we observe that the seismograms agree for closed-pore boundary conditions, but differ significantly for open-pore boundary conditions. This is an interesting result, which has potentially important implications for wave-equation-based algorithms in exploration geophysics involving fluid-solid interfaces, such as, for example, wave field decomposition.
Resumo:
There is increasing evidence to suggest that the presence of mesoscopic heterogeneities constitutes an important seismic attenuation mechanism in porous rocks. As a consequence, centimetre-scale perturbations of the rock physical properties should be taken into account for seismic modelling whenever detailed and accurate responses of specific target structures are desired, which is, however, computationally prohibitive. A convenient way to circumvent this problem is to use an upscaling procedure to replace each of the heterogeneous porous media composing the geological model by corresponding equivalent visco-elastic solids and to solve the visco-elastic equations of motion for the inferred equivalent model. While the overall qualitative validity of this procedure is well established, there are as of yet no quantitative analyses regarding the equivalence of the seismograms resulting from the original poro-elastic and the corresponding upscaled visco-elastic models. To address this issue, we compare poro-elastic and visco-elastic solutions for a range of marine-type models of increasing complexity. We found that despite the identical dispersion and attenuation behaviour of the heterogeneous poro-elastic and the equivalent visco-elastic media, the seismograms may differ substantially due to diverging boundary conditions, where there exist additional options for the poro-elastic case. In particular, we observe that at the fluid/porous-solid interface, the poro- and visco-elastic seismograms agree for closed-pore boundary conditions, but differ significantly for open-pore boundary conditions. This is an important result which has potentially far-reaching implications for wave-equation-based algorithms in exploration geophysics involving fluid/porous-solid interfaces, such as, for example, wavefield decomposition.
Resumo:
Unraveling the effect of selection vs. drift on the evolution of quantitative traits is commonly achieved by one of two methods. Either one contrasts population differentiation estimates for genetic markers and quantitative traits (the Q(st)-F(st) contrast) or multivariate methods are used to study the covariance between sets of traits. In particular, many studies have focused on the genetic variance-covariance matrix (the G matrix). However, both drift and selection can cause changes in G. To understand their joint effects, we recently combined the two methods into a single test (accompanying article by Martin et al.), which we apply here to a network of 16 natural populations of the freshwater snail Galba truncatula. Using this new neutrality test, extended to hierarchical population structures, we studied the multivariate equivalent of the Q(st)-F(st) contrast for several life-history traits of G. truncatula. We found strong evidence of selection acting on multivariate phenotypes. Selection was homogeneous among populations within each habitat and heterogeneous between habitats. We found that the G matrices were relatively stable within each habitat, with proportionality between the among-populations (D) and the within-populations (G) covariance matrices. The effect of habitat heterogeneity is to break this proportionality because of selection for habitat-dependent optima. Individual-based simulations mimicking our empirical system confirmed that these patterns are expected under the selective regime inferred. We show that homogenizing selection can mimic some effect of drift on the G matrix (G and D almost proportional), but that incorporating information from molecular markers (multivariate Q(st)-F(st)) allows disentangling the two effects.
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.
Resumo:
Knowledge of the spatial distribution of hydraulic conductivity (K) within an aquifer is critical for reliable predictions of solute transport and the development of effective groundwater management and/or remediation strategies. While core analyses and hydraulic logging can provide highly detailed information, such information is inherently localized around boreholes that tend to be sparsely distributed throughout the aquifer volume. Conversely, larger-scale hydraulic experiments like pumping and tracer tests provide relatively low-resolution estimates of K in the investigated subsurface region. As a result, traditional hydrogeological measurement techniques contain a gap in terms of spatial resolution and coverage, and they are often alone inadequate for characterizing heterogeneous aquifers. Geophysical methods have the potential to bridge this gap. The recent increased interest in the application of geophysical methods to hydrogeological problems is clearly evidenced by the formation and rapid growth of the domain of hydrogeophysics over the past decade (e.g., Rubin and Hubbard, 2005).
Resumo:
MAGE-encoded antigens, which are expressed by tumors of many histological types but not in normal tissues, are suitable candidates for vaccine-based immunotherapy of cancers. Thus far, however, T-cell responses to MAGE antigens have been detected only occasionally in cancer patients. In contrast, by using HLA/peptide fluorescent tetramers, we have observed recently that CD8(+) T cells specific for peptide MAGE-A10(254-262) can be detected frequently in peptide-stimulated peripheral blood mononuclear cells from HLA-A2-expressing melanoma patients and healthy donors. On the basis of these results, antitumoral vaccination trials using peptide MAGE-A10(254-262) have been implemented recently. In the present study, we have characterized MAGE-A10(254-262)-specific CD8(+) T cells in polyclonal cultures and at the clonal level. The results indicate that the repertoire of MAGE-A10(254-262)-specific CD8(+) T cells is diverse both in terms of clonal composition, efficiency of peptide recognition, and tumor-specific lytic activity. Importantly, only CD8(+) T cells able to recognize the antigenic peptide with high efficiency are able to lyse MAGE-A10-expressing tumor cells. Under defined experimental conditions, the tetramer staining intensity exhibited by MAGE-A10(254-262)-specific CD8(+) T cells correlates with efficiency of peptide recognition so that "high" and "low" avidity cells can be separated by FACS. Altogether, the data reported here provide evidence for functional diversity of MAGE-A10(254-262)-specific T cells and will be instrumental for the monitoring of peptide MAGE-A10(254-262)-based clinical trials.
Resumo:
Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin, and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.
Resumo:
The paper presents an approach for mapping of precipitation data. The main goal is to perform spatial predictions and simulations of precipitation fields using geostatistical methods (ordinary kriging, kriging with external drift) as well as machine learning algorithms (neural networks). More practically, the objective is to reproduce simultaneously both the spatial patterns and the extreme values. This objective is best reached by models integrating geostatistics and machine learning algorithms. To demonstrate how such models work, two case studies have been considered: first, a 2-day accumulation of heavy precipitation and second, a 6-day accumulation of extreme orographic precipitation. The first example is used to compare the performance of two optimization algorithms (conjugate gradients and Levenberg-Marquardt) of a neural network for the reproduction of extreme values. Hybrid models, which combine geostatistical and machine learning algorithms, are also treated in this context. The second dataset is used to analyze the contribution of radar Doppler imagery when used as external drift or as input in the models (kriging with external drift and neural networks). Model assessment is carried out by comparing independent validation errors as well as analyzing data patterns.
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.