993 resultados para Over sampling
Resumo:
L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.
Resumo:
Roadside surveys such as the Breeding Bird Survey (BBS) are widely used to assess the relative abundance of bird populations. The accuracy of roadside surveys depends on the extent to which surveys from roads represent the entire region under study. We quantified roadside land cover sampling bias in Tennessee, USA, by comparing land cover proportions near roads to proportions of the surrounding region. Roadside surveys gave a biased estimate of patterns across the region because some land cover types were over- or underrepresented near roads. These biases changed over time, introducing varying levels of distortion into the data. We constructed simulated population trends for five bird species of management interest based on these measured roadside sampling biases and on field data on bird abundance. These simulations indicated that roadside surveys may give overly negative assessments of the population trends of early successional birds and of synanthropic birds, but not of late-successional birds. Because roadside surveys are the primary source of avian population trend information in North America, we conclude that these surveys should be corrected for roadside land cover sampling bias. In addition, current recommendations about the need to create more early successional habitat for birds may need reassessment in the light of the undersampling of this habitat by roads.
Resumo:
The North American Breeding Bird Survey (BBS) is the principal source of data to inform researchers about the status of and trend for boreal forest birds. Unfortunately, little BBS coverage is available in the boreal forest, where increasing concern over the status of species breeding there has increased interest in northward expansion of the BBS. However, high disturbance rates in the boreal forest may complicate roadside monitoring. If the roadside sampling frame does not capture variation in disturbance rates because of either road placement or the use of roads for resource extraction, biased trend estimates might result. In this study, we examined roadside bias in the proportional representation of habitat disturbance via spatial data on forest “loss,” forest fires, and anthropogenic disturbance. In each of 455 BBS routes, the area disturbed within multiple buffers away from the road was calculated and compared against the area disturbed in degree blocks and BBS strata. We found a nonlinear relationship between bias and distance from the road, suggesting forest loss and forest fires were underrepresented below 75 and 100 m, respectively. In contrast, anthropogenic disturbance was overrepresented at distances below 500 m and underrepresented thereafter. After accounting for distance from road, BBS routes were reasonably representative of the degree blocks they were within, with only a few strata showing biased representation. In general, anthropogenic disturbance is overrepresented in southern strata, and forest fires are underrepresented in almost all strata. Similar biases exist when comparing the entire road network and the subset sampled by BBS routes against the amount of disturbance within BBS strata; however, the magnitude of biases differed. Based on our results, we recommend that spatial stratification and rotating panel designs be used to spread limited BBS and off-road sampling effort in an unbiased fashion and that new BBS routes be established where sufficient road coverage exists.
Resumo:
In the Radiative Atmospheric Divergence Using ARM Mobile Facility GERB and AMMA Stations (RADAGAST) project we calculate the divergence of radiative flux across the atmosphere by comparing fluxes measured at each end of an atmospheric column above Niamey, in the African Sahel region. The combination of broadband flux measurements from geostationary orbit and the deployment for over 12 months of a comprehensive suite of active and passive instrumentation at the surface eliminates a number of sampling issues that could otherwise affect divergence calculations of this sort. However, one sampling issue that challenges the project is the fact that the surface flux data are essentially measurements made at a point, while the top-of-atmosphere values are taken over a solid angle that corresponds to an area at the surface of some 2500 km2. Variability of cloud cover and aerosol loading in the atmosphere mean that the downwelling fluxes, even when averaged over a day, will not be an exact match to the area-averaged value over that larger area, although we might expect that it is an unbiased estimate thereof. The heterogeneity of the surface, for example, fixed variations in albedo, further means that there is a likely systematic difference in the corresponding upwelling fluxes. In this paper we characterize and quantify this spatial sampling problem. We bound the root-mean-square error in the downwelling fluxes by exploiting a second set of surface flux measurements from a site that was run in parallel with the main deployment. The differences in the two sets of fluxes lead us to an upper bound to the sampling uncertainty, and their correlation leads to another which is probably optimistic as it requires certain other conditions to be met. For the upwelling fluxes we use data products from a number of satellite instruments to characterize the relevant heterogeneities and so estimate the systematic effects that arise from the flux measurements having to be taken at a single point. The sampling uncertainties vary with the season, being higher during the monsoon period. We find that the sampling errors for the daily average flux are small for the shortwave irradiance, generally less than 5 W m−2, under relatively clear skies, but these increase to about 10 W m−2 during the monsoon. For the upwelling fluxes, again taking daily averages, systematic errors are of order 10 W m−2 as a result of albedo variability. The uncertainty on the longwave component of the surface radiation budget is smaller than that on the shortwave component, in all conditions, but a bias of 4 W m−2 is calculated to exist in the surface leaving longwave flux.
Resumo:
The Representative Soil Sampling Scheme (RSSS) has monitored the soil of agricultural land in England and Wales since 1969. Here we describe the first spatial analysis of the data from these surveys using geostatistics. Four years of data (1971, 1981, 1991 and 2001) were chosen to examine the nutrient (available K, Mg and P) and pH status of the soil. At each farm, four fields were sampled; however, for the earlier years, coordinates were available for the farm only and not for each field. The averaged data for each farm were used for spatial analysis and the variograms showed spatial structure even with the smaller sample size. These variograms provide a reasonable summary of the larger scale of variation identified from the data of the more intensively sampled National Soil Inventory. Maps of kriged predictions of K generally show larger values in the central and southeastern areas (above 200 mg L-1) and an increase in values in the west over time, whereas Mg is fairly stable over time. The kriged predictions of P show a decline over time, particularly in the east, and those of pH show an increase in the east over time. Disjunctive kriging was used to examine temporal changes in available P using probabilities less than given thresholds of this element. The RSSS was not designed for spatial analysis, but the results show that the data from these surveys are suitable for this purpose. The results of the spatial analysis, together with those of the statistical analyses, provide a comprehensive view of the RSSS database as a basis for monitoring the soil. These data should be taken into account when future national soil monitoring schemes are designed.
Resumo:
It has been generally accepted that the method of moments (MoM) variogram, which has been widely applied in soil science, requires about 100 sites at an appropriate interval apart to describe the variation adequately. This sample size is often larger than can be afforded for soil surveys of agricultural fields or contaminated sites. Furthermore, it might be a much larger sample size than is needed where the scale of variation is large. A possible alternative in such situations is the residual maximum likelihood (REML) variogram because fewer data appear to be required. The REML method is parametric and is considered reliable where there is trend in the data because it is based on generalized increments that filter trend out and only the covariance parameters are estimated. Previous research has suggested that fewer data are needed to compute a reliable variogram using a maximum likelihood approach such as REML, however, the results can vary according to the nature of the spatial variation. There remain issues to examine: how many fewer data can be used, how should the sampling sites be distributed over the site of interest, and how do different degrees of spatial variation affect the data requirements? The soil of four field sites of different size, physiography, parent material and soil type was sampled intensively, and MoM and REML variograms were calculated for clay content. The data were then sub-sampled to give different sample sizes and distributions of sites and the variograms were computed again. The model parameters for the sets of variograms for each site were used for cross-validation. Predictions based on REML variograms were generally more accurate than those from MoM variograms with fewer than 100 sampling sites. A sample size of around 50 sites at an appropriate distance apart, possibly determined from variograms of ancillary data, appears adequate to compute REML variograms for kriging soil properties for precision agriculture and contaminated sites. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
The Representative Soil Sampling Scheme of England and Wales has recorded information on the soil of agricultural land in England and Wales since 1969. It is a valuable source of information about the soil in the context of monitoring for sustainable agricultural development. Changes in soil nutrient status and pH were examined over the period 1971-2001. Several methods of statistical analysis were applied to data from the surveys during this period. The main focus here is on the data for 1971, 1981, 1991 and 2001. The results of examining change over time in general show that levels of potassium in the soil have increased, those of magnesium have remained fairly constant, those of phosphorus have declined and pH has changed little. Future sampling needs have been assessed in the context of monitoring, to determine the mean at a given level of confidence and tolerable error and to detect change in the mean over time at these same levels over periods of 5 and 10 years. The results of a non-hierarchical multivariate classification suggest that England and Wales could be stratified to optimize future sampling and analysis. To monitor soil quality and health more generally than for agriculture, more of the country should be sampled and a wider range of properties recorded.
Resumo:
[1] Cloud cover is conventionally estimated from satellite images as the observed fraction of cloudy pixels. Active instruments such as radar and Lidar observe in narrow transects that sample only a small percentage of the area over which the cloud fraction is estimated. As a consequence, the fraction estimate has an associated sampling uncertainty, which usually remains unspecified. This paper extends a Bayesian method of cloud fraction estimation, which also provides an analytical estimate of the sampling error. This method is applied to test the sensitivity of this error to sampling characteristics, such as the number of observed transects and the variability of the underlying cloud field. The dependence of the uncertainty on these characteristics is investigated using synthetic data simulated to have properties closely resembling observations of the spaceborne Lidar NASA-LITE mission. Results suggest that the variance of the cloud fraction is greatest for medium cloud cover and least when conditions are mostly cloudy or clear. However, there is a bias in the estimation, which is greatest around 25% and 75% cloud cover. The sampling uncertainty is also affected by the mean lengths of clouds and of clear intervals; shorter lengths decrease uncertainty, primarily because there are more cloud observations in a transect of a given length. Uncertainty also falls with increasing number of transects. Therefore a sampling strategy aimed at minimizing the uncertainty in transect derived cloud fraction will have to take into account both the cloud and clear sky length distributions as well as the cloud fraction of the observed field. These conclusions have implications for the design of future satellite missions. This paper describes the first integrated methodology for the analytical assessment of sampling uncertainty in cloud fraction observations from forthcoming spaceborne radar and Lidar missions such as NASA's Calipso and CloudSat.
Resumo:
Seed of 15 species of Brassicaceae were stored hermetically in a genebank (at -5 degrees C to -10 degrees C with c. 3% moisture content) for 40 years. Samples were withdrawn at intervals for germination tests. Many accessions showed an increase in ability to germinate over this period. due to loss in dormancy. Nevertheless, some dormancy remained after 40 years' storage and was broken by pre-applied gibberellic acid. The poorest seed survival occurred in Hormatophylla spinosa. Even in this accession the ability to germinate declined by only 7% between 1966 and 2006. Comparison of seeds from 1966 stored for 40 years with those collected anew in 2006 from the original sampling sites, where possible, showed few differences, other than a tendency (7 of 9 accessions) for the latter to show greater dormancy. These results for hermetic storage at sub-zero temperatures and low moisture contents confirm that long-term seed storage can provide a successful technology for ex situ plant biodiversity conservation.
Resumo:
The sensitivity of 73 isolates of Mycosphaerella graminicola collected over the period 1993–2002 from wheat fields in South England was tested in vitro against the triazole fluquinconazole, the strobilurin azoxystrobin and to the imidazole prochloraz. Over the sampling period, sensitivity of the population to fluquinconazole and prochloraz decreased by factors of approximately 10 and 2, respectively, but there was no evidence of changes in sensitivity to azoxystrobin. There was no correlation between sensitivity to fluquinconazole and prochloraz, but there was a weak negative cross-resistance between fluquinconazole and azoxystrobin.
Resumo:
The sampling of certain solid angle is a fundamental operation in realistic image synthesis, where the rendering equation describing the light propagation in closed domains is solved. Monte Carlo methods for solving the rendering equation use sampling of the solid angle subtended by unit hemisphere or unit sphere in order to perform the numerical integration of the rendering equation. In this work we consider the problem for generation of uniformly distributed random samples over hemisphere and sphere. Our aim is to construct and study the parallel sampling scheme for hemisphere and sphere. First we apply the symmetry property for partitioning of hemisphere and sphere. The domain of solid angle subtended by a hemisphere is divided into a number of equal sub-domains. Each sub-domain represents solid angle subtended by orthogonal spherical triangle with fixed vertices and computable parameters. Then we introduce two new algorithms for sampling of orthogonal spherical triangles. Both algorithms are based on a transformation of the unit square. Similarly to the Arvo's algorithm for sampling of arbitrary spherical triangle the suggested algorithms accommodate the stratified sampling. We derive the necessary transformations for the algorithms. The first sampling algorithm generates a sample by mapping of the unit square onto orthogonal spherical triangle. The second algorithm directly compute the unit radius vector of a sampling point inside to the orthogonal spherical triangle. The sampling of total hemisphere and sphere is performed in parallel for all sub-domains simultaneously by using the symmetry property of partitioning. The applicability of the corresponding parallel sampling scheme for Monte Carlo and Quasi-D/lonte Carlo solving of rendering equation is discussed.
Resumo:
High spatial resolution vertical profiles of pore-water chemistry have been obtained for a peatland using diffusive equilibrium in thin films (DET) gel probes. Comparison of DET pore-water data with more traditional depth-specific sampling shows good agreement and the DET profiling method is less invasive and less likely to induce mixing of pore-waters. Chloride mass balances as water tables fell in the early summer indicate that evaporative concentration dominates and there is negligible lateral flow in the peat. Lack of lateral flow allows element budgets for the same site at different times to be compared. The high spatial resolution of sampling also enables gradients to be observed that permit calculations of vertical fluxes. Sulfate concentrations fall at two sites with net rates of 1.5 and 5.0nmol cm− 3 day− 1, likely due to a dominance of bacterial sulfate reduction, while a third site showed a net gain in sulfate due to oxidation of sulfur over the study period at an average rate of 3.4nmol cm− 3 day− 1. Behaviour of iron is closely coupled to that of sulfur; there is net removal of iron at the two sites where sulfate reduction dominates and addition of iron where oxidation dominates. The profiles demonstrate that, in addition to strong vertical redox related chemical changes, there is significant spatial heterogeneity. Whilst overall there is evidence for net reduction of sulfate within the peatland pore-waters, this can be reversed, at least temporarily, during periods of drought when sulfide oxidation with resulting acid production predominates.
Resumo:
Urban boundary layers (UBLs) can be highly complex due to the heterogeneous roughness and heating of the surface, particularly at night. Due to a general lack of observations, it is not clear whether canonical models of boundary layer mixing are appropriate in modelling air quality in urban areas. This paper reports Doppler lidar observations of turbulence profiles in the centre of London, UK, as part of the second REPARTEE campaign in autumn 2007. Lidar-measured standard deviation of vertical velocity averaged over 30 min intervals generally compared well with in situ sonic anemometer measurements at 190 m on the BT telecommunications Tower. During calm, nocturnal periods, the lidar underestimated turbulent mixing due mainly to limited sampling rate. Mixing height derived from the turbulence, and aerosol layer height from the backscatter profiles, showed similar diurnal cycles ranging from c. 300 to 800 m, increasing to c. 200 to 850 m under clear skies. The aerosol layer height was sometimes significantly different to the mixing height, particularly at night under clear skies. For convective and neutral cases, the scaled turbulence profiles resembled canonical results; this was less clear for the stable case. Lidar observations clearly showed enhanced mixing beneath stratocumulus clouds reaching down on occasion to approximately half daytime boundary layer depth. On one occasion the nocturnal turbulent structure was consistent with a nocturnal jet, suggesting a stable layer. Given the general agreement between observations and canonical turbulence profiles, mixing timescales were calculated for passive scalars released at street level to reach the BT Tower using existing models of turbulent mixing. It was estimated to take c. 10 min to diffuse up to 190 m, rising to between 20 and 50 min at night, depending on stability. Determination of mixing timescales is important when comparing to physico-chemical processes acting on pollutant species measured simultaneously at both the ground and at the BT Tower during the campaign. From the 3 week autumnal data-set there is evidence for occasional stable layers in central London, effectively decoupling surface emissions from air aloft.
Resumo:
We report on the consistency of water vapour line intensities in selected spectral regions between 800–12,000 cm−1 under atmospheric conditions using sun-pointing Fourier transform infrared spectroscopy. Measurements were made across a number of days at both a low and high altitude field site, sampling a relatively moist and relatively dry atmosphere. Our data suggests that across most of the 800–12,000 cm−1 spectral region water vapour line intensities in recent spectral line databases are generally consistent with what was observed. However, we find that HITRAN-2008 water vapour line intensities are systematically lower by up to 20% in the 8000–9200 cm−1 spectral interval relative to other spectral regions. This discrepancy is essentially removed when two new linelists (UCL08, a compilation of linelists and ab-initio calculations, and one based on recent laboratory measurements by Oudot et al. (2010) [10] in the 8000–9200 cm−1 spectral region) are used. This strongly suggests that the H2O line strengths in the HITRAN-2008 database are indeed underestimated in this spectral region and in need of revision. The calculated global-mean clear-sky absorption of solar radiation is increased by about 0.3 W m−2 when using either the UCL08 or Oudot line parameters in the 8000–9200 cm−1 region, instead of HITRAN-2008. We also found that the effect of isotopic fractionation of HDO is evident in the 2500–2900 cm−1 region in the observations.
Resumo:
The decline of bees has raised concerns regarding their conservation and the maintenance of ecosystem services they provide to bee-pollinated wild flowers and crops. Although the Mediterranean region is a hotspot for bee species richness, their status remains poorly studied. There is an urgent need for cost-effective, reliable, and unbiased sampling methods that give good bee species richness estimates. This study aims: (a) to assess bee species richness in two common Mediterranean habitat types: semi-natural scrub (phrygana) and managed olive groves; (b) to compare species richness in those systems to that of other biogeographic regions, and (c) to assess whether six different sampling methods (pan traps, variable and standardized transect walks, observation plots and trap nests), previously tested in other European biogeographic regions, are suitable in Mediterranean communities. Eight study sites, four per habitat type, were selected on the island of Lesvos, Greece. The species richness observed was high compared to other habitat types worldwide for which comparable data exist. Pan traps collected the highest proportion of the total bee species richness across all methods at the scale of a study site. Variable and standardized transect walks detected the highest total richness over all eight study sites. Trap nests and observation plots detected only a limited fraction of the bee species richness. To assess the total bee species richness in bee diversity hotspots, such as the studied habitats, we suggest a combination of transect walks conducted by trained bee collectors and pan trap sampling