954 resultados para sampling methods


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we address the problem of extracting representative point samples from polygonal models. The goal of such a sampling algorithm is to find points that are evenly distributed. We propose star-discrepancy as a measure for sampling quality and propose new sampling methods based on global line distributions. We investigate several line generation algorithms including an efficient hardware-based sampling method. Our method contributes to the area of point-based graphics by extracting points that are more evenly distributed than by sampling with current algorithms

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The decline of bees has raised concerns regarding their conservation and the maintenance of ecosystem services they provide to bee-pollinated wild flowers and crops. Although the Mediterranean region is a hotspot for bee species richness, their status remains poorly studied. There is an urgent need for cost-effective, reliable, and unbiased sampling methods that give good bee species richness estimates. This study aims: (a) to assess bee species richness in two common Mediterranean habitat types: semi-natural scrub (phrygana) and managed olive groves; (b) to compare species richness in those systems to that of other biogeographic regions, and (c) to assess whether six different sampling methods (pan traps, variable and standardized transect walks, observation plots and trap nests), previously tested in other European biogeographic regions, are suitable in Mediterranean communities. Eight study sites, four per habitat type, were selected on the island of Lesvos, Greece. The species richness observed was high compared to other habitat types worldwide for which comparable data exist. Pan traps collected the highest proportion of the total bee species richness across all methods at the scale of a study site. Variable and standardized transect walks detected the highest total richness over all eight study sites. Trap nests and observation plots detected only a limited fraction of the bee species richness. To assess the total bee species richness in bee diversity hotspots, such as the studied habitats, we suggest a combination of transect walks conducted by trained bee collectors and pan trap sampling

Relevância:

70.00% 70.00%

Publicador:

Resumo:

One of the fundamental machine learning tasks is that of predictive classification. Given that organisations collect an ever increasing amount of data, predictive classification methods must be able to effectively and efficiently handle large amounts of data. However, it is understood that present requirements push existing algorithms to, and sometimes beyond, their limits since many classification prediction algorithms were designed when currently common data set sizes were beyond imagination. This has led to a significant amount of research into ways of making classification learning algorithms more effective and efficient. Although substantial progress has been made, a number of key questions have not been answered. This dissertation investigates two of these key questions. The first is whether different types of algorithms to those currently employed are required when using large data sets. This is answered by analysis of the way in which the bias plus variance decomposition of predictive classification error changes as training set size is increased. Experiments find that larger training sets require different types of algorithms to those currently used. Some insight into the characteristics of suitable algorithms is provided, and this may provide some direction for the development of future classification prediction algorithms which are specifically designed for use with large data sets. The second question investigated is that of the role of sampling in machine learning with large data sets. Sampling has long been used as a means of avoiding the need to scale up algorithms to suit the size of the data set by scaling down the size of the data sets to suit the algorithm. However, the costs of performing sampling have not been widely explored. Two popular sampling methods are compared with learning from all available data in terms of predictive accuracy, model complexity, and execution time. The comparison shows that sub-sampling generally products models with accuracy close to, and sometimes greater than, that obtainable from learning with all available data. This result suggests that it may be possible to develop algorithms that take advantage of the sub-sampling methodology to reduce the time required to infer a model while sacrificing little if any accuracy. Methods of improving effective and efficient learning via sampling are also investigated, and now sampling methodologies proposed. These methodologies include using a varying-proportion of instances to determine the next inference step and using a statistical calculation at each inference step to determine sufficient sample size. Experiments show that using a statistical calculation of sample size can not only substantially reduce execution time but can do so with only a small loss, and occasional gain, in accuracy. One of the common uses of sampling is in the construction of learning curves. Learning curves are often used to attempt to determine the optimal training size which will maximally reduce execution time while nut being detrimental to accuracy. An analysis of the performance of methods for detection of convergence of learning curves is performed, with the focus of the analysis on methods that calculate the gradient, of the tangent to the curve. Given that such methods can be susceptible to local accuracy plateaus, an investigation into the frequency of local plateaus is also performed. It is shown that local accuracy plateaus are a common occurrence, and that ensuring a small loss of accuracy often results in greater computational cost than learning from all available data. These results cast doubt over the applicability of gradient of tangent methods for detecting convergence, and of the viability of learning curves for reducing execution time in general.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Sampling animals from the wild for study is something nearly every biologist has done, but despite our best efforts to obtain random samples of animals, ‘hidden’ trait biases may still exist. For example, consistent behavioral traits can affect trappability/catchability, independent of obvious factors such as size and gender, and these traits are often correlated with other repeatable physiological and/or life history traits. If so, systematic sampling bias may exist for any of these traits. The extent to which this is a problem, of course, depends on the magnitude of bias, which is presently unknown because the underlying trait distributions in populations are usually unknown, or unknowable. Indeed, our present knowledge about sampling bias comes from samples (not complete population censuses), which can possess bias to begin with. I had the unique opportunity to create naturalized populations of fish by seeding each of four small fishless lakes with equal densities of slow-, intermediate-, and fast-growing fish. Using sampling methods that are not size-selective, I observed that fast-growing fish were up to two-times more likely to be sampled than slower-growing fish. This indicates substantial and systematic bias with respect to an important life history trait (growth rate). If correlations between behavioral, physiological and life-history traits are as widespread as the literature suggests, then many animal samples may be systematically biased with respect to these traits (e.g., when collecting animals for laboratory use), and affect our inferences about population structure and abundance. I conclude with a discussion on ways to minimize sampling bias for particular physiological/behavioral/life-history types within animal populations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Killer whale (Orcinus orca Linnaeus, 1758) abundance in the North Pacific is known only for a few populations for which extensive longitudinal data are available, with little quantitative data from more remote regions. Line-transect ship surveys were conducted in July and August of 2001–2003 in coastal waters of the western Gulf of Alaska and the Aleutian Islands. Conventional and Multiple Covariate Distance Sampling methods were used to estimate the abundance of different killer whale ecotypes, which were distinguished based upon morphological and genetic data. Abundance was calculated separately for two data sets that differed in the method by which killer whale group size data were obtained. Initial group size (IGS) data corresponded to estimates of group size at the time of first sighting, and post-encounter group size (PEGS) corresponded to estimates made after closely approaching sighted groups.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We consider a fully model-based approach for the analysis of distance sampling data. Distance sampling has been widely used to estimate abundance (or density) of animals or plants in a spatially explicit study area. There is, however, no readily available method of making statistical inference on the relationships between abundance and environmental covariates. Spatial Poisson process likelihoods can be used to simultaneously estimate detection and intensity parameters by modeling distance sampling data as a thinned spatial point process. A model-based spatial approach to distance sampling data has three main benefits: it allows complex and opportunistic transect designs to be employed, it allows estimation of abundance in small subregions, and it provides a framework to assess the effects of habitat or experimental manipulation on density. We demonstrate the model-based methodology with a small simulation study and analysis of the Dubbo weed data set. In addition, a simple ad hoc method for handling overdispersion is also proposed. The simulation study showed that the model-based approach compared favorably to conventional distance sampling methods for abundance estimation. In addition, the overdispersion correction performed adequately when the number of transects was high. Analysis of the Dubbo data set indicated a transect effect on abundance via Akaike’s information criterion model selection. Further goodness-of-fit analysis, however, indicated some potential confounding of intensity with the detection function.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

"How large a sample is needed to survey the bird damage to corn in a county in Ohio or New Jersey or South Dakota?" Like those in the Bureau of Sport Fisheries and Wildlife and the U.S.D.A. who have been faced with a question of this sort we found only meager information on which to base an answer, whether the problem related to a county in Ohio or to one in New Jersey, or elsewhere. Many sampling methods and rates of sampling did yield reliable estimates but the judgment was often intuitive or based on the reasonableness of the resulting data. Later, when planning the next study or survey, little additional information was available on whether 40 samples of 5 ears each or 5 samples of 200 ears should be examined, i.e., examination of a large number of small samples or a small number of large samples. What information is needed to make a reliable decision? Those of us involved with the Agricultural Experiment Station regional project concerned with the problems of bird damage to crops, known as NE-49, thought we might supply an ans¬wer if we had a corn field in which all the damage was measured. If all the damage were known, we could then sample this field in various ways and see how the estimates from these samplings compared to the actual damage and pin-point the best and most accurate sampling procedure. Eventually the investigators in four states became involved in this work1 and instead of one field we were able to broaden the geographical base by examining all the corn ears in 2 half-acre sections of fields in each state, 8 sections in all. When the corn had matured well past the dough stage, damage on each corn ear was assessed, without removing the ear from the stalk, by visually estimating the percent of the kernel surface which had been destroyed and rating it in one of 5 damage categories. Measurements (by row-centimeters) of the rows of kernels pecked by birds also were made on selected ears representing all categories and all parts of each field section. These measurements provided conversion factors that, when fed into a computer, were applied to the more than 72,000 visually assessed ears. The machine now had in its memory and could supply on demand a map showing each ear, its location and the intensity of the damage.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Studies have shown similarities in the microflora between titanium implants or tooth sites when samples are taken by gingival crevicular fluid (GCF) sampling methods. The purpose of the present study was to study the microflora from curette and GCF samples using the checkerboard DNA-DNA hybridization method to assess the microflora of patients who had at least one oral osseo-integrated implant and who were otherwise dentate. Plaque samples were taken from tooth/implant surfaces and from sulcular gingival surfaces with curettes, and from gingival fluid using filter papers. A total of 28 subjects (11 females) were enrolled in the study. The mean age of the subjects was 64.1 years (SD+/-4.7). On average, the implants studied had been in function for 3.7 years (SD+/-2.9). The proportion of Streptococcus oralis (P<0.02) and Fusobacterium periodonticum (P<0.02) was significantly higher at tooth sites (curette samples). The GCF samples yielded higher proportions for 28/40 species studies (P-values varying between 0.05 and 0.001). The proportions of Tannerella forsythia (T. forsythensis), and Treponema denticola were both higher in GCF samples (P<0.02 and P<0.05, respectively) than in curette samples (implant sites). The microbial composition in gingival fluid from samples taken at implant sites differed partly from that of curette samples taken from implant surfaces or from sulcular soft tissues, providing higher counts for most bacteria studied at implant surfaces, but with the exception of Porphyromonas gingivalis. A combination of GCF and curette sampling methods might be the most representative sample method.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Proteins are linear chain molecules made out of amino acids. Only when they fold to their native states, they become functional. This dissertation aims to model the solvent (environment) effect and to develop & implement enhanced sampling methods that enable a reliable study of the protein folding problem in silico. We have developed an enhanced solvation model based on the solution to the Poisson-Boltzmann equation in order to describe the solvent effect. Following the quantum mechanical Polarizable Continuum Model (PCM), we decomposed net solvation free energy into three physical terms– Polarization, Dispersion and Cavitation. All the terms were implemented, analyzed and parametrized individually to obtain a high level of accuracy. In order to describe the thermodynamics of proteins, their conformational space needs to be sampled thoroughly. Simulations of proteins are hampered by slow relaxation due to their rugged free-energy landscape, with the barriers between minima being higher than the thermal energy at physiological temperatures. In order to overcome this problem a number of approaches have been proposed of which replica exchange method (REM) is the most popular. In this dissertation we describe a new variant of canonical replica exchange method in the context of molecular dynamic simulation. The advantage of this new method is the easily tunable high acceptance rate for the replica exchange. We call our method Microcanonical Replica Exchange Molecular Dynamic (MREMD). We have described the theoretical frame work, comment on its actual implementation, and its application to Trp-cage mini-protein in implicit solvent. We have been able to correctly predict the folding thermodynamics of this protein using our approach.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Methods are described for working with Nosema apis and Nosema ceranae in the field and in the laboratory. For fieldwork, different sampling methods are described to determine colony level infections at a given point in time, but also for following the temporal infection dynamics. Suggestions are made for how to standardise field trials for evaluating treatments and disease impact. The laboratory methods described include different means for determining colony level and individual bee infection levels and methods for species determination, including light microscopy, electron microscopy, and molecular methods (PCR). Suggestions are made for how to standardise cage trials, and different inoculation methods for infecting bees are described, including control methods for spore viability. A cell culture system for in vitro rearing of Nosema spp. is described. Finally, how to conduct different types of experiments are described, including infectious dose, dose effects, course of infection and longevity tests

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conservation and monitoring of forest biodiversity requires reliable information about forest structure and composition at multiple spatial scales. However, detailed data about forest habitat characteristics across large areas are often incomplete due to difficulties associated with field sampling methods. To overcome this limitation we employed a nationally available light detection and ranging (LiDAR) remote sensing dataset to develop variables describing forest landscape structure across a large environmental gradient in Switzerland. Using a model species indicative of structurally rich mountain forests (hazel grouse Bonasa bonasia), we tested the potential of such variables to predict species occurrence and evaluated the additional benefit of LiDAR data when used in combination with traditional, sample plot-based field variables. We calibrated boosted regression trees (BRT) models for both variable sets separately and in combination, and compared the models’ accuracies. While both field-based and LiDAR models performed well, combining the two data sources improved the accuracy of the species’ habitat model. The variables retained from the two datasets held different types of information: field variables mostly quantified food resources and cover in the field and shrub layer, LiDAR variables characterized heterogeneity of vegetation structure which correlated with field variables describing the understory and ground vegetation. When combined with data on forest vegetation composition from field surveys, LiDAR provides valuable complementary information for encompassing species niches more comprehensively. Thus, LiDAR bridges the gap between precise, locally restricted field-data and coarse digital land cover information by reliably identifying habitat structure and quality across large areas.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Research on open source software (OSS) projects often focuses on the SourceForge collaboration platform. We argue that a GNU/Linwr distribution, such as Debian, is better suited for the sampling ofprojects because it avoids biases and contains unique information only available in an integrated environment. Especially research on the reuse of components can build on dependency information inherent in the Debian GNU/Linux packaging system. This paper therefore contributes to the practice of sampling methods in OSS research and provides empirical data on reuse dependencies in Debian.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Since the beginning of 3D computer vision problems, the use of techniques to reduce the data to make it treatable preserving the important aspects of the scene has been necessary. Currently, with the new low-cost RGB-D sensors, which provide a stream of color and 3D data of approximately 30 frames per second, this is getting more relevance. Many applications make use of these sensors and need a preprocessing to downsample the data in order to either reduce the processing time or improve the data (e.g., reducing noise or enhancing the important features). In this paper, we present a comparison of different downsampling techniques which are based on different principles. Concretely, five different downsampling methods are included: a bilinear-based method, a normal-based, a color-based, a combination of the normal and color-based samplings, and a growing neural gas (GNG)-based approach. For the comparison, two different models have been used acquired with the Blensor software. Moreover, to evaluate the effect of the downsampling in a real application, a 3D non-rigid registration is performed with the data sampled. From the experimentation we can conclude that depending on the purpose of the application some kernels of the sampling methods can improve drastically the results. Bilinear- and GNG-based methods provide homogeneous point clouds, but color-based and normal-based provide datasets with higher density of points in areas with specific features. In the non-rigid application, if a color-based sampled point cloud is used, it is possible to properly register two datasets for cases where intensity data are relevant in the model and outperform the results if only a homogeneous sampling is used.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The use of quantitative methods has become increasingly important in the study of neurodegenerative disease. Disorders such as Alzheimer's disease (AD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This article reviews the advantages and limitations of the different methods of quantifying the abundance of pathological lesions in histological sections, including estimates of density, frequency, coverage, and the use of semiquantitative scores. The major sampling methods by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are also described. In addition, the data analysis methods commonly used to analyse quantitative data in neuropathology, including analyses of variance (ANOVA) and principal components analysis (PCA), are discussed. These methods are illustrated with reference to particular problems in the pathological diagnosis of AD and dementia with Lewy bodies (DLB).