50 resultados para Sampling

em Deakin Research Online - Australia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Researchers typically tackle questions by constructing powerful, highlyreplicated sampling protocols or experimental designs. Such approaches often demand large samples sizes and are usually only conducted on a once-off basis. In contrast, many industries need to continually monitor phenomena such as equipment reliability, water quality, or the abundance of a pest. In such instances, costs and time inherent in sampling preclude the use of highlyintensive methods. Ideally, one wants to collect the absolute minimum number of samples needed to make an appropriate decision. Sequential sampling, wherein the sample size is a function of the results of the sampling process itself, offers a practicable solution. But smaller sample sizes equate to less knowledge about the population, and thus an increased risk of making an incorrect management decision. There are various statistical techniques to account for and measure risk in sequential sampling plans. We illustrate these methods and assess them using examples relating to the management of arthropod pests in commercial crops, but they can be applied to any situation where sequential sampling is used.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a sampling procedure called selected ranked set sampling (SRSS), in which only selected observations from a ranked set sample (RSS) are measured. This paper describes the optimal linear estimation of location and scale parameters based on SRSS, and for some distributions it presents the required tables for optimal selections. For these distributions, the optimal SRSS estimators are compared with the other popular simple random sample (SRS) and RSS estimators. In every situation the estimators based on SRSS are found advantageous at least in some respect, compared to those obtained from SRS or RSS. The SRSS method with errors in ranking is also described. The relative precision of the estimator of the population mean is investigated for different degrees of correlations between the actual and erroneous ranking. The paper reports the minimum value of the correlation coefficient between the actual and the erroneous ranking required for achieving better precision with respect to the usual SRS estimator and with respect to the RSS estimator.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sampling is fundamental to the credibility of any empirical investigation, and this is no different for the area of Gay, Lesbian and Bisexual (GLB) research. In fact, it would not be an overstatement to say that publication of GLB research into high quality and mainstream journals has been limited in part because of sampling issues. The purpose of this paper is to raise awareness of the unique sampling problems posed by GLB resaerch and to offer solutions to these through the use of web surveys and recruitment strategies. In particular, we provide data which show that, contrary to voiced concerns, when employed with a rigorous recruitment strategy, web surveys increase rather than reduce sampling coverage. Further, we provide evidence that the web survey technique can yield data of comparable quality to that obtained with a hard-copy survey. The paper concludes with strategies researchers can adopt to overcome barriers in obtaining a diverse GLB sample.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: To highlight the importance of sampling and data collection  processes in qualitative interview studies, and to discuss the contribution of  these processes to determining the strength of the evidence generated and  thereby to decisions for public health practice and policy.

Approach:
This discussion is informed by a hierarchy-of-evidence-for-practice  model. The paper provides succinct guidelines for key sampling and data  collection considerations in qualitative research involving interview studies. The  importance of allowing time for immersion in a given community to become  familiar with the context and population is discussed, as well as the practical  constraints that sometimes operate against this stage. The role of theory in  guiding sample selection is discussed both in terms of identifying likely sources  of rich data and in understanding the issues emerging from the data. It is noted  that sampling further assists in confirming the developing evidence and also  illuminates data that does not seem to fit. The importance of reporting sampling  and data collection processes is highlighted clearly to enable others to assess  both the strength of the evidence and the broader applications of the findings.

Conclusion:
Sampling and data collection processes are critical to determining  the quality of a study and the generalisability of the findings. We argue that  these processes should operate within the parameters of the research goal, be  guided by emerging theoretical considerations, cover a range of relevant   participant perspectives, and be clearly outlined in research reports with an  explanation of any research limitations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Empirical investigations regarding ratio-based modelling of corporate collapse have been on going for decades. With any study of an empirical nature, a data sample is a necessary prerequisite. It allows testing the performance of the prediction model, thereby establishing its practical relevance. However, it is necessary to first ensure that the data sample used satisfies certain conditions, and these have lead to some choice controversies. This paper considers the controversial issues that arise in data sampling, provides a critical evaluation of these issues, and makes choice recommendations on the controversial aspects, by empirically examining the literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Helicoverpa spp. is the primary pest in the Australian fresh-market tomato industry. We describe the spatial distribution of Helicoverpa spp. eggs on fresh-market tomato crops in the Goulburn Valley region of Victoria, and present a sequential sampling plan for monitoring population densities. The distribution of Helicoverpa spp. eggs was highly contagious, as indicated by a Taylor's b-value of 1.59. This high level of contagion meant that relatively large sample sizes would need to be collected to obtain an estimate of population density. High-precision sampling plans generally necessitated impractical sample sizes, and thus the plan we present is a relatively low-precision level plan (SE/mean = 0.3). Nonetheless, this level of precision is considered adequate for most agronomic scenarios. The plan was validated using a statistical re-sampling approach. The level of precision achieved was generally close to the nominal level. Likewise, the number of samples collected generally showed little departure from the theoretically calculated minimum sample size.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Powerful Owl (Ninox strenua) is endemic to Australia, being resident in the three eastern mainland states and the Australian Capital Territory. It is classified nationally as of conservation significance and vulnerable in the state of Victoria. The elusive nature of this owl, along with its dispersed distribution, low population density and difficulty in identifying individual birds, limit the collection of ecological data. Molecular methods can be used to obtain crucial ecological information, essential for Powerful Owl conservation.

Non-invasive sampling is a relatively new method used for obtaining genetic material from free-ranging animals. This type of sampling however, is generally overlooked as a potential DNA source. Shed hair and feathers, faeces, urine, skins and eggshells are all potential sources of DNA. Non-invasive sampling regimes may be the only alternative for the genetic analysis of endangered and/or elusive species that are difficult to sample otherwise.

Powerful Owls moult annually. Shed feathers therefore, can be collected from under roost trees and used for genetic analysis. Feathers collected provide DNA that is unique to the individual and can provide additional ecological knowledge of the species.

In this study we collected shed Powerful Owl feathers during 2003 and 2004. In order to obtain samples from across the owl's large distribution, public awareness about the project via the way of flyers, mail-outs, media sources (radio, newspapers and magazines), email lists and public seminars was initiated. Overall, the collection strategy was very successful with over 500 Powerful Owl feather samples being collected.

Genetic information obtained from the analysis of DNA from feathers can enable a more rigorous assessment of population viability of the Powerful Owl. Specifically designed molecular markers will facilitate unequivocal identification of individual birds ("DNA fingerprinting"). Through the application of molecular techniques we can collect ecological information about the Powerful Owl such as, genetic divergence, population structure, dispersal patterns, migration and inbreeding. These questions can not be addressed via traditional data collection and will contribute significantly to the successful conservation of the Powerful Owl and potentially other raptor species.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Teachers in many introductory statistics courses demonstrate the Central Limit Theorem by using a computer to draw a large number of random samples of size n from a population distribution and plot the resulting empirical sampling distribution of the sample mean. There are
many computer applications that can be used for this (see, for example, the Rice Virtual Lab in Statistics: http://www.ruf.rice.edu/~lane/rvls.html). The effectiveness of such demonstrations has been questioned (see delMas et al (1999))) but in the work presented in this paper we do not rely on sampling distributions to convey or teach statistical concepts; only that the sampling distribution is independent of the distribution of the population, provided the sample size is sufficiently large.

We describe a lesson that starts out with a demonstration of the CTL, but sample from a (finite) population where actual census data is provided; doing this may help students more easily relate to the concepts – they can see the original data as a column of numbers and if the samples are shown they can also see random samples being taken. We continue with this theme of sampling from census data to teach the basic ideas of inference. We end up with standard resampling/bootstrap procedures.

We also demonstrate how Excel can provide a tool for developing a learning objects to support the program; a workbook called Sampling.xls is available from www.deakin.edu.au/~rodneyc/PS > Sampling.xls.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A non-destructive method for collecting samples for DNA analysis from the mucus of molluscs was successfully adapted for use with the genus Ischnochiton. DNA was extracted using a Chelex-based method and the COI subunit of the mtDNA was amplified and sequenced. Sequences from the mucus were crosschecked against sequences from the foot tissue of the same animal and were found to be identical. This method provides a non-destructive way of carrying out larger studies of the genetics of rare organisms and may be of general use for genetic-based field studies of molluscs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the fundamental machine learning tasks is that of predictive classification. Given that organisations collect an ever increasing amount of data, predictive classification methods must be able to effectively and efficiently handle large amounts of data. However, it is understood that present requirements push existing algorithms to, and sometimes beyond, their limits since many classification prediction algorithms were designed when currently common data set sizes were beyond imagination. This has led to a significant amount of research into ways of making classification learning algorithms more effective and efficient. Although substantial progress has been made, a number of key questions have not been answered. This dissertation investigates two of these key questions. The first is whether different types of algorithms to those currently employed are required when using large data sets. This is answered by analysis of the way in which the bias plus variance decomposition of predictive classification error changes as training set size is increased. Experiments find that larger training sets require different types of algorithms to those currently used. Some insight into the characteristics of suitable algorithms is provided, and this may provide some direction for the development of future classification prediction algorithms which are specifically designed for use with large data sets. The second question investigated is that of the role of sampling in machine learning with large data sets. Sampling has long been used as a means of avoiding the need to scale up algorithms to suit the size of the data set by scaling down the size of the data sets to suit the algorithm. However, the costs of performing sampling have not been widely explored. Two popular sampling methods are compared with learning from all available data in terms of predictive accuracy, model complexity, and execution time. The comparison shows that sub-sampling generally products models with accuracy close to, and sometimes greater than, that obtainable from learning with all available data. This result suggests that it may be possible to develop algorithms that take advantage of the sub-sampling methodology to reduce the time required to infer a model while sacrificing little if any accuracy. Methods of improving effective and efficient learning via sampling are also investigated, and now sampling methodologies proposed. These methodologies include using a varying-proportion of instances to determine the next inference step and using a statistical calculation at each inference step to determine sufficient sample size. Experiments show that using a statistical calculation of sample size can not only substantially reduce execution time but can do so with only a small loss, and occasional gain, in accuracy. One of the common uses of sampling is in the construction of learning curves. Learning curves are often used to attempt to determine the optimal training size which will maximally reduce execution time while nut being detrimental to accuracy. An analysis of the performance of methods for detection of convergence of learning curves is performed, with the focus of the analysis on methods that calculate the gradient, of the tangent to the curve. Given that such methods can be susceptible to local accuracy plateaus, an investigation into the frequency of local plateaus is also performed. It is shown that local accuracy plateaus are a common occurrence, and that ensuring a small loss of accuracy often results in greater computational cost than learning from all available data. These results cast doubt over the applicability of gradient of tangent methods for detecting convergence, and of the viability of learning curves for reducing execution time in general.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The adaptive cluster sampling (ACS) procedure is difficult to apply if some of the networks appearing in the sample are large. To deal with such large networks, a two-stage adaptive cluster sampling (TACS) procedure and an adjusted two-stage adaptive cluster sampling (ATACS) procedure are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ranked set sampling is a desirable sampling technique where the ranking of the sample observations does not require the actual measurement. This thesis develops new ranked set sampling techniques and determines the sampling characteristics required to best estimate the required population parameters, such as mean and variance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background
Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset.

Results
One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system.

Conclusion
The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.