988 resultados para Rejection-sampling Algorithm
Resumo:
The Inter-American Tropical Tuna Commission (IATTC) staff has been sampling the size distributions of tunas in the eastern Pacific Ocean (EPO) since 1954, and the species composition of the catches since 2000. The IATTC staff use the data from the species composition samples, in conjunction with observer and/or logbook data, and unloading data from the canneries to estimate the total annual catches of yellowfin (Thunnus albacares), skipjack (Katsuwonus pelamis), and bigeye (Thunnus obesus) tunas. These sample data are collected based on a stratified sampling design. I propose an update of the stratification of the EPO into more homogenous areas in order to reduce the variance in the estimates of the total annual catches and incorporate the geographical shifts resulting from the expansion of the floating-object fishery during the 1990s. The sampling model used by the IATTC is a stratified two-stage (cluster) random sampling design with first stage units varying (unequal) in size. The strata are month, area, and set type. Wells, the first cluster stage, are selected to be sampled only if all of the fish were caught in the same month, same area, and same set type. Fish, the second cluster stage, are sampled for lengths, and independently, for species composition of the catch. The EPO is divided into 13 sampling areas, which were defined in 1968, based on the catch distributions of yellowfin and skipjack tunas. This area stratification does not reflect the multi-species, multi-set-type fishery of today. In order to define more homogenous areas, I used agglomerative cluster analysis to look for groupings of the size data and the catch and effort data for 2000–2006. I plotted the results from both datasets against the IATTC Sampling Areas, and then created new areas. I also used the results of the cluster analysis to update the substitution scheme for strata with catch, but no sample. I then calculated the total annual catch (and variance) by species by stratifying the data into new Proposed Sampling Areas and compared the results to those reported by the IATTC. Results showed that re-stratifying the areas produced smaller variances of the catch estimates for some species in some years, but the results were not significant.
Resumo:
The measurement of high speed laser beam parameters during processing is a topic that has seen growing attention over the last few years as quality assurance places greater demand on the monitoring of the manufacturing process. The targets for any monitoring system is to be non-intrusive, low cost, simple to operate, high speed and capable of operation in process. A new ISO compliant system is presented based on the integration of an imaging plate and camera located behind a proprietary mirror sampling device. The general layout of the device is presented along with the thermal and optical performance of the sampling optic. Diagnostic performance of the system is compared with industry standard devices, demonstrating the high quality high speed data which has been generated using this system.
Resumo:
This document aims to describe an update of the implementation of the J48Consolidated class within WEKA platform. The J48Consolidated class implements the CTC algorithm [2][3] which builds a unique decision tree based on a set of samples. The J48Consolidated class extends WEKA’s J48 class which implements the well-known C4.5 algorithm. This implementation was described in the technical report "J48Consolidated: An implementation of CTC algorithm for WEKA". The main, but not only, change in this update is the integration of the notion of coverage in order to determine the number of samples to be generated to build a consolidated tree. We define coverage as the percentage of examples of the training sample present in –or covered by– the set of generated subsamples. So, depending on the type of samples that we use, we will need more or less samples in order to achieve a specific value of coverage.
Resumo:
The CTC algorithm, Consolidated Tree Construction algorithm, is a machine learning paradigm that was designed to solve a class imbalance problem, a fraud detection problem in the area of car insurance [1] where, besides, an explanation about the classification made was required. The algorithm is based on a decision tree construction algorithm, in this case the well-known C4.5, but it extracts knowledge from data using a set of samples instead of a single one as C4.5 does. In contrast to other methodologies based on several samples to build a classifier, such as bagging, the CTC builds a single tree and as a consequence, it obtains comprehensible classifiers. The main motivation of this implementation is to make public and available an implementation of the CTC algorithm. With this purpose we have implemented the algorithm within the well-known WEKA data mining environment http://www.cs.waikato.ac.nz/ml/weka/). WEKA is an open source project that contains a collection of machine learning algorithms written in Java for data mining tasks. J48 is the implementation of C4.5 algorithm within the WEKA package. We called J48Consolidated to the implementation of CTC algorithm based on the J48 Java class.
Resumo:
The Wyre estuary is sampled for water quality four times a year. The sampling locations are shown in Figure 1, and their descriptions are found in Appendix 1.