958 resultados para Data sets storage


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Research on assessment and monitoring methods has primarily focused on fisheries with long multivariate data sets. Less research exists on methods applicable to data-poor fisheries with univariate data sets with a small sample size. In this study, we examine the capabilities of seasonal autoregressive integrated moving average (SARIMA) models to fit, forecast, and monitor the landings of such data-poor fisheries. We use a European fishery on meagre (Sciaenidae: Argyrosomus regius), where only a short time series of landings was available to model (n=60 months), as our case-study. We show that despite the limited sample size, a SARIMA model could be found that adequately fitted and forecasted the time series of meagre landings (12-month forecasts; mean error: 3.5 tons (t); annual absolute percentage error: 15.4%). We derive model-based prediction intervals and show how they can be used to detect problematic situations in the fishery. Our results indicate that over the course of one year the meagre landings remained within the prediction limits of the model and therefore indicated no need for urgent management intervention. We discuss the information that SARIMA model structure conveys on the meagre lifecycle and fishery, the methodological requirements of SARIMA forecasting of data-poor fisheries landings, and the capabilities SARIMA models present within current efforts to monitor the world’s data-poorest resources.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the problem of one-class classification (OCC) one of the classes, the target class, has to be distinguished from all other possible objects, considered as nontargets. In many biomedical problems this situation arises, for example, in diagnosis, image based tumor recognition or analysis of electrocardiogram data. In this paper an approach to OCC based on a typicality test is experimentally compared with reference state-of-the-art OCC techniques-Gaussian, mixture of Gaussians, naive Parzen, Parzen, and support vector data description-using biomedical data sets. We evaluate the ability of the procedures using twelve experimental data sets with not necessarily continuous data. As there are few benchmark data sets for one-class classification, all data sets considered in the evaluation have multiple classes. Each class in turn is considered as the target class and the units in the other classes are considered as new units to be classified. The results of the comparison show the good performance of the typicality approach, which is available for high dimensional data; it is worth mentioning that it can be used for any kind of data (continuous, discrete, or nominal), whereas state-of-the-art approaches application is not straightforward when nominal variables are present.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Freehand 3D ultrasound can be acquired without a position sensor by finding the separations of pairs of frames using information in the images themselves. Previous work has not considered how to reconstruct entirely freehand data, which can exhibit irregularly spaced frames, non-monotonic out-of-plane probe motion and significant inplane motion. This paper presents reconstruction methods that overcome these limitations and are able to robustly reconstruct freehand data. The methods are assessed on freehand data sets and compared to reconstructions obtained using a position sensor.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Harmful algal blooms (HABs) are a significant and potentially expanding problem around the world. Resource management and public health protection require sufficient information to reduce the impacts of HABs by response strategies and through warnings and advisories. To be effective, these programs can best be served by an integration of improved detection methods with both evolving monitoring systems and new communications capabilities. Data sets are typically collected from a variety of sources, these can be considered as several types: point data, such as water samples; transects, such as from shipboard continuous sampling; and synoptic, such as from satellite imagery. Generation of a field of the HAB distribution requires all of these sampling approaches. This means that the data sets need to be interpreted and analyzed with each other to create the field or distribution of the HAB. The HAB field is also a necessary input into models that forecast blooms. Several systems have developed strategies that demonstrate these approaches. These range from data sets collected at key sites, such as swimming beaches, to automated collection systems, to integration of interpreted satellite data. Improved data collection, particularly in speed and cost, will be one of the advances of the next few years. Methods to improve creation of the HAB field from the variety of data types will be necessary for routine nowcasting and forecasting of HABs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the face of dramatic declines in groundfish populations and a lack of sufficient stock assessment information, a need has arisen for new methods of assessing groundfish populations. We describe the integration of seafloor transect data gathered by a manned submersible with high-resolution sonar imagery to produce a habitat-based stock assessment system for groundfish. The data sets used in this study were collected from Heceta Bank, Oregon, and were derived from 42 submersible dives (1988–90) and a multibeam sonar survey (1998). The submersible habitat survey investigated seafloor topography and groundfish abundance along 30-minute transects over six predetermined stations and found a statistical relationship between habitat variability and groundfish distribution and abundance. These transects were analyzed in a geographic information system (GIS) by using dynamic segmentation to display changes in habitat along the transects. We used the submersible data to extrapolate fish abundance within uniform habitat patches over broad areas of the bank by means of a habitat classification based on the sonar imagery. After applying a navigation correction to the submersible-based habitat segments, a good correlation with major boundaries on the backscatter and topographic boundaries on the imagery were apparent. Extrapolation of the extent of uniform habitats was made in the vicinity of the dive stations and a preliminary stock assessment of several species of demersal fish was calculated. Such a habitat-based approach will allow researchers to characterize marine communities over large areas of the seafloor.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ninety-six bigeye tuna (88– 134 cm fork length) were caught and released with implanted archival (electronic data storage) tags near fish-aggregating devices (FADs) in the equatorial eastern Pacific Ocean (EPO) during April 2000. Twenty-nine fish were recaptured, and the data from twenty-seven tags were successfully downloaded and processed. Time at liberty ranged from 8 to 446 days, and data for 23 fish at liberty for 30 days or more are presented. The accuracy in geolocation estimates, derived from the light level data, is about 2 degrees in latitude and 0.5 degrees in longitude in this region. The movement paths derived from the filtered geolocation estimates indicated that none of the fish traveled west of 110°W during the period between release and recapture. The null hypothesis that the movement path is random was rejected in 17 of the 22 statistical tests of the observed movement paths. The estimated mean velocity was 117 km/d. The fish exhibited occasional deep-diving behavior, and some dives exceeded 1000 m where temperatures were less than 3°C. Evaluations of timed depth records, resulted in the discrimination of three distinct behaviors: 54.3% of all days were classified as unassociated (with a floating object) type-1 behavior, 27.7% as unassociated type-2 behavior, and 18.7% as behavior associated with a floating object. The mean residence time at floating objects was 3.1 d. Data sets separated into day and night were used to evaluate diel differences in behavior and habitat selection. When the fish were exhibiting unassociated type-1 behavior (diel vertical migrations), they were mostly at depths of less than 50 m (within the mixed layer) throughout the night, and during the day between 200 and 300 m and 13° and 14°C. They shifted their average depths in conjunction with dawn and dusk events, presumably tracking the deep-scattering layer as a foraging strategy. There were also observed changes in the average nighttime depth distributions of the fish in relation to moon phase.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

I simulated somatic growth and accompanying otolith growth using an individual-based bioenergetics model in order to examine the performance of several back-calculation methods. Four shapes of otolith radius-total length relations (OR-TL) were simulated. Ten different back-calculation equations, two different regression models of radius length, and two schemes of annulus selection were examined for a total of 20 different methods to estimate size at age from simulated data sets of length and annulus measurements. The accuracy of each of the twenty methods was evaluated by comparing the back-calculated length-at-age and the true length-at-age. The best back-calculation technique was directly related to how well the OR-TL model fitted. When the OR-TL was sigmoid shaped and all annuli were used, employing a least squares linear regression coupled with a log-transformed Lee back-calculation equation (y-intercept corrected) resulted in the least error; when only the last annulus was used, employing a direct proportionality back-calculation equation resulted in the least error. When the OR-TL was linear, employing a functional regression coupled with the Lee back-calculation equation resulted in the least error when all annuli were used, and also when only the last annulus was used. If the OR-TL was exponentially shaped, direct substitution into the fitted quadratic equation resulted in the least error when all annuli were used, and when only the last annulus was used. Finally, an asymptotically shaped OR-TL was best modeled by the individually corrected Weibull cumulative distribution function when all annuli were used, and when only the last annulus was used.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This is an interim report for a study of mussel recovery and species dynamics at four California rocky intertidal sites. Conducted by Kinnetic Laboratories, Inc. (KLI), and funded by the Minerals Management Service (MMS), the initial experimental field study began in spring 1985 and continued through spring 1991. The initial field study included six sites along the central and northern California coast. In 1992, MMS decided to continue the work started by KLI through an in-house study and establishment of the MMS Intertidal (MINT) team. Four of the original six sites have been continued by MMS. The study methods of the original study have been retained by the MINT team, and close coordination with the original KLI team continues. In 1994, the MMS Environmental Studies Program officially awarded a contract to the MINT team for this in-house study. This interim report presents the results from the fall 1992 sampling, the first year of sampling by the MINT team. The report presents a limited statistical analysis and visual comparison of the 1992 data. The next interim report will include data collected during fall 1994 and will present a broader statistical analysis of both the 1992 and 1994 data sets.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There is increasing evidence that many of the mitochondrial DNA (mtDNA) databases published in the fields of forensic science and molecular anthropology are flawed. An a posteriori phylogenetic analysis of the sequences could help to eliminate most of the errors and thus greatly improve data quality. However, previously published caveats and recommendations along these lines were not yet picked up by all researchers. Here we call for stringent quality control of mtDNA data by haplogroup-directed database comparisons. We take some problematic databases of East Asian mtDNAs, published in the Journal of Forensic Sciences and Forensic Science International, as examples to demonstrate the process of pinpointing obvious errors. Our results show that data sets are not only notoriously plagued by base shifts and artificial recombination but also by lab-specific phantom mutations, especially in the second hypervariable region (HVR-II). (C) 2003 Elsevier Ireland Ltd. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Decision tree classification algorithms have significant potential for land cover mapping problems and have not been tested in detail by the remote sensing community relative to more conventional pattern recognition techniques such as maximum likelihood classification. In this paper, we present several types of decision tree classification algorithms arid evaluate them on three different remote sensing data sets. The decision tree classification algorithms tested include an univariate decision tree, a multivariate decision tree, and a hybrid decision tree capable of including several different types of classification algorithms within a single decision tree structure. Classification accuracies produced by each of these decision tree algorithms are compared with both maximum likelihood and linear discriminant function classifiers. Results from this analysis show that the decision tree algorithms consistently outperform the maximum likelihood and linear discriminant function classifiers in regard to classf — cation accuracy. In particular, the hybrid tree consistently produced the highest classification accuracies for the data sets tested. More generally, the results from this work show that decision trees have several advantages for remote sensing applications by virtue of their relatively simple, explicit, and intuitive classification structure. Further, decision tree algorithms are strictly nonparametric and, therefore, make no assumptions regarding the distribution of input data, and are flexible and robust with respect to nonlinear and noisy relations among input features and class labels.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the paper through extensive study and design, the technical plan for establishing the exploration database center is made to combine imported and self developed techniques. By research and repeated experiment a modern database center has been set up with its hardware and network having advanced performance, its system well configured, its data store and management complete, and its data support being fast and direct. Through study on the theory, method and model of decision an exploration decision assistant schema is designed with one decision plan of well location decision support system being evaluated and put into action. 1. Study on the establishment of Shengli exploration database center Research is made on the hardware configuration of the database center including its workstations and all connected hardware and system. The hardware of the database center is formed by connecting workstations, microcomputer workstations, disk arrays, and those equipments used for seismic processing and interpretation. Research on the data store and management includes the analysis of the contents to be managed, data flow, data standard, data QC, data backup and restore policy, optimization of database system. A reasonable data management regulation and workflow is made and the scientific exploration data management system is created. Data load is done by working out a schedule firstly and at last 200 more projects of seismic surveys has been loaded amount to 25TB. 2. Exploration work support system and its application Seismic data processing system support has the following features, automatic extraction of seismic attributes, GIS navigation, data order, extraction of any sized data cube, pseudo huge capacity disk array, standard output exchange format etc. The prestack data can be accessed by the processing system or data can be transferred to other processing system through standard exchange format. For supporting seismic interpretation system the following features exist such as auto scan and store of interpretation result, internal data quality control etc. the interpretation system is connected directly with database center to get real time support of seismic data, formation data and well data. Comprehensive geological study support is done through intranet with the ability to query or display data graphically on the navigation system under some geological constraints. Production management support system is mainly used to collect, analyze and display production data with its core technology on the controlled data collection and creation of multiple standard forms. 3. exploration decision support system design By classification of workflow and data flow of all the exploration stages and study on decision theory and method, target of each decision step, decision model and requirement, three concept models has been formed for the Shengli exploration decision support system including the exploration distribution support system, the well location support system and production management support system. the well location decision support system has passed evaluation and been put into action. 4. Technical advance Hardware and software match with high performance for the database center. By combining parallel computer system, database server, huge capacity ATL, disk array, network and firewall together to create the first exploration database center in China with reasonable configuration, high performance and able to manage the whole data sets of exploration. Huge exploration data management technology is formed where exploration data standards and management regulations are made to guarantee data quality, safety and security. Multifunction query and support system for comprehensive exploration information support. It includes support system for geological study, seismic processing and interpretation and production management. In the system a lot of new database and computer technology have been used to provide real time information support for exploration work. Finally is the design of Shengli exploration decision support system. 5. Application and benefit Data storage has reached the amount of 25TB with thousand of users in Shengli oil field to access data to improve work efficiency multiple times. The technology has also been applied by many other units of SINOPEC. Its application of providing data to a project named Exploration achievements and Evaluation of Favorable Targets in Hekou Area shortened the data preparation period from 30 days to 2 days, enriching data abundance 15 percent and getting information support from the database center perfectly. Its application to provide former processed result for a project named Pre-stack depth migration in Guxi fracture zone reduced the amount of repeated process and shortened work period of one month and improved processing precision and quality, saving capital investment of data processing of 30 million yuan. It application by providing project database automatically in project named Geological and seismic study of southern slope zone of Dongying Sag shortened data preparation time so that researchers have more time to do research, thus to improve interpretation precision and quality.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Mark Pagel, Andrew Meade (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology, 53(4), 571-581. RAE2008

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The data streaming model provides an attractive framework for one-pass summarization of massive data sets at a single observation point. However, in an environment where multiple data streams arrive at a set of distributed observation points, sketches must be computed remotely and then must be aggregated through a hierarchy before queries may be conducted. As a result, many sketch-based methods for the single stream case do not apply directly, as either the error introduced becomes large, or because the methods assume that the streams are non-overlapping. These limitations hinder the application of these techniques to practical problems in network traffic monitoring and aggregation in sensor networks. To address this, we develop a general framework for evaluating and enabling robust computation of duplicate-sensitive aggregate functions (e.g., SUM and QUANTILE), over data produced by distributed sources. We instantiate our approach by augmenting the Count-Min and Quantile-Digest sketches to apply in this distributed setting, and analyze their performance. We conclude with experimental evaluation to validate our analysis.