999 resultados para QC sets max


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Big data is big news in almost every sector including crisis communication. However, not everyone has access to big data and even if we have access to big data, we often do not have necessary tools to analyze and cross reference such a large data set. Therefore this paper looks at patterns in small data sets that we have ability to collect with our current tools to understand if we can find actionable information from what we already have. We have analyzed 164390 tweets collected during 2011 earthquake to find out what type of location specific information people mention in their tweet and when do they talk about that. Based on our analysis we find that even a small data set that has far less data than a big data set can be useful to find priority disaster specific areas quickly.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trees are capable of portraying the semi-structured data which is common in web domain. Finding similarities between trees is mandatory for several applications that deal with semi-structured data. Existing similarity methods examine a pair of trees by comparing through nodes and paths of two trees, and find the similarity between them. However, these methods provide unfavorable results for unordered tree data and result in yielding NP-hard or MAX-SNP hard complexity. In this paper, we present a novel method that encodes a tree with an optimal traversing approach first, and then, utilizes it to model the tree with its equivalent matrix representation for finding similarity between unordered trees efficiently. Empirical analysis shows that the proposed method is able to achieve high accuracy even on the large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present large, accurately calibrated and time-synchronized data sets, gathered outdoors in controlled and variable environmental conditions, using an unmanned ground vehicle (UGV), equipped with a wide variety of sensors. These include four 2D laser scanners, a radar scanner, a color camera and an infrared camera. It provides a full description of the system used for data collection and the types of environments and conditions in which these data sets have been gathered, which include the presence of airborne dust, smoke and rain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional nearest points methods use all the samples in an image set to construct a single convex or affine hull model for classification. However, strong artificial features and noisy data may be generated from combinations of training samples when significant intra-class variations and/or noise occur in the image set. Existing multi-model approaches extract local models by clustering each image set individually only once, with fixed clusters used for matching with various image sets. This may not be optimal for discrimination, as undesirable environmental conditions (eg. illumination and pose variations) may result in the two closest clusters representing different characteristics of an object (eg. frontal face being compared to non-frontal face). To address the above problem, we propose a novel approach to enhance nearest points based methods by integrating affine/convex hull classification with an adapted multi-model approach. We first extract multiple local convex hulls from a query image set via maximum margin clustering to diminish the artificial variations and constrain the noise in local convex hulls. We then propose adaptive reference clustering (ARC) to constrain the clustering of each gallery image set by forcing the clusters to have resemblance to the clusters in the query image set. By applying ARC, noisy clusters in the query set can be discarded. Experiments on Honda, MoBo and ETH-80 datasets show that the proposed method outperforms single model approaches and other recent techniques, such as Sparse Approximated Nearest Points, Mutual Subspace Method and Manifold Discriminant Analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how. We analyse regret w.r.t. each individual expert as a multi-objective criterion in the simple but fundamental case of absolute loss. We characterise the achievable and Pareto optimal trade-offs, and the corresponding optimal strategies for each sample size both exactly for each finite horizon and asymptotically.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analytically or computationally intractable likelihood functions can arise in complex statistical inferential problems making them inaccessible to standard Bayesian inferential methods. Approximate Bayesian computation (ABC) methods address such inferential problems by replacing direct likelihood evaluations with repeated sampling from the model. ABC methods have been predominantly applied to parameter estimation problems and less to model choice problems due to the added difficulty of handling multiple model spaces. The ABC algorithm proposed here addresses model choice problems by extending Fearnhead and Prangle (2012, Journal of the Royal Statistical Society, Series B 74, 1–28) where the posterior mean of the model parameters estimated through regression formed the summary statistics used in the discrepancy measure. An additional stepwise multinomial logistic regression is performed on the model indicator variable in the regression step and the estimated model probabilities are incorporated into the set of summary statistics for model choice purposes. A reversible jump Markov chain Monte Carlo step is also included in the algorithm to increase model diversity for thorough exploration of the model space. This algorithm was applied to a validating example to demonstrate the robustness of the algorithm across a wide range of true model probabilities. Its subsequent use in three pathogen transmission examples of varying complexity illustrates the utility of the algorithm in inferring preference of particular transmission models for the pathogens.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the objectives of this study was to evaluate soil testing equipment based on its capability of measuring in-place stiffness or modulus values. As design criteria transition from empirical to mechanistic-empirical, soil test methods and equipment that measure properties such as stiffness and modulus and how they relate to Florida materials are needed. Requirements for the selected equipment are that they be portable, cost effective, reliable, a ccurate, and repeatable. A second objective is that the selected equipment measures soil properties without the use of nuclear materials.The current device used to measure soil compaction is the nuclear density gauge (NDG). Equipment evaluated in this research included lightweight deflectometers (LWD) from different manufacturers, a dynamic cone penetrometer (DCP), a GeoGauge, a Clegg impact soil tester (CIST), a Briaud compaction device (BCD), and a seismic pavement analyzer (SPA). Evaluations were conducted over ranges of measured densities and moistures.Testing (Phases I and II) was conducted in a test box and test pits. Phase III testing was conducted on materials found on five construction projects located in the Jacksonville, Florida, area. Phase I analyses determined that the GeoGauge had the lowest overall coefficient of variance (COV). In ascending order of COV were the accelerometer-type LWD, the geophone-type LWD, the DCP, the BCD, and the SPA which had the highest overall COV. As a result, the BCD and the SPA were excluded from Phase II testing.In Phase II, measurements obtained from the selected equipment were compared to the modulus values obtained by the static plate load test (PLT), the resilient modulus (MR) from laboratory testing, and the NDG measurements. To minimize soil and moisture content variability, the single spot testing sequence was developed. At each location, test results obtained from the portable equipment under evaluation were compared to the values from adjacent NDG, PLT, and laboratory MR measurements. Correlations were developed through statistical analysis. Target values were developed for various soils for verification on similar soils that were field tested in Phase III. The single spot testing sequence also was employed in Phase III, field testing performed on A-3 and A-2-4 embankments, limerock-stabilized subgrade, limerock base, and graded aggregate base found on Florida Department of Transportation construction projects. The Phase II and Phase III results provided potential trend information for future research—specifically, data collection for in-depth statistical analysis for correlations with the laboratory MR for specific soil types under specific moisture conditions. With the collection of enough data, stronger relationships could be expected between measurements from the portable equipment and the MR values. Based on the statistical analyses and the experience gained from extensive use of the equipment, the combination of the DCP and the LWD was selected for in-place soil testing for compaction control acceptance. Test methods and developmental specifications were written for the DCP and the LWD. The developmental specifications include target values for the compaction control of embankment, subgrade, and base materials.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduction Vascular access devices (VADs), such as peripheral or central venous catheters, are vital across all medical and surgical specialties. To allow therapy or haemodynamic monitoring, VADs frequently require administration sets (AS) composed of infusion tubing, fluid containers, pressure-monitoring transducers and/or burettes. While VADs are replaced only when necessary, AS are routinely replaced every 3–4 days in the belief that this reduces infectious complications. Strong evidence supports AS use up to 4 days, but there is less evidence for AS use beyond 4 days. AS replacement twice weekly increases hospital costs and workload. Methods and analysis This is a pragmatic, multicentre, randomised controlled trial (RCT) of equivalence design comparing AS replacement at 4 (control) versus 7 (experimental) days. Randomisation is stratified by site and device, centrally allocated and concealed until enrolment. 6554 adult/paediatric patients with a central venous catheter, peripherally inserted central catheter or peripheral arterial catheter will be enrolled over 4 years. The primary outcome is VAD-related bloodstream infection (BSI) and secondary outcomes are VAD colonisation, AS colonisation, all-cause BSI, all-cause mortality, number of AS per patient, VAD time in situ and costs. Relative incidence rates of VAD-BSI per 100 devices and hazard rates per 1000 device days (95% CIs) will summarise the impact of 7-day relative to 4-day AS use and test equivalence. Kaplan-Meier survival curves (with log rank Mantel-Cox test) will compare VAD-BSI over time. Appropriate parametric or non-parametric techniques will be used to compare secondary end points. p Values of <0.05 will be considered significant.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rapid advances in sequencing technologies (Next Generation Sequencing or NGS) have led to a vast increase in the quantity of bioinformatics data available, with this increasing scale presenting enormous challenges to researchers seeking to identify complex interactions. This paper is concerned with the domain of transcriptional regulation, and the use of visualisation to identify relationships between specific regulatory proteins (the transcription factors or TFs) and their associated target genes (TGs). We present preliminary work from an ongoing study which aims to determine the effectiveness of different visual representations and large scale displays in supporting discovery. Following an iterative process of implementation and evaluation, representations were tested by potential users in the bioinformatics domain to determine their efficacy, and to understand better the range of ad hoc practices among bioinformatics literate users. Results from two rounds of small scale user studies are considered with initial findings suggesting that bioinformaticians require richly detailed views of TF data, features to compare TF layouts between organisms quickly, and ways to keep track of interesting data points.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Synopsis and critique of Australian film in animation, comedy, and drama genres.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A fuzzy waste-load allocation model, FWLAM, is developed for water quality management of a river system using fuzzy multiple-objective optimization. An important feature of this model is its capability to incorporate the aspirations and conflicting objectives of the pollution control agency and dischargers. The vagueness associated with specifying the water quality criteria and fraction removal levels is modeled in a fuzzy framework. The goals related to the pollution control agency and dischargers are expressed as fuzzy sets. The membership functions of these fuzzy sets are considered to represent the variation of satisfaction levels of the pollution control agency and dischargers in attaining their respective goals. Two formulations—namely, the MAX-MIN and MAX-BIAS formulations—are proposed for FWLAM. The MAX-MIN formulation maximizes the minimum satisfaction level in the system. The MAX-BIAS formulation maximizes a bias measure, giving a solution that favors the dischargers. Maximization of the bias measure attempts to keep the satisfaction levels of the dischargers away from the minimum satisfaction level and that of the pollution control agency close to the minimum satisfaction level. Most of the conventional water quality management models use waste treatment cost curves that are uncertain and nonlinear. Unlike such models, FWLAM avoids the use of cost curves. Further, the model provides the flexibility for the pollution control agency and dischargers to specify their aspirations independently.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sampling strategies are developed based on the idea of ranked set sampling (RSS) to increase efficiency and therefore to reduce the cost of sampling in fishery research. The RSS incorporates information on concomitant variables that are correlated with the variable of interest in the selection of samples. For example, estimating a monitoring survey abundance index would be more efficient if the sampling sites were selected based on the information from previous surveys or catch rates of the fishery. We use two practical fishery examples to demonstrate the approach: site selection for a fishery-independent monitoring survey in the Australian northern prawn fishery (NPF) and fish age prediction by simple linear regression modelling a short-lived tropical clupeoid. The relative efficiencies of the new designs were derived analytically and compared with the traditional simple random sampling (SRS). Optimal sampling schemes were measured by different optimality criteria. For the NPF monitoring survey, the efficiency in terms of variance or mean squared errors of the estimated mean abundance index ranged from 114 to 199% compared with the SRS. In the case of a fish ageing study for Tenualosa ilisha in Bangladesh, the efficiency of age prediction from fish body weight reached 140%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Plotless density estimators are those that are based on distance measures rather than counts per unit area (quadrats or plots) to estimate the density of some usually stationary event, e.g. burrow openings, damage to plant stems, etc. These estimators typically use distance measures between events and from random points to events to derive an estimate of density. The error and bias of these estimators for the various spatial patterns found in nature have been examined using simulated populations only. In this study we investigated eight plotless density estimators to determine which were robust across a wide range of data sets from fully mapped field sites. They covered a wide range of situations including animal damage to rice and corn, nest locations, active rodent burrows and distribution of plants. Monte Carlo simulations were applied to sample the data sets, and in all cases the error of the estimate (measured as relative root mean square error) was reduced with increasing sample size. The method of calculation and ease of use in the field were also used to judge the usefulness of the estimator. Estimators were evaluated in their original published forms, although the variable area transect (VAT) and ordered distance methods have been the subjects of optimization studies. Results: An estimator that was a compound of three basic distance estimators was found to be robust across all spatial patterns for sample sizes of 25 or greater. The same field methodology can be used either with the basic distance formula or the formula used with the Kendall-Moran estimator in which case a reduction in error may be gained for sample sizes less than 25, however, there is no improvement for larger sample sizes. The variable area transect (VAT) method performed moderately well, is easy to use in the field, and its calculations easy to undertake. Conclusion: Plotless density estimators can provide an estimate of density in situations where it would not be practical to layout a plot or quadrat and can in many cases reduce the workload in the field.