Biblioteca Digital

826 resultados para Data Accuracy

Addressing accuracy and precision issues in iTRAQ quantitation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

iTRAQ (isobaric tags for relative or absolute quantitation) is a mass spectrometry technology that allows quantitative comparison of protein abundance by measuring peak intensities of reporter ions released from iTRAQ-tagged peptides by fragmentation during MS/MS. However, current data analysis techniques for iTRAQ struggle to report reliable relative protein abundance estimates and suffer with problems of precision and accuracy. The precision of the data is affected by variance heterogeneity: low signal data have higher relative variability; however, low abundance peptides dominate data sets. Accuracy is compromised as ratios are compressed toward 1, leading to underestimation of the ratio. This study investigated both issues and proposed a methodology that combines the peptide measurements to give a robust protein estimate even when the data for the protein are sparse or at low intensity. Our data indicated that ratio compression arises from contamination during precursor ion selection, which occurs at a consistent proportion within an experiment and thus results in a linear relationship between expected and observed ratios. We proposed that a correction factor can be calculated from spiked proteins at known ratios. Then we demonstrated that variance heterogeneity is present in iTRAQ data sets irrespective of the analytical packages, LC-MS/MS instrumentation, and iTRAQ labeling kit (4-plex or 8-plex) used. We proposed using an additive-multiplicative error model for peak intensities in MS/MS quantitation and demonstrated that a variance-stabilizing normalization is able to address the error structure and stabilize the variance across the entire intensity range. The resulting uniform variance structure simplifies the downstream analysis. Heterogeneity of variance consistent with an additive-multiplicative model has been reported in other MS-based quantitation including fields outside of proteomics; consequently the variance-stabilizing normalization methodology has the potential to increase the capabilities of MS in quantitation across diverse areas of biology and chemistry.

Predicting shot locations in tennis using spatiotemporal data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the past decade, vision-based tracking systems have been successfully deployed in professional sports such as tennis and cricket for enhanced broadcast visualizations as well as aiding umpiring decisions. Despite the high-level of accuracy of the tracking systems and the sheer volume of spatiotemporal data they generate, the use of this high quality data for quantitative player performance and prediction has been lacking. In this paper, we present a method which predicts the location of a future shot based on the spatiotemporal parameters of the incoming shots (i.e. shot speed, location, angle and feet location) from such a vision system. Having the ability to accurately predict future short-term events has enormous implications in the area of automatic sports broadcasting in addition to coaching and commentary domains. Using Hawk-Eye data from the 2012 Australian Open Men's draw, we utilize a Dynamic Bayesian Network to model player behaviors and use an online model adaptation method to match the player's behavior to enhance shot predictability. To show the utility of our approach, we analyze the shot predictability of the top 3 players seeds in the tournament (Djokovic, Federer and Nadal) as they played the most amounts of games.

A web-based approach to data imputation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques.

A comparison of methods to identify alcohol involvement in youth injury-related emergency department presentation data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims: To compare different methods for identifying alcohol involvement in injury-related emergency department presentation in Queensland youth, and to explore the alcohol terminology used in triage text. Methods: Emergency Department Information System data were provided for patients aged 12-24 years with an injury-related diagnosis code for a 5 year period 2006-2010 presenting to a Queensland emergency department (N=348895). Three approaches were used to estimate alcohol involvement: 1) analysis of coded data, 2) mining of triage text, and 3) estimation using an adaptation of alcohol attributable fractions (AAF). Cases were identified as ‘alcohol-involved’ by code and text, as well as AAF weighted. Results: Around 6.4% of these injury presentations overall had some documentation of alcohol involvement, with higher proportions of alcohol involvement documented for 18-24 year olds, females, indigenous youth, where presentations occurred on a Saturday or Sunday, and where presentations occurred between midnight and 5am. The most common alcohol terms identified for all subgroups were generic alcohol terms (eg. ETOH or alcohol) with almost half of the cases where alcohol involvement was documented having a generic alcohol term recorded in the triage text. Conclusions: Emergency department data is a useful source of information for identification of high risk sub-groups to target intervention opportunities, though it is not a reliable source of data for incidence or trend estimation in its current unstandardised form. Improving the accuracy and consistency of identification, documenting and coding of alcohol-involvement at the point of data capture in the emergency department is the most desirable long term approach to produce a more solid evidence base to support policy and practice in this field.

Predicting the likelihood of QA failure using treatment plan accuracy metrics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study used automated data processing techniques to calculate a set of novel treatment plan accuracy metrics, and investigate their usefulness as predictors of quality assurance (QA) success and failure. 151 beams from 23 prostate and cranial IMRT treatment plans were used in this study. These plans had been evaluated before treatment using measurements with a diode array system. The TADA software suite was adapted to allow automatic batch calculation of several proposed plan accuracy metrics, including mean field area, small-aperture, off-axis and closed-leaf factors. All of these results were compared the gamma pass rates from the QA measurements and correlations were investigated. The mean field area factor provided a threshold field size (5 cm2, equivalent to a 2.2 x 2.2 cm2 square field), below which all beams failed the QA tests. The small aperture score provided a useful predictor of plan failure, when averaged over all beams, despite being weakly correlated with gamma pass rates for individual beams. By contrast, the closed leaf and off-axis factors provided information about the geometric arrangement of the beam segments but were not useful for distinguishing between plans that passed and failed QA. This study has provided some simple tests for plan accuracy, which may help minimise time spent on QA assessments of treatments that are unlikely to pass.

Urban traffic state estimation: Fusing point and zone based data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Loop detectors are the oldest and widely used traffic data source. On urban arterials, they are mainly installed for signal control. Recently state of the art Bluetooth MAC Scanners (BMS) has significantly captured the interest of stakeholders for exploiting it for area wide traffic monitoring. Loop detectors provide flow- a fundamental traffic parameter; whereas BMS provides individual vehicle travel time between BMS stations. Hence, these two data sources complement each other, and if integrated should increase the accuracy and reliability of the traffic state estimation. This paper proposed a model that integrates loops and BMS data for seamless travel time and density estimation for urban signalised network. The proposed model is validated using both real and simulated data and the results indicate that the accuracy of the proposed model is over 90%.

Practical aspects of frequency-domain identification of dynamic models of marine structures from hydrodynamic data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The motion response of marine structures in waves can be studied using finite-dimensional linear-time-invariant approximating models. These models, obtained using system identification with data computed by hydrodynamic codes, find application in offshore training simulators, hardware-in-the-loop simulators for positioning control testing, and also in initial designs of wave-energy conversion devices. Different proposals have appeared in the literature to address the identification problem in both time and frequency domains, and recent work has highlighted the superiority of the frequency-domain methods. This paper summarises practical frequency-domain estimation algorithms that use constraints on model structure and parameters to refine the search of approximating parametric models. Practical issues associated with the identification are discussed, including the influence of radiation model accuracy in force-to-motion models, which are usually the ultimate modelling objective. The illustration examples in the paper are obtained using a freely available MATLAB toolbox developed by the authors, which implements the estimation algorithms described.

Design of a garment for data collection of toddler language and physical activity

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Design process phases of development, evaluation and implementation were used to create a garment to simultaneously collect reliable data of speech production and intensity of movement of toddlers (18-36 months). A series of prototypes were developed and evaluated that housed accelerometer-based motion sensors and a digital transmitter with microphone. The approved test garment was a top constructed from loop-faced fabric with interior pockets to house devices. Extended side panels allowed for sizing. In total, 56 toddlers (28 male; 28 female; 16-36 months of age) participated in the study providing pilot and baseline data. The test garment was effective in collecting data as evaluated for accuracy and reliability using ANOVA for accelerometer data, transcription of video for type of movement, and number and length of utterances for speech production. The data collection garment has been implemented in various studies across disciplines.

Performance measures for Australian laboratories reporting cervical cytology : a decade of data 1998–2008

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aim Performance measures for Australian laboratories reporting cervical cytology are a set of quantifiable measures relating to the profile and accuracy of reporting. This study reviews aggregate data collected over the ten years in which participation in the performance measures has been mandatory. Methods Laboratories submit annual data on performance measures relating to the profile of reporting, including reporting rates for technically unsatisfactory specimens, high grade or possible high grade abnormalities and abnormal reports. Cytology-histology correlation data and review findings of negative smears reported from women with histological high grade disease are also collected. Suggested acceptable standards are set for each measure. This study reviews the aggregate data submitted by all laboratories for the years 1998-2008 and examines trends in reporting and the performance of laboratories against the suggested standards. Results The performance of Australian laboratories has shown continued improvement over the study period. There has been a fall in the proportion of laboratories with data outside the acceptable standard range in all performance measures. Laboratories are reporting a greater proportion of specimens as definite or possible high grade abnormality. This is partly attributable to an increase in the proportion of abnormal results classified as high grade or possible high grade abnormality. Despite this, the positive predictive value for high grade and possible high grade abnormalities has continued to rise. Conclusion Performance measures for cervical cytology have provided a valuable addition to external quality assurance procedures in Australia. They have documented continued improvements in the aggregate performance, as well as providing benchmarking data and goals for acceptable performance for individual laboratories.

Mining business process deviance : a quest for accuracy

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper evaluates the suitability of sequence classification techniques for analyzing deviant business process executions based on event logs. Deviant process executions are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as non-compliant executions or executions that undershoot or exceed performance targets. We evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions both when deviances are infrequent (unbalanced) and when deviances are as frequent as normal executions (balanced). We also analyze the ability of the discovered rules to explain potential causes and contributing factors of observed deviances. The evaluation results show that feature types extracted using pattern mining techniques only slightly outperform those based on individual activity frequency. The results also suggest that more complex feature types ought to be explored to achieve higher levels of accuracy.

miRPlant : an integrated tool for identification of plant miRNA from RNA sequencing data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Small RNA sequencing is commonly used to identify novel miRNAs and to determine their expression levels in plants. There are several miRNA identification tools for animals such as miRDeep, miRDeep2 and miRDeep*. miRDeep-P was developed to identify plant miRNA using miRDeep’s probabilistic model of miRNA biogenesis, but it depends on several third party tools and lacks a user-friendly interface. The objective of our miRPlant program is to predict novel plant miRNA, while providing a user-friendly interface with improved accuracy of prediction. Result We have developed a user-friendly plant miRNA prediction tool called miRPlant. We show using 16 plant miRNA datasets from four different plant species that miRPlant has at least a 10% improvement in accuracy compared to miRDeep-P, which is the most popular plant miRNA prediction tool. Furthermore, miRPlant uses a Graphical User Interface for data input and output, and identified miRNA are shown with all RNAseq reads in a hairpin diagram. Conclusions We have developed miRPlant which extends miRDeep* to various plant species by adopting suitable strategies to identify hairpin excision regions and hairpin structure filtering for plants. miRPlant does not require any third party tools such as mapping or RNA secondary structure prediction tools. miRPlant is also the first plant miRNA prediction tool that dynamically plots miRNA hairpin structure with small reads for identified novel miRNAs. This feature will enable biologists to visualize novel pre-miRNA structure and the location of small RNA reads relative to the hairpin. Moreover, miRPlant can be easily used by biologists with limited bioinformatics skills.

Generation and assessment of accuracy of Digital Elevation Model by using Pcigeomatica (10.3)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study aims to assess the accuracy of Digital Elevation Model (DEM) which is generated by using Toutin’s model. Thus, Toutin’s model was run by using OrthoEngineSE of PCI Geomatics 10.3.Thealong-track stereoimages of Advanced Spaceborne Thermal Emission and Reflection radiometer (ASTER) sensor with 15 m resolution were used to produce DEM on an area with low and near Mean Sea Level (MSL) elevation in Johor Malaysia. Despite the satisfactory pre-processing results the visual assessment of the DEM generated from Toutin’s model showed that the DEM contained many outliers and incorrect values. The failure of Toutin’s model may mostly be due to the inaccuracy and insufficiency of ASTER ephemeris data for low terrains as well as huge water body in the stereo images.

Anomaly detection in online social networks : using data-mining techniques and fuzzy logic

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research is a step forward in improving the accuracy of detecting anomaly in a data graph representing connectivity between people in an online social network. The proposed hybrid methods are based on fuzzy machine learning techniques utilising different types of structural input features. The methods are presented within a multi-layered framework which provides the full requirements needed for finding anomalies in data graphs generated from online social networks, including data modelling and analysis, labelling, and evaluation.

Machine learning for activity recognition : hip versus wrist data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Problem addressed Wrist-worn accelerometers are associated with greater compliance. However, validated algorithms for predicting activity type from wrist-worn accelerometer data are lacking. This study compared the activity recognition rates of an activity classifier trained on acceleration signal collected on the wrist and hip. Methodology 52 children and adolescents (mean age 13.7 +/- 3.1 year) completed 12 activity trials that were categorized into 7 activity classes: lying down, sitting, standing, walking, running, basketball, and dancing. During each trial, participants wore an ActiGraph GT3X+ tri-axial accelerometer on the right hip and the non-dominant wrist. Features were extracted from 10-s windows and inputted into a regularized logistic regression model using R (Glmnet + L1). Results Classification accuracy for the hip and wrist was 91.0% +/- 3.1% and 88.4% +/- 3.0%, respectively. The hip model exhibited excellent classification accuracy for sitting (91.3%), standing (95.8%), walking (95.8%), and running (96.8%); acceptable classification accuracy for lying down (88.3%) and basketball (81.9%); and modest accuracy for dance (64.1%). The wrist model exhibited excellent classification accuracy for sitting (93.0%), standing (91.7%), and walking (95.8%); acceptable classification accuracy for basketball (86.0%); and modest accuracy for running (78.8%), lying down (74.6%) and dance (69.4%). Potential Impact Both the hip and wrist algorithms achieved acceptable classification accuracy, allowing researchers to use either placement for activity recognition.

A data mining based method for discovery of web services and their compositions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to the availability of huge number of web services, finding an appropriate Web service according to the requirements of a service consumer is still a challenge. Moreover, sometimes a single web service is unable to fully satisfy the requirements of the service consumer. In such cases, combinations of multiple inter-related web services can be utilised. This paper proposes a method that first utilises a semantic kernel model to find related services and then models these related Web services as nodes of a graph. An all-pair shortest-path algorithm is applied to find the best compositions of Web services that are semantically related to the service consumer requirement. The recommendation of individual and composite Web services composition for a service request is finally made. Empirical evaluation confirms that the proposed method significantly improves the accuracy of service discovery in comparison to traditional keyword-based discovery methods.

«
1
2
...
5
6
7
8
9
10
11
...
55
56
»