337 resultados para outliers


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Collecting ground truth data is an important step to be accomplished before performing a supervised classification. However, its quality depends on human, financial and time ressources. It is then important to apply a validation process to assess the reliability of the acquired data. In this study, agricultural infomation was collected in the Brazilian Amazonian State of Mato Grosso in order to map crop expansion based on MODIS EVI temporal profiles. The field work was carried out through interviews for the years 2005-2006 and 2006-2007. This work presents a methodology to validate the training data quality and determine the optimal sample to be used according to the classifier employed. The technique is based on the detection of outlier pixels for each class and is carried out by computing Mahalanobis distances for each pixel. The higher the distance, the further the pixel is from the class centre. Preliminary observations through variation coefficent validate the efficiency of the technique to detect outliers. Then, various subsamples are defined by applying different thresholds to exclude outlier pixels from the classification process. The classification results prove the robustness of the Maximum Likelihood and Spectral Angle Mapper classifiers. Indeed, those classifiers were insensitive to outlier exclusion. On the contrary, the decision tree classifier showed better results when deleting 7.5% of pixels in the training data. The technique managed to detect outliers for all classes. In this study, few outliers were present in the training data, so that the classification quality was not deeply affected by the outliers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of this paper is to provide a contemporary summary of statistical and non-statistical meta-analytic procedures that have relevance to the type of experimental designs often used by sport scientists when examining differences/change in dependent measure(s) as a result of one or more independent manipulation(s). Using worked examples from studies on observational learning in the motor behaviour literature, we adopt a random effects model and give a detailed explanation of the statistical procedures for the three types of raw score difference-based analyses applicable to between-participant, within-participant, and mixed-participant designs. Major merits and concerns associated with these quantitative procedures are identified and agreed methods are reported for minimizing biased outcomes, such as those for dealing with multiple dependent measures from single studies, design variation across studies, different metrics (i.e. raw scores and difference scores), and variations in sample size. To complement the worked examples, we summarize the general considerations required when conducting and reporting a meta-analysis, including how to deal with publication bias, what information to present regarding the primary studies, and approaches for dealing with outliers. By bringing together these statistical and non-statistical meta-analytic procedures, we provide the tools required to clarify understanding of key concepts and principles.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this study, the authors propose a novel video stabilisation algorithm for mobile platforms with moving objects in the scene. The quality of videos obtained from mobile platforms, such as unmanned airborne vehicles, suffers from jitter caused by several factors. In order to remove this undesired jitter, the accurate estimation of global motion is essential. However it is difficult to estimate global motions accurately from mobile platforms due to increased estimation errors and noises. Additionally, large moving objects in the video scenes contribute to the estimation errors. Currently, only very few motion estimation algorithms have been developed for video scenes collected from mobile platforms, and this paper shows that these algorithms fail when there are large moving objects in the scene. In this study, a theoretical proof is provided which demonstrates that the use of delta optical flow can improve the robustness of video stabilisation in the presence of large moving objects in the scene. The authors also propose to use sorted arrays of local motions and the selection of feature points to separate outliers from inliers. The proposed algorithm is tested over six video sequences, collected from one fixed platform, four mobile platforms and one synthetic video, of which three contain large moving objects. Experiments show our proposed algorithm performs well to all these video sequences.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Transport regulators consider that, with respect to pavement damage, heavy vehicles (HVs) are the riskiest vehicles on the road network. That HV suspension design contributes to road and bridge damage has been recognised for some decades. This thesis deals with some aspects of HV suspension characteristics, particularly (but not exclusively) air suspensions. This is in the areas of developing low-cost in-service heavy vehicle (HV) suspension testing, the effects of larger-than-industry-standard longitudinal air lines and the characteristics of on-board mass (OBM) systems for HVs. All these areas, whilst seemingly disparate, seek to inform the management of HVs, reduce of their impact on the network asset and/or provide a measurement mechanism for worn HV suspensions. A number of project management groups at the State and National level in Australia have been, and will be, presented with the results of the project that resulted in this thesis. This should serve to inform their activities applicable to this research. A number of HVs were tested for various characteristics. These tests were used to form a number of conclusions about HV suspension behaviours. Wheel forces from road test data were analysed. A “novel roughness” measure was developed and applied to the road test data to determine dynamic load sharing, amongst other research outcomes. Further, it was proposed that this approach could inform future development of pavement models incorporating roughness and peak wheel forces. Left/right variations in wheel forces and wheel force variations for different speeds were also presented. This led on to some conclusions regarding suspension and wheel force frequencies, their transmission to the pavement and repetitive wheel loads in the spatial domain. An improved method of determining dynamic load sharing was developed and presented. It used the correlation coefficient between two elements of a HV to determine dynamic load sharing. This was validated against a mature dynamic loadsharing metric, the dynamic load sharing coefficient (de Pont, 1997). This was the first time that the technique of measuring correlation between elements on a HV has been used for a test case vs. a control case for two different sized air lines. That dynamic load sharing was improved at the air springs was shown for the test case of the large longitudinal air lines. The statistically significant improvement in dynamic load sharing at the air springs from larger longitudinal air lines varied from approximately 30 percent to 80 percent. Dynamic load sharing at the wheels was improved only for low air line flow events for the test case of larger longitudinal air lines. Statistically significant improvements to some suspension metrics across the range of test speeds and “novel roughness” values were evident from the use of larger longitudinal air lines, but these were not uniform. Of note were improvements to suspension metrics involving peak dynamic forces ranging from below the error margin to approximately 24 percent. Abstract models of HV suspensions were developed from the results of some of the tests. Those models were used to propose further development of, and future directions of research into, further gains in HV dynamic load sharing. This was from alterations to currently available damping characteristics combined with implementation of large longitudinal air lines. In-service testing of HV suspensions was found to be possible within a documented range from below the error margin to an error of approximately 16 percent. These results were in comparison with either the manufacturer’s certified data or test results replicating the Australian standard for “road-friendly” HV suspensions, Vehicle Standards Bulletin 11. OBM accuracy testing and development of tamper evidence from OBM data were detailed for over 2000 individual data points across twelve test and control OBM systems from eight suppliers installed on eleven HVs. The results indicated that 95 percent of contemporary OBM systems available in Australia are accurate to +/- 500 kg. The total variation in OBM linearity, after three outliers in the data were removed, was 0.5 percent. A tamper indicator and other OBM metrics that could be used by jurisdictions to determine tamper events were developed and documented. That OBM systems could be used as one vector for in-service testing of HV suspensions was one of a number of synergies between the seemingly disparate streams of this project.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Road surface macro-texture is an indicator used to determine the skid resistance levels in pavements. Existing methods of quantifying macro-texture include the sand patch test and the laser profilometer. These methods utilise the 3D information of the pavement surface to extract the average texture depth. Recently, interest in image processing techniques as a quantifier of macro-texture has arisen, mainly using the Fast Fourier Transform (FFT). This paper reviews the FFT method, and then proposes two new methods, one using the autocorrelation function and the other using wavelets. The methods are tested on pictures obtained from a pavement surface extending more than 2km's. About 200 images were acquired from the surface at approx. 10m intervals from a height 80cm above ground. The results obtained from image analysis methods using the FFT, the autocorrelation function and wavelets are compared with sensor measured texture depth (SMTD) data obtained from the same paved surface. The results indicate that coefficients of determination (R2) exceeding 0.8 are obtained when up to 10% of outliers are removed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The purpose of this review is to integrate and summarize specific measurement topics (instrument and metric choice, validity, reliability, how many and what types of days, reactivity, and data treatment) appropriate to the study of youth physical activity. Research quality pedometers are necessary to aid interpretation of steps per day collected in a range of young populations under a variety of circumstances. Steps per day is the most appropriate metric choice, but steps per minute can be used to interpret time-in-intensity in specifically delimited time periods (e.g., physical education class). Reported intraclass correlations (ICC) have ranged from .65 over 2 days (although higher values also have been reported for 2 days) to .87 over 8 days (although higher values have been reported for fewer days). Reported ICCs are lower on weekend days (.59) versus weekdays (.75) and lower over vacation days (.69) versus school days (.74). There is no objective evidence of reactivity at this time. Data treatment includes (a) identifying and addressing missing values, (b) identifying outliers and reducing data appropriately if necessary, and (c) transforming the data as required in preparation for inferential analysis. As more pedometry studies in young populations are published, these preliminary methodological recommendations should be modified and refined.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Large margin learning approaches, such as support vector machines (SVM), have been successfully applied to numerous classification tasks, especially for automatic facial expression recognition. The risk of such approaches however, is their sensitivity to large margin losses due to the influence from noisy training examples and outliers which is a common problem in the area of affective computing (i.e., manual coding at the frame level is tedious so coarse labels are normally assigned). In this paper, we leverage the relaxation of the parallel-hyperplanes constraint and propose the use of modified correlation filters (MCF). The MCF is similar in spirit to SVMs and correlation filters, but with the key difference of optimizing only a single hyperplane. We demonstrate the superiority of MCF over current techniques on a battery of experiments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data quality has become a major concern for organisations. The rapid growth in the size and technology of a databases and data warehouses has brought significant advantages in accessing, storing, and retrieving information. At the same time, great challenges arise with rapid data throughput and heterogeneous accesses in terms of maintaining high data quality. Yet, despite the importance of data quality, literature has usually condensed data quality into detecting and correcting poor data such as outliers, incomplete or inaccurate values. As a result, organisations are unable to efficiently and effectively assess data quality. Having an accurate and proper data quality assessment method will enable users to benchmark their systems and monitor their improvement. This paper introduces a granules mining for measuring the random degree of error data which will enable decision makers to conduct accurate quality assessment and allocate the most severe data, thereby providing an accurate estimation of human and financial resources for conducting quality improvement tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. Results We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. Conclusions We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The rapid increase in the deployment of CCTV systems has led to a greater demand for algorithms that are able to process incoming video feeds. These algorithms are designed to extract information of interest for human operators. During the past several years, there has been a large effort to detect abnormal activities through computer vision techniques. Typically, the problem is formulated as a novelty detection task where the system is trained on normal data and is required to detect events which do not fit the learned `normal' model. Many researchers have tried various sets of features to train different learning models to detect abnormal behaviour in video footage. In this work we propose using a Semi-2D Hidden Markov Model (HMM) to model the normal activities of people. The outliers of the model with insufficient likelihood are identified as abnormal activities. Our Semi-2D HMM is designed to model both the temporal and spatial causalities of the crowd behaviour by assuming the current state of the Hidden Markov Model depends not only on the previous state in the temporal direction, but also on the previous states of the adjacent spatial locations. Two different HMMs are trained to model both the vertical and horizontal spatial causal information. Location features, flow features and optical flow textures are used as the features for the model. The proposed approach is evaluated using the publicly available UCSD datasets and we demonstrate improved performance compared to other state of the art methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Travel time in an important transport performance indicator. Different modes of transport (buses and cars) have different mechanical and operational characteristics, resulting in significantly different travel behaviours and complexities in multimodal travel time estimation on urban networks. This paper explores the relationship between bus and car travel time on urban networks by utilising the empirical Bluetooth and Bus Vehicle Identification data from Brisbane. The technologies and issues behind the two datasets are studied. After cleaning the data to remove outliers, the relationship between not-in-service bus and car travel time and the relationship between in-service bus and car travel time are discussed. The travel time estimation models reveal that the not-in-service bus travel time are similar to the car travel time and the in-service bus travel time could be used to estimate car travel time during off-peak hours

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A satellite based observation system can continuously or repeatedly generate a user state vector time series that may contain useful information. One typical example is the collection of International GNSS Services (IGS) station daily and weekly combined solutions. Another example is the epoch-by-epoch kinematic position time series of a receiver derived by a GPS real time kinematic (RTK) technique. Although some multivariate analysis techniques have been adopted to assess the noise characteristics of multivariate state time series, statistic testings are limited to univariate time series. After review of frequently used hypotheses test statistics in univariate analysis of GNSS state time series, the paper presents a number of T-squared multivariate analysis statistics for use in the analysis of multivariate GNSS state time series. These T-squared test statistics have taken the correlation between coordinate components into account, which is neglected in univariate analysis. Numerical analysis was conducted with the multi-year time series of an IGS station to schematically demonstrate the results from the multivariate hypothesis testing in comparison with the univariate hypothesis testing results. The results have demonstrated that, in general, the testing for multivariate mean shifts and outliers tends to reject less data samples than the testing for univariate mean shifts and outliers under the same confidence level. It is noted that neither univariate nor multivariate data analysis methods are intended to replace physical analysis. Instead, these should be treated as complementary statistical methods for a prior or posteriori investigations. Physical analysis is necessary subsequently to refine and interpret the results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Purpose. The purpose of this article was to present methods capable of estimating the size and shape of the human eye lens without resorting to phakometry or magnetic resonance imaging (MRI). Methods. Previously published biometry and phakometry data of 66 emmetropic eyes of 66 subjects (age range [18, 63] years, spherical equivalent range [−0.75, +0.75] D) were used to define multiple linear regressions for the radii of curvature and thickness of the lens, from which the lens refractive index could be derived. MRI biometry was also available for a subset of 30 subjects, from which regressions could be determined for the vertex radii of curvature, conic constants, equatorial diameter, volume, and surface area. All regressions were compared with the phakometry and MRI data; the radii of curvature regressions were also compared with a method proposed by Bennett and Royston et al. Results. The regressions were in good agreement with the original measurements. This was especially the case for the regressions of lens thickness, volume, and surface area, which each had an R2 > 0.6. The regression for the posterior radius of curvature had an R2 < 0.2, making this regression unreliable. For all other regressions we found 0.25 < R2 < 0.6. The Bennett-Royston method also produced a good estimation of the radii of curvature, provided its parameters were adjusted appropriately. Conclusions. The regressions presented in this article offer a valuable alternative in case no measured lens biometry values are available; however care must be taken for possible outliers.