15 resultados para outlier
em Queensland University of Technology - ePrints Archive
Resumo:
Purpose: To examine the influence of two different fast-start pacing strategies on performance and oxygen consumption (V˙O2) during cycle ergometer time trials lasting ∼5 min. Methods: Eight trained male cyclists performed four cycle ergometer time trials whereby the total work completed (113 ± 11.5 kJ; mean ± SD) was identical to the better of two 5-min self-paced familiarization trials. During the performance trials, initial power output was manipulated to induce either an all-out or a fast start. Power output during the first 60 s of the fast-start trial was maintained at 471.0 ± 48.0 W, whereas the all-out start approximated a maximal starting effort for the first 15 s (mean power: 753.6 ± 76.5 W) followed by 45 s at a constant power output (376.8 ± 38.5 W). Irrespective of starting strategy, power output was controlled so that participants would complete the first quarter of the trial (28.3 ± 2.9 kJ) in 60 s. Participants performed two trials using each condition, with their fastest time trial compared. Results: Performance time was significantly faster when cyclists adopted the all-out start (4 min 48 s ± 8 s) compared with the fast start (4 min 51 s ± 8 s; P < 0.05). The first-quarter V˙O2 during the all-out start trial (3.4 ± 0.4 L·min-1) was significantly higher than during the fast-start trial (3.1 ± 0.4 L·min-1; P < 0.05). After removal of an outlier, the percentage increase in first-quarter V˙O2 was significantly correlated (r = -0.86, P < 0.05) with the relative difference in finishing time. Conclusions: An all-out start produces superior middle distance cycling performance when compared with a fast start. The improvement in performance may be due to a faster V˙O2 response rather than time saved due to a rapid acceleration.
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.
Resumo:
Neoproterozoic glacigenic formations are preserved in the Kimberley region and northwestern Northern Territory of northern Australia. They are distributed in the west Kimberley adjacent to the northern margins of the King Leopold Orogen, the Mt Ramsay area at the junction of the King Leopold and Halls Creek Orogens, and the east Kimberley, adjacent to the eastern margin of the Halls Creek Orogen. Small outlier glacigenic deposits are preserved in the Litchfield Province, Northern Territory (Uniya Formation) and Georgina Basin, western Queensland (Little Burke Formation). Glacigenic strata comprise diamictite, conglomerate, sandstone and pebbly mudstone and characterize the Walsh, Landrigan and Fargoo/Moonlight Valley formations. Thin units of laminated dolomite sit conformably at the top of the Walsh, Landrigan and Moonlight Valley formations. Glacigenic units are also interbedded with the carbonate platform deposits of the Egan Formation and Boonall Dolomite. δ13C data are available for all carbonate units. There is no direct chronological constraint on these successions. Dispute over regional correlation of the Neoproterozoic succession has been largely resolved through biostratigraphic, chemostratigraphic and lithostratigraphic analysis. However, palaeomagnetic results from the Walsh Formation are inconsistent with sedimentologically based correlations. Two stratigraphically defined glaciations are preserved in northwestern Australia: the ‘Landrigan Glaciation’, characterized by southwest-directed continental ice-sheet movement and correlated with late Cryogenian glaciation elsewhere in Australia and the world; and, the ‘Egan Glaciation’, a more localized glaciation of the Ediacaran Period. Future research focus should include chronology, palaeomagnetic constraint and tectonostratigraphic controls on deposition.
Resumo:
Background Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. Results We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. Conclusions We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers
Resumo:
This thesis describes the development of a robust and novel prototype to address the data quality problems that relate to the dimension of outlier data. It thoroughly investigates the associated problems with regards to detecting, assessing and determining the severity of the problem of outlier data; and proposes granule-mining based alternative techniques to significantly improve the effectiveness of mining and assessing outlier data.
Resumo:
We describe a new species of dasyurid marsupial within the genus Antechinus that was previously known as a northern outlier of Dusky Antechinus (A. swainsonii). The Black-tailed Antechinus, Antechinus arktos sp. nov., is known only from areas of high altitude and high rainfall on the Tweed Volcano caldera of far south-east Queensland and north-east New South Wales, Australia. Antechinus arktos formerly sheltered under the taxonomic umbrella of A. swainsonii mimetes, the widespread mainland form of Dusky Antechinus. With the benefit of genetic hindsight, some striking morphological differences are herein resolved: A. s. mimetes is more uniformly deep brown-black to grizzled grey-brown from head to rump, with brownish (clove brown—raw umber) hair on the upper surface of the hindfoot and tail, whereas A. arktos is more vibrantly coloured, with a marked change from greyish-brown head to orange-brown rump, fuscous black on the upper surface of the hindfoot and dense, short fur on the evenly black tail. Further, A. arktos has marked orange-brown fur on the upper and lower eyelid, cheek and in front of the ear and very long guard hairs all over the body; these characters are more subtle in A. s. mimetes. There are striking genetic differences between the two species: at mtDNA, A. s. mimetes from north-east New South Wales is 10% divergent to A. arktos from its type locality at Springbrook NP, Queensland. In contrast, the Ebor A. s. mimetes clades closely with conspecifics from ACT and Victoria. A. arktos skulls are strikingly different to all subspecies of A. swainsonii. A. arktos are markedly larger than A. s. mimetes and A. s. swainsonii (Tasmania) for a range of craniodental measures. Antechinus arktos were historically found at a few proximate mountainous sites in south-east Queensland, and have only recently been recorded from or near the type locality. Even there, the species is likely in low abundance. The Black-tailed Antechinus has plausibly been detrimentally affected by climate change in recent decades, and will be at further risk with increasing warming trends.
Resumo:
A systematic literature review and a comprehensive meta-analysis that combines the findings from existing studies, was conducted in this thesis to analyse the impact of traffic characteristics on crash occurrence. Sensitivity analyses were conducted to investigate the quality, publication bias and outlier bias of the various studies, and the time intervals used to measure traffic characteristics were considered. Based on this comprehensive and systematic review, and the results of the subsequent meta-analysis, major issues in study design, traffic and crash data, and model development and evaluation are discussed.
Resumo:
The ability to function in a nocturnal and ground-dwelling niche requires a unique set of sensory specializations. The New Zealand kiwi has shifted away from vision, instead relying on auditory and tactile stimuli to function in its environment and locate prey. Behavioral evidence suggests that kiwi also rely on their sense of smell, using olfactory cues in foraging and possibly also in communication and social interactions. Anatomical studies appear to support these observations: the olfactory bulbs and tubercles have been suggested to be large in the kiwi relative to other birds, although the extent of this enlargement is poorly understood. In this study, we examine the size of the olfactory bulbs in kiwi and compare them with 55 other bird species, including emus, ostriches, rheas, tinamous, and 2 extinct species of moa (Dinornithiformes). We also examine the cytoarchitecture of the olfactory bulbs and olfactory epithelium to determine if any neural specializations beyond size are present that would increase olfactory acuity. Kiwi were a clear outlier in our analysis, with olfactory bulbs that are proportionately larger than those of any other bird in this study. Emus, close relatives of the kiwi, also had a relative enlargement of the olfactory bulbs, possibly supporting a phylogenetic link to well-developed olfaction. The olfactory bulbs in kiwi are almost in direct contact with the olfactory epithelium, which is indeed well developed and complex, with olfactory receptor cells occupying a large percentage of the epithelium. The anatomy of the kiwi olfactory system supports an enhancement for olfactory sensitivities, which is undoubtedly associated with their unique nocturnal niche.
Resumo:
This paper presents a technique for the automated removal of noise from process execution logs. Noise is the result of data quality issues such as logging errors and manifests itself in the form of infrequent process behavior. The proposed technique generates an abstract representation of an event log as an automaton capturing the direct follows relations between event labels. This automaton is then pruned from arcs with low relative frequency and used to remove from the log those events not fitting the automaton, which are identified as outliers. The technique has been extensively evaluated on top of various auto- mated process discovery algorithms using both artificial logs with different levels of noise, as well as a variety of real-life logs. The results show that the technique significantly improves the quality of the discovered process model along fitness, appropriateness and simplicity, without negative effects on generalization. Further, the technique scales well to large and complex logs.
Resumo:
The development of methods for real-time crash prediction as a function of current or recent traffic and roadway conditions is gaining increasing attention in the literature. Numerous studies have modeled the relationships between traffic characteristics and crash occurrence, and significant progress has been made. Given the accumulated evidence on this topic and the lack of an articulate summary of research status, challenges, and opportunities, there is an urgent need to scientifically review these studies and to synthesize the existing state-of-the-art knowledge. This paper addresses this need by undertaking a systematic literature review to identify current knowledge, challenges, and opportunities, and then conducts a meta-analysis of existing studies to provide a summary impact of traffic characteristics on crash occurrence. Sensitivity analyses were conducted to assess quality, publication bias, and outlier bias of the various studies; and the time intervals used to measure traffic characteristics were also considered. As a result of this comprehensive and systematic review, issues in study designs, traffic and crash data, and model development and validation are discussed. Outcomes of this study are intended to provide researchers focused on real-time crash prediction with greater insight into the modeling of this important but extremely challenging safety issue.
Resumo:
Opsins are ancient molecules that enable animal vision by coupling to a vitamin-derived chromophore to form lightsensitive photopigments. The primary drivers of evolutionary diversification in opsins are thought to be visual tasks related to spectral sensitivity and color vision. Typically, only a few opsin amino acid sites affect photopigment spectral sensitivity. We show that opsin genes of the North American butterfly Limenitis arthemis have diversified along a latitudinal cline, consistent with natural selection due to environmental factors. We sequenced single nucleotide(SNP) polymorphisms in the coding regions of the ultraviolet (UVRh), blue (BRh), and long-wavelength (LWRh) opsin genes from ten butterfly populations along the eastern United States and found that a majority of opsin SNPs showed significant clinal variation. Outlier detection and analysis of molecular variance indicated that many SNPs are under balancing selection and show significant population structure. This contrasts with what we found by analysing SNPs in the wingless and EF-1 alpha loci, and from neutral amplified fragment length polymorphisms, which show no evidence of significant locus-specific or genome-wide structure among populations. Using a combination of functional genetic and physiological approaches, including expression in cell culture, transgenic Drosophila, UV-visible spectroscopy, and optophysiology, we show that key BRh opsin SNPs that vary clinally have almost no effect on spectral sensitivity. Our results suggest that opsin diversification in this butterfly is more consistent with natural selection unrelated to spectral tuning. Some of the clinally varying SNPs may instead play a role in regulating opsin gene expression levels or the thermostability of the opsin protein. Lastly, we discuss the possibility that insect opsins might have important, yet-to-be elucidated, adaptive functions in mediating animal responses to abiotic factors, such as temperature or photoperiod.
Resumo:
The output of a differential scanning fluorimetry (DSF) assay is a series of melt curves, which need to be interpreted to get value from the assay. An application that translates raw thermal melt curve data into more easily assimilated knowledge is described. This program, called “Meltdown,” conducts four main activities—control checks, curve normalization, outlier rejection, and melt temperature (Tm) estimation—and performs optimally in the presence of triplicate (or higher) sample data. The final output is a report that summarizes the results of a DSF experiment. The goal of Meltdown is not to replace human analysis of the raw fluorescence data but to provide a meaningful and comprehensive interpretation of the data to make this useful experimental technique accessible to inexperienced users, as well as providing a starting point for detailed analyses by more experienced users.
Resumo:
Problem The Manchester Driver Behaviour Questionnaire (DBQ) is the most commonly used self-report tool in traffic safety research and applied settings. It has been claimed that the violation factor of this instrument predicts accident involvement, which was supported by a previous meta-analysis. However, that analysis did not test for methodological effects, or include contacting researchers to obtain unpublished results. Method The present study re-analysed studies on prediction of accident involvement from DBQ factors, including lapses, and many unpublished effects. Tests of various types of dissemination bias and common method variance were undertaken. Results Outlier analysis showed that some effects were probably not reliable data, but excluding them did not change the results. For correlations between violations and crashes, tendencies for published effects to be larger than unpublished ones and for effects to decrease over time were observed, but were not significant. Also, analysis using the proxy of the mean of accidents in studies indicated that studies where effects for violations are unknown have smaller effect sizes. These differences indicate dissemination bias. Studies using self-reported accidents as dependent variables had much larger effects than those using recorded accident data. Also, zero-order correlations were larger than partial correlations that controlled for exposure. Similarly, violations/accidents effects were strong only when there was also a strong correlation between accidents and exposure. Overall, the true effect is probably very close to zero (r<.07) for violations versus traffic accident involvement, depending upon which systematic tendencies in the data are controlled for. Conclusions: Methodological factors and dissemination bias have inflated the mean effect size of the DBQ in the published literature. Strong evidence of various artefactual effects is apparent. Practical Applications A greater level of care should be taken if the DBQ continues to be used in traffic safety research. Also, validation of self-reports should be more comprehensive in the future, taking into account the possibility of common method variance.
Resumo:
We report sensitive high mass resolution ion microprobe, stable isotopes (SHRIMP SI) multiple sulfur isotope analyses (32S, 33S, 34S) to constrain the sources of sulfur in three Archean VMS deposits—Teutonic Bore, Bentley, and Jaguar—from the Teutonic Bore volcanic complex of the Yilgarn Craton, Western Australia, together with sedimentary pyrites from associated black shales and interpillow pyrites. The pyrites from VMS mineralization are dominated by mantle sulfur but include a small amount of slightly negative mass-independent fractionation (MIF) anomalies, whereas sulfur from the pyrites in the sedimentary rocks has pronounced positive MIF, with ∆33S values that lie between 0.19 and 6.20‰ (with one outlier at −1.62‰). The wall rocks to the mineralization include sedimentary rocks that have contributed no detectable positive MIF sulfur to the VMS deposits, which is difficult to reconcile with the leaching model for the formation of these deposits. The sulfur isotope data are best explained by mixing between sulfur derived from a magmatic-hydrothermal fluid and seawater sulfur as represented by the interpillow pyrites. The massive sulfide lens pyrites have a weighted mean ∆33S value of −0.27 ± 0.05‰ (MSWD = 1.6) nearly identical with −0.31 ± 0.08‰ (MSWD = 2.4) for pyrites from the stringer zone, which requires mixing to have occurred below the sea floor. We employed a two-component mixing model to estimate the contribution of seawater sulfur to the total sulfur budget of the two Teutonic Bore volcanic complex VMS deposits. The results are 15 to 18% for both Teutonic Bore and Bentley, much higher than the 3% obtained by Jamieson et al. (2013) for the giant Kidd Creek deposit. Similar calculations, carried out for other Neoarchean VMS deposits give value between 2% and 30%, which are similar to modern hydrothermal VMS deposits. We suggest that multiple sulfur isotope analyses may be used to predict the size of Archean VMS deposits and to provide a vector to ore deposit but further studies are needed to test these suggestions.
Resumo:
We propose a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from introducing multidimensional instead of univariate scale variables for the mixture of scaled Gaussian family of distributions. In contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examine a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and t tails. For these latter cases, we provide maximum likelihood estimation of the parameters and illustrate their modelling flexibility on simulated and real data clustering examples.