7 resultados para random forest data analysis
Resumo:
Algorithms for concept drift handling are important for various applications including video analysis and smart grids. In this paper we present decision tree ensemble classication method based on the Random Forest algorithm for concept drift. The weighted majority voting ensemble aggregation rule is employed based on the ideas of Accuracy Weighted Ensemble (AWE) method. Base learner weight in our case is computed for each sample evaluation using base learners accuracy and intrinsic proximity measure of Random Forest. Our algorithm exploits both temporal weighting of samples and ensemble pruning as a forgetting strategy. We present results of empirical comparison of our method with îriginal random forest with incorporated replace-the-looser forgetting andother state-of-the-art concept-drift classiers like AWE2.
Resumo:
This paper is part of a special issue of Applied Geochemistry focusing on reliable applications of compositional multivariate statistical methods. This study outlines the application of compositional data analysis (CoDa) to calibration of geochemical data and multivariate statistical modelling of geochemistry and grain-size data from a set of Holocene sedimentary cores from the Ganges-Brahmaputra (G-B) delta. Over the last two decades, understanding near-continuous records of sedimentary sequences has required the use of core-scanning X-ray fluorescence (XRF) spectrometry, for both terrestrial and marine sedimentary sequences. Initial XRF data are generally unusable in ‘raw-format’, requiring data processing in order to remove instrument bias, as well as informed sequence interpretation. The applicability of these conventional calibration equations to core-scanning XRF data are further limited by the constraints posed by unknown measurement geometry and specimen homogeneity, as well as matrix effects. Log-ratio based calibration schemes have been developed and applied to clastic sedimentary sequences focusing mainly on energy dispersive-XRF (ED-XRF) core-scanning. This study has applied high resolution core-scanning XRF to Holocene sedimentary sequences from the tidal-dominated Indian Sundarbans, (Ganges-Brahmaputra delta plain). The Log-Ratio Calibration Equation (LRCE) was applied to a sub-set of core-scan and conventional ED-XRF data to quantify elemental composition. This provides a robust calibration scheme using reduced major axis regression of log-ratio transformed geochemical data. Through partial least squares (PLS) modelling of geochemical and grain-size data, it is possible to derive robust proxy information for the Sundarbans depositional environment. The application of these techniques to Holocene sedimentary data offers an improved methodological framework for unravelling Holocene sedimentation patterns.
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.
Resumo:
BACKGROUND & AIMS: Gluteofemoral obesity (determined by measurement of subcutaneous fat in hip and thigh regions) could reduce risks of cardiovascular and diabetic disorders associated with abdominal obesity. We evaluated whether gluteofemoral obesity also reduces risk of Barrett's esophagus (BE), a premalignant lesion associated with abdominal obesity.
METHODS: We collected data from non-Hispanic white participants in 8 studies in the Barrett's and Esophageal Adenocarcinoma Consortium. We compared measures of hip circumference (as a proxy for gluteofemoral obesity) from cases of BE (n=1559) separately with 2 control groups: 2557 population-based controls and 2064 individuals with gastroesophageal reflux disease (GERD controls). Study-specific odds ratios (OR) and 95% confidence intervals (95% CI) were estimated using individual participant data and multivariable logistic regression and combined using random effects meta-analysis.
RESULTS: We found an inverse relationship between hip circumference and BE (OR per 5 cm increase, 0.88; 95% CI, 0.81-0.96), compared with population-based controls in a multivariable model that included waist circumference. This association was not observed in models that did not include waist circumference. Similar results were observed in analyses stratified by frequency of GERD symptoms. The inverse association with hip circumference was only statistically significant among men (vs population-based controls: OR, 0.85; 95% CI, 0.76-0.96 for men; OR, 0.93; 95% CI, 0.74-1.16 for women). For men, within each category of waist circumference, a larger hip circumference was associated with decreased risk of BE. Increasing waist circumference was associated with increased risk of BE in the mutually adjusted population-based and GERD control models.
CONCLUSIONS: Although abdominal obesity is associated with increased risk of BE, there is an inverse association between gluteofemoral obesity and BE, particularly among men.
Resumo:
The application of custom classification techniques and posterior probability modeling (PPM) using Worldview-2 multispectral imagery to archaeological field survey is presented in this paper. Research is focused on the identification of Neolithic felsite stone tool workshops in the North Mavine region of the Shetland Islands in Northern Scotland. Sample data from known workshops surveyed using differential GPS are used alongside known non-sites to train a linear discriminant analysis (LDA) classifier based on a combination of datasets including Worldview-2 bands, band difference ratios (BDR) and topographical derivatives. Principal components analysis is further used to test and reduce dimensionality caused by redundant datasets. Probability models were generated by LDA using principal components and tested with sites identified through geological field survey. Testing shows the prospective ability of this technique and significance between 0.05 and 0.01, and gain statistics between 0.90 and 0.94, higher than those obtained using maximum likelihood and random forest classifiers. Results suggest that this approach is best suited to relatively homogenous site types, and performs better with correlated data sources. Finally, by combining posterior probability models and least-cost analysis, a survey least-cost efficacy model is generated showing the utility of such approaches to archaeological field survey.
Resumo:
Background There is increasing interest in how culture may affect the quality of healthcare services, and previous research has shown that ‘treatment culture’—of which there are three categories (resident centred, ambiguous and traditional)—in a nursing home may influence prescribing of psychoactive medications. Objective The objective of this study was to explore and understand treatment culture in prescribing of psychoactive medications for older people with dementia in nursing homes. Method Six nursing homes—two from each treatment culture category—participated in this study. Qualitative data were collected through semi-structured interviews with nursing home staff and general practitioners (GPs), which sought to determine participants’ views on prescribing and administration of psychoactive medication, and their understanding of treatment culture and its potential influence on prescribing of psychoactive drugs. Following verbatim transcription, the data were analysed and themes were identified, facilitated by NVivo and discussion within the research team. Results Interviews took place with five managers, seven nurses, 13 care assistants and two GPs. Four themes emerged: the characteristics of the setting, the characteristics of the individual, relationships and decision making. The characteristics of the setting were exemplified by views of the setting, daily routines and staff training. The characteristics of the individual were demonstrated by views on the personhood of residents and staff attitudes. Relationships varied between staff within and outside the home. These relationships appeared to influence decision making about prescribing of medications. The data analysis found that each home exhibited traits that were indicative of its respective assigned treatment culture. Conclusion Nursing home treatment culture appeared to be influenced by four main themes. Modification of these factors may lead to a shift in culture towards a more flexible, resident-centred culture and a reduction in prescribing and use of psychoactive medication.
Resumo:
Tide gauge data are identified as legacy data given the radical transition between observation method and required output format associated with tide gauges over the 20th-century. Observed water level variation through tide-gauge records is regarded as the only significant basis for determining recent historical variation (decade to century) in mean sea-level and storm surge. There are limited tide gauge records that cover the 20th century, such that the Belfast (UK) Harbour tide gauge would be a strategic long-term (110 years) record, if the full paper-based records (marigrams) were digitally restructured to allow for consistent data analysis. This paper presents the methodology of extracting a consistent time series of observed water levels from the 5 different Belfast Harbour tide gauges’ positions/machine types, starting late 1901. Tide-gauge data was digitally retrieved from the original analogue (daily) records by scanning the marigrams and then extracting the sequential tidal elevations with graph-line seeking software (Ungraph™). This automation of signal extraction allowed the full Belfast series to be retrieved quickly, relative to any manual x–y digitisation of the signal. Restructuring variably lengthed tidal data sets to a consistent daily, monthly and annual file format was undertaken by project-developed software: Merge&Convert and MergeHYD allow consistent water level sampling both at 60 min (past standard) and 10 min intervals, the latter enhancing surge measurement. Belfast tide-gauge data have been rectified, validated and quality controlled (IOC 2006 standards). The result is a consistent annual-based legacy data series for Belfast Harbour that includes over 2 million tidal-level data observations.