941 resultados para random forest
Resumo:
The Zagros oak forests in Western Iran are critically important to the sustainability of the region. These forests have undergone dramatic declines in recent decades. We evaluated the utility of the non-parametric Random Forest classification algorithm for land cover classification of Zagros landscapes, and selected the best spatial and spectral predictive variables. The algorithm resulted in high overall classification accuracies (>85%) and also equivalent classification accuracies for the datasets from the three different sensors. We evaluated the associations between trends in forest area and structure with trends in socioeconomic and climatic conditions, to identify the most likely driving forces creating deforestation and landscape structure change. We used available socioeconomic (urban and rural population, and rural income), and climatic (mean annual rainfall and mean annual temperature) data for two provinces in northern Zagros. The most correlated driving force of forest area loss was urban population, and climatic variables to a lesser extent. Landscape structure changes were more closely associated with rural population. We examined the effects of scale changes on the results from spatial pattern analysis. We assessed the impacts of eight years of protection in a protected area in northern Zagros at two different scales (both grain and extent). The effects of protection on the amount and structure of forests was scale dependent. We evaluated the nature and magnitude of changes in forest area and structure over the entire Zagros region from 1972 to 2009. We divided the Zagros region in 167 Landscape Units and developed two measures— Deforestation Sensitivity (DS) and Connectivity Sensitivity (CS) — for each landscape unit as the percent of the time steps that forest area and ECA experienced a decrease of greater than 10% in either measure. A considerable loss in forest area and connectivity was detected, but no sudden (nonlinear) changes were detected at the spatial and temporal scale of the study. Connectivity loss occurred more rapidly than forest loss due to the loss of connecting patches. More connectivity was lost in southern Zagros due to climatic differences and different forms of traditional land use.
Resumo:
BACKGROUND: Periodontitis is the major cause of tooth loss in adults and is linked to systemic illnesses, such as cardiovascular disease and stroke. The development of rapid point-of-care (POC) chairside diagnostics has the potential for the early detection of periodontal infection and progression to identify incipient disease and reduce health care costs. However, validation of effective diagnostics requires the identification and verification of biomarkers correlated with disease progression. This clinical study sought to determine the ability of putative host- and microbially derived biomarkers to identify periodontal disease status from whole saliva and plaque biofilm. METHODS: One hundred human subjects were equally recruited into a healthy/gingivitis group or a periodontitis population. Whole saliva was collected from all subjects and analyzed using antibody arrays to measure the levels of multiple proinflammatory cytokines and bone resorptive/turnover markers. RESULTS: Salivary biomarker data were correlated to comprehensive clinical, radiographic, and microbial plaque biofilm levels measured by quantitative polymerase chain reaction (qPCR) for the generation of models for periodontal disease identification. Significantly elevated levels of matrix metalloproteinase (MMP)-8 and -9 were found in subjects with advanced periodontitis with Random Forest importance scores of 7.1 and 5.1, respectively. The generation of receiver operating characteristic curves demonstrated that permutations of salivary biomarkers and pathogen biofilm values augmented the prediction of disease category. Multiple combinations of salivary biomarkers (especially MMP-8 and -9 and osteoprotegerin) combined with red-complex anaerobic periodontal pathogens (such as Porphyromonas gingivalis or Treponema denticola) provided highly accurate predictions of periodontal disease category. Elevated salivary MMP-8 and T. denticola biofilm levels displayed robust combinatorial characteristics in predicting periodontal disease severity (area under the curve = 0.88; odds ratio = 24.6; 95% confidence interval: 5.2 to 116.5). CONCLUSIONS: Using qPCR and sensitive immunoassays, we identified host- and bacterially derived biomarkers correlated with periodontal disease. This approach offers significant potential for the discovery of biomarker signatures useful in the development of rapid POC chairside diagnostics for oral and systemic diseases. Studies are ongoing to apply this approach to the longitudinal predictions of disease activity.
Resumo:
Most published genomewide association studies (GWAS) in sheep have investigated recessively inherited monogenic traits. The objective here was to assess the feasibility of performing GWAS for a dominant trait for which the genetic basis was already known. A total of 42 Manchega and Rasa Aragonesa sheep that segregate solid black or white coat pigmentation were genotyped using the SNP50 BeadChip. Previous analysis in Manchegas demonstrated a complete association between the pigmentation trait and alleles of the MC1R gene, setting an a priori expectation for GWAS. Multiple methods were used to identify and quantify the strength of population substructure between black and white animals, before allelic association testing was performed for 49 034 SNPs. Following correction for substructure, GWAS identified the most strongly associated SNP (s26449) was also the closest to the MC1R gene. The finding was strongly supported by the permutation tree-based random forest (RF) analysis. Importantly, GWAS identified unlinked SNP with only slightly lower p-values than for s26449. Random forest analysis indicated these were false positives, suggesting interpretation based on both approaches was beneficial. The results indicate that a combined analytical approach can be successful in studies where a modest number of animals are available and substantial population stratification exists.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
Extraction of both pelvic and femoral surface models of a hip joint from CT data for computer-assisted pre-operative planning of hip arthroscopy is addressed. We present a method for a fully automatic image segmentation of a hip joint. Our method works by combining fast random forest (RF) regression based landmark detection, atlas-based segmentation, with articulated statistical shape model (aSSM) based hip joint reconstruction. The two fundamental contributions of our method are: (1) An improved fast Gaussian transform (IFGT) is used within the RF regression framework for a fast and accurate landmark detection, which then allows for a fully automatic initialization of the atlas-based segmentation; and (2) aSSM based fitting is used to preserve hip joint structure and to avoid penetration between the pelvic and femoral models. Validation on 30 hip CT images show that our method achieves high performance in segmenting pelvis, left proximal femur, and right proximal femur surfaces with an average accuracy of 0.59 mm, 0.62 mm, and 0.58 mm, respectively.
Resumo:
Extraction of surface models of a hip joint from CT data is a pre-requisite step for computer assisted diagnosis and planning (CADP) of periacetabular osteotomy (PAO). Most of existing CADP systems are based on manual segmentation, which is time-consuming and hard to achieve reproducible results. In this paper, we present a Fully Automatic CT Segmentation (FACTS) approach to simultaneously extract both pelvic and femoral models. Our approach works by combining fast random forest (RF) regression based landmark detection, multi-atlas based segmentation, with articulated statistical shape model (aSSM) based fitting. The two fundamental contributions of our approach are: (1) an improved fast Gaussian transform (IFGT) is used within the RF regression framework for a fast and accurate landmark detection, which then allows for a fully automatic initialization of the multi-atlas based segmentation; and (2) aSSM based fitting is used to preserve hip joint structure and to avoid penetration between the pelvic and femoral models. Taking manual segmentation as the ground truth, we evaluated the present approach on 30 hip CT images (60 hips) with a 6-fold cross validation. When the present approach was compared to manual segmentation, a mean segmentation accuracy of 0.40, 0.36, and 0.36 mm was found for the pelvis, the left proximal femur, and the right proximal femur, respectively. When the models derived from both segmentations were used to compute the PAO diagnosis parameters, a difference of 2.0 ± 1.5°, 2.1 ± 1.6°, and 3.5 ± 2.3% were found for anteversion, inclination, and acetabular coverage, respectively. The achieved accuracy is regarded as clinically accurate enough for our target applications.
Resumo:
Activities of daily living (ADL) are important for quality of life. They are indicators of cognitive health status and their assessment is a measure of independence in everyday living. ADL are difficult to reliably assess using questionnaires due to self-reporting biases. Various sensor-based (wearable, in-home, intrusive) systems have been proposed to successfully recognize and quantify ADL without relying on self-reporting. New classifiers required to classify sensor data are on the rise. We propose two ad-hoc classifiers that are based only on non-intrusive sensor data. METHODS: A wireless sensor system with ten sensor boxes was installed in the home of ten healthy subjects to collect ambient data over a duration of 20 consecutive days. A handheld protocol device and a paper logbook were also provided to the subjects. Eight ADL were selected for recognition. We developed two ad-hoc ADL classifiers, namely the rule based forward chaining inference engine (RBI) classifier and the circadian activity rhythm (CAR) classifier. The RBI classifier finds facts in data and matches them against the rules. The CAR classifier works within a framework to automatically rate routine activities to detect regular repeating patterns of behavior. For comparison, two state-of-the-art [Naïves Bayes (NB), Random Forest (RF)] classifiers have also been used. All classifiers were validated with the collected data sets for classification and recognition of the eight specific ADL. RESULTS: Out of a total of 1,373 ADL, the RBI classifier correctly determined 1,264, while missing 109 and the CAR determined 1,305 while missing 68 ADL. The RBI and CAR classifier recognized activities with an average sensitivity of 91.27 and 94.36%, respectively, outperforming both RF and NB. CONCLUSIONS: The performance of the classifiers varied significantly and shows that the classifier plays an important role in ADL recognition. Both RBI and CAR classifier performed better than existing state-of-the-art (NB, RF) on all ADL. Of the two ad-hoc classifiers, the CAR classifier was more accurate and is likely to be better suited than the RBI for distinguishing and recognizing complex ADL.
Resumo:
Smart homes for the aging population have recently started attracting the attention of the research community. The "health state" of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario.
Resumo:
Facial nerve segmentation plays an important role in surgical planning of cochlear implantation. Clinically available CBCT images are used for surgical planning. However, its relatively low resolution renders the identification of the facial nerve difficult. In this work, we present a supervised learning approach to enhance facial nerve image information from CBCT. A supervised learning approach based on multi-output random forest was employed to learn the mapping between CBCT and micro-CT images. Evaluation was performed qualitatively and quantitatively by using the predicted image as input for a previously published dedicated facial nerve segmentation, and cochlear implantation surgical planning software, OtoPlan. Results show the potential of the proposed approach to improve facial nerve image quality as imaged by CBCT and to leverage its segmentation using OtoPlan.
Resumo:
MRSI grids frequently show spectra with poor quality, mainly because of the high sensitivity of MRS to field inhomogeneities. These poor quality spectra are prone to quantification and/or interpretation errors that can have a significant impact on the clinical use of spectroscopic data. Therefore, quality control of the spectra should always precede their clinical use. When performed manually, quality assessment of MRSI spectra is not only a tedious and time-consuming task, but is also affected by human subjectivity. Consequently, automatic, fast and reliable methods for spectral quality assessment are of utmost interest. In this article, we present a new random forest-based method for automatic quality assessment of (1) H MRSI brain spectra, which uses a new set of MRS signal features. The random forest classifier was trained on spectra from 40 MRSI grids that were classified as acceptable or non-acceptable by two expert spectroscopists. To account for the effects of intra-rater reliability, each spectrum was rated for quality three times by each rater. The automatic method classified these spectra with an area under the curve (AUC) of 0.976. Furthermore, in the subset of spectra containing only the cases that were classified every time in the same way by the spectroscopists, an AUC of 0.998 was obtained. Feature importance for the classification was also evaluated. Frequency domain skewness and kurtosis, as well as time domain signal-to-noise ratios (SNRs) in the ranges 50-75 ms and 75-100 ms, were the most important features. Given that the method is able to assess a whole MRSI grid faster than a spectroscopist (approximately 3 s versus approximately 3 min), and without loss of accuracy (agreement between classifier trained with just one session and any of the other labelling sessions, 89.88%; agreement between any two labelling sessions, 89.03%), the authors suggest its implementation in the clinical routine. The method presented in this article was implemented in jMRUI's SpectrIm plugin. Copyright © 2016 John Wiley & Sons, Ltd.
Resumo:
To address growing concern over the effects of fisheries non-target catch on elasmobranchs worldwide, the accurate reporting of elasmobranch catch is essential. This requires data on a combination of measures, including reported landings, retained and discarded non-target catch, and post-discard survival. Identification of the factors influencing discard vs. retention is needed to improve catch estimates and to determine wasteful fishing practices. To do this we compared retention rates of elasmobranch non-target catch in a broad subset of fisheries throughout the world by taxon, fishing country, and gear. A regression tree and random forest analysis indicated that taxon was the most important determinant of retention in this dataset, but all three factors together explained 59% of the variance. Estimates of total elasmobranch removals were calculated by dividing the FAO global elasmobranch landings by average retention rates and suggest that total elasmobranch removals may exceed FAO reported landings by as much as 400%. This analysis is the first effort to directly characterize global drivers of discards for elasmobranch non-target catch. Our results highlight the importance of accurate quantification of retention and discard rates to improve assessments of the potential impacts of fisheries on these species.