15 resultados para Multivariate Adaptive Regression Splines (MARS)
em CentAUR: Central Archive University of Reading - UK
A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms
Resumo:
Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.
Resumo:
To better understand the dynamics of bee populations in crops, we assessed the effect of landscape context and habitat type on bee communities in annual entomophilous crops in Europe. We quantified bee communities in five pairs of crop-country: buckwheat in Poland, cantaloupe in France, field beans in the UK, spring oilseed rape in Sweden, and strawberries in Germany. For each country, 7-10 study fields were sampled over a gradient of increasing proportion of semi-natural habitats in the surrounding landscape. The CORINE land cover classification was used to characterize the landscape over a 3 km radius around each study field and we used multivariate and regression analyses to quantify the impact of landscape features on bee abundance and diversity at the sub-generic taxonomic level. Neither overall wild bee abundance nor diversity, taken as the number of sub-genera. was significantly affected by the proportion of semi-natural habitat. Therefore, we used the most precise level of the CORINE classification to examine the possible links between specific landscape features and wild bee communities. Bee community composition fell into three distinct groups across Europe: group I included Poland, Germany, and Sweden, group 2 the UK, and group 3 France. Among all three groups, wild bee abundance and sub-generic diversity were affected by 17 landscape elements including some semi-natural habitats (e.g., transitional wood land-shrub), some urban habitats (e.g., sport and leisure facilities) and some crop habitats (e.g., non-irrigated arable land). Some bee taxa were positively affected by urban habitats only, others by semi-natural habitats only, and others by a combination of semi-natural, urban and crop habitats. Bee sub-genera favoured by urban and crop habitats were more resistant to landscape change than those favoured only by semi-natural habitats. In agroecosystems, the agricultural intensification defined as the loss of semi-natural habitats does not necessarily cause a decline in evenness at the local level, but can change community composition towards a bee fauna dominated by common taxa. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Recent empirical studies have shown that multi-angle spectral data can be useful for predicting canopy height, but the physical reason for this correlation was not understood. We follow the concept of canopy spectral invariants, specifically escape probability, to gain insight into the observed correlation. Airborne Multi-Angle Imaging Spectrometer (AirMISR) and airborne Laser Vegetation Imaging Sensor (LVIS) data acquired during a NASA Terrestrial Ecology Program aircraft campaign underlie our analysis. Two multivariate linear regression models were developed to estimate LVIS height measures from 28 AirMISR multi-angle spectral reflectances and from the spectrally invariant escape probability at 7 AirMISR view angles. Both models achieved nearly the same accuracy, suggesting that canopy spectral invariant theory can explain the observed correlation. We hypothesize that the escape probability is sensitive to the aspect ratio (crown diameter to crown height). The multi-angle spectral data alone therefore may not provide enough information to retrieve canopy height globally.
Resumo:
BACKGROUND: We examined the role of aerosol transmission of influenza in an acute ward setting. METHODS: We investigated a seasonal influenza A outbreak that occurred in our general medical ward (with open bay ward layout) in 2008. Clinical and epidemiological information was collected in real time during the outbreak. Spatiotemporal analysis was performed to estimate the infection risk among patients. Airflow measurements were conducted, and concentrations of hypothetical virus-laden aerosols at different ward locations were estimated using computational fluid dynamics modeling. RESULTS: Nine inpatients were infected with an identical strain of influenza A/H3N2 virus. With reference to the index patient's location, the attack rate was 20.0% and 22.2% in the "same" and "adjacent" bays, respectively, but 0% in the "distant" bay (P = .04). Temporally, the risk of being infected was highest on the day when noninvasive ventilation was used in the index patient; multivariate logistic regression revealed an odds ratio of 14.9 (95% confidence interval, 1.7-131.3; P = .015). A simultaneous, directional indoor airflow blown from the "same" bay toward the "adjacent" bay was found; it was inadvertently created by an unopposed air jet from a separate air purifier placed next to the index patient's bed. Computational fluid dynamics modeling revealed that the dispersal pattern of aerosols originated from the index patient coincided with the bed locations of affected patients. CONCLUSIONS: Our findings suggest a possible role of aerosol transmission of influenza in an acute ward setting. Source and engineering controls, such as avoiding aerosol generation and improving ventilation design, may warrant consideration to prevent nosocomial outbreaks.
Resumo:
Wine production is largely governed by atmospheric conditions, such as air temperature and precipitation, together with soil management and viticultural/enological practices. Therefore, anthropogenic climate change is likely to have important impacts on the winemaking sector worldwide. An important winemaking region is the Portuguese Douro Valley, which is known by its world-famous Port Wine. The identification of robust relationships between atmospheric factors and wine parameters is of great relevance for the region. A multivariate linear regression analysis of a long wine production series (1932–2010) reveals that high rainfall and cool temperatures during budburst, shoot and inflorescence development (February-March) and warm temperatures during flowering and berry development (May) are generally favourable to high production. The probabilities of occurrence of three production categories (low, normal and high) are also modelled using multinomial logistic regression. Results show that both statistical models are valuable tools for predicting the production in a given year with a lead time of 3–4 months prior to harvest. These statistical models are applied to an ensemble of 16 regional climate model experiments following the SRES A1B scenario to estimate possible future changes. Wine production is projected to increase by about 10 % by the end of the 21st century, while the occurrence of high production years is expected to increase from 25 % to over 60 %. Nevertheless, further model development will be needed to include other aspects that may shape production in the future. In particular, the rising heat stress and/or changes in ripening conditions could limit the projected production increase in future decades.
Resumo:
Factors influencing the use of chemotherapy for the initial (6 months) treatment of lung cancer in South East England were investigated. The variables explored as possibly influencing the use of chemotherapy were sex, age, the year of diagnosis, the type of lung cancer, the stage, the index of multiple deprivation and the cancer network of residence. Chi2 analysis and multivariate logistic regression models were used to examine the effect of each of the variables on the use of chemotherapy. The results showed a highly significant trend in use of chemotherapy over time; the adjusted proportion of patients receiving chemotherapy increasing from 13.6% in 1994 to 29.3% in 2003. However, age, cancer network and type of lung cancer had the strongest influence on the use of chemotherapy. This finding is important when we consider that the NHS Cancer Plan aims at improving inequalities in cancer care in the UK.
Resumo:
Aims To investigate the relationship between adiposity and plasma free fatty acid levels and the influence of total plasma free fatty acid level on insulin sensitivity and β-cell function. Methods An insulin sensitivity index, acute insulin response to glucose and a disposition index, derived from i.v. glucose tolerance minimal model analysis and total fasting plasma free fatty acid levels were available for 533 participants in the Reading, Imperial, Surrey, Cambridge, Kings study. Bivariate correlations were made between insulin sensitivity index, acute insulin response to glucose and disposition index and both adiposity measures (BMI, waist circumference and body fat mass) and total plasma free fatty acid levels. Multivariate linear regression analysis was performed, controlling for age, sex, ethnicity and adiposity. Results After adjustment, all adiposity measures were inversely associated with insulin sensitivity index (BMI: β = −0.357; waist circumference: β = −0.380; body fat mass: β = −0.375) and disposition index (BMI: β = −0.215; waist circumference: β = −0.248; body fat mass: β = −0.221) and positively associated with acute insulin response to glucose [BMI: β = 0.200; waist circumference: β = 0.195; body fat mass β = 0.209 (P values <0.001)]. Adiposity explained 13, 4 and 5% of the variation in insulin sensitivity index, acute insulin response to glucose and disposition index, respectively. After adjustment, no adiposity measure was associated with free fatty acid level, but total plasma free fatty acid level was inversely associated with insulin sensitivity index (β = −0.133), acute insulin response to glucose (β = −0.148) and disposition index [β = −0.218 (P values <0.01)]. Plasma free fatty acid concentration accounted for 1.5, 2 and 4% of the variation in insulin sensitivity index, acute insulin response to glucose and disposition index, respectively. Conclusions Plasma free fatty acid levels have a modest negative association with insulin sensitivity, β-cell secretion and disposition index but no association with adiposity measures. It is unlikely that plasma free fatty acids are the primary mediators of obesity-related insulin resistance or β-cell dysfunction.
Resumo:
The Iowa gambling task (IGT) is one of the most influential behavioral paradigms in reward-related decision making and has been, most notably, associated with ventromedial prefrontal cortex function. However, performance in the IGT relies on a complex set of cognitive subprocesses, in particular integrating information about the outcome of choices into a continuously updated decision strategy under ambiguous conditions. The complexity of the task has made it difficult for neuroimaging studies to disentangle the underlying neurocognitive processes. In this study, we used functional magnetic resonance imaging in combination with a novel adaptation of the task, which allowed us to examine separately activation associated with the moment of decision or the evaluation of decision outcomes. Importantly, using whole-brain regression analyses with individual performance, in combination with the choice/outcome history of individual subjects, we aimed to identify the neural overlap between areas that are involved in the evaluation of outcomes and in the progressive discrimination of the relative value of available choice options, thus mapping the two fundamental cognitive processes that lead to adaptive decision making. We show that activation in right ventromedial and dorsolateral prefrontal cortex was predictive of adaptive performance, in both discriminating disadvantageous from advantageous decisions and confirming negative decision outcomes. We propose that these two prefrontal areas mediate shifting away from disadvantageous choices through their sensitivity to accumulating negative outcomes. These findings provide functional evidence of the underlying processes by which these prefrontal subregions drive adaptive choice in the task, namely through contingency-sensitive outcome evaluation.
Resumo:
In clinical trials, situations often arise where more than one response from each patient is of interest; and it is required that any decision to stop the study be based upon some or all of these measures simultaneously. Theory for the design of sequential experiments with simultaneous bivariate responses is described by Jennison and Turnbull (Jennison, C., Turnbull, B. W. (1993). Group sequential tests for bivariate response: interim analyses of clinical trials with both efficacy and safety endpoints. Biometrics 49:741-752) and Cook and Farewell (Cook, R. J., Farewell, V. T. (1994). Guidelines for monitoring efficacy and toxicity responses in clinical trials. Biometrics 50:1146-1152) in the context of one efficacy and one safety response. These expositions are in terms of normally distributed data with known covariance. The methods proposed require specification of the correlation, ρ between test statistics monitored as part of the sequential test. It can be difficult to quantify ρ and previous authors have suggested simply taking the lowest plausible value, as this will guarantee power. This paper begins with an illustration of the effect that inappropriate specification of ρ can have on the preservation of trial error rates. It is shown that both the type I error and the power can be adversely affected. As a possible solution to this problem, formulas are provided for the calculation of correlation from data collected as part of the trial. An adaptive approach is proposed and evaluated that makes use of these formulas and an example is provided to illustrate the method. Attention is restricted to the bivariate case for ease of computation, although the formulas derived are applicable in the general multivariate case.
Resumo:
Multivariate statistical methods were used to investigate file Causes of toxicity and controls on groundwater chemistry from 274 boreholes in an Urban area (London) of the United Kingdom. The groundwater was alkaline to neutral, and chemistry was dominated by calcium, sodium, and Sulfate. Contaminants included fuels, solvents, and organic compounds derived from landfill material. The presence of organic material in the aquifer caused decreases in dissolved oxygen, sulfate and nitrate concentrations. and increases in ferrous iron and ammoniacal nitrogen concentrations. Pearson correlations between toxicity results and the concentration of individual analytes indicated that concentrations of ammoinacal nitrogen, dissolved oxygen, ferrous iron, and hydrocarbons were important where present. However, principal component and regression analysis suggested no significant correlation between toxicity and chemistry over the whole area. Multidimensional Scaling was used to investigate differences in sites caused by historical use, landfill gas status, or position within the sample area. Significant differences were observed between sites with different historical land use and those with different gas status. Examination of the principal component matrix revealed that these differences are related to changes in the importance of reduced chemical species.
Resumo:
Baking and 2-g mixograph analyses were performed for 55 cultivars (19 spring and 36 winter wheat) from various quality classes from the 2002 harvest in Poland. An instrumented 2-g direct-drive mixograph was used to study the mixing characteristics of the wheat cultivars. A number of parameters were extracted automatically from each mixograph trace and correlated with baking volume and flour quality parameters (protein content and high molecular weight glutenin subunit [HMW-GS] composition by SDS-PAGE) using multiple linear regression statistical analysis. Principal component analysis of the mixograph data discriminated between four flour quality classes, and predictions of baking volume were obtained using several selected mixograph parameters, chosen using a best subsets regression routine, giving R-2 values of 0.862-0.866. In particular, three new spring wheat strains (CHD 502a-c) recently registered in Poland were highly discriminated and predicted to give high baking volume on the basis of two mixograph parameters: peak bandwidth and 10-min bandwidth.
Resumo:
Background: Robot-mediated therapies offer entirely new approaches to neurorehabilitation. In this paper we present the results obtained from trialling the GENTLE/S neurorehabilitation system assessed using the upper limb section of the Fugl-Meyer ( FM) outcome measure. Methods: We demonstrate the design of our clinical trial and its results analysed using a novel statistical approach based on a multivariate analytical model. This paper provides the rational for using multivariate models in robot-mediated clinical trials and draws conclusions from the clinical data gathered during the GENTLE/S study. Results: The FM outcome measures recorded during the baseline ( 8 sessions), robot-mediated therapy ( 9 sessions) and sling-suspension ( 9 sessions) was analysed using a multiple regression model. The results indicate positive but modest recovery trends favouring both interventions used in GENTLE/S clinical trial. The modest recovery shown occurred at a time late after stroke when changes are not clinically anticipated. Conclusion: This study has applied a new method for analysing clinical data obtained from rehabilitation robotics studies. While the data obtained during the clinical trial is of multivariate nature, having multipoint and progressive nature, the multiple regression model used showed great potential for drawing conclusions from this study. An important conclusion to draw from this paper is that this study has shown that the intervention and control phase both caused changes over a period of 9 sessions in comparison to the baseline. This might indicate that use of new challenging and motivational therapies can influence the outcome of therapies at a point when clinical changes are not expected. Further work is required to investigate the effects arising from early intervention, longer exposure and intensity of the therapies. Finally, more function-oriented robot-mediated therapies or sling-suspension therapies are needed to clarify the effects resulting from each intervention for stroke recovery.
Resumo:
In this brief, we propose an orthogonal forward regression (OFR) algorithm based on the principles of the branch and bound (BB) and A-optimality experimental design. At each forward regression step, each candidate from a pool of candidate regressors, referred to as S, is evaluated in turn with three possible decisions: 1) one of these is selected and included into the model; 2) some of these remain in S for evaluation in the next forward regression step; and 3) the rest are permanently eliminated from S. Based on the BB principle in combination with an A-optimality composite cost function for model structure determination, a simple adaptive diagnostics test is proposed to determine the decision boundary between 2) and 3). As such the proposed algorithm can significantly reduce the computational cost in the A-optimality OFR algorithm. Numerical examples are used to demonstrate the effectiveness of the proposed algorithm.
Resumo:
Cross-bred cow adoption is an important and potent policy variable precipitating subsistence household entry into emerging milk markets. This paper focuses on the problem of designing policies that encourage and sustain milkmarket expansion among a sample of subsistence households in the Ethiopian highlands. In this context it is desirable to measure households’ ‘proximity’ to market in terms of the level of deficiency of essential inputs. This problem is compounded by four factors. One is the existence of cross-bred cow numbers (count data) as an important, endogenous decision by the household; second is the lack of a multivariate generalization of the Poisson regression model; third is the censored nature of the milk sales data (sales from non-participating households are, essentially, censored at zero); and fourth is an important simultaneity that exists between the decision to adopt a cross-bred cow, the decision about how much milk to produce, the decision about how much milk to consume and the decision to market that milk which is produced but not consumed internally by the household. Routine application of Gibbs sampling and data augmentation overcome these problems in a relatively straightforward manner. We model the count data from two sites close to Addis Ababa in a latent, categorical-variable setting with known bin boundaries. The single-equation model is then extended to a multivariate system that accommodates the covariance between crossbred-cow adoption, milk-output, and milk-sales equations. The latent-variable procedure proves tractable in extension to the multivariate setting and provides important information for policy formation in emerging-market settings
Resumo:
We discuss the modeling of dielectric responses of electromagnetically excited networks which are composed of a mixture of capacitors and resistors. Such networks can be employed as lumped-parameter circuits to model the response of composite materials containing conductive and insulating grains. The dynamics of the excited network systems are studied using a state space model derived from a randomized incidence matrix. Time and frequency domain responses from synthetic data sets generated from state space models are analyzed for the purpose of estimating the fraction of capacitors in the network. Good results were obtained by using either the time-domain response to a pulse excitation or impedance data at selected frequencies. A chemometric framework based on a Successive Projections Algorithm (SPA) enables the construction of multiple linear regression (MLR) models which can efficiently determine the ratio of conductive to insulating components in composite material samples. The proposed method avoids restrictions commonly associated with Archie’s law, the application of percolation theory or Kohlrausch-Williams-Watts models and is applicable to experimental results generated by either time domain transient spectrometers or continuous-wave instruments. Furthermore, it is quite generic and applicable to tomography, acoustics as well as other spectroscopies such as nuclear magnetic resonance, electron paramagnetic resonance and, therefore, should be of general interest across the dielectrics community.