958 resultados para multivariate binary data
Resumo:
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuously cored boreholes, 100 to 220m deep were drilled in the northern part of the Po Plain by Regione Lombardia in the last five years. Quantitative provenance analysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carried out by using multivariate statistical analysis (principal component analysis, PCA, and similarity analysis) on an integrated data set, including high-resolution bulk petrography and heavy-mineral analyses on Pleistocene sands and of 250 major and minor modern rivers draining the southern flank of the Alps from West to East (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations, metamorphic and quartzofeldspathic detritus from the Western and Central Alps was carried from the axial belt to the Po basin longitudinally parallel to the SouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenario rapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset of the first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA and similarity analysis from core samples show that the longitudinal trunk river at this time was shifted southward by the rapid southward and westward progradation of transverse alluvial river systems fed from the Central and Southern Alps. Sediments were transported southward by braided river systems as well as glacial sediments transported by Alpine valley glaciers invaded the alluvial plain. Kew words: Detrital modes; Modern sands; Provenance; Principal Components Analysis; Similarity, Canberra Distance; palaeodrainage
Resumo:
Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult to achieve because the relative values of the forecast components often fail to behave in a way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It has been shown that cause-specic mortality forecasts are pessimistic when compared with all-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approach of using log mortality rates and forecasts the density of deaths in the life table. Since these values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbing state), they are intrinsically relative rather than absolute values across decrements as well as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison (1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that the unit sum constraint is honoured. The structure of the best-known, single-decrement mortality-rate forecasting model, devised by Lee and Carter (1992), is expressed in compositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortality by cause of death for Japan
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed to explore patterns in high-dimensional multivariate data. The conventional version of the algorithm involves the use of Euclidean metric in the process of adaptation of the model vectors, thus rendering in theory a whole methodology incompatible with non-Euclidean geometries. In this contribution we explore the two main aspects of the problem: 1. Whether the conventional approach using Euclidean metric can shed valid results with compositional data. 2. If a modification of the conventional approach replacing vectorial sum and scalar multiplication by the canonical operators in the simplex (i.e. perturbation and powering) can converge to an adequate solution. Preliminary tests showed that both methodologies can be used on compositional data. However, the modified version of the algorithm performs poorer than the conventional version, in particular, when the data is pathological. Moreover, the conventional ap- proach converges faster to a solution, when data is \well-behaved". Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr) transformation to obtain the random vector y of dimension D. The factor model is then y = Λf + e (1) with the factors f of dimension k < D, the error term e, and the loadings matrix Λ. Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysis model (1) can be written as Cov(y) = ΛΛT + ψ (2) where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as the loadings matrix Λ are estimated from an estimation of Cov(y). Given observed clr transformed data Y as realizations of the random vector y. Outliers or deviations from the idealized model assumptions of factor analysis can severely effect the parameter estimation. As a way out, robust estimation of the covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), see Pison et al. (2003). Well known robust covariance estimators with good statistical properties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), rely on a full-rank data matrix Y which is not the case for clr transformed data (see, e.g., Aitchison, 1986). The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves this singularity problem. The data matrix Y is transformed to a matrix Z by using an orthonormal basis of lower dimension. Using the ilr transformed data, a robust covariance matrix C(Z) can be estimated. The result can be back-transformed to the clr space by C(Y ) = V C(Z)V T where the matrix V with orthonormal columns comes from the relation between the clr and the ilr transformation. Now the parameters in the model (2) can be estimated (Basilevsky, 1994) and the results have a direct interpretation since the links to the original variables are still preserved. The above procedure will be applied to data from geochemistry. Our special interest is on comparing the results with those of Reimann et al. (2002) for the Kola project data
Resumo:
Slides based on Brookshear, augmented with a lot of simple material to encourage the students to understand binary numbers!
Resumo:
Site-specific management requires accurate knowledge of the spatial variation in a range of soil properties within fields. This involves considerable sampling effort, which is costly. Ancillary data, such as crop yield, elevation and apparent electrical conductivity (ECa) of the soil, can provide insight into the spatial variation of some soil properties. A multivariate classification with spatial constraint imposed by the variogram was used to classify data from two arable crop fields. The yield data comprised 5 years of crop yield, and the ancillary data 3 years of yield data, elevation and ECa. Information on soil chemical and physical properties was provided by intensive surveys of the soil. Multivariate variograms computed from these data were used to constrain sites spatially within classes to increase their contiguity. The constrained classifications resulted in coherent classes, and those based on the ancillary data were similar to those from the soil properties. The ancillary data seemed to identify areas in the field where the soil is reasonably homogeneous. The results of targeted sampling showed that these classes could be used as a basis for management and to guide future sampling of the soil.
Resumo:
Ecological risk assessments must increasingly consider the effects of chemical mixtures on the environment as anthropogenic pollution continues to grow in complexity. Yet testing every possible mixture combination is impractical and unfeasible; thus, there is an urgent need for models that can accurately predict mixture toxicity from single-compound data. Currently, two models are frequently used to predict mixture toxicity from single-compound data: Concentration addition and independent action (IA). The accuracy of the predictions generated by these models is currently debated and needs to be resolved before their use in risk assessments can be fully justified. The present study addresses this issue by determining whether the IA model adequately described the toxicity of binary mixtures of five pesticides and other environmental contaminants (cadmium, chlorpyrifos, diuron, nickel, and prochloraz) each with dissimilar modes of action on the reproduction of the nematode Caenorhabditis elegans. In three out of 10 cases, the IA model failed to describe mixture toxicity adequately with significant or antagonism being observed. In a further three cases, there was an indication of synergy, antagonism, and effect-level-dependent deviations, respectively, but these were not statistically significant. The extent of the significant deviations that were found varied, but all were such that the predicted percentage effect seen on reproductive output would have been wrong by 18 to 35% (i.e., the effect concentration expected to cause a 50% effect led to an 85% effect). The presence of such a high number and variety of deviations has important implications for the use of existing mixture toxicity models for risk assessments, especially where all or part of the deviation is synergistic.
Resumo:
During the past 15 years, a number of initiatives have been undertaken at national level to develop ocean forecasting systems operating at regional and/or global scales. The co-ordination between these efforts has been organized internationally through the Global Ocean Data Assimilation Experiment (GODAE). The French MERCATOR project is one of the leading participants in GODAE. The MERCATOR systems routinely assimilate a variety of observations such as multi-satellite altimeter data, sea-surface temperature and in situ temperature and salinity profiles, focusing on high-resolution scales of the ocean dynamics. The assimilation strategy in MERCATOR is based on a hierarchy of methods of increasing sophistication including optimal interpolation, Kalman filtering and variational methods, which are progressively deployed through the Syst`eme d’Assimilation MERCATOR (SAM) series. SAM-1 is based on a reduced-order optimal interpolation which can be operated using ‘altimetry-only’ or ‘multi-data’ set-ups; it relies on the concept of separability, assuming that the correlations can be separated into a product of horizontal and vertical contributions. The second release, SAM-2, is being developed to include new features from the singular evolutive extended Kalman (SEEK) filter, such as three-dimensional, multivariate error modes and adaptivity schemes. The third one, SAM-3, considers variational methods such as the incremental four-dimensional variational algorithm. Most operational forecasting systems evaluated during GODAE are based on least-squares statistical estimation assuming Gaussian errors. In the framework of the EU MERSEA (Marine EnviRonment and Security for the European Area) project, research is being conducted to prepare the next-generation operational ocean monitoring and forecasting systems. The research effort will explore nonlinear assimilation formulations to overcome limitations of the current systems. This paper provides an overview of the developments conducted in MERSEA with the SEEK filter, the Ensemble Kalman filter and the sequential importance re-sampling filter.
The sequential analysis of repeated binary responses: a score test for the case of three time points
Resumo:
In this paper a robust method is developed for the analysis of data consisting of repeated binary observations taken at up to three fixed time points on each subject. The primary objective is to compare outcomes at the last time point, using earlier observations to predict this for subjects with incomplete records. A score test is derived. The method is developed for application to sequential clinical trials, as at interim analyses there will be many incomplete records occurring in non-informative patterns. Motivation for the methodology comes from experience with clinical trials in stroke and head injury, and data from one such trial is used to illustrate the approach. Extensions to more than three time points and to allow for stratification are discussed. Copyright © 2005 John Wiley & Sons, Ltd.
Resumo:
This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.
Resumo:
The aim of phase II single-arm clinical trials of a new drug is to determine whether it has sufficient promising activity to warrant its further development. For the last several years Bayesian statistical methods have been proposed and used. Bayesian approaches are ideal for earlier phase trials as they take into account information that accrues during a trial. Predictive probabilities are then updated and so become more accurate as the trial progresses. Suitable priors can act as pseudo samples, which make small sample clinical trials more informative. Thus patients have better chances to receive better treatments. The goal of this paper is to provide a tutorial for statisticians who use Bayesian methods for the first time or investigators who have some statistical background. In addition, real data from three clinical trials are presented as examples to illustrate how to conduct a Bayesian approach for phase II single-arm clinical trials with binary outcomes.
Resumo:
Objectives: To assess the potential source of variation that surgeon may add to patient outcome in a clinical trial of surgical procedures. Methods: Two large (n = 1380) parallel multicentre randomized surgical trials were undertaken to compare laparoscopically assisted hysterectomy with conventional methods of abdominal and vaginal hysterectomy; involving 43 surgeons. The primary end point of the trial was the occurrence of at least one major complication. Patients were nested within surgeons giving the data set a hierarchical structure. A total of 10% of patients had at least one major complication, that is, a sparse binary outcome variable. A linear mixed logistic regression model (with logit link function) was used to model the probability of a major complication, with surgeon fitted as a random effect. Models were fitted using the method of maximum likelihood in SAS((R)). Results: There were many convergence problems. These were resolved using a variety of approaches including; treating all effects as fixed for the initial model building; modelling the variance of a parameter on a logarithmic scale and centring of continuous covariates. The initial model building process indicated no significant 'type of operation' across surgeon interaction effect in either trial, the 'type of operation' term was highly significant in the abdominal trial, and the 'surgeon' term was not significant in either trial. Conclusions: The analysis did not find a surgeon effect but it is difficult to conclude that there was not a difference between surgeons. The statistical test may have lacked sufficient power, the variance estimates were small with large standard errors, indicating that the precision of the variance estimates may be questionable.
Resumo:
The purpose of this study was to apply and compare two time-domain analysis procedures in the determination of oxygen uptake (VO2) kinetics in response to a pseudorandom binary sequence (PRBS) exercise test. PRBS exercise tests have typically been analysed in the frequency domain. However, the complex interpretation of frequency responses may have limited the application of this procedure in both sporting and clinical contexts, where a single time measurement would facilitate subject comparison. The relative potential of both a mean response time (MRT) and a peak cross-correlation time (PCCT) was investigated. This study was divided into two parts: a test-retest reliability study (part A), in which 10 healthy male subjects completed two identical PRBS exercise tests, and a comparison of the VO2 kinetics of 12 elite endurance runners (ER) and 12 elite sprinters (SR; part B). In part A, 95% limits of agreement were calculated for comparison between MRT and PCCT. The results of part A showed no significant difference between test and retest as assessed by MRT [mean (SD) 42.2 (4.2) s and 43.8 (6.9) s] or by PCCT [21.8 (3.7) s and 22.7 (4.5) s]. Measurement error (%) was lower for MRT in comparison with PCCT (16% and 25%, respectively). In part B of the study, the VO2 kinetics of ER were significantly faster than those of SR, as assessed by MRT [33.4 (3.4) s and 39.9 (7.1) s, respectively; P<0.01] and PCCT [20.9 (3.8) s and 24.8 (4.5) s; P < 0.05]. It is possible that either analysis procedure could provide a single test measurement Of VO2 kinetics; however, the greater reliability of the MRT data suggests that this method has more potential for development in the assessment Of VO2 kinetics by PRBS exercise testing.
Resumo:
Baking and 2-g mixograph analyses were performed for 55 cultivars (19 spring and 36 winter wheat) from various quality classes from the 2002 harvest in Poland. An instrumented 2-g direct-drive mixograph was used to study the mixing characteristics of the wheat cultivars. A number of parameters were extracted automatically from each mixograph trace and correlated with baking volume and flour quality parameters (protein content and high molecular weight glutenin subunit [HMW-GS] composition by SDS-PAGE) using multiple linear regression statistical analysis. Principal component analysis of the mixograph data discriminated between four flour quality classes, and predictions of baking volume were obtained using several selected mixograph parameters, chosen using a best subsets regression routine, giving R-2 values of 0.862-0.866. In particular, three new spring wheat strains (CHD 502a-c) recently registered in Poland were highly discriminated and predicted to give high baking volume on the basis of two mixograph parameters: peak bandwidth and 10-min bandwidth.
Resumo:
Background: Robot-mediated therapies offer entirely new approaches to neurorehabilitation. In this paper we present the results obtained from trialling the GENTLE/S neurorehabilitation system assessed using the upper limb section of the Fugl-Meyer ( FM) outcome measure. Methods: We demonstrate the design of our clinical trial and its results analysed using a novel statistical approach based on a multivariate analytical model. This paper provides the rational for using multivariate models in robot-mediated clinical trials and draws conclusions from the clinical data gathered during the GENTLE/S study. Results: The FM outcome measures recorded during the baseline ( 8 sessions), robot-mediated therapy ( 9 sessions) and sling-suspension ( 9 sessions) was analysed using a multiple regression model. The results indicate positive but modest recovery trends favouring both interventions used in GENTLE/S clinical trial. The modest recovery shown occurred at a time late after stroke when changes are not clinically anticipated. Conclusion: This study has applied a new method for analysing clinical data obtained from rehabilitation robotics studies. While the data obtained during the clinical trial is of multivariate nature, having multipoint and progressive nature, the multiple regression model used showed great potential for drawing conclusions from this study. An important conclusion to draw from this paper is that this study has shown that the intervention and control phase both caused changes over a period of 9 sessions in comparison to the baseline. This might indicate that use of new challenging and motivational therapies can influence the outcome of therapies at a point when clinical changes are not expected. Further work is required to investigate the effects arising from early intervention, longer exposure and intensity of the therapies. Finally, more function-oriented robot-mediated therapies or sling-suspension therapies are needed to clarify the effects resulting from each intervention for stroke recovery.