977 resultados para binary data
Resumo:
Slides based on Brookshear, augmented with a lot of simple material to encourage the students to understand binary numbers!
Resumo:
Ecological risk assessments must increasingly consider the effects of chemical mixtures on the environment as anthropogenic pollution continues to grow in complexity. Yet testing every possible mixture combination is impractical and unfeasible; thus, there is an urgent need for models that can accurately predict mixture toxicity from single-compound data. Currently, two models are frequently used to predict mixture toxicity from single-compound data: Concentration addition and independent action (IA). The accuracy of the predictions generated by these models is currently debated and needs to be resolved before their use in risk assessments can be fully justified. The present study addresses this issue by determining whether the IA model adequately described the toxicity of binary mixtures of five pesticides and other environmental contaminants (cadmium, chlorpyrifos, diuron, nickel, and prochloraz) each with dissimilar modes of action on the reproduction of the nematode Caenorhabditis elegans. In three out of 10 cases, the IA model failed to describe mixture toxicity adequately with significant or antagonism being observed. In a further three cases, there was an indication of synergy, antagonism, and effect-level-dependent deviations, respectively, but these were not statistically significant. The extent of the significant deviations that were found varied, but all were such that the predicted percentage effect seen on reproductive output would have been wrong by 18 to 35% (i.e., the effect concentration expected to cause a 50% effect led to an 85% effect). The presence of such a high number and variety of deviations has important implications for the use of existing mixture toxicity models for risk assessments, especially where all or part of the deviation is synergistic.
The sequential analysis of repeated binary responses: a score test for the case of three time points
Resumo:
In this paper a robust method is developed for the analysis of data consisting of repeated binary observations taken at up to three fixed time points on each subject. The primary objective is to compare outcomes at the last time point, using earlier observations to predict this for subjects with incomplete records. A score test is derived. The method is developed for application to sequential clinical trials, as at interim analyses there will be many incomplete records occurring in non-informative patterns. Motivation for the methodology comes from experience with clinical trials in stroke and head injury, and data from one such trial is used to illustrate the approach. Extensions to more than three time points and to allow for stratification are discussed. Copyright © 2005 John Wiley & Sons, Ltd.
Resumo:
This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.
Resumo:
The aim of phase II single-arm clinical trials of a new drug is to determine whether it has sufficient promising activity to warrant its further development. For the last several years Bayesian statistical methods have been proposed and used. Bayesian approaches are ideal for earlier phase trials as they take into account information that accrues during a trial. Predictive probabilities are then updated and so become more accurate as the trial progresses. Suitable priors can act as pseudo samples, which make small sample clinical trials more informative. Thus patients have better chances to receive better treatments. The goal of this paper is to provide a tutorial for statisticians who use Bayesian methods for the first time or investigators who have some statistical background. In addition, real data from three clinical trials are presented as examples to illustrate how to conduct a Bayesian approach for phase II single-arm clinical trials with binary outcomes.
Resumo:
Objectives: To assess the potential source of variation that surgeon may add to patient outcome in a clinical trial of surgical procedures. Methods: Two large (n = 1380) parallel multicentre randomized surgical trials were undertaken to compare laparoscopically assisted hysterectomy with conventional methods of abdominal and vaginal hysterectomy; involving 43 surgeons. The primary end point of the trial was the occurrence of at least one major complication. Patients were nested within surgeons giving the data set a hierarchical structure. A total of 10% of patients had at least one major complication, that is, a sparse binary outcome variable. A linear mixed logistic regression model (with logit link function) was used to model the probability of a major complication, with surgeon fitted as a random effect. Models were fitted using the method of maximum likelihood in SAS((R)). Results: There were many convergence problems. These were resolved using a variety of approaches including; treating all effects as fixed for the initial model building; modelling the variance of a parameter on a logarithmic scale and centring of continuous covariates. The initial model building process indicated no significant 'type of operation' across surgeon interaction effect in either trial, the 'type of operation' term was highly significant in the abdominal trial, and the 'surgeon' term was not significant in either trial. Conclusions: The analysis did not find a surgeon effect but it is difficult to conclude that there was not a difference between surgeons. The statistical test may have lacked sufficient power, the variance estimates were small with large standard errors, indicating that the precision of the variance estimates may be questionable.
Resumo:
The purpose of this study was to apply and compare two time-domain analysis procedures in the determination of oxygen uptake (VO2) kinetics in response to a pseudorandom binary sequence (PRBS) exercise test. PRBS exercise tests have typically been analysed in the frequency domain. However, the complex interpretation of frequency responses may have limited the application of this procedure in both sporting and clinical contexts, where a single time measurement would facilitate subject comparison. The relative potential of both a mean response time (MRT) and a peak cross-correlation time (PCCT) was investigated. This study was divided into two parts: a test-retest reliability study (part A), in which 10 healthy male subjects completed two identical PRBS exercise tests, and a comparison of the VO2 kinetics of 12 elite endurance runners (ER) and 12 elite sprinters (SR; part B). In part A, 95% limits of agreement were calculated for comparison between MRT and PCCT. The results of part A showed no significant difference between test and retest as assessed by MRT [mean (SD) 42.2 (4.2) s and 43.8 (6.9) s] or by PCCT [21.8 (3.7) s and 22.7 (4.5) s]. Measurement error (%) was lower for MRT in comparison with PCCT (16% and 25%, respectively). In part B of the study, the VO2 kinetics of ER were significantly faster than those of SR, as assessed by MRT [33.4 (3.4) s and 39.9 (7.1) s, respectively; P<0.01] and PCCT [20.9 (3.8) s and 24.8 (4.5) s; P < 0.05]. It is possible that either analysis procedure could provide a single test measurement Of VO2 kinetics; however, the greater reliability of the MRT data suggests that this method has more potential for development in the assessment Of VO2 kinetics by PRBS exercise testing.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.
Resumo:
Liquid-liquid equilibrium experimental data for refined sunflower seed oil, artificially acidified with commercial oleic acid or commercial linoleic acid and a solvent (ethanol + water), were determined at 298.2 K. This set of experimental data and the experimental data from Cuevas et al.,(1) which were obtained from (283.2 to 333.2) K, for degummed sunflower seed oil-containing systems were correlated using NRTL and UNIQUAC models with temperature-dependent binary parameters. The deviation between experimental and calculated compositions presented average values of (1.13 and 1.41) % for NRTL and UNIQUAC equations, respectively, indicating that the models were able to correctly describe the behavior of compounds under different temperature and solvent hydration.
Resumo:
Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.
Resumo:
The use of liposomes to encapsulate materials has received widespread attention for drug delivery, transfection, diagnostic reagent, and as immunoadjuvants. Phospholipid polymers form a new class of biomaterials with many potential applications in medicine and research. Of interest are polymeric phospholipids containing a diacetylene moiety along their acyl chain since these kinds of lipids can be polymerized by Ultra-Violet (UV) irradiation to form chains of covalently linked lipids in the bilayer. In particular the diacetylenic phosphatidylcholine 1,2-bis(10,12-tricosadiynoyl)- sn-glycero-3-phosphocholine (DC8,9PC) can form intermolecular cross-linking through the diacetylenic group to produce a conjugated polymer within the hydrocarbon region of the bilayer. As knowledge of liposome structures is certainly fundamental for system design improvement for new and better applications, this work focuses on the structural properties of polymerized DC8,9PC:1,2-dimyristoyl-sn-glycero-3-phusphocholine (DMPC) liposomes. Liposomes containing mixtures of DC8,9PC and DMPC, at different molar ratios, and exposed to different polymerization cycles, were studied through the analysis of the electron spin resonance (ESR) spectra of a spin label incorporated into the bilayer, and the calorimetric data obtained from differential scanning calorimetry (DSC) studies. Upon irradiation, if all lipids had been polymerized, no gel-fluid transition would be expected. However, even samples that went through 20 cycles of UV irradiation presented a DSC band, showing that around 80% of the DC8,9PC molecules were not polymerized. Both DSC and ESR indicated that the two different lipids scarcely mix at low temperatures, however few molecules of DMPC are present in DC8,9PC rich domains and vice versa. UV irradiation was found to affect the gel fluid transition of both DMPC and DC8,9PC rich regions, indicating the presence of polymeric units of DC8,9PC in both areas, A model explaining lipids rearrangement is proposed for this partially polymerized system.
Resumo:
We review several asymmetrical links for binary regression models and present a unified approach for two skew-probit links proposed in the literature. Moreover, under skew-probit link, conditions for the existence of the ML estimators and the posterior distribution under improper priors are established. The framework proposed here considers two sets of latent variables which are helpful to implement the Bayesian MCMC approach. A simulation study to criteria for models comparison is conducted and two applications are made. Using different Bayesian criteria we show that, for these data sets, the skew-probit links are better than alternative links proposed in the literature.
Resumo:
The second-order rate constants of thiolysis by n-heptanethiol on 4-nitro-N-n-butyl-1,8-naphthalimide (4NBN) are strongly affected by the water-methanol binary mixture composition reaching its maximum at around 50% mole fraction. In parallel solvent effects on 4NBN absorption molar extinction coefficient also shows a maximum at this composition region. From the spectroscopic study of reactant and product and the known H-bond capacity of the mixture a rationalization that involves specific solvent H-donor interaction with the nitro group is proposed to explain the kinetic data. Present findings also show a convenient methodology to obtain strongly fluorescent imides, valuable for peptide and analogs labeling as well as for thio-naphthalimide derivatives preparations. Copyright (C) 2008 John Wiley & Sons, Ltd.