985 resultados para binary data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

La compression des données est la technique informatique qui vise à réduire la taille de l’information pour minimiser l’espace de stockage nécessaire et accélérer la transmission des données dans les réseaux à bande passante limitée. Plusieurs techniques de compression telles que LZ77 et ses variantes souffrent d’un problème que nous appelons la redondance causée par la multiplicité d’encodages. La multiplicité d’encodages (ME) signifie que les données sources peuvent être encodées de différentes manières. Dans son cas le plus simple, ME se produit lorsqu’une technique de compression a la possibilité, au cours du processus d’encodage, de coder un symbole de différentes manières. La technique de compression par recyclage de bits a été introduite par D. Dubé et V. Beaudoin pour minimiser la redondance causée par ME. Des variantes de recyclage de bits ont été appliquées à LZ77 et les résultats expérimentaux obtenus conduisent à une meilleure compression (une réduction d’environ 9% de la taille des fichiers qui ont été compressés par Gzip en exploitant ME). Dubé et Beaudoin ont souligné que leur technique pourrait ne pas minimiser parfaitement la redondance causée par ME, car elle est construite sur la base du codage de Huffman qui n’a pas la capacité de traiter des mots de code (codewords) de longueurs fractionnaires, c’est-à-dire qu’elle permet de générer des mots de code de longueurs intégrales. En outre, le recyclage de bits s’appuie sur le codage de Huffman (HuBR) qui impose des contraintes supplémentaires pour éviter certaines situations qui diminuent sa performance. Contrairement aux codes de Huffman, le codage arithmétique (AC) peut manipuler des mots de code de longueurs fractionnaires. De plus, durant ces dernières décennies, les codes arithmétiques ont attiré plusieurs chercheurs vu qu’ils sont plus puissants et plus souples que les codes de Huffman. Par conséquent, ce travail vise à adapter le recyclage des bits pour les codes arithmétiques afin d’améliorer l’efficacité du codage et sa flexibilité. Nous avons abordé ce problème à travers nos quatre contributions (publiées). Ces contributions sont présentées dans cette thèse et peuvent être résumées comme suit. Premièrement, nous proposons une nouvelle technique utilisée pour adapter le recyclage de bits qui s’appuie sur les codes de Huffman (HuBR) au codage arithmétique. Cette technique est nommée recyclage de bits basé sur les codes arithmétiques (ACBR). Elle décrit le cadriciel et les principes de l’adaptation du HuBR à l’ACBR. Nous présentons aussi l’analyse théorique nécessaire pour estimer la redondance qui peut être réduite à l’aide de HuBR et ACBR pour les applications qui souffrent de ME. Cette analyse démontre que ACBR réalise un recyclage parfait dans tous les cas, tandis que HuBR ne réalise de telles performances que dans des cas très spécifiques. Deuxièmement, le problème de la technique ACBR précitée, c’est qu’elle requiert des calculs à précision arbitraire. Cela nécessite des ressources illimitées (ou infinies). Afin de bénéficier de cette dernière, nous proposons une nouvelle version à précision finie. Ladite technique devienne ainsi efficace et applicable sur les ordinateurs avec les registres classiques de taille fixe et peut être facilement interfacée avec les applications qui souffrent de ME. Troisièmement, nous proposons l’utilisation de HuBR et ACBR comme un moyen pour réduire la redondance afin d’obtenir un code binaire variable à fixe. Nous avons prouvé théoriquement et expérimentalement que les deux techniques permettent d’obtenir une amélioration significative (moins de redondance). À cet égard, ACBR surpasse HuBR et fournit une classe plus étendue des sources binaires qui pouvant bénéficier d’un dictionnaire pluriellement analysable. En outre, nous montrons qu’ACBR est plus souple que HuBR dans la pratique. Quatrièmement, nous utilisons HuBR pour réduire la redondance des codes équilibrés générés par l’algorithme de Knuth. Afin de comparer les performances de HuBR et ACBR, les résultats théoriques correspondants de HuBR et d’ACBR sont présentés. Les résultats montrent que les deux techniques réalisent presque la même réduction de redondance sur les codes équilibrés générés par l’algorithme de Knuth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dynamic mechanical properties such as storage modulus, loss modulus and damping properties of blends of nylon copolymer (PA6,66) with ethylene propylene diene (EPDM) rubber was investigated with special reference to the effect of blend ratio and compatibilisation over a temperature range –100°C to 150°C at different frequencies. The effect of change in the composition of the polymer blends on tanδ was studied to understand the extent of polymer miscibility and damping characteristics. The loss tangent curve of the blends exhibited two transition peaks, corresponding to the glass transition temperature (Tg) of individual components indicating incompatibility of the blend systems. The morphology of the blends has been examined by using scanning electron microscopy. The Arrhenius relationship was used to calculate the activation energy for the glass transition of the blends. Finally, attempts have been made to compare the experimental data with theoretical models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The TRIM.SP program which is based on the binary collision approximation was changed to handle not only repulsive interaction potentials, but also potentials with an attractive part. Sputtering yields, average depth and reflection coefficients calculated with four different potentials are compared. Three purely repulsive potentials (Meliere, Kr-C and ZBL) are used and an ab initio pair potential, which is especially calculated for silicon bombardment by silicon. The general trends in the calculated results are similar for all potentials applied, but differences between the repulsive potentials and the ab initio potential occur for the reflection coefficients and the sputtering yield at large angles of incidence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Slides based on Brookshear, augmented with a lot of simple material to encourage the students to understand binary numbers!

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ecological risk assessments must increasingly consider the effects of chemical mixtures on the environment as anthropogenic pollution continues to grow in complexity. Yet testing every possible mixture combination is impractical and unfeasible; thus, there is an urgent need for models that can accurately predict mixture toxicity from single-compound data. Currently, two models are frequently used to predict mixture toxicity from single-compound data: Concentration addition and independent action (IA). The accuracy of the predictions generated by these models is currently debated and needs to be resolved before their use in risk assessments can be fully justified. The present study addresses this issue by determining whether the IA model adequately described the toxicity of binary mixtures of five pesticides and other environmental contaminants (cadmium, chlorpyrifos, diuron, nickel, and prochloraz) each with dissimilar modes of action on the reproduction of the nematode Caenorhabditis elegans. In three out of 10 cases, the IA model failed to describe mixture toxicity adequately with significant or antagonism being observed. In a further three cases, there was an indication of synergy, antagonism, and effect-level-dependent deviations, respectively, but these were not statistically significant. The extent of the significant deviations that were found varied, but all were such that the predicted percentage effect seen on reproductive output would have been wrong by 18 to 35% (i.e., the effect concentration expected to cause a 50% effect led to an 85% effect). The presence of such a high number and variety of deviations has important implications for the use of existing mixture toxicity models for risk assessments, especially where all or part of the deviation is synergistic.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper a robust method is developed for the analysis of data consisting of repeated binary observations taken at up to three fixed time points on each subject. The primary objective is to compare outcomes at the last time point, using earlier observations to predict this for subjects with incomplete records. A score test is derived. The method is developed for application to sequential clinical trials, as at interim analyses there will be many incomplete records occurring in non-informative patterns. Motivation for the methodology comes from experience with clinical trials in stroke and head injury, and data from one such trial is used to illustrate the approach. Extensions to more than three time points and to allow for stratification are discussed. Copyright © 2005 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of phase II single-arm clinical trials of a new drug is to determine whether it has sufficient promising activity to warrant its further development. For the last several years Bayesian statistical methods have been proposed and used. Bayesian approaches are ideal for earlier phase trials as they take into account information that accrues during a trial. Predictive probabilities are then updated and so become more accurate as the trial progresses. Suitable priors can act as pseudo samples, which make small sample clinical trials more informative. Thus patients have better chances to receive better treatments. The goal of this paper is to provide a tutorial for statisticians who use Bayesian methods for the first time or investigators who have some statistical background. In addition, real data from three clinical trials are presented as examples to illustrate how to conduct a Bayesian approach for phase II single-arm clinical trials with binary outcomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives: To assess the potential source of variation that surgeon may add to patient outcome in a clinical trial of surgical procedures. Methods: Two large (n = 1380) parallel multicentre randomized surgical trials were undertaken to compare laparoscopically assisted hysterectomy with conventional methods of abdominal and vaginal hysterectomy; involving 43 surgeons. The primary end point of the trial was the occurrence of at least one major complication. Patients were nested within surgeons giving the data set a hierarchical structure. A total of 10% of patients had at least one major complication, that is, a sparse binary outcome variable. A linear mixed logistic regression model (with logit link function) was used to model the probability of a major complication, with surgeon fitted as a random effect. Models were fitted using the method of maximum likelihood in SAS((R)). Results: There were many convergence problems. These were resolved using a variety of approaches including; treating all effects as fixed for the initial model building; modelling the variance of a parameter on a logarithmic scale and centring of continuous covariates. The initial model building process indicated no significant 'type of operation' across surgeon interaction effect in either trial, the 'type of operation' term was highly significant in the abdominal trial, and the 'surgeon' term was not significant in either trial. Conclusions: The analysis did not find a surgeon effect but it is difficult to conclude that there was not a difference between surgeons. The statistical test may have lacked sufficient power, the variance estimates were small with large standard errors, indicating that the precision of the variance estimates may be questionable.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this study was to apply and compare two time-domain analysis procedures in the determination of oxygen uptake (VO2) kinetics in response to a pseudorandom binary sequence (PRBS) exercise test. PRBS exercise tests have typically been analysed in the frequency domain. However, the complex interpretation of frequency responses may have limited the application of this procedure in both sporting and clinical contexts, where a single time measurement would facilitate subject comparison. The relative potential of both a mean response time (MRT) and a peak cross-correlation time (PCCT) was investigated. This study was divided into two parts: a test-retest reliability study (part A), in which 10 healthy male subjects completed two identical PRBS exercise tests, and a comparison of the VO2 kinetics of 12 elite endurance runners (ER) and 12 elite sprinters (SR; part B). In part A, 95% limits of agreement were calculated for comparison between MRT and PCCT. The results of part A showed no significant difference between test and retest as assessed by MRT [mean (SD) 42.2 (4.2) s and 43.8 (6.9) s] or by PCCT [21.8 (3.7) s and 22.7 (4.5) s]. Measurement error (%) was lower for MRT in comparison with PCCT (16% and 25%, respectively). In part B of the study, the VO2 kinetics of ER were significantly faster than those of SR, as assessed by MRT [33.4 (3.4) s and 39.9 (7.1) s, respectively; P<0.01] and PCCT [20.9 (3.8) s and 24.8 (4.5) s; P < 0.05]. It is possible that either analysis procedure could provide a single test measurement Of VO2 kinetics; however, the greater reliability of the MRT data suggests that this method has more potential for development in the assessment Of VO2 kinetics by PRBS exercise testing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Liquid-liquid equilibrium experimental data for refined sunflower seed oil, artificially acidified with commercial oleic acid or commercial linoleic acid and a solvent (ethanol + water), were determined at 298.2 K. This set of experimental data and the experimental data from Cuevas et al.,(1) which were obtained from (283.2 to 333.2) K, for degummed sunflower seed oil-containing systems were correlated using NRTL and UNIQUAC models with temperature-dependent binary parameters. The deviation between experimental and calculated compositions presented average values of (1.13 and 1.41) % for NRTL and UNIQUAC equations, respectively, indicating that the models were able to correctly describe the behavior of compounds under different temperature and solvent hydration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.