932 resultados para data complexity
Resumo:
The schema of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. Obtaining quickly the appropriate data increases the likelihood that an organization will make good decisions and respond adeptly to challenges. This research presents and validates a methodology for evaluating, ex ante, the relative desirability of alternative instantiations of a model of data. In contrast to prior research, each instantiation is based on a different formal theory. This research theorizes that the instantiation that yields the lowest weighted average query complexity for a representative sample of information requests is the most desirable instantiation for end-user queries. The theory was validated by an experiment that compared end-user performance using an instantiation of a data structure based on the relational model of data with performance using the corresponding instantiation of the data structure based on the object-relational model of data. Complexity was measured using three different Halstead metrics: program length, difficulty, and effort. For a representative sample of queries, the average complexity using each instantiation was calculated. As theorized, end users querying the instantiation with the lower average complexity made fewer semantic errors, i.e., were more effective at composing queries. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
The increased data complexity and task interdependency associated with servitization represent significant barriers to its adoption. The outline of a business game is presented which demonstrates the increasing complexity of the management problem when moving through Base, Intermediate and Advanced levels of servitization. Linked data is proposed as an agile set of technologies, based on well established standards, for data exchange both in the game and more generally in supply chains.
Resumo:
SIMON is a family of 10 lightweight block ciphers published by Beaulieu et al. from the United States National Security Agency (NSA). A cipher in this family with K -bit key and N -bit block is called SIMON N/K . We present several linear characteristics for reduced-round SIMON32/64 that can be used for a key-recovery attack and extend them further to attack other variants of SIMON. Moreover, we provide results of key recovery analysis using several impossible differential characteristics starting from 14 out of 32 rounds for SIMON32/64 to 22 out of 72 rounds for SIMON128/256. In some cases the presented observations do not directly yield an attack, but provide a basis for further analysis for the specific SIMON variant. Finally, we exploit a connection between linear and differential characteristics for SIMON to construct linear characteristics for different variants of reduced-round SIMON. Our attacks extend to all variants of SIMON covering more rounds compared to any known results using linear cryptanalysis. We present a key recovery attack against SIMON128/256 which covers 35 out of 72 rounds with data complexity 2123 . We have implemented our attacks for small scale variants of SIMON and our experiments confirm the theoretical bias presented in this work.
Resumo:
This paper studies the security of the block ciphers ARIA and Camellia against impossible differential cryptanalysis. Our work improves the best impossible differential cryptanalysis of ARIA and Camellia known so far. The designers of ARIA expected no impossible differentials exist for 4-round ARIA. However, we found some nontrivial 4-round impossible differentials, which may lead to a possible attack on 6-round ARIA. Moreover, we found some nontrivial 8-round impossible differentials for Camellia, whereas only 7-round impossible differentials were previously known. By using the 8-round impossible differentials, we presented an attack on 12-round Camellia without FL/FL 1 layers.
Resumo:
IEEE Computer Society
Resumo:
FOX是最近推出的系列分组密码,它的设计思想基于可证安全的研究结果,且在各种平台上的性能优良.本文利用碰撞攻击和积分攻击相结合的技术分析FOX的安全性,结果显示碰撞-积分攻击比积分攻击有效,攻击对4轮FOX64的计算复杂度是2^45.4,对5轮FOX64的计算复杂度是2^109.4,对6轮FOX64的计算复杂度是2^173.4,对7轮FOX64的计算复杂度是2^237.4,且攻击所需数据量均为2^9;也就是说4轮FOX64/64、5轮FOX64/128、6轮FOX64/192和7轮FOX64/256对本文攻击是不免疫的.
Resumo:
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.
Resumo:
The increased data complexity and task interdependency associated with servitization represent significant barriers to its adoption. The outline of a business game is presented which demonstrates the increasing complexity of the management problem when moving through Base, Intermediate and Advanced levels of servitization. Linked data is proposed as an agile set of technologies, based on well established standards, for data exchange both in the game and more generally in supply chains.
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.
Resumo:
Complexity in time series is an intriguing feature of living dynamical systems, with potential use for identification of system state. Although various methods have been proposed for measuring physiologic complexity, uncorrelated time series are often assigned high values of complexity, errouneously classifying them as a complex physiological signals. Here, we propose and discuss a method for complex system analysis based on generalized statistical formalism and surrogate time series. Sample entropy (SampEn) was rewritten inspired in Tsallis generalized entropy, as function of q parameter (qSampEn). qSDiff curves were calculated, which consist of differences between original and surrogate series qSampEn. We evaluated qSDiff for 125 real heart rate variability (HRV) dynamics, divided into groups of 70 healthy, 44 congestive heart failure (CHF), and 11 atrial fibrillation (AF) subjects, and for simulated series of stochastic and chaotic process. The evaluations showed that, for nonperiodic signals, qSDiff curves have a maximum point (qSDiff(max)) for q not equal 1. Values of q where the maximum point occurs and where qSDiff is zero were also evaluated. Only qSDiff(max) values were capable of distinguish HRV groups (p-values 5.10 x 10(-3); 1.11 x 10(-7), and 5.50 x 10(-7) for healthy vs. CHF, healthy vs. AF, and CHF vs. AF, respectively), consistently with the concept of physiologic complexity, and suggests a potential use for chaotic system analysis. (C) 2012 American Institute of Physics. [http://dx.doi.org/10.1063/1.4758815]
Resumo:
Intensity modulated radiation therapy (IMRT) is a technique that delivers a highly conformal dose distribution to a target volume while attempting to maximally spare the surrounding normal tissues. IMRT is a common treatment modality used for treating head and neck (H&N) cancers, and the presence of many critical structures in this region requires accurate treatment delivery. The Radiological Physics Center (RPC) acts as both a remote and on-site quality assurance agency that credentials institutions participating in clinical trials. To date, about 30% of all IMRT participants have failed the RPC’s remote audit using the IMRT H&N phantom. The purpose of this project is to evaluate possible causes of H&N IMRT delivery errors observed by the RPC, specifically IMRT treatment plan complexity and the use of improper dosimetry data from machines that were thought to be matched but in reality were not. Eight H&N IMRT plans with a range of complexity defined by total MU (1460-3466), number of segments (54-225), and modulation complexity scores (MCS) (0.181-0.609) were created in Pinnacle v.8m. These plans were delivered to the RPC’s H&N phantom on a single Varian Clinac. One of the IMRT plans (1851 MU, 88 segments, and MCS=0.469) was equivalent to the median H&N plan from 130 previous RPC H&N phantom irradiations. This average IMRT plan was also delivered on four matched Varian Clinac machines and the dose distribution calculated using a different 6MV beam model. Radiochromic film and TLD within the phantom were used to analyze the dose profiles and absolute doses, respectively. The measured and calculated were compared to evaluate the dosimetric accuracy. All deliveries met the RPC acceptance criteria of ±7% absolute dose difference and 4 mm distance-to-agreement (DTA). Additionally, gamma index analysis was performed for all deliveries using a ±7%/4mm and ±5%/3mm criteria. Increasing the treatment plan complexity by varying the MU, number of segments, or varying the MCS resulted in no clear trend toward an increase in dosimetric error determined by the absolute dose difference, DTA, or gamma index. Varying the delivery machines as well as the beam model (use of a Clinac 6EX 6MV beam model vs. Clinac 21EX 6MV model), also did not show any clear trend towards an increased dosimetric error using the same criteria indicated above.
Resumo:
Valoración de la transferencia temporal de los modelos de distribución de especies para su aplicación en nuestros días utilizando datos paleobotánicos Corilus avellana y Alnus glutinosa.