981 resultados para clustered binary data


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The clusters of binary patterns can be considered as Boolean functions of the (binary) features. Such a relationship between the linearly separable (LS) Boolean functions and LS clusters of binary patterns is examined. An algorithm is presented to answer the questions of the type: “Is the cluster formed by the subsets of the (binary) data set having certain features AND/NOT having certain other features, LS from the remaining set?” The algorithm uses the sequences of Numbered Binary Form (NBF) notation and some elementary (NPN) transformations of the binary data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Optical films containing the genetic variant bacteriorhodopsin BR-D96N were experimentally studied in view of their properties as media for holographic storage. Different polarization recording schemes were tested and compared. The influence of the polarization states of the recording and readout waves on the retrieved diffractive image's intensity and its signal-to-noise ratio were analyzed. The experimental results showed that, compared with the other tested polarization relations during holographic recording, the discrimination between the polarization states of diffracted and scattered light is optimized with orthogonal circular polarization of the recording beams, and thus a high signal-to-noise ratio and a high diffraction efficiency are obtained. Using a He-Ne laser (633 nm, 3 mW) for recording and readout, a spatial light modulator as a data input element, and a 2D-CCD sensor for data capture in a Fourier-transform holographic setup, a storage density of 2 x 10(8) bits/cm(2) was obtained on a 60 x 42 mu m(2) area in the BR-D96N film. The readout of encoded binary data was possible with a zero-error rate at the tested storage density. (c) 2005 Optical Society of America.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A role for sequential test procedures is emerging in genetic and epidemiological studies using banked biological resources. This stems from the methodology's potential for improved use of information relative to comparable fixed sample designs. Studies in which cost, time and ethics feature prominently are particularly suited to a sequential approach. In this paper sequential procedures for matched case–control studies with binary data will be investigated and assessed. Design issues such as sample size evaluation and error rates are identified and addressed. The methodology is illustrated and evaluated using both real and simulated data sets.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work develops a new methodology in order to discriminate models for interval-censored data based on bootstrap residual simulation by observing the deviance difference from one model in relation to another, according to Hinde (1992). Generally, this sort of data can generate a large number of tied observations and, in this case, survival time can be regarded as discrete. Therefore, the Cox proportional hazards model for grouped data (Prentice & Gloeckler, 1978) and the logistic model (Lawless, 1982) can befitted by means of generalized linear models. Whitehead (1989) considered censoring to be an indicative variable with a binomial distribution and fitted the Cox proportional hazards model using complementary log-log as a link function. In addition, a logistic model can be fitted using logit as a link function. The proposed methodology arises as an alternative to the score tests developed by Colosimo et al. (2000), where such models can be obtained for discrete binary data as particular cases from the Aranda-Ordaz distribution asymmetric family. These tests are thus developed with a basis on link functions to generate such a fit. The example that motivates this study was the dataset from an experiment carried out on a flax cultivar planted on four substrata susceptible to the pathogen Fusarium oxysoprum. The response variable, which is the time until blighting, was observed in intervals during 52 days. The results were compared with the model fit and the AIC values.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An interim analysis is usually applied in later phase II or phase III trials to find convincing evidence of a significant treatment difference that may lead to trial termination at an earlier point than planned at the beginning. This can result in the saving of patient resources and shortening of drug development and approval time. In addition, ethics and economics are also the reasons to stop a trial earlier. In clinical trials of eyes, ears, knees, arms, kidneys, lungs, and other clustered treatments, data may include distribution-free random variables with matched and unmatched subjects in one study. It is important to properly include both subjects in the interim and the final analyses so that the maximum efficiency of statistical and clinical inferences can be obtained at different stages of the trials. So far, no publication has applied a statistical method for distribution-free data with matched and unmatched subjects in the interim analysis of clinical trials. In this simulation study, the hybrid statistic was used to estimate the empirical powers and the empirical type I errors among the simulated datasets with different sample sizes, different effect sizes, different correlation coefficients for matched pairs, and different data distributions, respectively, in the interim and final analysis with 4 different group sequential methods. Empirical powers and empirical type I errors were also compared to those estimated by using the meta-analysis t-test among the same simulated datasets. Results from this simulation study show that, compared to the meta-analysis t-test commonly used for data with normally distributed observations, the hybrid statistic has a greater power for data observed from normally, log-normally, and multinomially distributed random variables with matched and unmatched subjects and with outliers. Powers rose with the increase in sample size, effect size, and correlation coefficient for the matched pairs. In addition, lower type I errors were observed estimated by using the hybrid statistic, which indicates that this test is also conservative for data with outliers in the interim analysis of clinical trials.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Semantic Binary Data Model (SBM) is a viable alternative to the now-dominant relational data model. SBM would be especially advantageous for applications dealing with complex interrelated networks of objects provided that a robust efficient implementation can be achieved. This dissertation presents an implementation design method for SBM, algorithms, and their analytical and empirical evaluation. Our method allows building a robust and flexible database engine with a wider applicability range and improved performance. ^ Extensions to SBM are introduced and an implementation of these extensions is proposed that allows the database engine to efficiently support applications with a predefined set of queries. A New Record data structure is proposed. Trade-offs of employing Fact, Record and Bitmap Data structures for storing information in a semantic database are analyzed. ^ A clustering ID distribution algorithm and an efficient algorithm for object ID encoding are proposed. Mapping to an XML data model is analyzed and a new XML-based XSDL language facilitating interoperability of the system is defined. Solutions to issues associated with making the database engine multi-platform are presented. An improvement to the atomic update algorithm suitable for certain scenarios of database recovery is proposed. ^ Specific guidelines are devised for implementing a robust and well-performing database engine based on the extended Semantic Data Model. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This collection contains measurements of abundance and diversity of different groups of aboveground invertebrates sampled on the plots of the different sub-experiments at the field site of a large grassland biodiversity experiment (the Jena Experiment; see further details below). In the main experiment, 82 grassland plots of 20 x 20 m were established from a pool of 60 species belonging to four functional groups (grasses, legumes, tall and small herbs). In May 2002, varying numbers of plant species from this species pool were sown into the plots to create a gradient of plant species richness (1, 2, 4, 8, 16 and 60 species) and functional richness (1, 2, 3, 4 functional groups). Plots were maintained by bi-annual weeding and mowing. The following series of datasets are contained in this collection: 1. Measurements of ant abundance (number of individuals attracted to baits) and ant occurrence (binary data) in the Main Experiment in 2006 and 2013. Ants where sampled using two types of baited traps receiving ~10g of Tuna or ~10g of honey/Sucrose. After 30min the occurrence (presence = 1 / absence = 0) and abundance (number) of ants at the two types of baits was recorded and pooled per plot.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The viscosity of ionic liquids (ILs) has been modeled as a function of temperature and at atmospheric pressure using a new method based on the UNIFAC–VISCO method. This model extends the calculations previously reported by our group (see Zhao et al. J. Chem. Eng. Data 2016, 61, 2160–2169) which used 154 experimental viscosity data points of 25 ionic liquids for regression of a set of binary interaction parameters and ion Vogel–Fulcher–Tammann (VFT) parameters. Discrepancies in the experimental data of the same IL affect the quality of the correlation and thus the development of the predictive method. In this work, mathematical gnostics was used to analyze the experimental data from different sources and recommend one set of reliable data for each IL. These recommended data (totally 819 data points) for 70 ILs were correlated using this model to obtain an extended set of binary interaction parameters and ion VFT parameters, with a regression accuracy of 1.4%. In addition, 966 experimental viscosity data points for 11 binary mixtures of ILs were collected from literature to establish this model. All the binary data consist of 128 training data points used for the optimization of binary interaction parameters and 838 test data points used for the comparison of the pure evaluated values. The relative average absolute deviation (RAAD) for training and test is 2.9% and 3.9%, respectively.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988-96, while accounting for spatial and temporal correlations in the catch-effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had within-cluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In terabit-density magnetic recording, several bits of data can be replaced by the values of their neighbors in the storage medium. As a result, errors in the medium are dependent on each other and also on the data written. We consider a simple 1-D combinatorial model of this medium. In our model, we assume a setting where binary data is sequentially written on the medium and a bit can erroneously change to the immediately preceding value. We derive several properties of codes that correct this type of errors, focusing on bounds on their cardinality. We also define a probabilistic finite-state channel model of the storage medium, and derive lower and upper estimates of its capacity. A lower bound is derived by evaluating the symmetric capacity of the channel, i.e., the maximum transmission rate under the assumption of the uniform input distribution of the channel. An upper bound is found by showing that the original channel is a stochastic degradation of another, related channel model whose capacity we can compute explicitly.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A zone based systems design framework is described and utilised in the implementation of a message authentication code (MAC) algorithm based on symmetric key block ciphers. The resulting block cipher based MAC algorithm may be used to provide assurance of the authenticity and, hence, the integrity of binary data. Using software simulation to benchmark against the de facto cipher block chaining MAC (CBC-MAC) variant used in the TinySec security protocol for wireless sensor networks and the NIST cipher block chaining MAC standard, CMAC; we show that our zone based systems design framework can lead to block cipher based MAC constructs that point to improvements in message processing efficiency, processing throughput and processing latency.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tese de doutoramento, Estatística e Investigação Operacional (Probabilidades e Estatística), Universidade de Lisboa, Faculdade de Ciências, 2014