Biblioteca Digital

988 resultados para clustered binary data

Designs for generalized linear models with several variables and model uncertainty

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Standard factorial designs sometimes may be inadequate for experiments that aim to estimate a generalized linear model, for example, for describing a binary response in terms of several variables. A method is proposed for finding exact designs for such experiments that uses a criterion allowing for uncertainty in the link function, the linear predictor, or the model parameters, together with a design search. Designs are assessed and compared by simulation of the distribution of efficiencies relative to locally optimal designs over a space of possible models. Exact designs are investigated for two applications, and their advantages over factorial and central composite designs are demonstrated.

Optimal crossover designs for logistic regression models in pharmacodynamics

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Pharmacodynamics (PD) is the study of the biochemical and physiological effects of drugs. The construction of optimal designs for dose-ranging trials with multiple periods is considered in this paper, where the outcome of the trial (the effect of the drug) is considered to be a binary response: the success or failure of a drug to bring about a particular change in the subject after a given amount of time. The carryover effect of each dose from one period to the next is assumed to be proportional to the direct effect. It is shown for a logistic regression model that the efficiency of optimal parallel (single-period) or crossover (two-period) design is substantially greater than a balanced design. The optimal designs are also shown to be robust to misspecification of the value of the parameters. Finally, the parallel and crossover designs are combined to provide the experimenter with greater flexibility.

Visualization of molecular fingerprints

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A visualization plot of a data set of molecular data is a useful tool for gaining insight into a set of molecules. In chemoinformatics, most visualization plots are of molecular descriptors, and the statistical model most often used to produce a visualization is principal component analysis (PCA). This paper takes PCA, together with four other statistical models (NeuroScale, GTM, LTM, and LTM-LIN), and evaluates their ability to produce clustering in visualizations not of molecular descriptors but of molecular fingerprints. Two different tasks are addressed: understanding structural information (particularly combinatorial libraries) and relating structure to activity. The quality of the visualizations is compared both subjectively (by visual inspection) and objectively (with global distance comparisons and local k-nearest-neighbor predictors). On the data sets used to evaluate clustering by structure, LTM is found to perform significantly better than the other models. In particular, the clusters in LTM visualization space are consistent with the relationships between the core scaffolds that define the combinatorial sublibraries. On the data sets used to evaluate clustering by activity, LTM again gives the best performance but by a smaller margin. The results of this paper demonstrate the value of using both a nonlinear projection map and a Bernoulli noise model for modeling binary data.

Mass transfer and chemical reaction on a distillation plate

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A multistage distillation column in which mass transfer and a reversible chemical reaction occurred simultaneously, has been investigated to formulate a technique by which this process can be analysed or predicted. A transesterification reaction between ethyl alcohol and butyl acetate, catalysed by concentrated sulphuric acid, was selected for the investigation and all the components were analysed on a gas liquid chromatograph. The transesterification reaction kinetics have been studied in a batch reactor for catalyst concentrations of 0.1 - 1.0 weight percent and temperatures between 21.4 and 85.0 °C. The reaction was found to be second order and dependent on the catalyst concentration at a given temperature. The vapour liquid equilibrium data for six binary, four ternary and one quaternary systems are measured at atmospheric pressure using a modified Cathala dynamic equilibrium still. The systems with the exception of ethyl alcohol - butyl alcohol mixtures, were found to be non-ideal. Multicomponent vapour liquid equilibrium compositions were predicted by a computer programme which utilised the Van Laar constants obtained from the binary data sets. Good agreement was obtained between the predicted and experimental quaternary equilibrium vapour compositions. Continuous transesterification experiments were carried out in a six stage sieve plate distillation column. The column was 3" in internal diameter and of unit construction in glass. The plates were 8" apart and had a free area of 7.7%. Both the liquid and vapour streams were analysed. The component conversion was dependent on the boilup rate and the reflux ratio. Because of the presence of the reaction, the concentration of one of the lighter components increased below the feed plate. In the same region a highly developed foam was formed due to the presence of the catalyst. The experimental results were analysed by the solution of a series of simultaneous enthalpy and mass equations. Good agreement was obtained between the experimental and calculated results.

Ant abundance and occurrence along the plant diversity gradient in the Jena Experiment (Main Experiment, year 2006)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This data set contains measurements of ant abundance (number of individuals observed at the baits) and ant occurrence (binary data) measured in the Main Experiment plots of a large grassland biodiversity experiment (the Jena Experiment; see further details below). Ants were sampled in 80 plots of the Main Experiment using baited traps in July 2006. In each plot two petri dishes were set on the ground, one received ~10g of Tuna the other ~10g of sugar (Sucrose). After 30min the occurrence (presence = 1 / absence = 0) and abundance (number) of ants at the two baits was recorded. Given is, per plot, the sum of ants attracted to the two different baits. In the Main Experiment, 82 grassland plots of 20 x 20 m were established from a pool of 60 species belonging to four functional groups (grasses, legumes, tall and small herbs). In May 2002, varying numbers of plant species from this species pool were sown in the plots to create a gradient of plant species richness (1, 2, 4, 8, 16 and 60 species) and functional richness (1, 2, 3, or 4 functional groups). Plots were maintained by bi-annual weeding and mowing.

Ant abundance and occurrence along the plant diversity gradient in the Jena Experiment (Main Experiment, year 2013)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This data set contains measurements of ant abundance (number of individuals attracted to baits) and ant occurrence (binary data) measured in the Main Experiment plots of a large grassland biodiversity experiment (the Jena Experiment; see further details below). In the Main Experiment, 82 grassland plots of 20 x 20 m were established from a pool of 60 species belonging to four functional groups (grasses, legumes, tall and small herbs). In May 2002, varying numbers of plant species from this species pool were sown in the plots to create a gradient of plant species richness (1, 2, 4, 8, 16 and 60 species) and functional richness (1, 2, 3, or 4 functional groups). Plots were maintained by bi-annual weeding and mowing. Ants where sampled in 80 plots of the Main Experiment using baited traps end of July/ beginning of August 2013. Sampling took place 36 days after the end of a major flooding of the field site that lasted for several weeks (see DOI flood descriptor). In each plot two petri dishes were set on the ground, one received ~10g of Tuna the other ~10g of Honey. After 30min the occurrence (presence = 1 / absence = 0) and abundance (number) of ants at the two baits was recorded. Given is, per plot, the sum of ants attracted to the two different baits.

Extending Zelterman’s approach for robust estimation of population size to zero-truncated clustered data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)

Liquid-liquid equilibria in binary solutions formed by [Pyridinium-Derived][F4B] ionic liquids and alkanols:new experimental data and validation of a multiparametric model for correlating LLE data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

[EN]Experimental solubility data are presented for a set of binary systems composed of ionic liquids (IL) derived from pyridium, with the tetrafluoroborate anion, and normal alcohols ranging from ethanol to decanol, in the temperature interval of 275 420 K, at atmospheric pressure. For each case, the miscibility curve and the upper critical solubility temperature (UCST) values are presented. The effects of the ILs on the behavior of solutions with alkanols are analyzed, paying special attention to the pyridine derivatives, and considering a series of structural characteristics of the compounds involved.

Multiproperty modeling for a set of binary systems. Evaluation of a model to correlate simultaneously several mixing properties of methyl ethanoate+alkanes and new experimental data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

[EN] This work makes a theoretical–experimental contribution to the study of ester and alkane solutions. Experimental data of isobaric vapor–liquid equilibria (VLE) are presented at 101.3 kPa for binary systems of methyl ethanoate with six alkanes (from C5 to C10), and of volumes and mixing enthalpies, vE and hE.

A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm. Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data. Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy'' package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.

Efficient binary classification of large data sets

Relevância:

40.00% 40.00%

Publicador:

Index Tracking Using Data-Mining Techniques and Mixed-Binary Linear Programming

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Index tracking has become one of the most common strategies in asset management. The index-tracking problem consists of constructing a portfolio that replicates the future performance of an index by including only a subset of the index constituents in the portfolio. Finding the most representative subset is challenging when the number of stocks in the index is large. We introduce a new three-stage approach that at first identifies promising subsets by employing data-mining techniques, then determines the stock weights in the subsets using mixed-binary linear programming, and finally evaluates the subsets based on cross validation. The best subset is returned as the tracking portfolio. Our approach outperforms state-of-the-art methods in terms of out-of-sample performance and running times.

Index Tracking Using Data-Mining Techniques and Mixed-Binary Linear Programming

Relevância:

40.00% 40.00%

Publicador:

Validation of proposed CCSDS compression scheme for GRIB2 gridded binary format, links to test-data

Relevância:

40.00% 40.00%

Publicador:

Correlation of the liquid–liquid equilibrium data for specific ternary systems with one or two partially miscible binary subsystems

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents the results of a liquid–liquid equilibrium data correlation for 11 ternary systems which have not been previously fitted using the NRTL model or, when they have, the results presented in the literature are inconsistent with the experimental behavior of the system. These ternary systems include mixtures with one or two partially miscible pairs. During the correlation process, new restrictions were imposed on the values for the NRTL binary parameters to ensure correct prediction of the total or partial miscibility for the binary pairs involved. In addition, topological concepts related to the Gibbs stability test have been applied in order to validate the results in the whole range of compositions.

«
1
2
3
4
5
6
7
8
...
65
66
»