987 resultados para correlated binary regression
Resumo:
This study examines the properties of Generalised Regression (GREG) estimators for domain class frequencies and proportions. The family of GREG estimators forms the class of design-based model-assisted estimators. All GREG estimators utilise auxiliary information via modelling. The classic GREG estimator with a linear fixed effects assisting model (GREG-lin) is one example. But when estimating class frequencies, the study variable is binary or polytomous. Therefore logistic-type assisting models (e.g. logistic or probit model) should be preferred over the linear one. However, other GREG estimators than GREG-lin are rarely used, and knowledge about their properties is limited. This study examines the properties of L-GREG estimators, which are GREG estimators with fixed-effects logistic-type models. Three research questions are addressed. First, I study whether and when L-GREG estimators are more accurate than GREG-lin. Theoretical results and Monte Carlo experiments which cover both equal and unequal probability sampling designs and a wide variety of model formulations show that in standard situations, the difference between L-GREG and GREG-lin is small. But in the case of a strong assisting model, two interesting situations arise: if the domain sample size is reasonably large, L-GREG is more accurate than GREG-lin, and if the domain sample size is very small, estimation of assisting model parameters may be inaccurate, resulting in bias for L-GREG. Second, I study variance estimation for the L-GREG estimators. The standard variance estimator (S) for all GREG estimators resembles the Sen-Yates-Grundy variance estimator, but it is a double sum of prediction errors, not of the observed values of the study variable. Monte Carlo experiments show that S underestimates the variance of L-GREG especially if the domain sample size is minor, or if the assisting model is strong. Third, since the standard variance estimator S often fails for the L-GREG estimators, I propose a new augmented variance estimator (A). The difference between S and the new estimator A is that the latter takes into account the difference between the sample fit model and the census fit model. In Monte Carlo experiments, the new estimator A outperformed the standard estimator S in terms of bias, root mean square error and coverage rate. Thus the new estimator provides a good alternative to the standard estimator.
Resumo:
Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.
Resumo:
The surface tensions of binary mixtures of 1-alkanols (Cl-Cd with benzene, toluene, or xylene were measured. The results were correlated with the activity coefficients calculated through the group contribution method such as UNIFAC, with the maximum deviation from the experimental results less that 5%. The coefficients of the correlation are correlated with the chain length.
Resumo:
Symmetry?adapted linear combinations of valence?bond (VB) diagrams are constructed for arbitrary point groups and total spin S using diagrammatic VB methods. VB diagrams are related uniquely to invariant subspaces whose size reflects the number of group elements; their nonorthogonality leads to sparser matrices and is fully incorporated into a binary integer representation. Symmetry?adapated linear combinations of VB diagrams are constructed for the 1764 singlets of a half?filled cube of eight sites, the 2.8 million ??electron singlets of anthracene, and for illustrative S?0 systems.
Resumo:
The problem of sensor-network-based distributed intrusion detection in the presence of clutter is considered. It is argued that sensing is best regarded as a local phenomenon in that only sensors in the immediate vicinity of an intruder are triggered. In such a setting, lack of knowledge of intruder location gives rise to correlated sensor readings. A signal-space view-point is introduced in which the noise-free sensor readings associated to intruder and clutter appear as surfaces f(s) and f(g) and the problem reduces to one of determining in distributed fashion, whether the current noisy sensor reading is best classified as intruder or clutter. Two approaches to distributed detection are pursued. In the first, a decision surface separating f(s) and f(g) is identified using Neyman-Pearson criteria. Thereafter, the individual sensor nodes interactively exchange bits to determine whether the sensor readings are on one side or the other of the decision surface. Bounds on the number of bits needed to be exchanged are derived, based on communication-complexity (CC) theory. A lower bound derived for the two-party average case CC of general functions is compared against the performance of a greedy algorithm. Extensions to the multi-party case is straightforward and is briefly discussed. The average case CC of the relevant greaterthan (CT) function is characterized within two bits. Under the second approach, each sensor node broadcasts a single bit arising from appropriate two-level quantization of its own sensor reading, keeping in mind the fusion rule to be subsequently applied at a local fusion center. The optimality of a threshold test as a quantization rule is proved under simplifying assumptions. Finally, results from a QualNet simulation of the algorithms are presented that include intruder tracking using a naive polynomial-regression algorithm. 2010 Elsevier B.V. All rights reserved.
Resumo:
Validation of the flux partitioning of species model has been illustrated. Various combinations of inequality expression for the fluxes of species A and B in two successively grown hypothetical intermetallic phases in the interdiffusion zone have been considered within the constraints of this concept. Furthermore, ratio of intrinsic diffusivities of the species A and B in those two phases has been correlated in four different cases. Moreover, complete and or partial validation or invalidation of this model with respect to both the species, has been proven theoretically and also discussed with the Co-Si system as an example.
Resumo:
The solidification pathways of Nb rich Nb-Si alloys when processed under non-equilibrium conditions require understanding. Continuing with our earlier work on alloying additions in single eutectic composition 1,2], we report a detailed characterization of the microstructures of Nb-Si binary alloys with wide composition range (10-25 at% Si). The alloys are processed using chilled copper mould suction casting. This has allowed us to correlate the evolution of microstructure and phases with different possible solidification pathways. Finally these are correlated with mechanical properties through studies on deformation using mechanical testing under indentation and compressive loads. It is shown that microstructure modification can significantly influence the plasticity of these alloys.
Resumo:
This paper provides a root-n consistent, asymptotically normal weighted least squares estimator of the coefficients in a truncated regression model. The distribution of the errors is unknown and permits general forms of unknown heteroskedasticity. Also provided is an instrumental variables based two-stage least squares estimator for this model, which can be used when some regressors are endogenous, mismeasured, or otherwise correlated with the errors. A simulation study indicates that the new estimators perform well in finite samples. Our limiting distribution theory includes a new asymptotic trimming result addressing the boundary bias in first-stage density estimation without knowledge of the support boundary. © 2007 Cambridge University Press.
Resumo:
Coloured effluents from textile industries are a problem in many rivers and waterways. Prediction of adsorption capacities of dyes by adsorbents is important in design considerations. The sorption of three basic dyes, namely Basic Blue 3, Basic Yellow 21 and Basic Red 22, onto peat is reported. Equilibrium sorption isotherms have been measured for the three single component systems. Equilibrium was achieved after twenty-one days. The experimental isotherm data were analysed using Langmuir, Freundlich, Redlich-Peterson, Temkin and Toth isotherm equations. A detailed error analysis has been undertaken to investigate the effect of using different error criteria for the determination of the single component isotherm parameters and hence obtain the best isotherm and isotherm parameters which describe the adsorption process. The linear transform model provided the highest R2 regression coefficient with the Redlich-Peterson model. The Redlich-Peterson model also yielded the best fit to experimental data for all three dyes using the non-linear error functions. An extended Langmuir model has been used to predict the isotherm data for the binary systems using the single component data. The correlation between theoretical and experimental data had only limited success due to competitive and interactive effects between the dyes and the dye-surface interactions.
Resumo:
The viscosity ? for eighteen binary mixtures cyclopentane + cyclohexane and + cyclooctane; cyclohexane + cycloheptane, + cyclooctane, + methylcyclohexane, + n-hexane, + n-heptane, + n-octane, + i-octane, + benzene, + toluene, + ethylbenzene, + p-xylene, and + propylbenzene; methylcyclohexane + n-hexane, + i-octane, and + benzene; and cyclooctane + benzene have been reported at 303.15 K over the entire range of composition. The viscosity deviations ?? and excess Gibbs energy of activation ?G*E of viscous flow based on Eyring's theory have been calculated. The effects of molecular sizes and shapes of the component molecules and of interaction energy in the mixture have been discussed. The viscosity data have been correlated with the equations of Grunberg and Nissan, Hind, McLaughlin and Ubbelohde, Tamura and Kurata, Katti and Chaudhri, McAllister, Heric and Brewer, and of Auslaender.
Resumo:
We present a study on the phase equilibrium behaviour of binary mixtures containing two 1-alkyl-3-methylimidazolium bis{(trifluoromethyl)sulfonyl}imide-based ionic liquids, [Cnmim] [NTf2] (n=2 and 4), mixed with diethylamine or triethylamine as a function of temperature and composition using different experimental techniques. Based on this work, two systems showing an LCST and one system with a possible hourglass shape are measured. Their phase behaviours are then correlated and predicted by using Flory–Huggins equations and the UNIQUAC method implemented in Aspen. The potential of the COSMO-RS methodology to predict the phase equilibria was also tested for the binary systems studied. However, this methodology is unable to predict the trends obtained experimentally, limiting its use for systems involving amines in ionic liquids. The liquid-state structure of the binary mixture ([C2mim] [NTf2]+diethylamine) is also investigated by molecular dynamics simulation and neutron diffraction. Finally, the absorption of gaseous ethane by the ([C2mim][NTf2]+diethylamine) binary mixture is determined and compared with that observed in the pure solvents.
Resumo:
The structure and dynamics of the common polysaccharide dextran have been investigated in mixed solvents at two different temperatures using small-angle X-ray scattering (SAXS) and viscosity measurements. More specifically, binary mixtures of a good solvent (water, formamide, dimethylsulfoxide, ethanolamine) and the bad solvent ethanol as the minority component have been considered. The experimentally observed effects on the polymer conformation (intrinsic viscosity, coil radius, and radius of gyration) of the bad solvent addition are discussed in terms of hydrogen bonding density and are correlated with the Hansen solubility parameters and the surface tension of the solvent mixtures. Hydrogen bonding appears to be an important contributor to the solubility of dextran but is not sufficient to capture the dextran coil contraction in the mixtures of good+bad solvents.
Resumo:
In the present paper, a study on the influence of the alkyl chain length in N-alkyl-triethylammonium bis(trifluoromethylsulfonyl)imide ionic liquids, [NR,222][Tf2N] (R = 6, 8 or 12), on the excess molar enthalpy at 303.15 K and excess molar volume within the temperature interval (283.15–338.15 K) of ionic liquid + methanol mixtures is carried out. Small excess molar volumes with highly asymmetric curves (i.e. S-shape) as a function of mole fraction composition were obtained, with negative values showing in the methanol-rich regions. The excess molar volumes increase with the increase of the alkyl-chain length of the ammonium cation of the ionic liquid and decrease with temperature. The excess enthalpies of selected binary mixtures are positive over the whole composition range and increase slightly with the length of the alkyl side-chain of the cation on the ionic liquid. Both excess properties were subsequently correlated using a Redlich–Kister-type equation, as well as by using the ERAS model. From this semipredictive model the studied excess quantities could be obtained from its chemical and physical contribution. Finally, the COSMOThermX software has been used to evaluate its prediction capability on the excess enthalpy for investigated mixtures at 303.15 K and 0.1 MPa. From this work, it appears that COSMOThermX method predicts this property with good accuracy of approx. 10%, providing at the same time the correct order of magnitude of the partial molar excess enthalpies at infinite dilution for the studied ILs,
<img height="21" border="0" style="vertical-align:bottom" width="33" alt="View the MathML source" title="View the MathML source" src="http://origin-ars.els-cdn.com/content/image/1-s2.0-S0378381213006869-si13.gif">H¯1E,∞, and methanol, <img height="21" border="0" style="vertical-align:bottom" width="33" alt="View the MathML source" title="View the MathML source" src="http://origin-ars.els-cdn.com/content/image/1-s2.0-S0378381213006869-si14.gif">H¯2E,∞.
Resumo:
Virtual metrology (VM) aims to predict metrology values using sensor data from production equipment and physical metrology values of preceding samples. VM is a promising technology for the semiconductor manufacturing industry as it can reduce the frequency of in-line metrology operations and provide supportive information for other operations such as fault detection, predictive maintenance and run-to-run control. The prediction models for VM can be from a large variety of linear and nonlinear regression methods and the selection of a proper regression method for a specific VM problem is not straightforward, especially when the candidate predictor set is of high dimension, correlated and noisy. Using process data from a benchmark semiconductor manufacturing process, this paper evaluates the performance of four typical regression methods for VM: multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), neural networks (NN) and Gaussian process regression (GPR). It is observed that GPR performs the best among the four methods and that, remarkably, the performance of linear regression approaches that of GPR as the subset of selected input variables is increased. The observed competitiveness of high-dimensional linear regression models, which does not hold true in general, is explained in the context of extreme learning machines and functional link neural networks.
Resumo:
Histone deacetylases (HDACs) are enzymes involved in transcriptional repression. We aimed to examine the significance of HDAC1 and HDAC2 gene expression in the prediction of recurrence and survival in 156 patients with hepatocellular carcinoma (HCC) among a South East Asian population who underwent curative surgical resection in Singapore. We found that HDAC1 and HDAC2 were upregulated in the majority of HCC tissues. The presence of HDAC1 in tumor tissues was correlated with poor tumor differentiation. Notably, HDAC1 expression in adjacent non-tumor hepatic tissues was correlated with the presence of satellite nodules and multiple lesions, suggesting that HDAC1 upregulation within the field of HCC may contribute to tumor spread. Using competing risk regression analysis, we found that increased cancer-specific mortality was significantly associated with HDAC2 expression. Mortality was also increased with high HDAC1 expression. In the liver cancer cell lines, HEP3B, HEPG2, PLC5, and a colorectal cancer cell line, HCT116, the combined knockdown of HDAC1 and HDAC2 increased cell death and reduced cell proliferation as well as colony formation. In contrast, knockdown of either HDAC1 or HDAC2 alone had minimal effects on cell death and proliferation. Taken together, our study suggests that both HDAC1 and HDAC2 exert pro-survival effects in HCC cells, and the combination of isoform-specific HDAC inhibitors against both HDACs may be effective in targeting HCC to reduce mortality.