958 resultados para multivariate binary data
Resumo:
Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.
Resumo:
Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.
Resumo:
The multivariate skew-t distribution (J Multivar Anal 79:93-113, 2001; J R Stat Soc, Ser B 65:367-389, 2003; Statistics 37:359-363, 2003) includes the Student t, skew-Cauchy and Cauchy distributions as special cases and the normal and skew-normal ones as limiting cases. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis of repeated measures, pretest/post-test data, under multivariate null intercept measurement error model (J Biopharm Stat 13(4):763-771, 2003) where the random errors and the unobserved value of the covariate (latent variable) follows a Student t and skew-t distribution, respectively. The results and methods are numerically illustrated with an example in the field of dentistry.
Resumo:
Considering the Wald, score, and likelihood ratio asymptotic test statistics, we analyze a multivariate null intercept errors-in-variables regression model, where the explanatory and the response variables are subject to measurement errors, and a possible structure of dependency between the measurements taken within the same individual are incorporated, representing a longitudinal structure. This model was proposed by Aoki et al. (2003b) and analyzed under the bayesian approach. In this article, considering the classical approach, we analyze asymptotic test statistics and present a simulation study to compare the behavior of the three test statistics for different sample sizes, parameter values and nominal levels of the test. Also, closed form expressions for the score function and the Fisher information matrix are presented. We consider two real numerical illustrations, the odontological data set from Hadgu and Koch (1999), and a quality control data set.
Resumo:
The use of liposomes to encapsulate materials has received widespread attention for drug delivery, transfection, diagnostic reagent, and as immunoadjuvants. Phospholipid polymers form a new class of biomaterials with many potential applications in medicine and research. Of interest are polymeric phospholipids containing a diacetylene moiety along their acyl chain since these kinds of lipids can be polymerized by Ultra-Violet (UV) irradiation to form chains of covalently linked lipids in the bilayer. In particular the diacetylenic phosphatidylcholine 1,2-bis(10,12-tricosadiynoyl)- sn-glycero-3-phosphocholine (DC8,9PC) can form intermolecular cross-linking through the diacetylenic group to produce a conjugated polymer within the hydrocarbon region of the bilayer. As knowledge of liposome structures is certainly fundamental for system design improvement for new and better applications, this work focuses on the structural properties of polymerized DC8,9PC:1,2-dimyristoyl-sn-glycero-3-phusphocholine (DMPC) liposomes. Liposomes containing mixtures of DC8,9PC and DMPC, at different molar ratios, and exposed to different polymerization cycles, were studied through the analysis of the electron spin resonance (ESR) spectra of a spin label incorporated into the bilayer, and the calorimetric data obtained from differential scanning calorimetry (DSC) studies. Upon irradiation, if all lipids had been polymerized, no gel-fluid transition would be expected. However, even samples that went through 20 cycles of UV irradiation presented a DSC band, showing that around 80% of the DC8,9PC molecules were not polymerized. Both DSC and ESR indicated that the two different lipids scarcely mix at low temperatures, however few molecules of DMPC are present in DC8,9PC rich domains and vice versa. UV irradiation was found to affect the gel fluid transition of both DMPC and DC8,9PC rich regions, indicating the presence of polymeric units of DC8,9PC in both areas, A model explaining lipids rearrangement is proposed for this partially polymerized system.
Resumo:
Canalizing genes possess such broad regulatory power, and their action sweeps across a such a wide swath of processes that the full set of affected genes are not highly correlated under normal conditions. When not active, the controlling gene will not be predictable to any significant degree by its subject genes, either alone or in groups, since their behavior will be highly varied relative to the inactive controlling gene. When the controlling gene is active, its behavior is not well predicted by any one of its targets, but can be very well predicted by groups of genes under its control. To investigate this question, we introduce in this paper the concept of intrinsically multivariate predictive (IMP) genes, and present a mathematical study of IMP in the context of binary genes with respect to the coefficient of determination (CoD), which measures the predictive power of a set of genes with respect to a target gene. A set of predictor genes is said to be IMP for a target gene if all properly contained subsets of the predictor set are bad predictors of the target but the full predictor set predicts the target with great accuracy. We show that logic of prediction, predictive power, covariance between predictors, and the entropy of the joint probability distribution of the predictors jointly affect the appearance of IMP genes. In particular, we show that high-predictive power, small covariance among predictors, a large entropy of the joint probability distribution of predictors, and certain logics, such as XOR in the 2-predictor case, are factors that favor the appearance of IMP. The IMP concept is applied to characterize the behavior of the gene DUSP1, which exhibits control over a central, process-integrating signaling pathway, thereby providing preliminary evidence that IMP can be used as a criterion for discovery of canalizing genes.
Resumo:
Scale mixtures of the skew-normal (SMSN) distribution is a class of asymmetric thick-tailed distributions that includes the skew-normal (SN) distribution as a special case. The main advantage of these classes of distributions is that they are easy to simulate and have a nice hierarchical representation facilitating easy implementation of the expectation-maximization algorithm for the maximum-likelihood estimation. In this paper, we assume an SMSN distribution for the unobserved value of the covariates and a symmetric scale mixtures of the normal distribution for the error term of the model. This provides a robust alternative to parameter estimation in multivariate measurement error models. Specific distributions examined include univariate and multivariate versions of the SN, skew-t, skew-slash and skew-contaminated normal distributions. The results and methods are applied to a real data set.
Resumo:
We review several asymmetrical links for binary regression models and present a unified approach for two skew-probit links proposed in the literature. Moreover, under skew-probit link, conditions for the existence of the ML estimators and the posterior distribution under improper priors are established. The framework proposed here considers two sets of latent variables which are helpful to implement the Bayesian MCMC approach. A simulation study to criteria for models comparison is conducted and two applications are made. Using different Bayesian criteria we show that, for these data sets, the skew-probit links are better than alternative links proposed in the literature.
Resumo:
This paper derives the second-order biases Of maximum likelihood estimates from a multivariate normal model where the mean vector and the covariance matrix have parameters in common. We show that the second order bias can always be obtained by means of ordinary weighted least-squares regressions. We conduct simulation studies which indicate that the bias correction scheme yields nearly unbiased estimators. (C) 2009 Elsevier B.V. All rights reserved.
A robust Bayesian approach to null intercept measurement error model with application to dental data
Resumo:
Measurement error models often arise in epidemiological and clinical research. Usually, in this set up it is assumed that the latent variable has a normal distribution. However, the normality assumption may not be always correct. Skew-normal/independent distribution is a class of asymmetric thick-tailed distributions which includes the Skew-normal distribution as a special case. In this paper, we explore the use of skew-normal/independent distribution as a robust alternative to null intercept measurement error model under a Bayesian paradigm. We assume that the random errors and the unobserved value of the covariate (latent variable) follows jointly a skew-normal/independent distribution, providing an appealing robust alternative to the routine use of symmetric normal distribution in this type of model. Specific distributions examined include univariate and multivariate versions of the skew-normal distribution, the skew-t distributions, the skew-slash distributions and the skew contaminated normal distributions. The methods developed is illustrated using a real data set from a dental clinical trial. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
The second-order rate constants of thiolysis by n-heptanethiol on 4-nitro-N-n-butyl-1,8-naphthalimide (4NBN) are strongly affected by the water-methanol binary mixture composition reaching its maximum at around 50% mole fraction. In parallel solvent effects on 4NBN absorption molar extinction coefficient also shows a maximum at this composition region. From the spectroscopic study of reactant and product and the known H-bond capacity of the mixture a rationalization that involves specific solvent H-donor interaction with the nitro group is proposed to explain the kinetic data. Present findings also show a convenient methodology to obtain strongly fluorescent imides, valuable for peptide and analogs labeling as well as for thio-naphthalimide derivatives preparations. Copyright (C) 2008 John Wiley & Sons, Ltd.
Resumo:
This paper is concerned with the cost efficiency in achieving the Swedish national air quality objectives under uncertainty. To realize an ecologically sustainable society, the parliament has approved a set of interim and long-term pollution reduction targets. However, there are considerable quantification uncertainties on the effectiveness of the proposed pollution reduction measures. In this paper, we develop a multivariate stochastic control framework to deal with the cost efficiency problem with multiple pollutants. Based on the cost and technological data collected by several national authorities, we explore the implications of alternative probabilistic constraints. It is found that a composite probabilistic constraint induces considerably lower abatement cost than separable probabilistic restrictions. The trend is reinforced by the presence of positive correlations between reductions in the multiple pollutants.
Resumo:
This paper presents a two-step pseudo likelihood estimation technique for generalized linear mixed models with the random effects being correlated between groups. The core idea is to deal with the intractable integrals in the likelihood function by multivariate Taylor's approximation. The accuracy of the estimation technique is assessed in a Monte-Carlo study. An application of it with a binary response variable is presented using a real data set on credit defaults from two Swedish banks. Thanks to the use of two-step estimation technique, the proposed algorithm outperforms conventional pseudo likelihood algorithms in terms of computational time.
Resumo:
Researchers analyzing spatiotemporal or panel data, which varies both in location and over time, often find that their data has holes or gaps. This thesis explores alternative methods for filling those gaps and also suggests a set of techniques for evaluating those gap-filling methods to determine which works best.
Resumo:
The aim of this paper is to test whether or not there was evidence of contagion across the various financial crises that assailed some countries in the 1990s. Data on sovereign debt bonds for Brazil, Mexico, Russia and Argentina were used to implement the test. The contagion hypothesis is tested using multivariate volatility models. If there is any evidence of structural break in volatility that can be linked to financial crises, the contagion hypothesis will be confirmed. Results suggest that there is evidence in favor of the contagion hypothesis.