958 resultados para multivariate binary data
Resumo:
We present results of a search for continuously emitted gravitational radiation, directed at the brightest low-mass x-ray binary, Scorpius X-1. Our semicoherent analysis covers 10 days of LIGO S5 data ranging from 50-550 Hz, and performs an incoherent sum of coherent F-statistic power distributed amongst frequency-modulated orbital sidebands. All candidates not removed at the veto stage were found to be consistent with noise at a 1% false alarm rate. We present Bayesian 95% confidence upper limits on gravitational-wave strain amplitude using two different prior distributions: a standard one, with no a priori assumptions about the orientation of Scorpius X-1; and an angle-restricted one, using a prior derived from electromagnetic observations. Median strain upper limits of 1.3 x 10(-24) and 8 x 10(-25) are reported at 150 Hz for the standard and angle-restricted searches respectively. This proof-of-principle analysis was limited to a short observation time by unknown effects of accretion on the intrinsic spin frequency of the neutron star, but improves upon previous upper limits by factors of similar to 1.4 for the standard, and 2.3 for the angle-restricted search at the sensitive region of the detector.
Resumo:
In this paper we describe how morphological castes can be distinguished using multivariate statistical methods combined with jackknife estimators of the allometric coefficients. Data from the polymorphic ant, Camponotus rufipes, produced two distinct patterns of allometric variation, and thus two morphological castes. Morphometric analysis distinguished different allometric patterns within the two castes, with overall variability being greater in the major workers. Caste-specific scaling variabilities were associated with the relative importance of first principal component. The static multivariate allometric coefficients for each of 10 measured characters were different between castes, but their relative magnitudes within castes were similar. Multivariate statistical analysis of worker polymorphism in ants is a more complete descriptor of shape variation than, and provides statistical and conceptual advantages over, the standard bivariate techniques commonly used.
Resumo:
Background Several researchers seek methods for the selection of homogeneous groups of animals in experimental studies, a fact justified because homogeneity is an indispensable prerequisite for casualization of treatments. The lack of robust methods that comply with statistical and biological principles is the reason why researchers use empirical or subjective methods, influencing their results. Objective To develop a multivariate statistical model for the selection of a homogeneous group of animals for experimental research and to elaborate a computational package to use it. Methods The set of echocardiographic data of 115 male Wistar rats with supravalvular aortic stenosis (AoS) was used as an example of model development. Initially, the data were standardized, and became dimensionless. Then, the variance matrix of the set was submitted to principal components analysis (PCA), aiming at reducing the parametric space and at retaining the relevant variability. That technique established a new Cartesian system into which the animals were allocated, and finally the confidence region (ellipsoid) was built for the profile of the animals’ homogeneous responses. The animals located inside the ellipsoid were considered as belonging to the homogeneous batch; those outside the ellipsoid were considered spurious. Results The PCA established eight descriptive axes that represented the accumulated variance of the data set in 88.71%. The allocation of the animals in the new system and the construction of the confidence region revealed six spurious animals as compared to the homogeneous batch of 109 animals. Conclusion The biometric criterion presented proved to be effective, because it considers the animal as a whole, analyzing jointly all parameters measured, in addition to having a small discard rate.
Resumo:
Concentrations of 39 organic compounds were determined in three fractions (head, heart and tail) obtained from the pot still distillation of fermented sugarcane juice. The results were evaluated using analysis of variance (ANOVA), Tukey's test, principal component analysis (PCA), hierarchical cluster analysis (HCA) and linear discriminant analysis (LDA). According to PCA and HCA, the experimental data lead to the formation of three clusters. The head fractions give rise to a more defined group. The heart and tail fractions showed some overlap consistent with its acid composition. The predictive ability of calibration and validation of the model generated by LDA for the three fractions classification were 90.5 and 100%, respectively. This model recognized as the heart twelve of the thirteen commercial cachacas (92.3%) with good sensory characteristics, thus showing potential for guiding the process of cuts.
Resumo:
IDENTIFICATION OF ETHANOLIC WOOD EXTRACTS USING ELECTRONIC ABSORPTION SPECTRUM AND MULTIVARIATE ANALYSIS. The application of multivariate analysis to spectrophotometric (UV) data was explored for distinguishing extracts of cachaca woods commonly used in the manufacture of casks for aging cachacas (oak, cabretiva-parda, jatoba, amendoim and canela-sassafras). Absorbances close to 280 nm were more strongly correlated with oak and jatoba woods, whereas absorbances near 230 nm were more correlated with canela-sassafras and cabretiva-parda. A comparison between the spectrophotometric model and the model based on chromatographic (HPLC-DAD) data was carried out. The spectrophotometric model better explained the variance data (PC1 + PC2 = 91%) exhibiting potential as a routine method for checking aged spirits.
Resumo:
Aims. Our goal is to study the circumstellar environment associated with each component of the wide intermediate-mass pre-main sequence binary system PDS 144 using broadband polarimetry. Methods. We present near-infrared (NIR) linear polarimetric observations of PDS 144 gathered with the IAGPOL imaging polarimeter along with the CamIV infrared camera at the Observatorio do Pico dos Dias (OPD). In addition, we re-analyzed OPD archive optical polarization to separate the binary and estimate the interstellar polarization using foreground stars. Results. After discounting the interstellar component, we found that both stars of the binary system are intrinsically polarized. The polarization vectors at optical and NIR bands of both components are aligned with the local magnetic field and the jet axis. These findings indicate an interplay between the interstellar magnetic field and the formation of the binary system. We also found that the PDS 144N is less polarized than its southern companion in the optical. However, in the NIR PDS 144N is more polarized. Our polarization data can only be explained by high inclinations (i greater than or similar to 80 degrees) for the disks of both members. In particular, comparisons of our NIR data with young stellar objects disk models suggest predominantly small grains in the circumstellar environment of PDS 144N. In spite of the different grain types in each component, the infrared spectral indexes indicate a coeval system. We also found evidence of coplanarity between the disks.
Resumo:
In this work, 50 ceramic fragments from the Lago Grande and 30 from the Osvaldo archaeological site were compared to assess elemental similarities. The aim is to perform a preliminary comparison between the sites, which are located in the central Amazon, Brazil. The analytical technique employed to obtain the ceramics elemental composition was instrumental neutron activation analysis (INAA). The data set obtained was explored by the multivariate statistical techniques of cluster, principal component and discriminant analysis. The analyzed elements were: Na, Lu, U, Yb, La, Th, Cr, Cs, Sc, Fe, Eu, Ce and Hf. The results showed the existence of at least two compositional groups for Lago Grande and Osvaldo. Each compositional group of Osvaldo archaeological site matches with one group of Lago Grande. Correlated with the archaeological background, the results suggest commercial or cultural exchange in the region, which is an indicative of socio-cultural interactions between those sites.
Resumo:
Although a large amount of data have been published in past years on the taxonomic status of the Anastrepha fraterculus (Wiedemann) species complex, there is still a need to know how many species this complex comprises, the distribution of each one, and their distinguishing features. In this study, we assessed the morphometric variability of 32 populations from the A. fraterculus complex, located in major biogeographical areas from the Neotropics. Multivariate techniques for analysis were applied to the measurements of 21 variables referring to the mesonotum, aculeus, and wing. For the first time, our results identified the presence of seven distinct morphotypes within this species complex. According to the biogeographical areas, populations occurring in the Mesoamerican dominion (Mexico, Guatemala, and Panama) were clustered within a single natural entity labeled as the "Mexican" morphotype; whereas in the northwestern South American dominion, samples fell into three distinct groups: the "Venezuelan" morphotype with a single population from the Caribbean lowlands of Venezuela, the "Andean" morphotype from the highlands of Venezuela and Colombia, and the third group or "Peruvian" morphotype comprised the samples from the Pacific coastal lowlands of Ecuador and Peru. Three additional groups were identified from the Chacoan and Paranaense sub-regions: the morphotype "Brazilian-1" was recognized as including the Argentinean samples with most pertaining to Brazil, and widely distributed in these biogeographical areas; the morphotype "Brazilian-2" was recognized as including two samples from the state of Sao Paulo (Ilha-Bela and Sao Sebastiao); whereas the morphotype "Brazilian-3" included a single population from Botucatu (state of Sao Paulo). Based on data published by previous authors showing genetic and karyotypic differentiation, as well as reproductive isolation, we have concluded that such morphotypes indeed represent natural groups and distinct taxonomic entities.
Resumo:
Portable system of energy dispersive X-ray fluorescence was used to determine the elemental composition of 68 pottery fragments from Sambaqui do Bacanga, an archeological site in Sao Luis, Maranhao, Brazil. This site was occupied from 6600 BP until 900 BP. By determining the element chemical composition of those fragments, it was possible to verify the existence of engobe in 43 pottery fragments. Obtained from two-dimensional graphs and hierarchical cluster analysis performed in fragments of stratigraphies from surface and 113-cm level, and 10 to 20, 132 and 144-cm level, it was possible to group these fragments in five distinct groups, according to their stratigraphies. The results of data grouping (two-dimensional graphics) are in agreement with hierarchical cluster analysis by Ward method. Copyright (C) 2011 John Wiley & Sons, Ltd.
Resumo:
In multi-label classification, examples can be associated with multiple labels simultaneously. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label classification problems. The binary relevance approach is one of these methods, where the multi-label learning task is decomposed into several independent binary classification problems, one for each label in the set of labels, and the final labels for each example are determined by aggregating the predictions from all binary classifiers. However, this approach fails to consider any dependency among the labels. Aiming to accurately predict label combinations, in this paper we propose a simple approach that enables the binary classifiers to discover existing label dependency by themselves. An experimental study using decision trees, a kernel method as well as Naive Bayes as base-learning techniques shows the potential of the proposed approach to improve the multi-label classification performance.
Resumo:
Abstract Background Prostate cancer is a leading cause of death in the male population, therefore, a comprehensive study about the genes and the molecular networks involved in the tumoral prostate process becomes necessary. In order to understand the biological process behind potential biomarkers, we have analyzed a set of 57 cDNA microarrays containing ~25,000 genes. Results Principal Component Analysis (PCA) combined with the Maximum-entropy Linear Discriminant Analysis (MLDA) were applied in order to identify genes with the most discriminative information between normal and tumoral prostatic tissues. Data analysis was carried out using three different approaches, namely: (i) differences in gene expression levels between normal and tumoral conditions from an univariate point of view; (ii) in a multivariate fashion using MLDA; and (iii) with a dependence network approach. Our results show that malignant transformation in the prostatic tissue is more related to functional connectivity changes in their dependence networks than to differential gene expression. The MYLK, KLK2, KLK3, HAN11, LTF, CSRP1 and TGM4 genes presented significant changes in their functional connectivity between normal and tumoral conditions and were also classified as the top seven most informative genes for the prostate cancer genesis process by our discriminant analysis. Moreover, among the identified genes we found classically known biomarkers and genes which are closely related to tumoral prostate, such as KLK3 and KLK2 and several other potential ones. Conclusion We have demonstrated that changes in functional connectivity may be implicit in the biological process which renders some genes more informative to discriminate between normal and tumoral conditions. Using the proposed method, namely, MLDA, in order to analyze the multivariate characteristic of genes, it was possible to capture the changes in dependence networks which are related to cell transformation.
Resumo:
In this thesis some multivariate spectroscopic methods for the analysis of solutions are proposed. Spectroscopy and multivariate data analysis form a powerful combination for obtaining both quantitative and qualitative information and it is shown how spectroscopic techniques in combination with chemometric data evaluation can be used to obtain rapid, simple and efficient analytical methods. These spectroscopic methods consisting of spectroscopic analysis, a high level of automation and chemometric data evaluation can lead to analytical methods with a high analytical capacity, and for these methods, the term high-capacity analysis (HCA) is suggested. It is further shown how chemometric evaluation of the multivariate data in chromatographic analyses decreases the need for baseline separation. The thesis is based on six papers and the chemometric tools used are experimental design, principal component analysis (PCA), soft independent modelling of class analogy (SIMCA), partial least squares regression (PLS) and parallel factor analysis (PARAFAC). The analytical techniques utilised are scanning ultraviolet-visible (UV-Vis) spectroscopy, diode array detection (DAD) used in non-column chromatographic diode array UV spectroscopy, high-performance liquid chromatography with diode array detection (HPLC-DAD) and fluorescence spectroscopy. The methods proposed are exemplified in the analysis of pharmaceutical solutions and serum proteins. In Paper I a method is proposed for the determination of the content and identity of the active compound in pharmaceutical solutions by means of UV-Vis spectroscopy, orthogonal signal correction and multivariate calibration with PLS and SIMCA classification. Paper II proposes a new method for the rapid determination of pharmaceutical solutions by the use of non-column chromatographic diode array UV spectroscopy, i.e. a conventional HPLC-DAD system without any chromatographic column connected. In Paper III an investigation is made of the ability of a control sample, of known content and identity to diagnose and correct errors in multivariate predictions something that together with use of multivariate residuals can make it possible to use the same calibration model over time. In Paper IV a method is proposed for simultaneous determination of serum proteins with fluorescence spectroscopy and multivariate calibration. Paper V proposes a method for the determination of chromatographic peak purity by means of PCA of HPLC-DAD data. In Paper VI PARAFAC is applied for the decomposition of DAD data of some partially separated peaks into the pure chromatographic, spectral and concentration profiles.
Resumo:
A faithful depiction of the tropical atmosphere requires three-dimensional sets of observations. Despite the increasing amount of observations presently available, these will hardly ever encompass the entire atmosphere and, in addition, observations have errors. Additional (background) information will always be required to complete the picture. Valuable added information comes from the physical laws governing the flow, usually mediated via a numerical weather prediction (NWP) model. These models are, however, never going to be error-free, why a reliable estimate of their errors poses a real challenge since the whole truth will never be within our grasp. The present thesis addresses the question of improving the analysis procedures for NWP in the tropics. Improvements are sought by addressing the following issues: - the efficiency of the internal model adjustment, - the potential of the reliable background-error information, as compared to observations, - the impact of a new, space-borne line-of-sight wind measurements, and - the usefulness of multivariate relationships for data assimilation in the tropics. Most NWP assimilation schemes are effectively univariate near the equator. In this thesis, a multivariate formulation of the variational data assimilation in the tropics has been developed. The proposed background-error model supports the mass-wind coupling based on convectively-coupled equatorial waves. The resulting assimilation model produces balanced analysis increments and hereby increases the efficiency of all types of observations. Idealized adjustment and multivariate analysis experiments highlight the importance of direct wind measurements in the tropics. In particular, the presented results confirm the superiority of wind observations compared to mass data, in spite of the exact multivariate relationships available from the background information. The internal model adjustment is also more efficient for wind observations than for mass data. In accordance with these findings, new satellite wind observations are expected to contribute towards the improvement of NWP and climate modeling in the tropics. Although incomplete, the new wind-field information has the potential to reduce uncertainties in the tropical dynamical fields, if used together with the existing satellite mass-field measurements. The results obtained by applying the new background-error representation to the tropical short-range forecast errors of a state-of-art NWP model suggest that achieving useful tropical multivariate relationships may be feasible within an operational NWP environment.
Resumo:
The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.
Resumo:
[EN]In this work, the measurements of the isobaric vapor−liquid equilibrium (VLE) data at 101.32 kPa and the excess molar volumes (vE), obtained at 10 K intervals of temperature in the range (288.15 to 328.15) K, for four binary systems comprised of methyl or ethyl butanoate with two alkanes (heptane and nonane) are presented.