877 resultados para Canonical Correlation Analysis
Resumo:
We consider ranked-based regression models for clustered data analysis. A weighted Wilcoxon rank method is proposed to take account of within-cluster correlations and varying cluster sizes. The asymptotic normality of the resulting estimators is established. A method to estimate covariance of the estimators is also given, which can bypass estimation of the density function. Simulation studies are carried out to compare different estimators for a number of scenarios on the correlation structure, presence/absence of outliers and different correlation values. The proposed methods appear to perform well, in particular, the one incorporating the correlation in the weighting achieves the highest efficiency and robustness against misspecification of correlation structure and outliers. A real example is provided for illustration.
Resumo:
We consider the analysis of longitudinal data when the covariance function is modeled by additional parameters to the mean parameters. In general, inconsistent estimators of the covariance (variance/correlation) parameters will be produced when the "working" correlation matrix is misspecified, which may result in great loss of efficiency of the mean parameter estimators (albeit the consistency is preserved). We consider using different "Working" correlation models for the variance and the mean parameters. In particular, we find that an independence working model should be used for estimating the variance parameters to ensure their consistency in case the correlation structure is misspecified. The designated "working" correlation matrices should be used for estimating the mean and the correlation parameters to attain high efficiency for estimating the mean parameters. Simulation studies indicate that the proposed algorithm performs very well. We also applied different estimation procedures to a data set from a clinical trial for illustration.
Resumo:
The approach of generalized estimating equations (GEE) is based on the framework of generalized linear models but allows for specification of a working matrix for modeling within-subject correlations. The variance is often assumed to be a known function of the mean. This article investigates the impacts of misspecifying the variance function on estimators of the mean parameters for quantitative responses. Our numerical studies indicate that (1) correct specification of the variance function can improve the estimation efficiency even if the correlation structure is misspecified; (2) misspecification of the variance function impacts much more on estimators for within-cluster covariates than for cluster-level covariates; and (3) if the variance function is misspecified, correct choice of the correlation structure may not necessarily improve estimation efficiency. We illustrate impacts of different variance functions using a real data set from cow growth.
Resumo:
Efficiency of analysis using generalized estimation equations is enhanced when intracluster correlation structure is accurately modeled. We compare two existing criteria (a quasi-likelihood information criterion, and the Rotnitzky-Jewell criterion) to identify the true correlation structure via simulations with Gaussian or binomial response, covariates varying at cluster or observation level, and exchangeable or AR(l) intracluster correlation structure. Rotnitzky and Jewell's approach performs better when the true intracluster correlation structure is exchangeable, while the quasi-likelihood criteria performs better for an AR(l) structure.
Resumo:
The method of generalized estimating equation-, (GEEs) has been criticized recently for a failure to protect against misspecification of working correlation models, which in some cases leads to loss of efficiency or infeasibility of solutions. However, the feasibility and efficiency of GEE methods can be enhanced considerably by using flexible families of working correlation models. We propose two ways of constructing unbiased estimating equations from general correlation models for irregularly timed repeated measures to supplement and enhance GEE. The supplementary estimating equations are obtained by differentiation of the Cholesky decomposition of the working correlation, or as score equations for decoupled Gaussian pseudolikelihood. The estimating equations are solved with computational effort equivalent to that required for a first-order GEE. Full details and analytic expressions are developed for a generalized Markovian model that was evaluated through simulation. Large-sample ".sandwich" standard errors for working correlation parameter estimates are derived and shown to have good performance. The proposed estimating functions are further illustrated in an analysis of repeated measures of pulmonary function in children.
Resumo:
The method of generalised estimating equations for regression modelling of clustered outcomes allows for specification of a working matrix that is intended to approximate the true correlation matrix of the observations. We investigate the asymptotic relative efficiency of the generalised estimating equation for the mean parameters when the correlation parameters are estimated by various methods. The asymptotic relative efficiency depends on three-features of the analysis, namely (i) the discrepancy between the working correlation structure and the unobservable true correlation structure, (ii) the method by which the correlation parameters are estimated and (iii) the 'design', by which we refer to both the structures of the predictor matrices within clusters and distribution of cluster sizes. Analytical and numerical studies of realistic data-analysis scenarios show that choice of working covariance model has a substantial impact on regression estimator efficiency. Protection against avoidable loss of efficiency associated with covariance misspecification is obtained when a 'Gaussian estimation' pseudolikelihood procedure is used with an AR(1) structure.
Resumo:
The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988-96, while accounting for spatial and temporal correlations in the catch-effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had within-cluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations.
Resumo:
Ultraviolet irradiation of crystalline molecular inclusion complexes of deoxycholic acid with di-tert-butyl thioketone results in no reaction. The structure of the above complex has been determined via X-ray diffraction. The absence of expected photoreactions. namely, photoreduction and photooxidation, is rationalized on the basis of the X-ray structure analysis of the complex.
Resumo:
This thesis is an empirical study of how two words in Icelandic, "nú" and "núna", are used in contemporary Icelandic conversation. My aims in this study are, first, to explain the differences between the temporal functions of "nú" and "núna", and, second, to describe the non-temporal functions of "nú". In the analysis, a focus is placed on comparing the sequential placement of the two words, on their syntactical distribution, and on their prosodic realization. The empirical data comprise 14 hours and 11 minutes of naturally occurring conversation recorded between 1996 and 2003. The selected conversations represent a wide range of interactional contexts including informal dinner parties, institutional and non-institutional telephone conversations, radio programs for teenagers, phone-in programs, and, finally, a political debate on television. The theoretical and methodological framework is interactional linguistics, which can be described as linguistically oriented conversation analysis (CA). A comparison of "nú" and "núna" shows that the two words have different syntactic distributions. "Nú" has a clear tendency to occur in the front field, before the finite verb, while "núna" typically occurs in the end field, after the object. It is argued that this syntactic difference reflects a functional difference between "nú" and "núna". A sequential analysis of "núna" shows that the word refers to an unspecified period of time which includes the utterance time as well as some time in the past and in the future. This temporal relation is referred to as reference time. "Nú", by contrast, is mainly used in three different environments: a) in temporal comparisons, 2) in transitions, and 3) when the speaker is taking an affective stance. The non-temporal functions of "nú" are divided into three categories: a) "nú" as a tone particle, 2) "nú" as an utterance particle, and 3) "nú" as a dialogue particle. "Nú" as a tone particle is syntactically integrated and can occur in two syntactic positions: pre-verbally and post-verbally. I argue that these instances are employed in utterances in which a speaker is foregrounding information or marking it as particularly important. The study shows that, although these instances are typically prosodically non-prominent and unstressed, they are in some cases delivered with stress and with a higher pitch than the surrounding talk. "Nú" as an utterance particle occurs turn-initially and is syntactically non-integrated. By using "nú", speakers show continuity between turns and link new turns to prior ones. These instances initiate either continuations by the same speaker or new turns after speaker shifts. "Nú" as a dialogue particle occurs as a turn of its own. The study shows that these instances register informings in prior turns as unexpected or as a departure from the normal state of affairs. "Nú" as a dialogue particle is often delivered with a prolonged vowel and a recognizable intonation contour. A comparative sequential and prosodic analysis shows that in these cases there is a correlation between the function of "nú" and the intonation contour by which it is delivered. Finally, I argue that despite the many functions of "nú", all the instances can be said to have a common denominator, which is to display attention towards the present moment and the utterances which are produced prior or after the production of "nú". Instead of anchoring the utterances in external time or reference time, these instances position the utterance in discourse internal time, or discourse time.
Resumo:
The relationship between EUF extractable nutrients and conventional soil test extractable nutrients in the acid soils of Southern India on one hand and that between EUF values and tea productivity on the other are described. Close correlation exists between EUF-NO3–N at 20°C and CuSO4–Ag2SO4-extractable NO3–N (r=0.98***), EUF-Norg and Morgan's reagent extractable NH4–N (r=0.97***), total EUF-N and CuSO4–Ag2SO4-extractable NO3–N plus Morgan's reagent NH4–N (r=0.96***), EUF-P at 20°C and modified Bray II-P (r=0.93***) and EUF-P at 20°C plus that at 80°C and modified Bray II-P (r=0.91***). The EUF-K at 20°C shows close correlation with NH4OAc–K (r=0.80***), Ag-thiourea-K (r=0.86***) and Morgan's reagent-K (r=0.84***) whereas the EUF-K at 80°C shows close correlation with the difference in K contents of NH4OAc–K and Ag-thiourea-K (r=0.92***) or of NH4OAc–K and Morgan's reagent-K (r=0.93***) and fixed NH4–N (r=0.89***). EUF-Ca, EUF-Mg and EUF-Mn do not show any relationship with conventional soil test values. Tea productivity is strongly associated with EUF-N and EUF-P extracted at 20°C.
Resumo:
OBJECTIVE To quantify genetic overlap between migraine and ischemic stroke (IS) with respect to common genetic variation. METHODS We applied 4 different approaches to large-scale meta-analyses of genome-wide data on migraine (23,285 cases and 95,425 controls) and IS (12,389 cases and 62,004 controls). First, we queried known genome-wide significant loci for both disorders, looking for potential overlap of signals. We then analyzed the overall shared genetic load using polygenic scores and estimated the genetic correlation between disease subtypes using data derived from these models. We further interrogated genomic regions of shared risk using analysis of covariance patterns between the 2 phenotypes using cross-phenotype spatial mapping. RESULTS We found substantial genetic overlap between migraine and IS using all 4 approaches. Migraine without aura (MO) showed much stronger overlap with IS and its subtypes than migraine with aura (MA). The strongest overlap existed between MO and large artery stroke (LAS; p = 6.4 x 10(-28) for the LAS polygenic score in MO) and between MO and cardioembolic stroke (CE; p = 2.7 x 10(-20) for the CE score in MO). CONCLUSIONS Our findings indicate shared genetic susceptibility to migraine and IS, with a particularly strong overlap between MO and both LAS and CE pointing towards shared mechanisms. Our observations on MA are consistent with a limited role of common genetic variants in this subtype.
Resumo:
Telomere length (TL) has been associated with aging and mortality, but individual differences are also influenced by genetic factors, with previous studies reporting heritability estimates ranging from 34 to 82%. Here we investigate the heritability, mode of inheritance and the influence of parental age at birth on TL in six large, independent cohort studies with a total of 19 713 participants. The meta-analysis estimate of TL heritability was 0.70 (95% CI 0.64–0.76) and is based on a pattern of results that is highly similar for twins and other family members. We observed a stronger mother–offspring (r=0.42; P-value=3.60 × 10−61) than father–offspring correlation (r=0.33; P-value=7.01 × 10−5), and a significant positive association with paternal age at offspring birth (β=0.005; P-value=7.01 × 10−5). Interestingly, a significant and quite substantial correlation in TL between spouses (r=0.25; P-value=2.82 × 10−30) was seen, which appeared stronger in older spouse pairs (mean age ≥55 years; r=0.31; P-value=4.27 × 10−23) than in younger pairs (mean age<55 years; r=0.20; P-value=3.24 × 10−10). In summary, we find a high and very consistent heritability estimate for TL, evidence for a maternal inheritance component and a positive association with paternal age.
Resumo:
Context: Identifying susceptibility genes for schizophrenia may be complicated by phenotypic heterogeneity, with some evidence suggesting that phenotypic heterogeneity reflects genetic heterogeneity. Objective: To evaluate the heritability and conduct genetic linkage analyses of empirically derived, clinically homogeneous schizophrenia subtypes. Design: Latent class and linkage analysis. Setting: Taiwanese field research centers. Participants: The latent class analysis included 1236 Han Chinese individuals with DSM-IV schizophrenia. These individuals were members of a large affected-sibling-pair sample of schizophrenia (606 ascertained families), original linkage analyses of which detected a maximum logarithm of odds (LOD) of 1.8 (z = 2.88) on chromosome 10q22.3. Main Outcome Measures: Multipoint exponential LOD scores by latent class assignment and parametric heterogeneity LOD scores. Results: Latent class analyses identified 4 classes, with 2 demonstrating familial aggregation. The first (LC2) described a group with severe negative symptoms, disorganization, and pronounced functional impairment, resembling “deficit schizophrenia.” The second (LC3) described a group with minimal functional impairment, mild or absent negative symptoms, and low disorganization. Using the negative/deficit subtype, we detected genome-wide significant linkage to 1q23-25 (LOD = 3.78, empiric genome-wide P = .01). This region was not detected using the DSM-IV schizophrenia diagnosis, but has been strongly implicated in schizophrenia pathogenesis by previous linkage and association studies.Variants in the 1q region may specifically increase risk for a negative/deficit schizophrenia subtype. Alternatively, these results may reflect increased familiality/heritability of the negative class, the presence of multiple 1q schizophrenia risk genes, or a pleiotropic 1q risk locus or loci, with stronger genotype-phenotype correlation with negative/deficit symptoms. Using the second familial latent class, we identified nominally significant linkage to the original 10q peak region. Conclusion: Genetic analyses of heritable, homogeneous phenotypes may improve the power of linkage and association studies of schizophrenia and thus have relevance to the design and analysis of genome-wide association studies.
Resumo:
BACKGROUND: In order to rapidly and efficiently screen potential biofuel feedstock candidates for quintessential traits, robust high-throughput analytical techniques must be developed and honed. The traditional methods of measuring lignin syringyl/guaiacyl (S/G) ratio can be laborious, involve hazardous reagents, and/or be destructive. Vibrational spectroscopy can furnish high-throughput instrumentation without the limitations of the traditional techniques. Spectral data from mid-infrared, near-infrared, and Raman spectroscopies was combined with S/G ratios, obtained using pyrolysis molecular beam mass spectrometry, from 245 different eucalypt and Acacia trees across 17 species. Iterations of spectral processing allowed the assembly of robust predictive models using partial least squares (PLS). RESULTS: The PLS models were rigorously evaluated using three different randomly generated calibration and validation sets for each spectral processing approach. Root mean standard errors of prediction for validation sets were lowest for models comprised of Raman (0.13 to 0.16) and mid-infrared (0.13 to 0.15) spectral data, while near-infrared spectroscopy led to more erroneous predictions (0.18 to 0.21). Correlation coefficients (r) for the validation sets followed a similar pattern: Raman (0.89 to 0.91), mid-infrared (0.87 to 0.91), and near-infrared (0.79 to 0.82). These statistics signify that Raman and mid-infrared spectroscopy led to the most accurate predictions of S/G ratio in a diverse consortium of feedstocks. CONCLUSION: Eucalypts present an attractive option for biofuel and biochemical production. Given the assortment of over 900 different species of Eucalyptus and Corymbia, in addition to various species of Acacia, it is necessary to isolate those possessing ideal biofuel traits. This research has demonstrated the validity of vibrational spectroscopy to efficiently partition different potential biofuel feedstocks according to lignin S/G ratio, significantly reducing experiment and analysis time and expense while providing non-destructive, accurate, global, predictive models encompassing a diverse array of feedstocks.
Resumo:
Variety selection in perennial pasture crops involves identifying best varieties from data collected from multiple harvest times in field trials. For accurate selection, the statistical methods for analysing such data need to account for the spatial and temporal correlation typically present. This paper provides an approach for analysing multi-harvest data from variety selection trials in which there may be a large number of harvest times. Methods are presented for modelling the variety by harvest effects while accounting for the spatial and temporal correlation between observations. These methods provide an improvement in model fit compared to separate analyses for each harvest, and provide insight into variety by harvest interactions. The approach is illustrated using two traits from a lucerne variety selection trial. The proposed method provides variety predictions allowing for the natural sources of variation and correlation in multi-harvest data.