931 resultados para Generalized Shift Operator
Resumo:
Marginal generalized linear models can be used for clustered and longitudinal data by fitting a model as if the data were independent and using an empirical estimator of parameter standard errors. We extend this approach to data where the number of observations correlated with a given one grows with sample size and show that parameter estimates are consistent and asymptotically Normal with a slower convergence rate than for independent data, and that an information sandwich variance estimator is consistent. We present two problems that motivated this work, the modelling of patterns of HIV genetic variation and the behavior of clustered data estimators when clusters are large.
Resumo:
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Resumo:
Generalized linear mixed models with semiparametric random effects are useful in a wide variety of Bayesian applications. When the random effects arise from a mixture of Dirichlet process (MDP) model, normal base measures and Gibbs sampling procedures based on the Pólya urn scheme are often used to simulate posterior draws. These algorithms are applicable in the conjugate case when (for a normal base measure) the likelihood is normal. In the non-conjugate case, the algorithms proposed by MacEachern and Müller (1998) and Neal (2000) are often applied to generate posterior samples. Some common problems associated with simulation algorithms for non-conjugate MDP models include convergence and mixing difficulties. This paper proposes an algorithm based on the Pólya urn scheme that extends the Gibbs sampling algorithms to non-conjugate models with normal base measures and exponential family likelihoods. The algorithm proceeds by making Laplace approximations to the likelihood function, thereby reducing the procedure to that of conjugate normal MDP models. To ensure the validity of the stationary distribution in the non-conjugate case, the proposals are accepted or rejected by a Metropolis-Hastings step. In the special case where the data are normally distributed, the algorithm is identical to the Gibbs sampler.
Resumo:
Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases. This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM to subsetted versions of the data. Additional gains in efficiency are achieved for Poisson models, commonly used in disease mapping problems, because of their special collapsibility property which allows data reduction through summaries. Convergence of the proposed iterative procedure is guaranteed for canonical link functions. The strategy is applied to investigate the relationship between ischemic heart disease, socioeconomic status and age/gender category in New South Wales, Australia, based on outcome data consisting of approximately 33 million records. A simulation study demonstrates the algorithm's reliability in analyzing a data set with 12 million records for a (non-collapsible) logistic regression model.
Resumo:
Nearly 22 million Americans operate as shift workers, and shift work has been linked to the development of cardiovascular disease (CVD). This study is aimed at identifying pivotal risk factors of CVD by assessing 24 hour ambulatory blood pressure, state anxiety levels and sleep patterns in 12 hour fixed shift workers. We hypothesized that night shift work would negatively affect blood pressure regulation, anxiety levels and sleep patterns. A total of 28 subjects (ages 22-60) were divided into two groups: 12 hour fixed night shift workers (n=15) and 12 hour fixed day shift workers (n=13). 24 hour ambulatory blood pressure measurements (Space Labs 90207) were taken twice: once during a regular work day and once on a non-work day. State anxiety levels were assessed on both test days using the Speilberger’s State Trait Anxiety Inventory. Total sleep time (TST) was determined using self recorded sleep diary. Night shift workers demonstrated increases in 24 hour systolic (122 ± 2 to 126 ± 2 mmHg, P=0.012); diastolic (75 ± 1 to 79 ± 2 mmHg, P=0.001); and mean arterial pressures (90 ± 2 to 94 ± 2mmHg, P<0.001) during work days compared to off days. In contrast, 24 hour blood pressures were similar during work and off days in day shift workers. Night shift workers reported less TST on work days versus off days (345 ± 16 vs. 552 ± 30 min; P<0.001), whereas day shift workers reported similar TST during work and off days (475 ± 16 minutes to 437 ± 20 minutes; P=0.231). State anxiety scores did not differ between the groups or testing days (time*group interaction P=0.248), suggesting increased 24 hour blood pressure during night shift work is related to decreased TST, not short term anxiety. Our findings suggest that fixed night shift work causes disruption of the normal sleep-wake cycle negatively affecting acute blood pressure regulation, which may increase the long-term risk for CVD.
Resumo:
Fuzzy community detection is to identify fuzzy communities in a network, which are groups of vertices in the network such that the membership of a vertex in one community is in [0,1] and that the sum of memberships of vertices in all communities equals to 1. Fuzzy communities are pervasive in social networks, but only a few works have been done for fuzzy community detection. Recently, a one-step forward extension of Newman’s Modularity, the most popular quality function for disjoint community detection, results into the Generalized Modularity (GM) that demonstrates good performance in finding well-known fuzzy communities. Thus, GMis chosen as the quality function in our research. We first propose a generalized fuzzy t-norm modularity to investigate the effect of different fuzzy intersection operators on fuzzy community detection, since the introduction of a fuzzy intersection operation is made feasible by GM. The experimental results show that the Yager operator with a proper parameter value performs better than the product operator in revealing community structure. Then, we focus on how to find optimal fuzzy communities in a network by directly maximizing GM, which we call it Fuzzy Modularity Maximization (FMM) problem. The effort on FMM problem results into the major contribution of this thesis, an efficient and effective GM-based fuzzy community detection method that could automatically discover a fuzzy partition of a network when it is appropriate, which is much better than fuzzy partitions found by existing fuzzy community detection methods, and a crisp partition of a network when appropriate, which is competitive with partitions resulted from the best disjoint community detections up to now. We address FMM problem by iteratively solving a sub-problem called One-Step Modularity Maximization (OSMM). We present two approaches for solving this iterative procedure: a tree-based global optimizer called Find Best Leaf Node (FBLN) and a heuristic-based local optimizer. The OSMM problem is based on a simplified quadratic knapsack problem that can be solved in linear time; thus, a solution of OSMM can be found in linear time. Since the OSMM algorithm is called within FBLN recursively and the structure of the search tree is non-deterministic, we can see that the FMM/FBLN algorithm runs in a time complexity of at least O (n2). So, we also propose several highly efficient and very effective heuristic algorithms namely FMM/H algorithms. We compared our proposed FMM/H algorithms with two state-of-the-art community detection methods, modified MULTICUT Spectral Fuzzy c-Means (MSFCM) and Genetic Algorithm with a Local Search strategy (GALS), on 10 real-world data sets. The experimental results suggest that the H2 variant of FMM/H is the best performing version. The H2 algorithm is very competitive with GALS in producing maximum modularity partitions and performs much better than MSFCM. On all the 10 data sets, H2 is also 2-3 orders of magnitude faster than GALS. Furthermore, by adopting a simply modified version of the H2 algorithm as a mutation operator, we designed a genetic algorithm for fuzzy community detection, namely GAFCD, where elite selection and early termination are applied. The crossover operator is designed to make GAFCD converge fast and to enhance GAFCD’s ability of jumping out of local minimums. Experimental results on all the data sets show that GAFCD uncovers better community structure than GALS.
Resumo:
BACKGROUND: Aim of this study was to analyse the relationship between popliteal artery aneurysm (PAA) and generalized arteriomegaly. PATIENTS AND METHODS: In this consecutive serie, thirty-three patients (1 woman, mean age 69.7 +/- 9.6 years) undergoing PAA repair between 1996 and 2000 agreed to participate in a duplex screening program to assess the diameters of the infrarenal abdominal aorta, common and external iliac, common and superficial femoral and contralateral popliteal arteries as well as common carotid and brachial arteries. RESULTS: The prevalence of arteriomegaly and aneurysmal disease, respectively, was as follows: abdominal aorta 15/33 (45.5%) and 8/33 (24.2%), common iliac artery 34/66 (51.5%) and 23/66 (34.8%), common femoral artery 55/66 (83.3%) and 7/66 (10.6%) as well as contralateral popliteal artery 7/33 (21.2%) 15/33 (45.5%). Significantly larger carotid artery diameters were found comparing PAA patients with age- and body surface adjusted healthy controls (p < 0.001). Furthermore, patients with multiple peripheral arterial aneurysms had significantly larger diameters of the brachial (p < 0.02) and external iliac (p < 0.005). CONCLUSIONS: Our findings support the hypothesis of a diathesis for a generalized arteriomegaly with a predilection for further aneurysms of the abdominal aorta, iliac arteries, femoral and contralateral popliteal arteries in patients with PAA.