331 resultados para Régression linéaire multiple
Resumo:
Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which dearly demonstrates the advantages of the rank regression models.
Resumo:
The extended recruitment season for short-lived species such as prawns biases the estimation of growth parameters from length-frequency data when conventional methods are used. We propose a simple method for overcoming this bias given a time series of length-frequency data. The difficulties arising from extended recruitment are eliminated by predicting the growth of the succeeding samples and the length increments of the recruits in previous samples. This method requires that some maximum size at recruitment can be specified. The advantages of this multiple length-frequency method are: it is simple to use; it requires only three parameters; no specific distributions need to be assumed; and the actual seasonal recruitment pattern does not have to be specified. We illustrate the new method with length-frequency data on the tiger prawn Penaeus esculentus from the north-western Gulf of Carpentaria, Australia.
Resumo:
We consider estimation of mortality rates and growth parameters from length-frequency data of a fish stock and derive the underlying length distribution of the population and the catch when there is individual variability in the von Bertalanffy growth parameter L-infinity. The model is flexible enough to accommodate 1) any recruitment pattern as a function of both time and length, 2) length-specific selectivity, and 3) varying fishing effort over time. The maximum likelihood method gives consistent estimates, provided the underlying distribution for individual variation in growth is correctly specified. Simulation results indicate that our method is reasonably robust to violations in the assumptions. The method is applied to tiger prawn data (Penaeus semisulcatus) to obtain estimates of natural and fishing mortality.
Resumo:
Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain ∼38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same region.
Resumo:
The goal of this article is to provide a new design framework and its corresponding estimation for phase I trials. Existing phase I designs assign each subject to one dose level based on responses from previous subjects. Yet it is possible that subjects with neither toxicity nor efficacy responses can be treated at higher dose levels, and their subsequent responses to higher doses will provide more information. In addition, for some trials, it might be possible to obtain multiple responses (repeated measures) from a subject at different dose levels. In this article, a nonparametric estimation method is developed for such studies. We also explore how the designs of multiple doses per subject can be implemented to improve design efficiency. The gain of efficiency from "single dose per subject" to "multiple doses per subject" is evaluated for several scenarios. Our numerical study shows that using "multiple doses per subject" and the proposed estimation method together increases the efficiency substantially.
Resumo:
A decision-theoretic framework is proposed for designing sequential dose-finding trials with multiple outcomes. The optimal strategy is solvable theoretically via backward induction. However, for dose-finding studies involving k doses, the computational complexity is the same as the bandit problem with k-dependent arms, which is computationally prohibitive. We therefore provide two computationally compromised strategies, which is of practical interest as the computational complexity is greatly reduced: one is closely related to the continual reassessment method (CRM), and the other improves CRM and approximates to the optimal strategy better. In particular, we present the framework for phase I/II trials with multiple outcomes. Applications to a pediatric HIV trial and a cancer chemotherapy trial are given to illustrate the proposed approach. Simulation results for the two trials show that the computationally compromised strategy can perform well and appear to be ethical for allocating patients. The proposed framework can provide better approximation to the optimal strategy if more extensive computing is available.
Resumo:
The oncogene MDM4, also known as MDMX or HDMX, contributes to cancer susceptibility and progression through its capacity to negatively regulate a range of genes with tumour-suppressive functions. As part of a recent genome-wide association study it was determined that the A-allele of the rs4245739 SNP (A>C), located in the 3'-UTR of MDM4, is associated with an increased risk of prostate cancer. Computational predictions revealed that the rs4245739 SNP is located within a predicted binding site for three microRNAs (miRNAs): miR-191-5p, miR-887 and miR-3669. Herein, we show using reporter gene assays and endogenous MDM4 expression analyses that miR-191-5p and miR-887 have a specific affinity for the rs4245739 SNP C-allele in prostate cancer. These miRNAs do not affect MDM4 mRNA levels, rather they inhibit its translation in C-allele-containing PC3 cells but not in LNCaP cells homozygous for the A-allele. By analysing gene expression datasets from patient cohorts, we found that MDM4 is associated with metastasis and prostate cancer progression and that targeting this gene with miR-191-5p or miR-887 decreases in PC3 cell viability. This study is the first, to our knowledge, to demonstrate regulation of the MDM4 rs4245739 SNP C-allele by two miRNAs in prostate cancer, and thereby to identify a mechanism by which the MDM4 rs4245739 SNP A-allele may be associated with an increased risk for prostate cancer.
Resumo:
This article develops a method for analysis of growth data with multiple recaptures when the initial ages for all individuals are unknown. The existing approaches either impute the initial ages or model them as random effects. Assumptions about the initial age are not verifiable because all the initial ages are unknown. We present an alternative approach that treats all the lengths including the length at first capture as correlated repeated measures for each individual. Optimal estimating equations are developed using the generalized estimating equations approach that only requires the first two moment assumptions. Explicit expressions for estimation of both mean growth parameters and variance components are given to minimize the computational complexity. Simulation studies indicate that the proposed method works well. Two real data sets are analyzed for illustration, one from whelks (Dicathais aegaota) and the other from southern rock lobster (Jasus edwardsii) in South Australia.
Resumo:
The work is a report of research on using multiple inverters of Battery Energy Storage Systems with angle droop controllers to share real power in an isolated micro grid system consisting of inertia based Distributed Generation units and variable load. The proposed angle droop control method helps to balance the supply and demand in the micro grid autonomous mode through charging and discharging of the Battery Energy Storage Systems while ensuring that the state of charge of the storage devices is within safe operating conditions. The proposed method is also studied for its effectiveness for frequency control. The proposed control system is verified and its performance validated with simulation software MATLAB/SIMULINK.
Resumo:
This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case,and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex symbolic sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets.
Resumo:
Red blood cells (RBCs) are the most common type of blood cells in the blood and 99% of the blood cells are RBCs. During the circulation of blood in the cardiovascular network, RBCs squeeze through the tiny blood vessels (capillaries). They exhibit various types of motions and deformed shapes, when flowing through these capillaries with diameters varying between 5 10 µm. RBCs occupy about 45 % of the whole blood volume and the interaction between the RBCs directly influences on the motion and the deformation of the RBCs. However, most of the previous numerical studies have explored the motion and deformation of a single RBC when the interaction between RBCs has been neglected. In this study, motion and deformation of two 2D (two-dimensional) RBCs in capillaries are comprehensively explored using a coupled smoothed particle hydrodynamics (SPH) and discrete element method (DEM) model. In order to clearly model the interactions between RBCs, only two RBCs are considered in this study even though blood with RBCs is continuously flowing through the blood vessels. A spring network based on the DEM is employed to model the viscoelastic membrane of the RBC while the inside and outside fluid of RBC is modelled by SPH. The effect of the initial distance between two RBCs, membrane bending stiffness (Kb) of one RBC and undeformed diameter of one RBC on the motion and deformation of both RBCs in a uniform capillary is studied. Finally, the deformation behavior of two RBCs in a stenosed capillary is also examined. Simulation results reveal that the interaction between RBCs has significant influence on their motion and deformation.
Genetic loci for Epstein-Barr Virus nuclear antigen-1 are associated with risk of multiple sclerosis
Resumo:
Common diseases such as endometriosis (ED), Alzheimer's disease (AD) and multiple sclerosis (MS) account for a significant proportion of the health care burden in many countries. Genome-wide association studies (GWASs) for these diseases have identified a number of individual genetic variants contributing to the risk of those diseases. However, the effect size for most variants is small and collectively the known variants explain only a small proportion of the estimated heritability. We used a linear mixed model to fit all single nucleotide polymorphisms (SNPs) simultaneously, and estimated genetic variances on the liability scale using SNPs from GWASs in unrelated individuals for these three diseases. For each of the three diseases, case and control samples were not all genotyped in the same laboratory. We demonstrate that a careful analysis can obtain robust estimates, but also that insufficient quality control (QC) of SNPs can lead to spurious results and that too stringent QC is likely to remove real genetic signals. Our estimates show that common SNPs on commercially available genotyping chips capture significant variation contributing to liability for all three diseases. The estimated proportion of total variation tagged by all SNPs was 0.26 (SE 0.04) for ED, 0.24 (SE 0.03) for AD and 0.30 (SE 0.03) for MS. Further, we partitioned the genetic variance explained into five categories by a minor allele frequency (MAF), by chromosomes and gene annotation. We provide strong evidence that a substantial proportion of variation in liability is explained by common SNPs, and thereby give insights into the genetic architecture of the diseases.
Resumo:
CONTEXT People meeting diagnostic criteria for anxiety or depressive disorders tend to score high on the personality scale of neuroticism. Studying this personality dimension can give insights into the etiology of these important psychiatric disorders. OBJECTIVES To undertake a comprehensive genome-wide linkage study of neuroticism using large study samples that have been measured multiple times and to compare the results between countries for replication and across time within countries for consistency. DESIGN Genome-wide linkage scan. SETTING Twin individuals and their family members from Australia and the Netherlands. PARTICIPANTS Nineteen thousand six hundred thirty-five sibling pairs completed self-report questionnaires for neuroticism up to 5 times over a period of up to 22 years. Five thousand sixty-nine sibling pairs were genotyped with microsatellite markers. METHODS Nonparametric linkage analyses were conducted in MERLIN-REGRESS for the mean neuroticism scores averaged across time. Additional analyses were conducted for the time-specific measures of neuroticism from each country to investigate consistency of linkage results. RESULTS Three chromosomal regions exceeded empirically derived thresholds for suggestive linkage using mean neuroticism scores: 10p 5 Kosambi cM (cM) (Dutch study sample), 14q 103 cM (Dutch study sample), and 18q 117 cM (combined Australian and Dutch study sample), but only 14q retained significance after correction for multiple testing. These regions all showed evidence for linkage in individual time-specific measures of neuroticism and 1 (18q) showed some evidence for replication between countries. Linkage intervals for these regions all overlap with regions identified in other studies of neuroticism or related traits and/or in studies of anxiety in mice. CONCLUSIONS Our results demonstrate the value of the availability of multiple measures over time and add to the optimism reported in recent reviews for replication of linkage regions for neuroticism. These regions are likely to harbor causal variants for neuroticism and its related psychiatric disorders and can inform prioritization of results from genome-wide association studies.
Resumo:
Accurate determination of same-sex twin zygosity is important for medical, scientific and personal reasons. Determination may be based upon questionnaire data, blood group, enzyme isoforms and fetal membrane examination, but assignment of zygosity must ultimately be confirmed by genotypic data. Here methods are reviewed for calculating average probabilities of correctly concluding a twin pair is monozygotic, given they share the same genotypes across all loci for commonly utilized multiplex short tandem repeat (STR) kits.