984 resultados para analysis of variance
Resumo:
BACKGROUND: Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-seq data. RESULTS: We conducted an extensive comparison of eleven methods for differential expression analysis of RNA-seq data. All methods are freely available within the R framework and take as input a matrix of counts, i.e. the number of reads mapping to each genomic feature of interest in each of a number of samples. We evaluate the methods based on both simulated data and real RNA-seq data. CONCLUSIONS: Very small sample sizes, which are still common in RNA-seq experiments, impose problems for all evaluated methods and any results obtained under such conditions should be interpreted with caution. For larger sample sizes, the methods combining a variance-stabilizing transformation with the 'limma' method for differential expression analysis perform well under many different conditions, as does the nonparametric SAMseq method.
Resumo:
The work presented evaluates the statistical characteristics of regional bias and expected error in reconstructions of real positron emission tomography (PET) data of human brain fluoro-deoxiglucose (FDG) studies carried out by the maximum likelihood estimator (MLE) method with a robust stopping rule, and compares them with the results of filtered backprojection (FBP) reconstructions and with the method of sieves. The task of evaluating radioisotope uptake in regions-of-interest (ROIs) is investigated. An assessment of bias and variance in uptake measurements is carried out with simulated data. Then, by using three different transition matrices with different degrees of accuracy and a components of variance model for statistical analysis, it is shown that the characteristics obtained from real human FDG brain data are consistent with the results of the simulation studies.
Resumo:
With the trend in molecular epidemiology towards both genome-wide association studies and complex modelling, the need for large sample sizes to detect small effects and to allow for the estimation of many parameters within a model continues to increase. Unfortunately, most methods of association analysis have been restricted to either a family-based or a case-control design, resulting in the lack of synthesis of data from multiple studies. Transmission disequilibrium-type methods for detecting linkage disequilibrium from family data were developed as an effective way of preventing the detection of association due to population stratification. Because these methods condition on parental genotype, however, they have precluded the joint analysis of family and case-control data, although methods for case-control data may not protect against population stratification and do not allow for familial correlations. We present here an extension of a family-based association analysis method for continuous traits that will simultaneously test for, and if necessary control for, population stratification. We further extend this method to analyse binary traits (and therefore family and case-control data together) and accurately to estimate genetic effects in the population, even when using an ascertained family sample. Finally, we present the power of this binary extension for both family-only and joint family and case-control data, and demonstrate the accuracy of the association parameter and variance components in an ascertained family sample.
Resumo:
OBJECTIVES: Etravirine (ETV) is metabolized by cytochrome P450 (CYP) 3A, 2C9, and 2C19. Metabolites are glucuronidated by uridine diphosphate glucuronosyltransferases (UGT). To identify the potential impact of genetic and non-genetic factors involved in ETV metabolism, we carried out a two-step pharmacogenetics-based population pharmacokinetic study in HIV-1 infected individuals. MATERIALS AND METHODS: The study population included 144 individuals contributing 289 ETV plasma concentrations and four individuals contributing 23 ETV plasma concentrations collected in a rich sampling design. Genetic variants [n=125 single-nucleotide polymorphisms (SNPs)] in 34 genes with a predicted role in ETV metabolism were selected. A first step population pharmacokinetic model included non-genetic and known genetic factors (seven SNPs in CYP2C, one SNP in CYP3A5) as covariates. Post-hoc individual ETV clearance (CL) was used in a second (discovery) step, in which the effect of the remaining 98 SNPs in CYP3A, P450 cytochrome oxidoreductase (POR), nuclear receptor genes, and UGTs was investigated. RESULTS: A one-compartment model with zero-order absorption best characterized ETV pharmacokinetics. The average ETV CL was 41 (l/h) (CV 51.1%), the volume of distribution was 1325 l, and the mean absorption time was 1.2 h. The administration of darunavir/ritonavir or tenofovir was the only non-genetic covariate influencing ETV CL significantly, resulting in a 40% [95% confidence interval (CI): 13-69%] and a 42% (95% CI: 17-68%) increase in ETV CL, respectively. Carriers of rs4244285 (CYP2C19*2) had 23% (8-38%) lower ETV CL. Co-administered antiretroviral agents and genetic factors explained 16% of the variance in ETV concentrations. None of the SNPs in the discovery step influenced ETV CL. CONCLUSION: ETV concentrations are highly variable, and co-administered antiretroviral agents and genetic factors explained only a modest part of the interindividual variability in ETV elimination. Opposing effects of interacting drugs effectively abrogate genetic influences on ETV CL, and vice-versa.
Resumo:
Introduction: Les études GVvA (Genome-wide association ,-studies) ont identifié et confirmé plus de 20 gènes de susceptibilité au DT2 et ont contribué à mieux comprendre sa physiopathologie. L'hyperglycémie à jeun (GJ), et 2 heures après une HGPO (G2h) sont les deux mesures cliniques du diagnostic du DT2. Nous avons identifié récemment la G6P du pancréas (G6PC2) comme déterminant de la variabilité physiologique de la GJ puis Ie récepteur à la mélatonine (MTNRIB) qui de plus lie la régulation du rythme circadien au DT2. Dans ce travail nous avons étudié la génétique de la G2h à l'aide de l'approche GWA. Résultats: Nous avons réalisé une méta-analyse GWA dans le cadre de MAGIC (Meta-Analysis of Glucose and Insulin related traits Consortium) qui a inclus 9 études GWA (N=15'234). La réplication de 29 loci (N=6958-30 121, P < 10-5 ) a confirmé 5 nouveaux loci; 2 étant connus comme associés avec Ie DT2 (TCF7L2, P = 1,6 X 10-10 ) et la GJ (GCKR, p = 5,6 X 10-10 ); alors que GIPR (p= 5,2 X 10-12), VSP13C (p= 3,9 X 10-8) et ADCY5 (p = 1,11 X 10-15 ) sont inédits. GIPR code Ie récepteur au GIP (gastric inhibitory polypeptide) qui est sécrété par les ceIlules intestinales pour stimuler la sécrétion de l'insuline en réponse au glucose (l'effet incrétine). Les porteurs du variant GIPR qui augmente la G2h ont également un indice insulinogénique plus bas, (p= 1,0 X 10-17) mais ils ne présentent aucune modification de leur glycémie suite à une hyperglycémie provoquée par voie veineuse (p= 0,21). Ces résultats soutiennent un effet incrétine du locus GIPR qui expliquerait ~9,6 % de la variance total de ce trait. La biologie de ADCY5 et VPS13C et son lien avec l'homéostasie du glucose restent à élucider. GIPR n'est pas associé avec le risque de DT2 indiquant qu'il influence la variabilité physiologique de la G2h alors que le locus ADCY5 est associé avec le DT2 (OR = 1,11, P = 1,5 X 10-15). Conclusion: Notre étude démontre que l'étude de la G2h est une approche efficace d'une part pour la compréhension de la base génétique de la physiologie de ce trait clinique important et d'autre part pour identifier de nouveaux gènes de susceptibilité au DT2.
Resumo:
Experimental research has identified many putative agents of amphibian decline, yet the population-level consequences of these agents remain unknown, owing to lack of information on compensatory density dependence in natural populations. Here, we investigate the relative importance of intrinsic (density-dependent) and extrinsic (climatic) factors impacting the dynamics of a tree frog (Hyla arborea) population over 22 years. A combination of log-linear density dependence and rainfall (with a 2-year time lag corresponding to development time) explain 75% of the variance in the rate of increase. Such fluctuations around a variable return point might be responsible for the seemingly erratic demography and disequilibrium dynamics of many amphibian populations.
Resumo:
Genetic variants influence the risk to develop certain diseases or give rise to differences in drug response. Recent progresses in cost-effective, high-throughput genome-wide techniques, such as microarrays measuring Single Nucleotide Polymorphisms (SNPs), have facilitated genotyping of large clinical and population cohorts. Combining the massive genotypic data with measurements of phenotypic traits allows for the determination of genetic differences that explain, at least in part, the phenotypic variations within a population. So far, models combining the most significant variants can only explain a small fraction of the variance, indicating the limitations of current models. In particular, researchers have only begun to address the possibility of interactions between genotypes and the environment. Elucidating the contributions of such interactions is a difficult task because of the large number of genetic as well as possible environmental factors.In this thesis, I worked on several projects within this context. My first and main project was the identification of possible SNP-environment interactions, where the phenotypes were serum lipid levels of patients from the Swiss HIV Cohort Study (SHCS) treated with antiretroviral therapy. Here the genotypes consisted of a limited set of SNPs in candidate genes relevant for lipid transport and metabolism. The environmental variables were the specific combinations of drugs given to each patient over the treatment period. My work explored bioinformatic and statistical approaches to relate patients' lipid responses to these SNPs, drugs and, importantly, their interactions. The goal of this project was to improve our understanding and to explore the possibility of predicting dyslipidemia, a well-known adverse drug reaction of antiretroviral therapy. Specifically, I quantified how much of the variance in lipid profiles could be explained by the host genetic variants, the administered drugs and SNP-drug interactions and assessed the predictive power of these features on lipid responses. Using cross-validation stratified by patients, we could not validate our hypothesis that models that select a subset of SNP-drug interactions in a principled way have better predictive power than the control models using "random" subsets. Nevertheless, all models tested containing SNP and/or drug terms, exhibited significant predictive power (as compared to a random predictor) and explained a sizable proportion of variance, in the patient stratified cross-validation context. Importantly, the model containing stepwise selected SNP terms showed higher capacity to predict triglyceride levels than a model containing randomly selected SNPs. Dyslipidemia is a complex trait for which many factors remain to be discovered, thus missing from the data, and possibly explaining the limitations of our analysis. In particular, the interactions of drugs with SNPs selected from the set of candidate genes likely have small effect sizes which we were unable to detect in a sample of the present size (<800 patients).In the second part of my thesis, I performed genome-wide association studies within the Cohorte Lausannoise (CoLaus). I have been involved in several international projects to identify SNPs that are associated with various traits, such as serum calcium, body mass index, two-hour glucose levels, as well as metabolic syndrome and its components. These phenotypes are all related to major human health issues, such as cardiovascular disease. I applied statistical methods to detect new variants associated with these phenotypes, contributing to the identification of new genetic loci that may lead to new insights into the genetic basis of these traits. This kind of research will lead to a better understanding of the mechanisms underlying these pathologies, a better evaluation of disease risk, the identification of new therapeutic leads and may ultimately lead to the realization of "personalized" medicine.
Resumo:
BACKGROUND: The quantification of total (free+sulfated) metanephrines in urine is recommended to diagnose pheochromocytoma. Urinary metanephrines include metanephrine itself, normetanephrine and methoxytyramine, mainly in the form of sulfate conjugates (60-80%). Their determination requires the hydrolysis of the sulfate ester moiety to allow electrochemical oxidation of the phenolic group. Commercially available urine calibrators and controls contain essentially free, unhydrolysable metanephrines which are not representative of native urines. The lack of appropriate calibrators may lead to uncertainty regarding the completion of the hydrolysis of sulfated metanephrines, resulting in incorrect quantification. METHODS: We used chemically synthesized sulfated metanephrines to establish whether the procedure most frequently recommended for commercial kits (pH 1.0 for 30 min over a boiling water bath) ensures their complete hydrolysis. RESULTS: We found that sulfated metanephrines differ in their optimum pH to obtain complete hydrolysis. Highest yields and minimal variance were established for incubation at pH 0.7-0.9 during 20 min. CONCLUSION: Urinary pH should be carefully controlled to ensure an efficient and reproducible hydrolysis of sulfated metanephrines. Synthetic sulfated metanephrines represent the optimal material for calibrators and proficiency testing to improve inter-laboratory accuracy.
Resumo:
Biological scaling analyses employing the widely used bivariate allometric model are beset by at least four interacting problems: (1) choice of an appropriate best-fit line with due attention to the influence of outliers; (2) objective recognition of divergent subsets in the data (allometric grades); (3) potential restrictions on statistical independence resulting from phylogenetic inertia; and (4) the need for extreme caution in inferring causation from correlation. A new non-parametric line-fitting technique has been developed that eliminates requirements for normality of distribution, greatly reduces the influence of outliers and permits objective recognition of grade shifts in substantial datasets. This technique is applied in scaling analyses of mammalian gestation periods and of neonatal body mass in primates. These analyses feed into a re-examination, conducted with partial correlation analysis, of the maternal energy hypothesis relating to mammalian brain evolution, which suggests links between body size and brain size in neonates and adults, gestation period and basal metabolic rate. Much has been made of the potential problem of phylogenetic inertia as a confounding factor in scaling analyses. However, this problem may be less severe than suspected earlier because nested analyses of variance conducted on residual variation (rather than on raw values) reveals that there is considerable variance at low taxonomic levels. In fact, limited divergence in body size between closely related species is one of the prime examples of phylogenetic inertia. One common approach to eliminating perceived problems of phylogenetic inertia in allometric analyses has been calculation of 'independent contrast values'. It is demonstrated that the reasoning behind this approach is flawed in several ways. Calculation of contrast values for closely related species of similar body size is, in fact, highly questionable, particularly when there are major deviations from the best-fit line for the scaling relationship under scrutiny.
Principal components analysis for quality evaluation of cooled banana 'Nanicão' in different packing
Resumo:
This work aims determinate the evaluation of the quality of 'Nanicão' banana, submitted to two conditions of storage temperature and three different kinds of package, using the technique of the Analysis of Principal Components (ACP), as a basis for an Analysis of Variance. The fruits used were 'Nanicão' bananas, at ripening degree 3, that is, more green than yellow. The packages tested were: "Torito" wood boxes, load capacity: 18 kg; "½ box" wood boxes, load capacity: 13 kg; and cardboard boxes, load capacity: 18 kg. The temperatures assessed were: room temperature (control); and (13±1ºC), with humidity controlled to 90±2,5%. Fruits were discarded when a sensory analysis determined they had become unfit for consumption. Peel coloration, percentages of imperfection, fresh mass, total acidity, pH, total soluble solids and percentages of sucrose were assessed. A completely randomized design with a 2-factorial treatment structure (packing X temperature) was used. The obtained data were analyzed through a multivariate analysis known as Principal Components Analysis, using S-plus 4.2. The conclusion was that the best packages to preserve the fruit were the ½ box ones, which proves that it is necessary to reduce the number of fruits per package to allow better ventilation and decreases mechanical injuries and ensure quality for more time.
Resumo:
In many industrial applications, accurate and fast surface reconstruction is essential for quality control. Variation in surface finishing parameters, such as surface roughness, can reflect defects in a manufacturing process, non-optimal product operational efficiency, and reduced life expectancy of the product. This thesis considers reconstruction and analysis of high-frequency variation, that is roughness, on planar surfaces. Standard roughness measures in industry are calculated from surface topography. A fast and non-contact method to obtain surface topography is to apply photometric stereo in the estimation of surface gradients and to reconstruct the surface by integrating the gradient fields. Alternatively, visual methods, such as statistical measures, fractal dimension and distance transforms, can be used to characterize surface roughness directly from gray-scale images. In this thesis, the accuracy of distance transforms, statistical measures, and fractal dimension are evaluated in the estimation of surface roughness from gray-scale images and topographies. The results are contrasted to standard industry roughness measures. In distance transforms, the key idea is that distance values calculated along a highly varying surface are greater than distances calculated along a smoother surface. Statistical measures and fractal dimension are common surface roughness measures. In the experiments, skewness and variance of brightness distribution, fractal dimension, and distance transforms exhibited strong linear correlations to standard industry roughness measures. One of the key strengths of photometric stereo method is the acquisition of higher frequency variation of surfaces. In this thesis, the reconstruction of planar high-frequency varying surfaces is studied in the presence of imaging noise and blur. Two Wiener filterbased methods are proposed of which one is optimal in the sense of surface power spectral density given the spectral properties of the imaging noise and blur. Experiments show that the proposed methods preserve the inherent high-frequency variation in the reconstructed surfaces, whereas traditional reconstruction methods typically handle incorrect measurements by smoothing, which dampens the high-frequency variation.
Resumo:
Public opinion surveys have become progressively incorporated into systems of official statistics. Surveys of the economic climate are usually qualitative because they collect opinions of businesspeople and/or experts about the long-term indicators described by a number of variables. In such cases the responses are expressed in ordinal numbers, that is, the respondents verbally report, for example, whether during a given trimester the sales or the new orders have increased, decreased or remained the same as in the previous trimester. These data allow to calculate the percent of respondents in the total population (results are extrapolated), who select every one of the three options. Data are often presented in the form of an index calculated as the difference between the percent of those who claim that a given variable has improved in value and of those who claim that it has deteriorated.
Resumo:
In this work we study the classification of forest types using mathematics based image analysis on satellite data. We are interested in improving classification of forest segments when a combination of information from two or more different satellites is used. The experimental part is based on real satellite data originating from Canada. This thesis gives summary of the mathematics basics of the image analysis and supervised learning , methods that are used in the classification algorithm. Three data sets and four feature sets were investigated in this thesis. The considered feature sets were 1) histograms (quantiles) 2) variance 3) skewness and 4) kurtosis. Good overall performances were achieved when a combination of ASTERBAND and RADARSAT2 data sets was used.
Resumo:
Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.
Resumo:
The autonomic nervous system plays an important role in physiological and pathological conditions, and has been extensively evaluated by parametric and non-parametric spectral analysis. To compare the results obtained with fast Fourier transform (FFT) and the autoregressive (AR) method, we performed a comprehensive comparative study using data from humans and rats during pharmacological blockade (in rats), a postural test (in humans), and in the hypertensive state (in both humans and rats). Although postural hypotension in humans induced an increase in normalized low-frequency (LFnu) of systolic blood pressure, the increase in the ratio was detected only by AR. In rats, AR and FFT analysis did not agree for LFnu and high frequency (HFnu) under basal conditions and after vagal blockade. The increase in the LF/HF ratio of the pulse interval, induced by methylatropine, was detected only by FFT. In hypertensive patients, changes in LF and HF for systolic blood pressure were observed only by AR; FFT was able to detect the reduction in both blood pressure variance and total power. In hypertensive rats, AR presented different values of variance and total power for systolic blood pressure. Moreover, AR and FFT presented discordant results for LF, LFnu, HF, LF/HF ratio, and total power for pulse interval. We provide evidence for disagreement in 23% of the indices of blood pressure and heart rate variability in humans and 67% discordance in rats when these variables are evaluated by AR and FFT under physiological and pathological conditions. The overall disagreement between AR and FFT in this study was 43%.