4 resultados para experimental designs
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
The humans process the numbers in a similar way to animals. There are countless studies in which similar performance between animals and humans (adults and/or children) are reported. Three models have been developed to explain the cognitive mechanisms underlying the number processing. The triple-code model (Dehaene, 1992) posits an mental number line as preferred way to represent magnitude. The mental number line has three particular effects: the distance, the magnitude and the SNARC effects. The SNARC effect shows a spatial association between number and space representations. In other words, the small numbers are related to left space while large numbers are related to right space. Recently a vertical SNARC effect has been found (Ito & Hatta, 2004; Schwarz & Keus, 2004), reflecting a space-related bottom-to-up representation of numbers. The magnitude representations horizontally and vertically could influence the subject performance in explicit and implicit digit tasks. The goal of this research project aimed to investigate the spatial components of number representation using different experimental designs and tasks. The experiment 1 focused on horizontal and vertical number representations in a within- and between-subjects designs in a parity and magnitude comparative tasks, presenting positive or negative Arabic digits (1-9 without 5). The experiment 1A replied the SNARC and distance effects in both spatial arrangements. The experiment 1B showed an horizontal reversed SNARC effect in both tasks while a vertical reversed SNARC effect was found only in comparative task. In the experiment 1C two groups of subjects performed both tasks in two different instruction-responding hand assignments with positive numbers. The results did not show any significant differences between two assignments, even if the vertical number line seemed to be more flexible respect to horizontal one. On the whole the experiment 1 seemed to demonstrate a contextual (i.e. task set) influences of the nature of the SNARC effect. The experiment 2 focused on the effect of horizontal and vertical number representations on spatial biases in a paper-and-pencil bisecting tasks. In the experiment 2A the participants were requested to bisect physical and number (2 or 9) lines horizontally and vertically. The findings demonstrated that digit 9 strings tended to generate a more rightward bias comparing with digit 2 strings horizontally. However in vertical condition the digit 2 strings generated a more upperward bias respect to digit 9 strings, suggesting a top-to-bottom number line. In the experiment 2B the participants were asked to bisect lines flanked by numbers (i.e. 1 or 7) in four spatial arrangements: horizontal, vertical, right-diagonal and left-diagonal lines. Four number conditions were created according to congruent or incongruent number line representation: 1-1, 1-7, 7-1 and 7-7. The main results showed a more reliable rightward bias in horizontal congruent condition (1-7) respect to incongruent condition (7-1). Vertically the incongruent condition (1-7) determined a significant bias towards bottom side of line respect to congruent condition (7-1). The experiment 2 suggested a more rigid horizontal number line while in vertical condition the number representation could be more flexible. In the experiment 3 we adopted the materials of experiment 2B in order to find a number line effect on temporal (motor) performance. The participants were presented horizontal, vertical, rightdiagonal and left-diagonal lines flanked by the same digits (i.e. 1-1 or 7-7) or by different digits (i.e. 1-7 or 7-1). The digits were spatially congruent or incongruent with their respective hypothesized mental representations. Participants were instructed to touch the lines either close to the large digit, or close to the small digit, or to bisected the lines. Number processing influenced movement execution more than movement planning. Number congruency influenced spatial biases mostly along the horizontal but also along the vertical dimension. These results support a two-dimensional magnitude representation. Finally, the experiment 4 addressed the visuo-spatial manipulation of number representations for accessing and retrieval arithmetic facts. The participants were requested to perform a number-matching and an addition verification tasks. The findings showed an interference effect between sum-nodes and neutral-nodes only with an horizontal presentation of digit-cues, in number-matching tasks. In the addition verification task, the performance was similar for horizontal and vertical presentations of arithmetic problems. In conclusion the data seemed to show an automatic activation of horizontal number line also used to retrieval arithmetic facts. The horizontal number line seemed to be more rigid and the preferred way to order number from left-to-right. A possible explanation could be the left-to-right direction for reading and writing. The vertical number line seemed to be more flexible and more dependent from the tasks, reflecting perhaps several example in the environment representing numbers either from bottom-to-top or from top-to-bottom. However the bottom-to-top number line seemed to be activated by explicit task demands.
Resumo:
Due to the growing attention of consumers towards their food, improvement of quality of animal products has become one of the main focus of research. To this aim, the application of modern molecular genetics approaches has been proved extremely useful and effective. This innovative drive includes all livestock species productions, including pork. The Italian pig breeding industry is unique because needs heavy pigs slaughtered at about 160 kg for the production of high quality processed products. For this reason, it requires precise meat quality and carcass characteristics. Two aspects have been considered in this thesis: the application of the transcriptome analysis in post mortem pig muscles as a possible method to evaluate meat quality parameters related to the pre mortem status of the animals, including health, nutrition, welfare, and with potential applications for product traceability (chapters 3 and 4); the study of candidate genes for obesity related traits in order to identify markers associated with fatness in pigs that could be applied to improve carcass quality (chapters 5, 6, and 7). Chapter three addresses the first issue from a methodological point of view. When we considered this issue, it was not obvious that post mortem skeletal muscle could be useful for transcriptomic analysis. Therefore we demonstrated that the quality of RNA extracted from skeletal muscle of pigs sampled at different post mortem intervals (20 minutes, 2 hours, 6 hours, and 24 hours) is good for downstream applications. Degradation occurred starting from 48 h post mortem even if at this time it is still possible to use some RNA products. In the fourth chapter, in order to demonstrate the potential use of RNA obtained up to 24 hours post mortem, we present the results of RNA analysis with the Affymetrix microarray platform that made it possible to assess the level of expression of more of 24000 mRNAs. We did not identify any significant differences between the different post mortem times suggesting that this technique could be applied to retrieve information coming from the transcriptome of skeletal muscle samples not collected just after slaughtering. This study represents the first contribution of this kind applied to pork. In the fifth chapter, we investigated as candidate for fat deposition the TBC1D1 [TBC1 (tre-2/USP6, BUB2, cdc16) gene. This gene is involved in mechanisms regulating energy homeostasis in skeletal muscle and is associated with predisposition to obesity in humans. By resequencing a fragment of the TBC1D1 gene we identified three synonymous mutations localized in exon 2 (g.40A>G, g.151C>T, and g.172T>C) and 2 polymorphisms localized in intron 2 (g.219G>A and g.252G>A). One of these polymorphisms (g.219G>A) was genotyped by high resolution melting (HRM) analysis and PCR-RFLP. Moreover, this gene sequence was mapped by radiation hybrid analysis on porcine chromosome 8. The association study was conducted in 756 performance tested pigs of Italian Large White and Italian Duroc breeds. Significant results were obtained for lean meat content, back fat thickness, visible intermuscular fat and ham weight. In chapter six, a second candidate gene (tribbles homolog 3, TRIB3) is analyzed in a study of association with carcass and meat quality traits. The TRIB3 gene is involved in energy metabolism of skeletal muscle and plays a role as suppressor of adipocyte differentiation. We identified two polymorphisms in the first coding exon of the porcine TRIB3 gene, one is a synonymous SNP (c.132T> C), a second is a missense mutation (c.146C> T, p.P49L). The two polymorphisms appear to be in complete linkage disequilibrium between and within breeds. The in silico analysis of the p.P49L substitution suggests that it might have a functional effect. The association study in about 650 pigs indicates that this marker is associated with back fat thickness in Italian Large White and Italian Duroc breeds in two different experimental designs. This polymorphisms is also associated with lactate content of muscle semimembranosus in Italian Large White pigs. Expression analysis indicated that this gene is transcribed in skeletal muscle and adipose tissue as well as in other tissues. In the seventh chapter, we reported the genotyping results for of 677 SNPs in extreme divergent groups of pigs chosen according to the extreme estimated breeding values for back fat thickness. SNPs were identified by resequencing, literature mining and in silico database mining. analysis, data reported in the literature of 60 candidates genes for obesity. Genotyping was carried out using the GoldenGate (Illumina) platform. Of the analyzed SNPs more that 300 were polymorphic in the genotyped population and had minor allele frequency (MAF) >0.05. Of these SNPs, 65 were associated (P<0.10) with back fat thickness. One of the most significant gene marker was the same TBC1D1 SNPs reported in chapter 5, confirming the role of this gene in fat deposition in pig. These results could be important to better define the pig as a model for human obesity other than for marker assisted selection to improve carcass characteristics.
Resumo:
The topic of this work concerns nonparametric permutation-based methods aiming to find a ranking (stochastic ordering) of a given set of groups (populations), gathering together information from multiple variables under more than one experimental designs. The problem of ranking populations arises in several fields of science from the need of comparing G>2 given groups or treatments when the main goal is to find an order while taking into account several aspects. As it can be imagined, this problem is not only of theoretical interest but it also has a recognised relevance in several fields, such as industrial experiments or behavioural sciences, and this is reflected by the vast literature on the topic, although sometimes the problem is associated with different keywords such as: "stochastic ordering", "ranking", "construction of composite indices" etc., or even "ranking probabilities" outside of the strictly-speaking statistical literature. The properties of the proposed method are empirically evaluated by means of an extensive simulation study, where several aspects of interest are let to vary within a reasonable practical range. These aspects comprise: sample size, number of variables, number of groups, and distribution of noise/error. The flexibility of the approach lies mainly in the several available choices for the test-statistic and in the different types of experimental design that can be analysed. This render the method able to be tailored to the specific problem and the to nature of the data at hand. To perform the analyses an R package called SOUP (Stochastic Ordering Using Permutations) has been written and it is available on CRAN.