6 resultados para Random noise theory
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
This work provides a forward step in the study and comprehension of the relationships between stochastic processes and a certain class of integral-partial differential equation, which can be used in order to model anomalous diffusion and transport in statistical physics. In the first part, we brought the reader through the fundamental notions of probability and stochastic processes, stochastic integration and stochastic differential equations as well. In particular, within the study of H-sssi processes, we focused on fractional Brownian motion (fBm) and its discrete-time increment process, the fractional Gaussian noise (fGn), which provide examples of non-Markovian Gaussian processes. The fGn, together with stationary FARIMA processes, is widely used in the modeling and estimation of long-memory, or long-range dependence (LRD). Time series manifesting long-range dependence, are often observed in nature especially in physics, meteorology, climatology, but also in hydrology, geophysics, economy and many others. We deepely studied LRD, giving many real data examples, providing statistical analysis and introducing parametric methods of estimation. Then, we introduced the theory of fractional integrals and derivatives, which indeed turns out to be very appropriate for studying and modeling systems with long-memory properties. After having introduced the basics concepts, we provided many examples and applications. For instance, we investigated the relaxation equation with distributed order time-fractional derivatives, which describes models characterized by a strong memory component and can be used to model relaxation in complex systems, which deviates from the classical exponential Debye pattern. Then, we focused in the study of generalizations of the standard diffusion equation, by passing through the preliminary study of the fractional forward drift equation. Such generalizations have been obtained by using fractional integrals and derivatives of distributed orders. In order to find a connection between the anomalous diffusion described by these equations and the long-range dependence, we introduced and studied the generalized grey Brownian motion (ggBm), which is actually a parametric class of H-sssi processes, which have indeed marginal probability density function evolving in time according to a partial integro-differential equation of fractional type. The ggBm is of course Non-Markovian. All around the work, we have remarked many times that, starting from a master equation of a probability density function f(x,t), it is always possible to define an equivalence class of stochastic processes with the same marginal density function f(x,t). All these processes provide suitable stochastic models for the starting equation. Studying the ggBm, we just focused on a subclass made up of processes with stationary increments. The ggBm has been defined canonically in the so called grey noise space. However, we have been able to provide a characterization notwithstanding the underline probability space. We also pointed out that that the generalized grey Brownian motion is a direct generalization of a Gaussian process and in particular it generalizes Brownain motion and fractional Brownain motion as well. Finally, we introduced and analyzed a more general class of diffusion type equations related to certain non-Markovian stochastic processes. We started from the forward drift equation, which have been made non-local in time by the introduction of a suitable chosen memory kernel K(t). The resulting non-Markovian equation has been interpreted in a natural way as the evolution equation of the marginal density function of a random time process l(t). We then consider the subordinated process Y(t)=X(l(t)) where X(t) is a Markovian diffusion. The corresponding time-evolution of the marginal density function of Y(t) is governed by a non-Markovian Fokker-Planck equation which involves the same memory kernel K(t). We developed several applications and derived the exact solutions. Moreover, we considered different stochastic models for the given equations, providing path simulations.
Resumo:
For many years, RF and analog integrated circuits have been mainly developed using bipolar and compound semiconductor technologies due to their better performance. In the last years, the advance made in CMOS technology allowed analog and RF circuits to be built with such a technology, but the use of CMOS technology in RF application instead of bipolar technology has brought more issues in terms of noise. The noise cannot be completely eliminated and will therefore ultimately limit the accuracy of measurements and set a lower limit on how small signals can be detected and processed in an electronic circuit. One kind of noise which affects MOS transistors much more than bipolar ones is the low-frequency noise. In MOSFETs, low-frequency noise is mainly of two kinds: flicker or 1/f noise and random telegraph signal noise (RTS). The objective of this thesis is to characterize and to model the low-frequency noise by studying RTS and flicker noise under both constant and switched bias conditions. The effect of different biasing schemes on both RTS and flicker noise in time and frequency domain has been investigated.
Resumo:
This thesis presents some different techniques designed to drive a swarm of robots in an a-priori unknown environment in order to move the group from a starting area to a final one avoiding obstacles. The presented techniques are based on two different theories used alone or in combination: Swarm Intelligence (SI) and Graph Theory. Both theories are based on the study of interactions between different entities (also called agents or units) in Multi- Agent Systems (MAS). The first one belongs to the Artificial Intelligence context and the second one to the Distributed Systems context. These theories, each one from its own point of view, exploit the emergent behaviour that comes from the interactive work of the entities, in order to achieve a common goal. The features of flexibility and adaptability of the swarm have been exploited with the aim to overcome and to minimize difficulties and problems that can affect one or more units of the group, having minimal impact to the whole group and to the common main target. Another aim of this work is to show the importance of the information shared between the units of the group, such as the communication topology, because it helps to maintain the environmental information, detected by each single agent, updated among the swarm. Swarm Intelligence has been applied to the presented technique, through the Particle Swarm Optimization algorithm (PSO), taking advantage of its features as a navigation system. The Graph Theory has been applied by exploiting Consensus and the application of the agreement protocol with the aim to maintain the units in a desired and controlled formation. This approach has been followed in order to conserve the power of PSO and to control part of its random behaviour with a distributed control algorithm like Consensus.
Resumo:
In this thesis we provide a characterization of probabilistic computation in itself, from a recursion-theoretical perspective, without reducing it to deterministic computation. More specifically, we show that probabilistic computable functions, i.e., those functions which are computed by Probabilistic Turing Machines (PTM), can be characterized by a natural generalization of Kleene's partial recursive functions which includes, among initial functions, one that returns identity or successor with probability 1/2. We then prove the equi-expressivity of the obtained algebra and the class of functions computed by PTMs. In the the second part of the thesis we investigate the relations existing between our recursion-theoretical framework and sub-recursive classes, in the spirit of Implicit Computational Complexity. More precisely, endowing predicative recurrence with a random base function is proved to lead to a characterization of polynomial-time computable probabilistic functions.
Resumo:
The main topic of this thesis is confounding in linear regression models. It arises when a relationship between an observed process, the covariate, and an outcome process, the response, is influenced by an unmeasured process, the confounder, associated with both. Consequently, the estimators for the regression coefficients of the measured covariates might be severely biased, less efficient and characterized by misleading interpretations. Confounding is an issue when the primary target of the work is the estimation of the regression parameters. The central point of the dissertation is the evaluation of the sampling properties of parameter estimators. This work aims to extend the spatial confounding framework to general structured settings and to understand the behaviour of confounding as a function of the data generating process structure parameters in several scenarios focusing on the joint covariate-confounder structure. In line with the spatial statistics literature, our purpose is to quantify the sampling properties of the regression coefficient estimators and, in turn, to identify the most prominent quantities depending on the generative mechanism impacting confounding. Once the sampling properties of the estimator conditionally on the covariate process are derived as ratios of dependent quadratic forms in Gaussian random variables, we provide an analytic expression of the marginal sampling properties of the estimator using Carlson’s R function. Additionally, we propose a representative quantity for the magnitude of confounding as a proxy of the bias, its first-order Laplace approximation. To conclude, we work under several frameworks considering spatial and temporal data with specific assumptions regarding the covariance and cross-covariance functions used to generate the processes involved. This study allows us to claim that the variability of the confounder-covariate interaction and of the covariate plays the most relevant role in determining the principal marker of the magnitude of confounding.