21 resultados para data-driven Stochastic Subspace Identification (SSI-data)
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
Membrane bioreactors (MBRs) are a combination of activated sludge bioreactors and membrane filtration, enabling high quality effluent with a small footprint. However, they can be beset by fouling, which causes an increase in transmembrane pressure (TMP). Modelling and simulation of changes in TMP could be useful to describe fouling through the identification of the most relevant operating conditions. Using experimental data from a MBR pilot plant operated for 462days, two different models were developed: a deterministic model using activated sludge model n°2d (ASM2d) for the biological component and a resistance in-series model for the filtration component as well as a data-driven model based on multivariable regressions. Once validated, these models were used to describe membrane fouling (as changes in TMP over time) under different operating conditions. The deterministic model performed better at higher temperatures (>20°C), constant operating conditions (DO set-point, membrane air-flow, pH and ORP), and high mixed liquor suspended solids (>6.9gL-1) and flux changes. At low pH (<7) or periods with higher pH changes, the data-driven model was more accurate. Changes in the DO set-point of the aerobic reactor that affected the TMP were also better described by the data-driven model. By combining the use of both models, a better description of fouling can be achieved under different operating conditions
Resumo:
Low-copy-number molecules are involved in many functions in cells. The intrinsic fluctuations of these numbers can enable stochastic switching between multiple steady states, inducing phenotypic variability. Herein we present a theoretical and computational study based on Master Equations and Fokker-Planck and Langevin descriptions of stochastic switching for a genetic circuit of autoactivation. We show that in this circuit the intrinsic fluctuations arising from low-copy numbers, which are inherently state-dependent, drive asymmetric switching. These theoretical results are consistent with experimental data that have been reported for the bistable system of the gallactose signaling network in yeast. Our study unravels that intrinsic fluctuations, while not required to describe bistability, are fundamental to understand stochastic switching and the dynamical relative stability of multiple states.
Resumo:
Recent single-cell studies in monkeys (Romo et al., 2004) show that the activity of neurons in the ventral premotor cortex covaries with the animal's decisions in a perceptual comparison task regarding the frequency of vibrotactile events. The firing rate response of these neurons was dependent only on the frequency differences between the two applied vibrations, the sign of that difference being the determining factor for correct task performance. We present a biophysically realistic neurodynamical model that can account for the most relevant characteristics of this decision-making-related neural activity. One of the nontrivial predictions of this model is that Weber's law will underlie the perceptual discrimination behavior. We confirmed this prediction in behavioral tests of vibrotactile discrimination in humans and propose a computational explanation of perceptual discrimination that accounts naturally for the emergence of Weber's law. We conclude that the neurodynamical mechanisms and computational principles underlying the decision-making processes in this perceptual discrimination task are consistent with a fluctuation-driven scenario in a multistable regime.
Resumo:
Low-copy-number molecules are involved in many functions in cells. The intrinsic fluctuations of these numbers can enable stochastic switching between multiple steady states, inducing phenotypic variability. Herein we present a theoretical and computational study based on Master Equations and Fokker-Planck and Langevin descriptions of stochastic switching for a genetic circuit of autoactivation. We show that in this circuit the intrinsic fluctuations arising from low-copy numbers, which are inherently state-dependent, drive asymmetric switching. These theoretical results are consistent with experimental data that have been reported for the bistable system of the gallactose signaling network in yeast. Our study unravels that intrinsic fluctuations, while not required to describe bistability, are fundamental to understand stochastic switching and the dynamical relative stability of multiple states.
Resumo:
Breast cancer is the most common diagnosed cancer and the leading cause of cancer death among females worldwide. It is considered a highly heterogeneous disease and it must be classified into more homogeneous groups. Hence, the purpose of this study was to classify breast tumors based on variations in gene expression patterns derived from RNA sequencing by using different class discovery methods. 42 breast tumors paired-samples were sequenced by Illumine Genome Analyzer and the data was analyzed and prepared by TopHat2 and htseq-count. As reported previously, breast cancer could be grouped into five main groups known as basal epithelial-like group, HER2 group, normal breast-like group and two Luminal groups with a distinctive expression profile. Classifying breast tumor samples by using PAM50 method, the most common subtype was Luminal B and was significantly associated with ESR1 and ERBB2 high expression. Luminal A subtype had ESR1 and SLC39A6 significant high expression, whereas HER2 subtype had a high expression of ERBB2 and CNNE1 genes and low luminal epithelial gene expression. Basal-like and normal-like subtypes were associated with low expression of ESR1, PgR and HER2, and had significant high expression of cytokeratins 5 and 17. Our results were similar compared with TGCA breast cancer data results and with known studies related with breast cancer classification. Classifying breast tumors could add significant prognostic and predictive information to standard parameters, and moreover, identify marker genes for each subtype to find a better therapy for patients with breast cancer.
Resumo:
The stochastic convergence amongst Mexican Federal entities is analyzed in panel data framework. The joint consideration of cross-section dependence and multiple structural breaks is required to ensure that the statistical inference is based on statistics with good statistical properties. Once these features are accounted for, evidence in favour of stochastic convergence is found. Since stochastic convergence is a necessary, yet insufficient condition for convergence as predicted by economic growth models, the paper also investigates whether-convergence process has taken place. We found that the Mexican states have followed either heterogeneous convergence patterns or divergence process throughout the analyzed period.
Resumo:
This paper provides empirical evidence that continuous time models with one factor of volatility, in some conditions, are able to fit the main characteristics of financial data. It also reports the importance of the feedback factor in capturing the strong volatility clustering of data, caused by a possible change in the pattern of volatility in the last part of the sample. We use the Efficient Method of Moments (EMM) by Gallant and Tauchen (1996) to estimate logarithmic models with one and two stochastic volatility factors (with and without feedback) and to select among them.
Resumo:
This paper develops a methodology to estimate the entire population distributions from bin-aggregated sample data. We do this through the estimation of the parameters of mixtures of distributions that allow for maximal parametric flexibility. The statistical approach we develop enables comparisons of the full distributions of height data from potential army conscripts across France's 88 departments for most of the nineteenth century. These comparisons are made by testing for differences-of-means stochastic dominance. Corrections for possible measurement errors are also devised by taking advantage of the richness of the data sets. Our methodology is of interest to researchers working on historical as well as contemporary bin-aggregated or histogram-type data, something that is still widely done since much of the information that is publicly available is in that form, often due to restrictions due to political sensitivity and/or confidentiality concerns.
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
This article presents recent WMR (wheeled mobile robot) navigation experiences using local perception knowledge provided by monocular and odometer systems. A local narrow perception horizon is used to plan safety trajectories towards the objective. Therefore, monocular data are proposed as a way to obtain real time local information by building two dimensional occupancy grids through a time integration of the frames. The path planning is accomplished by using attraction potential fields, while the trajectory tracking is performed by using model predictive control techniques. The results are faced to indoor situations by using the lab available platform consisting in a differential driven mobile robot
Resumo:
Isotopic data are currently becoming an important source of information regardingsources, evolution and mixing processes of water in hydrogeologic systems. However, itis not clear how to treat with statistics the geochemical data and the isotopic datatogether. We propose to introduce the isotopic information as new parts, and applycompositional data analysis with the resulting increased composition. Results areequivalent to downscale the classical isotopic delta variables, because they are alreadyrelative (as needed in the compositional framework) and isotopic variations are almostalways very small. This methodology is illustrated and tested with the study of theLlobregat River Basin (Barcelona, NE Spain), where it is shown that, though verysmall, isotopic variations comp lement geochemical principal components, and help inthe better identification of pollution sources
Resumo:
One of the disadvantages of old age is that there is more past than future: this,however, may be turned into an advantage if the wealth of experience and, hopefully,wisdom gained in the past can be reflected upon and throw some light on possiblefuture trends. To an extent, then, this talk is necessarily personal, certainly nostalgic,but also self critical and inquisitive about our understanding of the discipline ofstatistics. A number of almost philosophical themes will run through the talk: searchfor appropriate modelling in relation to the real problem envisaged, emphasis onsensible balances between simplicity and complexity, the relative roles of theory andpractice, the nature of communication of inferential ideas to the statistical layman, theinter-related roles of teaching, consultation and research. A list of keywords might be:identification of sample space and its mathematical structure, choices betweentransform and stay, the role of parametric modelling, the role of a sample spacemetric, the underused hypothesis lattice, the nature of compositional change,particularly in relation to the modelling of processes. While the main theme will berelevance to compositional data analysis we shall point to substantial implications forgeneral multivariate analysis arising from experience of the development ofcompositional data analysis…
Resumo:
We use CEX repeated cross-section data on consumption and income, to evaluate the nature of increased income inequality in the 1980s and 90s. We decompose unexpected changes in family income into transitory and permanent, and idiosyncratic and aggregate components, and estimate the contribution of each component to total inequality. The model we use is a linearized incomplete markets model, enriched to incorporate risk-sharing while maintaining tractability. Our estimates suggest that taking risk sharing into account is important for the model fit; that the increase in inequality in the 1980s was mainly permanent; and that inequality is driven almost entirely by idiosyncratic income risk. In addition we find no evidence for cyclical behavior of consumption risk, casting doubt on Constantinides and Duffie s (1995) explanation for the equity premium puzzle.
Resumo:
This paper aims to estimate a translog stochastic frontier production function in the analysis of a panel of 150 mixed Catalan farms in the period 1989-1993, in order to attempt to measure and explain variation in technical inefficiency scores with a one-stage approach. The model uses gross value added as the output aggregate measure. Total employment, fixed capital, current assets, specific costs and overhead costs are introduced into the model as inputs. Stochasticfrontier estimates are compared with those obtained using a linear programming method using a two-stage approach. The specification of the translog stochastic frontier model appears as an appropriate representation of the data, technical change was rejected and the technical inefficiency effects were statistically significant. The mean technical efficiency in the period analyzed was estimated to be 64.0%. Farm inefficiency levels were found significantly at 5%level and positively correlated with the number of economic size units.
Resumo:
We study the statistical properties of three estimation methods for a model of learning that is often fitted to experimental data: quadratic deviation measures without unobserved heterogeneity, and maximum likelihood withand without unobserved heterogeneity. After discussing identification issues, we show that the estimators are consistent and provide their asymptotic distribution. Using Monte Carlo simulations, we show that ignoring unobserved heterogeneity can lead to seriously biased estimations in samples which have the typical length of actual experiments. Better small sample properties areobtained if unobserved heterogeneity is introduced. That is, rather than estimating the parameters for each individual, the individual parameters are considered random variables, and the distribution of those random variables is estimated.