902 resultados para Deterministic imputation


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Membrane bioreactors (MBRs) are a combination of activated sludge bioreactors and membrane filtration, enabling high quality effluent with a small footprint. However, they can be beset by fouling, which causes an increase in transmembrane pressure (TMP). Modelling and simulation of changes in TMP could be useful to describe fouling through the identification of the most relevant operating conditions. Using experimental data from a MBR pilot plant operated for 462days, two different models were developed: a deterministic model using activated sludge model n°2d (ASM2d) for the biological component and a resistance in-series model for the filtration component as well as a data-driven model based on multivariable regressions. Once validated, these models were used to describe membrane fouling (as changes in TMP over time) under different operating conditions. The deterministic model performed better at higher temperatures (>20°C), constant operating conditions (DO set-point, membrane air-flow, pH and ORP), and high mixed liquor suspended solids (>6.9gL-1) and flux changes. At low pH (<7) or periods with higher pH changes, the data-driven model was more accurate. Changes in the DO set-point of the aerobic reactor that affected the TMP were also better described by the data-driven model. By combining the use of both models, a better description of fouling can be achieved under different operating conditions

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article discusses, from the standpoint of cellular biology, the deterministic and indeterministic androgenesis theories. The role of the vacuole and of various types of stresses on deviation of the microspore from normal development and the point where androgenetic competence is acquired are examined. Based on extensive literature review and data on wheat studies from our laboratory, a model for androgenetic capacity of pollen grain is proposed. A two point deterministic model for in vitro androgenesis is our proposal for acquisition of androgenetic potential of the pollen grain: the first switch point would be early meiosis and the second switch point the uninucleate pollen stage, because the elimination of cytoplasmatic sporophytic determinants takes place at those two strategic moments. Any abnormality in this process allowing the maintenance of sporophytic informational molecules results in the absence of establishment of a gametophytic program, allowing the reactivation of the embryogenic process

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the most important problems in the theory of cellular automata (CA) is determining the proportion of cells in a specific state after a given number of time iterations. We approach this problem using patterns in preimage sets - that is, the set of blocks which iterate to the desired output. This allows us to construct a response curve - a relationship between the proportion of cells in state 1 after niterations as a function of the initial proportion. We derive response curve formulae for many two-dimensional deterministic CA rules with L-neighbourhood. For all remaining rules, we find experimental response curves. We also use preimage sets to classify surjective rules. In the last part of the thesis, we consider a special class of one-dimensional probabilistic CA rules. We find response surface formula for these rules and experimental response surfaces for all remaining rules.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

L'imputation est souvent utilisée dans les enquêtes pour traiter la non-réponse partielle. Il est bien connu que traiter les valeurs imputées comme des valeurs observées entraîne une sous-estimation importante de la variance des estimateurs ponctuels. Pour remédier à ce problème, plusieurs méthodes d'estimation de la variance ont été proposées dans la littérature, dont des méthodes adaptées de rééchantillonnage telles que le Bootstrap et le Jackknife. Nous définissons le concept de double-robustesse pour l'estimation ponctuelle et de variance sous l'approche par modèle de non-réponse et l'approche par modèle d'imputation. Nous mettons l'emphase sur l'estimation de la variance à l'aide du Jackknife qui est souvent utilisé dans la pratique. Nous étudions les propriétés de différents estimateurs de la variance à l'aide du Jackknife pour l'imputation par la régression déterministe ainsi qu'aléatoire. Nous nous penchons d'abord sur le cas de l'échantillon aléatoire simple. Les cas de l'échantillonnage stratifié et à probabilités inégales seront aussi étudiés. Une étude de simulation compare plusieurs méthodes d'estimation de variance à l'aide du Jackknife en terme de biais et de stabilité relative quand la fraction de sondage n'est pas négligeable. Finalement, nous établissons la normalité asymptotique des estimateurs imputés pour l'imputation par régression déterministe et aléatoire.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Les logiciels utilisés sont Splus et R.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cette thèse comporte trois articles dont un est publié et deux en préparation. Le sujet central de la thèse porte sur le traitement des valeurs aberrantes représentatives dans deux aspects importants des enquêtes que sont : l’estimation des petits domaines et l’imputation en présence de non-réponse partielle. En ce qui concerne les petits domaines, les estimateurs robustes dans le cadre des modèles au niveau des unités ont été étudiés. Sinha & Rao (2009) proposent une version robuste du meilleur prédicteur linéaire sans biais empirique pour la moyenne des petits domaines. Leur estimateur robuste est de type «plugin», et à la lumière des travaux de Chambers (1986), cet estimateur peut être biaisé dans certaines situations. Chambers et al. (2014) proposent un estimateur corrigé du biais. En outre, un estimateur de l’erreur quadratique moyenne a été associé à ces estimateurs ponctuels. Sinha & Rao (2009) proposent une procédure bootstrap paramétrique pour estimer l’erreur quadratique moyenne. Des méthodes analytiques sont proposées dans Chambers et al. (2014). Cependant, leur validité théorique n’a pas été établie et leurs performances empiriques ne sont pas pleinement satisfaisantes. Ici, nous examinons deux nouvelles approches pour obtenir une version robuste du meilleur prédicteur linéaire sans biais empirique : la première est fondée sur les travaux de Chambers (1986), et la deuxième est basée sur le concept de biais conditionnel comme mesure de l’influence d’une unité de la population. Ces deux classes d’estimateurs robustes des petits domaines incluent également un terme de correction pour le biais. Cependant, ils utilisent tous les deux l’information disponible dans tous les domaines contrairement à celui de Chambers et al. (2014) qui utilise uniquement l’information disponible dans le domaine d’intérêt. Dans certaines situations, un biais non négligeable est possible pour l’estimateur de Sinha & Rao (2009), alors que les estimateurs proposés exhibent un faible biais pour un choix approprié de la fonction d’influence et de la constante de robustesse. Les simulations Monte Carlo sont effectuées, et les comparaisons sont faites entre les estimateurs proposés et ceux de Sinha & Rao (2009) et de Chambers et al. (2014). Les résultats montrent que les estimateurs de Sinha & Rao (2009) et de Chambers et al. (2014) peuvent avoir un biais important, alors que les estimateurs proposés ont une meilleure performance en termes de biais et d’erreur quadratique moyenne. En outre, nous proposons une nouvelle procédure bootstrap pour l’estimation de l’erreur quadratique moyenne des estimateurs robustes des petits domaines. Contrairement aux procédures existantes, nous montrons formellement la validité asymptotique de la méthode bootstrap proposée. Par ailleurs, la méthode proposée est semi-paramétrique, c’est-à-dire, elle n’est pas assujettie à une hypothèse sur les distributions des erreurs ou des effets aléatoires. Ainsi, elle est particulièrement attrayante et plus largement applicable. Nous examinons les performances de notre procédure bootstrap avec les simulations Monte Carlo. Les résultats montrent que notre procédure performe bien et surtout performe mieux que tous les compétiteurs étudiés. Une application de la méthode proposée est illustrée en analysant les données réelles contenant des valeurs aberrantes de Battese, Harter & Fuller (1988). S’agissant de l’imputation en présence de non-réponse partielle, certaines formes d’imputation simple ont été étudiées. L’imputation par la régression déterministe entre les classes, qui inclut l’imputation par le ratio et l’imputation par la moyenne sont souvent utilisées dans les enquêtes. Ces méthodes d’imputation peuvent conduire à des estimateurs imputés biaisés si le modèle d’imputation ou le modèle de non-réponse n’est pas correctement spécifié. Des estimateurs doublement robustes ont été développés dans les années récentes. Ces estimateurs sont sans biais si l’un au moins des modèles d’imputation ou de non-réponse est bien spécifié. Cependant, en présence des valeurs aberrantes, les estimateurs imputés doublement robustes peuvent être très instables. En utilisant le concept de biais conditionnel, nous proposons une version robuste aux valeurs aberrantes de l’estimateur doublement robuste. Les résultats des études par simulations montrent que l’estimateur proposé performe bien pour un choix approprié de la constante de robustesse.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nature is full of phenomena which we call "chaotic", the weather being a prime example. What we mean by this is that we cannot predict it to any significant accuracy, either because the system is inherently complex, or because some of the governing factors are not deterministic. However, during recent years it has become clear that random behaviour can occur even in very simple systems with very few number of degrees of freedom, without any need for complexity or indeterminacy. The discovery that chaos can be generated even with the help of systems having completely deterministic rules - often models of natural phenomena - has stimulated a lo; of research interest recently. Not that this chaos has no underlying order, but it is of a subtle kind, that has taken a great deal of ingenuity to unravel. In the present thesis, the author introduce a new nonlinear model, a ‘modulated’ logistic map, and analyse it from the view point of ‘deterministic chaos‘.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study cooperating distributed systems (CD-systems) of restarting automata that are very restricted: they are deterministic, they cannot rewrite, but only delete symbols, they restart immediately after performing a delete operation, they are stateless, and they have a read/write window of size 1 only, that is, these are stateless deterministic R(1)-automata. We study the expressive power of these systems by relating the class of languages that they accept by mode =1 computations to other well-studied language classes, showing in particular that this class only contains semi-linear languages, and that it includes all rational trace languages. In addition, we investigate the closure and non-closure properties of this class of languages and some of its algorithmic properties.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study cooperating distributed systems (CD-systems) of stateless deterministic restarting automata with window size 1 that are governed by an external pushdown store. In this way we obtain an automata-theoretical characterization for the class of context-free trace languages.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is known that cooperating distributed systems (CD-systems) of stateless deterministic restarting automata with window size 1 accept a class of semi-linear languages that properly includes all rational trace languages. Although the component automata of such a CD-system are all deterministic, in general the CD-system itself is not, as in each of its computations, the initial component and the successor components are still chosen nondeterministically. Here we study CD-systems of stateless deterministic restarting automata with window size 1 that are themselves completely deterministic. In fact, we consider two such types of CD-systems, the strictly deterministic systems and the globally deterministic systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped. ----- Methods: Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets. ----- Results: Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams. ----- Conclusions: Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation has as its goal the quantitative evaluation of the application of coupled hydrodynamic, ecological and clarity models, to address the deterministic prediction of water clarity in lakes and reservoirs. Prediction of water clarity is somewhat unique, insofar as it represents the integrated and coupled effects of a broad range of individual water quality components. These include the biological components such as phytoplankton, together with the associated cycles of nutrients that are needed to sustain their popuiations, and abiotic components such as suspended particles that may be introduced by streams, atmospheric deposition or sediment resuspension. Changes in clarity induced by either component will feed back on the phytoplankton dynamics, as incident light also affects biological growth. Thus ability to successfully model changes in clarity will by necessity have to achieve the correct modeling of these other water quality parameters. Water clarity is also unique in that it may be one of the earliest and most easily detected wamings of the acceleration of the process of eutrophication in a water body.