999 resultados para CIS Statistical
Resumo:
We present an introductory overview of several challenging problems in the statistical characterization of turbulence. We provide examples from fluid turbulence in three and two dimensions, from the turbulent advection of passive scalars, turbulence in the one-dimensional Burgers equation, and fluid turbulence in the presence of polymer additives.
Resumo:
A method is developed for demonstrating how solitons with some internal periodic motion may emerge as elementary excitations in the statistical mechanics of field systems. The procedure is demonstrated in the context of complex scalar fields which can, for appropriate choices of the Lagrangian, yield charge-carrying solitons with such internal motion. The derivation uses the techniques of the steepest-descent method for functional integrals. It is shown that, despite the constraint of some fixed total charge, a gaslike excitation of such charged solitons does emerge.
Resumo:
The past decade has brought a proliferation of statistical genetic (linkage) analysis techniques, incorporating new methodology and/or improvement of existing methodology in gene mapping, specifically targeted towards the localization of genes underlying complex disorders. Most of these techniques have been implemented in user-friendly programs and made freely available to the genetics community. Although certain packages may be more 'popular' than others, a common question asked by genetic researchers is 'which program is best for me?'. To help researchers answer this question, the following software review aims to summarize the main advantages and disadvantages of the popular GENEHUNTER package.
Resumo:
Sequential firings with fixed time delays are frequently observed in simultaneous recordings from multiple neurons. Such temporal patterns are potentially indicative of underlying microcircuits and it is important to know when a repeatedly occurring pattern is statistically significant. These sequences are typically identified through correlation counts. In this paper we present a method for assessing the significance of such correlations. We specify the null hypothesis in terms of a bound on the conditional probabilities that characterize the influence of one neuron on another. This method of testing significance is more general than the currently available methods since under our null hypothesis we do not assume that the spiking processes of different neurons are independent. The structure of our null hypothesis also allows us to rank order the detected patterns. We demonstrate our method on simulated spike trains.
Resumo:
Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
This thesis presents methods for locating and analyzing cis-regulatory DNA elements involved with the regulation of gene expression in multicellular organisms. The regulation of gene expression is carried out by the combined effort of several transcription factor proteins collectively binding the DNA on the cis-regulatory elements. Only sparse knowledge of the 'genetic code' of these elements exists today. An automatic tool for discovery of putative cis-regulatory elements could help their experimental analysis, which would result in a more detailed view of the cis-regulatory element structure and function. We have developed a computational model for the evolutionary conservation of cis-regulatory elements. The elements are modeled as evolutionarily conserved clusters of sequence-specific transcription factor binding sites. We give an efficient dynamic programming algorithm that locates the putative cis-regulatory elements and scores them according to the conservation model. A notable proportion of the high-scoring DNA sequences show transcriptional enhancer activity in transgenic mouse embryos. The conservation model includes four parameters whose optimal values are estimated with simulated annealing. With good parameter values the model discriminates well between the DNA sequences with evolutionarily conserved cis-regulatory elements and the DNA sequences that have evolved neutrally. In further inquiry, the set of highest scoring putative cis-regulatory elements were found to be sensitive to small variations in the parameter values. The statistical significance of the putative cis-regulatory elements is estimated with the Two Component Extreme Value Distribution. The p-values grade the conservation of the cis-regulatory elements above the neutral expectation. The parameter values for the distribution are estimated by simulating the neutral DNA evolution. The conservation of the transcription factor binding sites can be used in the upstream analysis of regulatory interactions. This approach may provide mechanistic insight to the transcription level data from, e.g., microarray experiments. Here we give a method to predict shared transcriptional regulators for a set of co-expressed genes. The EEL (Enhancer Element Locator) software implements the method for locating putative cis-regulatory elements. The software facilitates both interactive use and distributed batch processing. We have used it to analyze the non-coding regions around all human genes with respect to the orthologous regions in various other species including mouse. The data from these genome-wide analyzes is stored in a relational database which is used in the publicly available web services for upstream analysis and visualization of the putative cis-regulatory elements in the human genome.
Resumo:
Variety selection in perennial pasture crops involves identifying best varieties from data collected from multiple harvest times in field trials. For accurate selection, the statistical methods for analysing such data need to account for the spatial and temporal correlation typically present. This paper provides an approach for analysing multi-harvest data from variety selection trials in which there may be a large number of harvest times. Methods are presented for modelling the variety by harvest effects while accounting for the spatial and temporal correlation between observations. These methods provide an improvement in model fit compared to separate analyses for each harvest, and provide insight into variety by harvest interactions. The approach is illustrated using two traits from a lucerne variety selection trial. The proposed method provides variety predictions allowing for the natural sources of variation and correlation in multi-harvest data.
Resumo:
The rarity of occurrence of cis peptide units is only partially explained by the higher intrinsic energy of the cis over the trans form, which provides a probability of 0·01 for cis peptide units to occur. An additional factor is the conformational restriction imposed by the occurrence of a cis peptide unit in a chain of trans units. Taking a section of three peptide units having the sequences trans-trans-trans (ttt) and trans-cis-trans (tct), conformational energy calculations indicate that the latter can occur only to an extent of 0·1%, unless there occurs the sequence X-Pro, in which case it is of the order of 30%. This explains the extreme rarity of cis peptide units, in general; however, it follows that even with non-prolyl residues, cis peptide units are not forbidden, but can occur in some rare examples and should be looked for.
Resumo:
Early detection of (pre-)signs of ulceration on a diabetic foot is valuable for clinical practice. Hyperspectral imaging is a promising technique for detection and classification of such (pre-)signs. However, the number of the spectral bands should be limited to avoid overfitting, which is critical for pixel classification with hyperspectral image data. The goal was to design a detector/classifier based on spectral imaging (SI) with a small number of optical bandpass filters. The performance and stability of the design were also investigated. The selection of the bandpass filters boils down to a feature selection problem. A dataset was built, containing reflectance spectra of 227 skin spots from 64 patients, measured with a spectrometer. Each skin spot was annotated manually by clinicians as "healthy" or a specific (pre-)sign of ulceration. Statistical analysis on the data set showed the number of required filters is between 3 and 7, depending on additional constraints on the filter set. The stability analysis revealed that shot noise was the most critical factor affecting the classification performance. It indicated that this impact could be avoided in future SI systems with a camera sensor whose saturation level is higher than 106, or by postimage processing.
Resumo:
From the autocorrelation function of geomagnetic polarity intervals, it is shown that the field reversal intervals are not independent but form a process akin to the Markov process, where the random input to the model is itself a moving average process. The input to the moving average model is, however, an independent Gaussian random sequence. All the parameters in this model of the geomagnetic field reversal have been estimated. In physical terms this model implies that the mechanism of reversal possesses a memory.
Resumo:
The pseudoproline residue (Psi Pro, L-2,2-dimethyl-1,3-thiazolidine-4-carboxylic acid) has been introduced into heterochiral diproline segments that have been previously shown to facilitate the formation of beta-hairpins, containing central two and three residue turns. NMR studies of the octapeptide Boc-Leu-Phe-Val-(D)Pro-Psi Pro-Leu-Phe-Val-OMe (1), Boc-Leu-Val-Val-(D)Pro-Psi Pro-Leu-Val-Val-OMe (2), and the nonapeptide sequence Boc-Leu-Phe-Val-(D)Pro-Psi Pro-(D)Ala-Leu-Phe-Val-OMe (3) established well-registered beta-hairpin structures in chloroform solution, with the almost exclusive population of the trans conformation for the peptide bond preceding the Psi Pro residue. The beta-hairpin conformation of 1 is confirmed by single crystal X-ray diffraction. Truncation of the strand length in Boc-Val-(D)Pro-Psi Pro-Leu-OMe (4) results in air increase in the population of the cis conformer, with a cis/trans ratio of 3.65. Replacement of Psi Pro in 4 by (L)Pro in 5, results in almost exclusive population of the trans form, resulting in an incipient beta-hairpin conformation, stabilized by two intramolecular hydrogen bonds. Further truncation of the sequence gives an appreciable rise in the population of cis conformers in the tripeptide piv-(D)Pro-Psi Pro-Leu-OMe (6). In the homochiral segment Piv-Pro Psi Pro-Leu-OMe (7) only the cis form is observed with the NMR evidence strongly supporting a type VIa beta-turn conformation, stabilized by a 4 -> 1 hydrogen bond between the Piv (CO) and Leu (3) NH groups. The crystal structure of the analog peptide 7a (Piv-Pro-Psi(H,CH3)Pro-Leu-NHMe) confirms the cis peptide bond geometry for the Pro-Psi(H,CH3)pro peptide bond, resulting in a type VIa beta-turn conformation.
Resumo:
Population dynamics are generally viewed as the result of intrinsic (purely density dependent) and extrinsic (environmental) processes. Both components, and potential interactions between those two, have to be modelled in order to understand and predict dynamics of natural populations; a topic that is of great importance in population management and conservation. This thesis focuses on modelling environmental effects in population dynamics and how effects of potentially relevant environmental variables can be statistically identified and quantified from time series data. Chapter I presents some useful models of multiplicative environmental effects for unstructured density dependent populations. The presented models can be written as standard multiple regression models that are easy to fit to data. Chapters II IV constitute empirical studies that statistically model environmental effects on population dynamics of several migratory bird species with different life history characteristics and migration strategies. In Chapter II, spruce cone crops are found to have a strong positive effect on the population growth of the great spotted woodpecker (Dendrocopos major), while cone crops of pine another important food resource for the species do not effectively explain population growth. The study compares rate- and ratio-dependent effects of cone availability, using state-space models that distinguish between process and observation error in the time series data. Chapter III shows how drought, in combination with settling behaviour during migration, produces asymmetric spatially synchronous patterns of population dynamics in North American ducks (genus Anas). Chapter IV investigates the dynamics of a Finnish population of skylark (Alauda arvensis), and point out effects of rainfall and habitat quality on population growth. Because the skylark time series and some of the environmental variables included show strong positive autocorrelation, the statistical significances are calculated using a Monte Carlo method, where random autocorrelated time series are generated. Chapter V is a simulation-based study, showing that ignoring observation error in analyses of population time series data can bias the estimated effects and measures of uncertainty, if the environmental variables are autocorrelated. It is concluded that the use of state-space models is an effective way to reach more accurate results. In summary, there are several biological assumptions and methodological issues that can affect the inferential outcome when estimating environmental effects from time series data, and that therefore need special attention. The functional form of the environmental effects and potential interactions between environment and population density are important to deal with. Other issues that should be considered are assumptions about density dependent regulation, modelling potential observation error, and when needed, accounting for spatial and/or temporal autocorrelation.