60 resultados para Statistical variance
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
Electrical failure of insulation is known to be an extremal random process wherein nominally identical pro-rated specimens of equipment insulation, at constant stress fail at inordinately different times even under laboratory test conditions. In order to be able to estimate the life of power equipment, it is necessary to run long duration ageing experiments under accelerated stresses, to acquire and analyze insulation specific failure data. In the present work, Resin Impregnated Paper (RIP) a relatively new insulation system of choice used in transformer bushings, is taken as an example. The failure data has been processed using proven statistical methods, both graphical and analytical. The physical model governing insulation failure at constant accelerated stress has been assumed to be based on temperature dependent inverse power law model.
Resumo:
With the rapid scaling down of the semiconductor process technology, the process variation aware circuit design has become essential today. Several statistical models have been proposed to deal with the process variation. We propose an accurate BSIM model for handling variability in 45nm CMOS technology. The MOSFET is designed to meet the specification of low standby power technology of International Technology Roadmap for Semiconductors (ITRS).The process parameters variation of annealing temperature, oxide thickness, halo dose and title angle of halo implant are considered for the model development. One parameter variation at a time is considered for developing the model. The model validation is done by performance matching with device simulation results and reported error is less than 10%.© (2012) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Resumo:
A few variance reduction schemes are proposed within the broad framework of a particle filter as applied to the problem of structural system identification. Whereas the first scheme uses a directional descent step, possibly of the Newton or quasi-Newton type, within the prediction stage of the filter, the second relies on replacing the more conventional Monte Carlo simulation involving pseudorandom sequence with one using quasi-random sequences along with a Brownian bridge discretization while representing the process noise terms. As evidenced through the derivations and subsequent numerical work on the identification of a shear frame, the combined effect of the proposed approaches in yielding variance-reduced estimates of the model parameters appears to be quite noticeable. DOI: 10.1061/(ASCE)EM.1943-7889.0000480. (C) 2013 American Society of Civil Engineers.
Resumo:
Neutral and niche theories give contrasting explanations for the maintenance of tropical tree species diversity. Both have some empirical support, but methods to disentangle their effects have not yet been developed. We applied a statistical measure of spatial structure to data from 14 large tropical forest plots to test a prediction of niche theory that is incompatible with neutral theory: that species in heterogeneous environments should separate out in space according to their niche preferences. We chose plots across a range of topographic heterogeneity, and tested whether pairwise spatial associations among species were more variable in more heterogeneous sites. We found strong support for this prediction, based on a strong positive relationship between variance in the spatial structure of species pairs and topographic heterogeneity across sites. We interpret this pattern as evidence of pervasive niche differentiation, which increases in importance with increasing environmental heterogeneity.
Resumo:
Diketopyrrolopyrrole (DPP) containing copolymers have gained a lot of interest in organic optoelectronics with great potential in organic photovoltaics. In this work, DPP based statistical copolymers, with slightly different bandgap energies and a varying fraction of donor-acceptor ratio are investigated using monochromatic photocurrent spectroscopy and Fourier-transform photocurrent spectroscopy (FTPS). The statistical copolymer with a lower DPP fraction, when blended with a fullerene derivative, shows the signature of an inter charge transfer complex state in photocurrent spectroscopy. Furthermore, the absorption spectrum of the blended sample with a lower DPP fraction is seen to change as a function of an external bias, qualitatively similar to the quantum confined Stark effect, from where we estimate the exciton binding energy. The statistical copolymer with a higher DPP fraction shows no signal of the inter charge transfer states and yields a higher external quantum efficiency in a photovoltaic structure. In order to gain insight into the origin of the observed charge transfer transitions, we present theoretical studies using density-functional theory and time-dependent density-functional theory for the two pristine DPP based statistical monomers.
Resumo:
Diketopyrrolopyrrole (DPP) containing copolymers have gained a lot of interest in organic optoelectronics with great potential in organic photovoltaics. In this work, DPP based statistical copolymers, with slightly different bandgap energies and a varying fraction of donor-acceptor ratio are investigated using monochromatic photocurrent spectroscopy and Fourier-transform photocurrent spectroscopy (FTPS). The statistical copolymer with a lower DPP fraction, when blended with a fullerene derivative, shows the signature of an inter charge transfer complex state in photocurrent spectroscopy. Furthermore, the absorption spectrum of the blended sample with a lower DPP fraction is seen to change as a function of an external bias, qualitatively similar to the quantum confined Stark effect, from where we estimate the exciton binding energy. The statistical copolymer with a higher DPP fraction shows no signal of the inter charge transfer states and yields a higher external quantum efficiency in a photovoltaic structure. In order to gain insight into the origin of the observed charge transfer transitions, we present theoretical studies using density-functional theory and time-dependent density-functional theory for the two pristine DPP based statistical monomers.
Resumo:
This paper attempts to unravel any relations that may exist between turbulent shear flows and statistical mechanics through a detailed numerical investigation in the simplest case where both can be well defined. The flow considered for the purpose is the two-dimensional (2D) temporal free shear layer with a velocity difference Delta U across it, statistically homogeneous in the streamwise direction (x) and evolving from a plane vortex sheet in the direction normal to it (y) in a periodic-in-x domain L x +/-infinity. Extensive computer simulations of the flow are carried out through appropriate initial-value problems for a ``vortex gas'' comprising N point vortices of the same strength (gamma = L Delta U/N) and sign. Such a vortex gas is known to provide weak solutions of the Euler equation. More than ten different initial-condition classes are investigated using simulations involving up to 32 000 vortices, with ensemble averages evaluated over up to 10(3) realizations and integration over 10(4)L/Delta U. The temporal evolution of such a system is found to exhibit three distinct regimes. In Regime I the evolution is strongly influenced by the initial condition, sometimes lasting a significant fraction of L/Delta U. Regime III is a long-time domain-dependent evolution towards a statistically stationary state, via ``violent'' and ``slow'' relaxations P.-H. Chavanis, Physica A 391, 3657 (2012)], over flow time scales of order 10(2) and 10(4)L/Delta U, respectively (for N = 400). The final state involves a single structure that stochastically samples the domain, possibly constituting a ``relative equilibrium.'' The vortex distribution within the structure follows a nonisotropic truncated form of the Lundgren-Pointin (L-P) equilibrium distribution (with negatively high temperatures; L-P parameter lambda close to -1). The central finding is that, in the intermediate Regime II, the spreading rate of the layer is universal over the wide range of cases considered here. The value (in terms of momentum thickness) is 0.0166 +/- 0.0002 times Delta U. Regime II, extensively studied in the turbulent shear flow literature as a self-similar ``equilibrium'' state, is, however, a part of the rapid nonequilibrium evolution of the vortex-gas system, which we term ``explosive'' as it lasts less than one L/Delta U. Regime II also exhibits significant values of N-independent two-vortex correlations, indicating that current kinetic theories that neglect correlations or consider them as O(1/N) cannot describe this regime. The evolution of the layer thickness in present simulations in Regimes I and II agree with the experimental observations of spatially evolving (3D Navier-Stokes) shear layers. Further, the vorticity-stream-function relations in Regime III are close to those computed in 2D Navier-Stokes temporal shear layers J. Sommeria, C. Staquet, and R. Robert, J. Fluid Mech. 233, 661 (1991)]. These findings suggest the dominance of what may be called the Kelvin-Biot-Savart mechanism in determining the growth of the free shear layer through large-scale momentum and vorticity dispersal.
Resumo:
Several statistical downscaling models have been developed in the past couple of decades to assess the hydrologic impacts of climate change by projecting the station-scale hydrological variables from large-scale atmospheric variables simulated by general circulation models (GCMs). This paper presents and compares different statistical downscaling models that use multiple linear regression (MLR), positive coefficient regression (PCR), stepwise regression (SR), and support vector machine (SVM) techniques for estimating monthly rainfall amounts in the state of Florida. Mean sea level pressure, air temperature, geopotential height, specific humidity, U wind, and V wind are used as the explanatory variables/predictors in the downscaling models. Data for these variables are obtained from the National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) reanalysis dataset and the Canadian Centre for Climate Modelling and Analysis (CCCma) Coupled Global Climate Model, version 3 (CGCM3) GCM simulations. The principal component analysis (PCA) and fuzzy c-means clustering method (FCM) are used as part of downscaling model to reduce the dimensionality of the dataset and identify the clusters in the data, respectively. Evaluation of the performances of the models using different error and statistical measures indicates that the SVM-based model performed better than all the other models in reproducing most monthly rainfall statistics at 18 sites. Output from the third-generation CGCM3 GCM for the A1B scenario was used for future projections. For the projection period 2001-10, MLR was used to relate variables at the GCM and NCEP grid scales. Use of MLR in linking the predictor variables at the GCM and NCEP grid scales yielded better reproduction of monthly rainfall statistics at most of the stations (12 out of 18) compared to those by spatial interpolation technique used in earlier studies.
Resumo:
Frequent episode discovery is one of the methods used for temporal pattern discovery in sequential data. An episode is a partially ordered set of nodes with each node associated with an event type. For more than a decade, algorithms existed for episode discovery only when the associated partial order is total (serial episode) or trivial (parallel episode). Recently, the literature has seen algorithms for discovering episodes with general partial orders. In frequent pattern mining, the threshold beyond which a pattern is inferred to be interesting is typically user-defined and arbitrary. One way of addressing this issue in the pattern mining literature has been based on the framework of statistical hypothesis testing. This paper presents a method of assessing statistical significance of episode patterns with general partial orders. A method is proposed to calculate thresholds, on the non-overlapped frequency, beyond which an episode pattern would be inferred to be statistically significant. The method is first explained for the case of injective episodes with general partial orders. An injective episode is one where event-types are not allowed to repeat. Later it is pointed out how the method can be extended to the class of all episodes. The significance threshold calculations for general partial order episodes proposed here also generalize the existing significance results for serial episodes. Through simulations studies, the usefulness of these statistical thresholds in pruning uninteresting patterns is illustrated. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
We formulate a natural model of loops and isolated vertices for arbitrary planar graphs, which we call the monopole-dimer model. We show that the partition function of this model can be expressed as a determinant. We then extend the method of Kasteleyn and Temperley-Fisher to calculate the partition function exactly in the case of rectangular grids. This partition function turns out to be a square of a polynomial with positive integer coefficients when the grid lengths are even. Finally, we analyse this formula in the infinite volume limit and show that the local monopole density, free energy and entropy can be expressed in terms of well-known elliptic functions. Our technique is a novel determinantal formula for the partition function of a model of isolated vertices and loops for arbitrary graphs.
Resumo:
It is well known that wrist pulse signals contain information about the status of health of a person and hence diagnosis based on pulse signals has assumed great importance since long time. In this paper the efficacy of signal processing techniques in extracting useful information from wrist pulse signals has been demonstrated by using signals recorded under two different experimental conditions viz. before lunch condition and after lunch condition. We have used Pearson's product-moment correlation coefficient, which is an effective measure of phase synchronization, in making a statistical analysis of wrist pulse signals. Contour plots and box plots are used to illustrate various differences. Two-sample t-tests show that the correlations show statistically significant differences between the groups. Results show that the correlation coefficient is effective in distinguishing the changes taking place after having lunch. This paper demonstrates the ability of the wrist pulse signals in detecting changes occurring under two different conditions. The study assumes importance in view of limited literature available on the analysis of wrist pulse signals in the case of food intake and also in view of its potential health care applications.
Resumo:
In this paper, we consider the problem of power allocation in MIMO wiretap channel for secrecy in the presence of multiple eavesdroppers. Perfect knowledge of the destination channel state information (CSI) and only the statistical knowledge of the eavesdroppers CSI are assumed. We first consider the MIMO wiretap channel with Gaussian input. Using Jensen's inequality, we transform the secrecy rate max-min optimization problem to a single maximization problem. We use generalized singular value decomposition and transform the problem to a concave maximization problem which maximizes the sum secrecy rate of scalar wiretap channels subject to linear constraints on the transmit covariance matrix. We then consider the MIMO wiretap channel with finite-alphabet input. We show that the transmit covariance matrix obtained for the case of Gaussian input, when used in the MIMO wiretap channel with finite-alphabet input, can lead to zero secrecy rate at high transmit powers. We then propose a power allocation scheme with an additional power constraint which alleviates this secrecy rate loss problem, and gives non-zero secrecy rates at high transmit powers.
Resumo:
Diffusion-a measure of dynamics, and entropy-a measure of disorder in the system are found to be intimately correlated in many systems, and the correlation is often strongly non-linear. We explore the origin of this complex dependence by studying diffusion of a point Brownian particle on a model potential energy surface characterized by ruggedness. If we assume that the ruggedness has a Gaussian distribution, then for this model, one can obtain the excess entropy exactly for any dimension. By using the expression for the mean first passage time, we present a statistical mechanical derivation of the well-known and well-tested scaling relation proposed by Rosenfeld between diffusion and excess entropy. In anticipation that Rosenfeld diffusion-entropy scaling (RDES) relation may continue to be valid in higher dimensions (where the mean first passage time approach is not available), we carry out an effective medium approximation (EMA) based analysis of the effective transition rate and hence of the effective diffusion coefficient. We show that the EMA expression can be used to derive the RDES scaling relation for any dimension higher than unity. However, RDES is shown to break down in the presence of spatial correlation among the energy landscape values. (C) 2015 AIP Publishing LLC.