65 resultados para 230204 Applied Statistics
em Indian Institute of Science - Bangalore - Índia
Resumo:
Maximum entropy approach to classification is very well studied in applied statistics and machine learning and almost all the methods that exists in literature are discriminative in nature. In this paper, we introduce a maximum entropy classification method with feature selection for large dimensional data such as text datasets that is generative in nature. To tackle the curse of dimensionality of large data sets, we employ conditional independence assumption (Naive Bayes) and we perform feature selection simultaneously, by enforcing a `maximum discrimination' between estimated class conditional densities. For two class problems, in the proposed method, we use Jeffreys (J) divergence to discriminate the class conditional densities. To extend our method to the multi-class case, we propose a completely new approach by considering a multi-distribution divergence: we replace Jeffreys divergence by Jensen-Shannon (JS) divergence to discriminate conditional densities of multiple classes. In order to reduce computational complexity, we employ a modified Jensen-Shannon divergence (JS(GM)), based on AM-GM inequality. We show that the resulting divergence is a natural generalization of Jeffreys divergence to a multiple distributions case. As far as the theoretical justifications are concerned we show that when one intends to select the best features in a generative maximum entropy approach, maximum discrimination using J-divergence emerges naturally in binary classification. Performance and comparative study of the proposed algorithms have been demonstrated on large dimensional text and gene expression datasets that show our methods scale up very well with large dimensional datasets.
Resumo:
We derive analytical expressions for probability distribution function (PDF) for electron transport in a simple model of quantum junction in presence of thermal fluctuations. Our approach is based on the large deviation theory combined with the generating function method. For large number of electrons transferred, the PDF is found to decay exponentially in the tails with different rates due to applied bias. This asymmetry in the PDF is related to the fluctuation theorem. Statistics of fluctuations are analyzed in terms of the Fano factor. Thermal fluctuations play a quantitative role in determining the statistics of electron transfer; they tend to suppress the average current while enhancing the fluctuations in particle transfer. This gives rise to both bunching and antibunching phenomena as determined by the Fano factor. The thermal fluctuations and shot noise compete with each other and determine the net (effective) statistics of particle transfer. Exact analytical expression is obtained for delay time distribution. The optimal values of the delay time between successive electron transfers can be lowered below the corresponding shot noise values by tuning the thermal effects. (C) 2015 AIP Publishing LLC.
Resumo:
Conformational preferences of thiocarbonohydrazide (H2NNHCSNHNH2) in its basic and N,N′-diprotonated forms are examined by calculating the barrier to internal rotation around the C---N bonds, using the theoretical LCAO—MO (ab initio and semiempirical CNDO and EHT) methods. The calculated and experimental results are compared with each other and also with values for N,N′-dimethylthiourea which is isoelectronic with thiocarbonohydrazide. The suitability of these methods for studying rotational isomerism seems suspect when lone pair interactions are present.
Resumo:
In this paper, we present an approach to estimate fractal complexity of discrete time signal waveforms based on computation of area bounded by sample points of the signal at different time resolutions. The slope of best straight line fit to the graph of log(A(rk)A / rk(2)) versus log(l/rk) is estimated, where A(rk) is the area computed at different time resolutions and rk time resolutions at which the area have been computed. The slope quantifies complexity of the signal and it is taken as an estimate of the fractal dimension (FD). The proposed approach is used to estimate the fractal dimension of parametric fractal signals with known fractal dimensions and the method has given accurate results. The estimation accuracy of the method is compared with that of Higuchi's and Sevcik's methods. The proposed method has given more accurate results when compared with that of Sevcik's method and the results are comparable to that of the Higuchi's method. The practical application of the complexity measure in detecting change in complexity of signals is discussed using real sleep electroencephalogram recordings from eight different subjects. The FD-based approach has shown good performance in discriminating different stages of sleep.
Resumo:
The flow, heat and mass transfer problem for a steady laminar incompressible boundary layer flow in an electrically conducting fluid over a longitudinal cylinder with an applied magnetic field has been studied. The partial differential equations governing the flow have been solved numerically using an implicit finite-difference scheme. The results are found to be strongly dependent on the magnetic field and dissipation parameter. The effect of the mass transfer is more pronounced on the skin friction than on the heat transfer. The results have been compared with those of the series solution, the asymptotic solution, the Glauert and Lighthill's solution, local similarity, local nonsimilarity and difference-differential methods. Good agreement is found with all of them, except with the results of the local similarity and series solution methods.
Resumo:
The results are presented of applying multi-time scale analysis using the singular perturbation technique for long time simulation of power system problems. A linear system represented in state-space form can be decoupled into slow and fast subsystems. These subsystems can be simulated with different time steps and then recombined to obtain the system response. Simulation results with a two-time scale analysis of a power system show a large saving in computational costs.
Resumo:
The simultaneous state and parameter estimation problem for a linear discrete-time system with unknown noise statistics is treated as a large-scale optimization problem. The a posterioriprobability density function is maximized directly with respect to the states and parameters subject to the constraint of the system dynamics. The resulting optimization problem is too large for any of the standard non-linear programming techniques and hence an hierarchical optimization approach is proposed. It turns out that the states can be computed at the first levelfor given noise and system parameters. These, in turn, are to be modified at the second level.The states are to be computed from a large system of linear equations and two solution methods are considered for solving these equations, limiting the horizon to a suitable length. The resulting algorithm is a filter-smoother, suitable for off-line as well as on-line state estimation for given noise and system parameters. The second level problem is split up into two, one for modifying the noise statistics and the other for modifying the system parameters. An adaptive relaxation technique is proposed for modifying the noise statistics and a modified Gauss-Newton technique is used to adjust the system parameters.
Resumo:
A very general and numerically quite robust algorithm has been proposed by Sastry and Gauvrit (1980) for system identification. The present paper takes it up and examines its performance on a real test example. The example considered is the lateral dynamics of an aircraft. This is used as a vehicle for demonstrating the performance of various aspects of the algorithm in several possible modes.
Resumo:
Direct numerical simulations (DNS) of spatially growing turbulent shear layers may be performed as temporal simulations by solving the governing equations with some additional terms while imposing streamwise periodicity. These terms are functions of the means whose spatial growth is calculated easily and accurately from statistics of the temporal DNS. Equations for such simulations are derived.
Resumo:
The determination of settlement of shallow foundations on cohesionless soil is an important task in geotechnical engineering. Available methods for the determination of settlement are not reliable. In this study, the support vector machine (SVM), a novel type of learning algorithm based on statistical theory, has been used to predict the settlement of shallow foundations on cohesionless soil. SVM uses a regression technique by introducing an ε – insensitive loss function. A thorough sensitive analysis has been made to ascertain which parameters are having maximum influence on settlement. The study shows that SVM has the potential to be a useful and practical tool for prediction of settlement of shallow foundation on cohesionless soil.
Resumo:
Let X(t) be a right continuous temporally homogeneous Markov pro- cess, Tt the corresponding semigroup and A the weak infinitesimal genera- tor. Let g(t) be absolutely continuous and r a stopping time satisfying E.( S f I g(t) I dt) < oo and E.( f " I g'(t) I dt) < oo Then for f e 9iJ(A) with f(X(t)) right continuous the identity Exg(r)f(X(z)) - g(O)f(x) = E( 5 " g'(s)f(X(s)) ds) + E.( 5 " g(s)Af(X(s)) ds) is a simple generalization of Dynkin's identity (g(t) 1). With further restrictions on f and r the following identity is obtained as a corollary: Ex(f(X(z))) = f(x) + k! Ex~rkAkf(X(z))) + n-1E + (n ) )!.E,(so un-1Anf(X(u)) du). These identities are applied to processes with stationary independent increments to obtain a number of new and known results relating the moments of stopping times to the moments of the stopped processes.