168 resultados para Probabilistic Algorithms
Resumo:
Facet-based sentiment analysis involves discovering the latent facets, sentiments and their associations. Traditional facet-based sentiment analysis algorithms typically perform the various tasks in sequence, and fail to take advantage of the mutual reinforcement of the tasks. Additionally,inferring sentiment levels typically requires domain knowledge or human intervention. In this paper, we propose aseries of probabilistic models that jointly discover latent facets and sentiment topics, and also order the sentiment topics with respect to a multi-point scale, in a language and domain independent manner. This is achieved by simultaneously capturing both short-range syntactic structure and long range semantic dependencies between the sentiment and facet words. The models further incorporate coherence in reviews, where reviewers dwell on one facet or sentiment level before moving on, for more accurate facet and sentiment discovery. For reviews which are supplemented with ratings, our models automatically order the latent sentiment topics, without requiring seed-words or domain-knowledge. To the best of our knowledge, our work is the first attempt to combine the notions of syntactic and semantic dependencies in the domain of review mining. Further, the concept of facet and sentiment coherence has not been explored earlier either. Extensive experimental results on real world review data show that the proposed models outperform various state of the art baselines for facet-based sentiment analysis.
Resumo:
Time series classification deals with the problem of classification of data that is multivariate in nature. This means that one or more of the attributes is in the form of a sequence. The notion of similarity or distance, used in time series data, is significant and affects the accuracy, time, and space complexity of the classification algorithm. There exist numerous similarity measures for time series data, but each of them has its own disadvantages. Instead of relying upon a single similarity measure, our aim is to find the near optimal solution to the classification problem by combining different similarity measures. In this work, we use genetic algorithms to combine the similarity measures so as to get the best performance. The weightage given to different similarity measures evolves over a number of generations so as to get the best combination. We test our approach on a number of benchmark time series datasets and present promising results.
Operator-splitting finite element algorithms for computations of high-dimensional parabolic problems
Resumo:
An operator-splitting finite element method for solving high-dimensional parabolic equations is presented. The stability and the error estimates are derived for the proposed numerical scheme. Furthermore, two variants of fully-practical operator-splitting finite element algorithms based on the quadrature points and the nodal points, respectively, are presented. Both the quadrature and the nodal point based operator-splitting algorithms are validated using a three-dimensional (3D) test problem. The numerical results obtained with the full 3D computations and the operator-split 2D + 1D computations are found to be in a good agreement with the analytical solution. Further, the optimal order of convergence is obtained in both variants of the operator-splitting algorithms. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
This paper considers sequential hypothesis testing in a decentralized framework. We start with two simple decentralized sequential hypothesis testing algorithms. One of which is later proved to be asymptotically Bayes optimal. We also consider composite versions of decentralized sequential hypothesis testing. A novel nonparametric version for decentralized sequential hypothesis testing using universal source coding theory is developed. Finally we design a simple decentralized multihypothesis sequential detection algorithm.
Resumo:
Low-complexity near-optimal detection of signals in MIMO systems with large number (tens) of antennas is getting increased attention. In this paper, first, we propose a variant of Markov chain Monte Carlo (MCMC) algorithm which i) alleviates the stalling problem encountered in conventional MCMC algorithm at high SNRs, and ii) achieves near-optimal performance for large number of antennas (e.g., 16×16, 32×32, 64×64 MIMO) with 4-QAM. We call this proposed algorithm as randomized MCMC (R-MCMC) algorithm. Second, we propose an other algorithm based on a random selection approach to choose candidate vectors to be tested in a local neighborhood search. This algorithm, which we call as randomized search (RS) algorithm, also achieves near-optimal performance for large number of antennas with 4-QAM. The complexities of the proposed R-MCMC and RS algorithms are quadratic/sub-quadratic in number of transmit antennas, which are attractive for detection in large-MIMO systems. We also propose message passing aided R-MCMC and RS algorithms, which are shown to perform well for higher-order QAM.
Resumo:
This paper discusses an approach for river mapping and flood evaluation based on multi-temporal time-series analysis of satellite images utilizing pixel spectral information for image clustering and region based segmentation for extracting water covered regions. MODIS satellite images are analyzed at two stages: before flood and during flood. Multi-temporal MODIS images are processed in two steps. In the first step, clustering algorithms such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) are used to distinguish the water regions from the non-water based on spectral information. These algorithms are chosen since they are quite efficient in solving multi-modal optimization problems. These classified images are then segmented using spatial features of the water region to extract the river. From the results obtained, we evaluate the performance of the methods and conclude that incorporating region based image segmentation along with clustering algorithms provides accurate and reliable approach for the extraction of water covered region.
Resumo:
The delineation of seismic source zones plays an important role in the evaluation of seismic hazard. In most of the studies the seismic source delineation is done based on geological features. In the present study, an attempt has been made to delineate seismic source zones in the study area (south India) based on the seismicity parameters. Seismicity parameters and the maximum probable earthquake for these source zones were evaluated and were used in the hazard evaluation. The probabilistic evaluation of seismic hazard for south India was carried out using a logic tree approach. Two different types of seismic sources, linear and areal, were considered in the present study to model the seismic sources in the region more precisely. In order to properly account for the attenuation characteristics of the region, three different attenuation relations were used with different weightage factors. Seismic hazard evaluation was done for the probability of exceedance (PE) of 10% and 2% in 50 years. The spatial variation of rock level peak horizontal acceleration (PHA) and spectral acceleration (Sa) values corresponding to return periods of 475 and 2500 years for the entire study area are presented in this work. The peak ground acceleration (PGA) values at ground surface level were estimated based on different NEHRP site classes by considering local site effects.
Resumo:
For compressed sensing (CS), we develop a new scheme inspired by data fusion principles. In the proposed fusion based scheme, several CS reconstruction algorithms participate and they are executed in parallel, independently. The final estimate of the underlying sparse signal is derived by fusing the estimates obtained from the participating algorithms. We theoretically analyze this fusion based scheme and derive sufficient conditions for achieving a better reconstruction performance than any participating algorithm. Through simulations, we show that the proposed scheme has two specific advantages: 1) it provides good performance in a low dimensional measurement regime, and 2) it can deal with different statistical natures of the underlying sparse signals. The experimental results on real ECG signals shows that the proposed scheme demands fewer CS measurements for an approximate sparse signal reconstruction.
Resumo:
Past studies use deterministic models to evaluate optimal cache configuration or to explore its design space. However, with the increasing number of components present on a chip multiprocessor (CMP), deterministic approaches do not scale well. Hence, we apply probabilistic genetic algorithms (GA) to determine a near-optimal cache configuration for a sixteen tiled CMP. We propose and implement a faster trace based approach to estimate fitness of a chromosome. It shows up-to 218x simulation speedup over the cycle-accurate architectural simulation. Our methodology can be applied to solve other cache optimization problems such as design space exploration of cache and its partitioning among applications/ virtual machines.
Resumo:
Opportunistic relay selection in a multiple source-destination (MSD) cooperative system requires quickly allocating to each source-destination (SD) pair a suitable relay based on channel gains. Since the channel knowledge is available only locally at a relay and not globally, efficient relay selection algorithms are needed. For an MSD system, in which the SD pairs communicate in a time-orthogonal manner with the help of decode-and-forward relays, we propose three novel relay selection algorithms, namely, contention-free en masse assignment (CFEA), contention-based en masse assignment (CBEA), and a hybrid algorithm that combines the best features of CFEA and CBEA. En masse assignment exploits the fact that a relay can often aid not one but multiple SD pairs, and, therefore, can be assigned to multiple SD pairs. This drastically reduces the average time required to allocate an SD pair when compared to allocating the SD pairs one by one. We show that the algorithms are much faster than other selection schemes proposed in the literature and yield significantly higher net system throughputs. Interestingly, CFEA is as effective as CBEA over a wider range of system parameters than in single SD pair systems.
Resumo:
Estimating program worst case execution time(WCET) accurately and efficiently is a challenging task. Several programs exhibit phase behavior wherein cycles per instruction (CPI) varies in phases during execution. Recent work has suggested the use of phases in such programs to estimate WCET with minimal instrumentation. However the suggested model uses a function of mean CPI that has no probabilistic guarantees. We propose to use Chebyshev's inequality that can be applied to any arbitrary distribution of CPI samples, to probabilistically bound CPI of a phase. Applying Chebyshev's inequality to phases that exhibit high CPI variation leads to pessimistic upper bounds. We propose a mechanism that refines such phases into sub-phases based on program counter(PC) signatures collected using profiling and also allows the user to control variance of CPI within a sub-phase. We describe a WCET analyzer built on these lines and evaluate it with standard WCET and embedded benchmark suites on two different architectures for three chosen probabilities, p={0.9, 0.95 and 0.99}. For p= 0.99, refinement based on PC signatures alone, reduces average pessimism of WCET estimate by 36%(77%) on Arch1 (Arch2). Compared to Chronos, an open source static WCET analyzer, the average improvement in estimates obtained by refinement is 5%(125%) on Arch1 (Arch2). On limiting variance of CPI within a sub-phase to {50%, 10%, 5% and 1%} of its original value, average accuracy of WCET estimate improves further to {9%, 11%, 12% and 13%} respectively, on Arch1. On Arch2, average accuracy of WCET improves to 159% when CPI variance is limited to 50% of its original value and improvement is marginal beyond that point.
Resumo:
In this paper, we propose low-complexity algorithms based on Monte Carlo sampling for signal detection and channel estimation on the uplink in large-scale multiuser multiple-input-multiple-output (MIMO) systems with tens to hundreds of antennas at the base station (BS) and a similar number of uplink users. A BS receiver that employs a novel mixed sampling technique (which makes a probabilistic choice between Gibbs sampling and random uniform sampling in each coordinate update) for detection and a Gibbs-sampling-based method for channel estimation is proposed. The algorithm proposed for detection alleviates the stalling problem encountered at high signal-to-noise ratios (SNRs) in conventional Gibbs-sampling-based detection and achieves near-optimal performance in large systems with M-ary quadrature amplitude modulation (M-QAM). A novel ingredient in the detection algorithm that is responsible for achieving near-optimal performance at low complexity is the joint use of a mixed Gibbs sampling (MGS) strategy coupled with a multiple restart (MR) strategy with an efficient restart criterion. Near-optimal detection performance is demonstrated for a large number of BS antennas and users (e. g., 64 and 128 BS antennas and users). The proposed Gibbs-sampling-based channel estimation algorithm refines an initial estimate of the channel obtained during the pilot phase through iterations with the proposed MGS-based detection during the data phase. In time-division duplex systems where channel reciprocity holds, these channel estimates can be used for multiuser MIMO precoding on the downlink. The proposed receiver is shown to achieve good performance and scale well for large dimensions.
Resumo:
The boxicity (cubicity) of a graph G, denoted by box(G) (respectively cub(G)), is the minimum integer k such that G can be represented as the intersection graph of axis parallel boxes (cubes) in ℝ k . The problem of computing boxicity (cubicity) is known to be inapproximable in polynomial time even for graph classes like bipartite, co-bipartite and split graphs, within an O(n 0.5 − ε ) factor for any ε > 0, unless NP = ZPP. We prove that if a graph G on n vertices has a clique on n − k vertices, then box(G) can be computed in time n22O(k2logk) . Using this fact, various FPT approximation algorithms for boxicity are derived. The parameter used is the vertex (or edge) edit distance of the input graph from certain graph families of bounded boxicity - like interval graphs and planar graphs. Using the same fact, we also derive an O(nloglogn√logn√) factor approximation algorithm for computing boxicity, which, to our knowledge, is the first o(n) factor approximation algorithm for the problem. We also present an FPT approximation algorithm for computing the cubicity of graphs, with vertex cover number as the parameter.
Resumo:
Maximum likelihood (ML) algorithms, for the joint estimation of synchronisation impairments and channel in multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) system, are investigated in this work. A system model that takes into account the effects of carrier frequency offset, sampling frequency offset, symbol timing error and channel impulse response is formulated. Cramer-Rao lower bounds for the estimation of continuous parameters are derived, which show the coupling effect among different impairments and the significance of the joint estimation. The authors propose an ML algorithm for the estimation of synchronisation impairments and channel together, using the grid search method. To reduce the complexity of the joint grid search in the ML algorithm, a modified ML (MML) algorithm with multiple one-dimensional searches is also proposed. Further, a stage-wise ML (SML) algorithm using existing algorithms, which estimate less number of parameters, is also proposed. Performance of the estimation algorithms is studied through numerical simulations and it is found that the proposed ML and MML algorithms exhibit better performance than SML algorithm.