155 resultados para Rainfall event classification
Resumo:
Proving the unsatisfiability of propositional Boolean formulas has applications in a wide range of fields. Minimal Unsatisfiable Sets (MUS) are signatures of the property of unsatisfiability in formulas and our understanding of these signatures can be very helpful in answering various algorithmic and structural questions relating to unsatisfiability. In this paper, we explore some combinatorial properties of MUS and use them to devise a classification scheme for MUS. We also derive bounds on the sizes of MUS in Horn, 2-SAT and 3-SAT formulas.
Resumo:
In this paper, we consider the problem of time series classification. Using piecewise linear interpolation various novel kernels are obtained which can be used with Support vector machines for designing classifiers capable of deciding the class of a given time series. The approach is general and is applicable in many scenarios. We apply the method to the task of Online Tamil handwritten character recognition with promising results.
Resumo:
In this paper we consider the process of discovering frequent episodes in event sequences. The most computationally intensive part of this process is that of counting the frequencies of a set of candidate episodes. We present two new frequency counting algorithms for speeding up this part. These, referred to as non-overlapping and non-inteleaved frequency counts, are based on directly counting suitable subsets of the occurrences of an episode. Hence they are different from the frequency counts of Mannila et al [1], where they count the number of windows in which the episode occurs. Our new frequency counts offer a speed-up factor of 7 or more on real and synthetic datasets. We also show how the new frequency counts can be used when the events in episodes have time-durations as well.
Resumo:
Discovering patterns in temporal data is an important task in Data Mining. A successful method for this was proposed by Mannila et al. [1] in 1997. In their framework, mining for temporal patterns in a database of sequences of events is done by discovering the so called frequent episodes. These episodes characterize interesting collections of events occurring relatively close to each other in some partial order. However, in this framework(and in many others for finding patterns in event sequences), the ordering of events in an event sequence is the only allowed temporal information. But there are many applications where the events are not instantaneous; they have time durations. Interesting episodesthat we want to discover may need to contain information regarding event durations etc. In this paper we extend Mannila et al.’s framework to tackle such issues. In our generalized formulation, episodes are defined so that much more temporal information about events can be incorporated into the structure of an episode. This significantly enhances the expressive capability of the rules that can be discovered in the frequent episode framework. We also present algorithms for discovering such generalized frequent episodes.
Resumo:
We consider a small extent sensor network for event detection, in which nodes periodically take samples and then contend over a random access network to transmit their measurement packets to the fusion center. We consider two procedures at the fusion center for processing the measurements. The Bayesian setting, is assumed, that is, the fusion center has a prior distribution on the change time. In the first procedure, the decision algorithm at the fusion center is network-oblivious and makes a decision only when a complete vector of measurements taken at a sampling instant is available. In the second procedure, the decision algorithm at the fusion center is network-aware and processes measurements as they arrive, but in a time-causal order. In this case, the decision statistic depends on the network delays, whereas in the network-oblivious case, the decision statistic does not. This yields a Bayesian change-detection problem with a trade-off between the random network delay and the decision delay that is, a higher sampling rate reduces the decision delay but increases the random access delay. Under periodic sampling, in the network-oblivious case, the structure of the optimal stopping rule is the same as that without the network, and the optimal change detection delay decouples into the network delay and the optimal decision delay without the network. In the network-aware case, the optimal stopping problem is analyzed as a partially observable Markov decision process, in which the states of the queues and delays in the network need to be maintained. A sufficient decision statistic is the network state and the posterior probability of change having occurred, given the measurements received and the state of the network. The optimal regimes are studied using simulation.
Resumo:
In this study, an effort has been made to study heavy rainfall events during cyclonic storms over Indian Ocean. This estimate is based on microwave observations from tropical rainfall measuring mission (TRMM) Microwave Imager (TMI). Regional scattering index (SI) developed for Indian region based on measurements at 19-, 21- and 85-GHz brightness temperature and polarization corrected temperature (PCT) at 85 GHz have been utilized in this study. These PCT and SI are collocated against Precipitation Radar (PR) onboard TRMM to establish a relationship between rainfall rate, PCT and SI. The retrieval technique using both linear and nonlinear regressions has been developed utilizing SI, PCT and the combination of SI and PCT. The results have been compared with the observations from PR. It was found that a nonlinear algorithm using combination of SI and PCT is more accurate than linear algorithm or nonlinear algorithm using either SI or PCT. Statistical comparison with PR exhibits the correlation coefficients (CC) of 0.68, 0.66 and 0.70, and root mean square error (RMSE) of 1.78, 1.96 and 1.68 mm/h from the observations of SI, PCT and combination of SI and PCT respectively using linear regressions. When nonlinear regression is used, the CC of 0.73, 0.71, 0.79 and RMSE of 1.64, 1.95, 1.54 mm/h are observed from the observations of SI, PCT and combination of SI and PCT, respectively. The error statistics for high rain events (above 10 mm/h) shows the CC of 0.58, 0.59, 0.60 and RMSE of 5.07, 5.47, 5.03 mm/h from the observations of SI, PCT and combination of SI and PCT, respectively, using linear regression, and on the other hand, use of nonlinear regression yields the CC of 0.66, 0.64, 0.71 and RMSE of 4.68, 5.78 and 4.02 mm/h from the observations of SI, PCT and combined SI and PCT, respectively.
Resumo:
The failure of atmospheric general circulation models (AGCMs) forced by prescribed SST to simulate and predict the interannual variability of Indian/Asian monsoon has been widely attributed to their inability to reproduce the actual sea surface temperature (SST)-rainfall relationship in the warm Indo-Pacific oceans. This assessment is based on a comparison of the observed and simulated correlation between the rainfall and local SST. However, the observed SSTconvection/rainfall relationship is nonlinear and for this a linear measure such as the correlation is not an appropriate measure. We show that the SST-rainfall relationship simulated by atmospheric and coupled general circulation models in IPCC AR4 is nonlinear, as observed, and realistic over the tropical West Pacific (WPO) and the Indian Ocean (IO). The SST-rainfall pattern simulated by the coupled versions of these models is rather similar to that from the corresponding atmospheric one, except for a shift of the entire pattern to colder/warmer SSTs when there is a cold/warm bias in the coupled version.
Resumo:
This paper discusses an approach for river mapping and flood evaluation based on multi-temporal time series analysis of satellite images utilizing pixel spectral information for image classification and region-based segmentation for extracting water-covered regions. Analysis of MODIS satellite images is applied in three stages: before flood, during flood and after flood. Water regions are extracted from the MODIS images using image classification (based on spectral information) and image segmentation (based on spatial information). Multi-temporal MODIS images from ``normal'' (non-flood) and flood time-periods are processed in two steps. In the first step, image classifiers such as Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) separate the image pixels into water and non-water groups based on their spectral features. The classified image is then segmented using spatial features of the water pixels to remove the misclassified water. From the results obtained, we evaluate the performance of the method and conclude that the use of image classification (SVM and ANN) and region-based image segmentation is an accurate and reliable approach for the extraction of water-covered regions. (c) 2012 COSPAR. Published by Elsevier Ltd. All rights reserved.
Resumo:
Mountain waves in the stratosphere have been observed over elevated topographies using both nadir-looking and limb-viewing satellites. However, the characteristics of mountain waves generated over the Himalayan Mountain range and the adjacent Tibetan Plateau are relatively less explored. The present study reports on three-dimensional (3-D) properties of a mountain wave event that occurred over the western Himalayan region on 9 December 2008. Observations made by the Atmospheric Infrared Sounder on board the Aqua and Microwave Limb Sounder on board the Aura satellites are used to delineate the wave properties. The observed wave properties such as horizontal (lambda(x), lambda(y)) and vertical (lambda(z)) wavelengths are 276 km (zonal), 289 km (meridional), and 25 km, respectively. A good agreement is found between the observed and modeled/analyzed vertical wavelength for a stationary gravity wave determined using the Modern Era Retrospective Analysis for Research and Applications (MERRA) reanalysis winds. The analysis of both the National Centers for Environmental Prediction/National Center for Atmospheric Research reanalysis and MERRA winds shows that the waves are primarily forced by strong flow across the topography. Using the 3-D properties of waves and the corrected temperature amplitudes, we estimated wave momentum fluxes of the order of similar to 0.05 Pa, which is in agreement with large-amplitude mountain wave events reported elsewhere. In this regard, the present study is considered to be very much informative to the gravity wave drag schemes employed in current general circulation models for this region.
Resumo:
In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T-2) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly.
Resumo:
In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.
Resumo:
The present approach uses stopwords and the gaps that oc- cur between successive stopwords –formed by contentwords– as features for sentiment classification.
Resumo:
Time series classification deals with the problem of classification of data that is multivariate in nature. This means that one or more of the attributes is in the form of a sequence. The notion of similarity or distance, used in time series data, is significant and affects the accuracy, time, and space complexity of the classification algorithm. There exist numerous similarity measures for time series data, but each of them has its own disadvantages. Instead of relying upon a single similarity measure, our aim is to find the near optimal solution to the classification problem by combining different similarity measures. In this work, we use genetic algorithms to combine the similarity measures so as to get the best performance. The weightage given to different similarity measures evolves over a number of generations so as to get the best combination. We test our approach on a number of benchmark time series datasets and present promising results.
Resumo:
Native species' response to the presence of invasive species is context specific. This response cannot be studied in isolation from the prevailing environmental stresses in invaded habitats such as seasonal drought. We investigated the combined effects of an invasive shrub Lantana camara L. (lantana), seasonal rainfall and species' microsite preferences on the growth and survival of 1,105 naturally established seedlings of native trees and shrubs in a seasonally dry tropical forest. Individuals were followed from April 2008 to February 2010, and growth and survival measured in relation to lantana density, seasonality of rainfall and species characteristics in a 50-ha permanent forest plot located in Mudumalai, southern India. We used a mixed effects modelling approach to examine seedling growth and generalized linear models to examine seedling survival. The overall relative height growth rate of established seedlings was found to be very low irrespective of the presence or absence of dense lantana. 22-month growth rate of dry forest species was lower under dense lantana while moist forest species were not affected by the presence of lantana thickets. 4-month growth rates of all species increased with increasing inter-census rainfall. Community results may be influenced by responses of the most abundant species, Catunaregam spinosa, whose growth rates were always lower under dense lantana. Overall seedling survival was high, increased with increasing rainfall and was higher for species with dry forest preference than for species with moist forest preference. The high survival rates of naturally established seedlings combined with their basal sprouting ability in this forest could enable the persistence of woody species in the face of invasive species.