89 resultados para datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We make an assessment of the impact of projected climate change on forest ecosystems in India. This assessment is based on climate projections of the Regional Climate Model of the Hadley Centre (HadRM3) and the dynamic global vegetation model IBIS for A2 and B2 scenarios. According to the model projections, 39% of forest grids are likely to undergo vegetation type change under the A2 scenario and 34% under the B2 scenario by the end of this century. However, in many forest dominant states such as Chattisgarh, Karnataka and Andhra Pradesh up to 73%, 67% and 62% of forested grids are projected to undergo change. Net Primary Productivity (NPP) is projected to increase by 68.8% and 51.2% under the A2 and B2 scenarios, respectively, and soil organic carbon (SOC) by 37.5% for A2 and 30.2% for B2 scenario. Based on the dynamic global vegetation modeling, we present a forest vulnerability index for India which is based on the observed datasets of forest density, forest biodiversity as well as model predicted vegetation type shift estimates for forested grids. The vulnerability index suggests that upper Himalayas, northern and central parts of Western Ghats and parts of central India are most vulnerable to projected impacts of climate change, while Northeastern forests are more resilient. Thus our study points to the need for developing and implementing adaptation strategies to reduce vulnerability of forests to projected climate change.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of the shear wave velocity data as a field index for evaluating the liquefaction potential of sands is receiving increased attention because both shear wave velocity and liquefaction resistance are similarly influenced by many of the same factors such as void ratio, state of stress, stress history and geologic age. In this paper, the potential of support vector machine (SVM) based classification approach has been used to assess the liquefaction potential from actual shear wave velocity data. In this approach, an approximate implementation of a structural risk minimization (SRM) induction principle is done, which aims at minimizing a bound on the generalization error of a model rather than minimizing only the mean square error over the data set. Here SVM has been used as a classification tool to predict liquefaction potential of a soil based on shear wave velocity. The dataset consists the information of soil characteristics such as effective vertical stress (sigma'(v0)), soil type, shear wave velocity (V-s) and earthquake parameters such as peak horizontal acceleration (a(max)) and earthquake magnitude (M). Out of the available 186 datasets, 130 are considered for training and remaining 56 are used for testing the model. The study indicated that SVM can successfully model the complex relationship between seismic parameters, soil parameters and the liquefaction potential. In the model based on soil characteristics, the input parameters used are sigma'(v0), soil type. V-s, a(max) and M. In the other model based on shear wave velocity alone uses V-s, a(max) and M as input parameters. In this paper, it has been demonstrated that Vs alone can be used to predict the liquefaction potential of a soil using a support vector machine model. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper studies the problem of constructing robust classifiers when the training is plagued with uncertainty. The problem is posed as a Chance-Constrained Program (CCP) which ensures that the uncertain data points are classified correctly with high probability. Unfortunately such a CCP turns out to be intractable. The key novelty is in employing Bernstein bounding schemes to relax the CCP as a convex second order cone program whose solution is guaranteed to satisfy the probabilistic constraint. Prior to this work, only the Chebyshev based relaxations were exploited in learning algorithms. Bernstein bounds employ richer partial information and hence can be far less conservative than Chebyshev bounds. Due to this efficient modeling of uncertainty, the resulting classifiers achieve higher classification margins and hence better generalization. Methodologies for classifying uncertain test data points and error measures for evaluating classifiers robust to uncertain data are discussed. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle data uncertainty and outperform state-of-the-art in many cases.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Due to its wide applicability, semi-supervised learning is an attractive method for using unlabeled data in classification. In this work, we present a semi-supervised support vector classifier that is designed using quasi-Newton method for nonsmooth convex functions. The proposed algorithm is suitable in dealing with very large number of examples and features. Numerical experiments on various benchmark datasets showed that the proposed algorithm is fast and gives improved generalization performance over the existing methods. Further, a non-linear semi-supervised SVM has been proposed based on a multiple label switching scheme. This non-linear semi-supervised SVM is found to converge faster and it is found to improve generalization performance on several benchmark datasets. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We have analysed the diurnal cycle of rainfall over the Indian region (10S-35N, 60E-100E) using both satellite and in-situ data, and found many interesting features associated with this fundamental, yet under-explored, mode of variability. Since there is a distinct and strong diurnal mode of variability associated with the Indian summer monsoon rainfall, we evaluate the ability of the Weather Research and Forecasting Model (WRF) to simulate the observed diurnal rainfall characteristics. The model (at 54km grid-spacing) is integrated for the month of July, 2006, since this period was particularly favourable for the study of diurnal cycle. We first evaluate the sensitivity of the model to the prescribed sea surface temperature (SST), by using two different SST datasets, namely, Final Analyses (FNL) and Real-time Global (RTG). It was found that with RTG SST the rainfall simulation over central India (CI) was significantly better than that with FNL. On the other hand, over the Bay of Bengal (BoB), rainfall simulated with FNL was marginally better than with RTG. However, the overall performance of RTG SST was found to be better than FNL, and hence it was used for further model simulations. Next, we investigated the role of the convective parameterization scheme on the simulation of diurnal cycle of rainfall. We found that the Kain-Fritsch (KF) scheme performs significantly better than Betts-Miller-Janjić (BMJ) and Grell-Devenyi schemes. We also studied the impact of other physical parameterizations, namely, microphysics, boundary layer, land surface, and the radiation parameterization, on the simulation of diurnal cycle of rainfall, and identified the “best” model configuration. We used this configuration of the “best” model to perform a sensitivity study on the role of various convective components used in the KF scheme. In particular, we studied the role of convective downdrafts, convective timescale, and feedback fraction, on the simulated diurnal cycle of rainfall. The “best” model simulations, in general, show a good agreement with observations. Specifically, (i) Over CI, the simulated diurnal rainfall peak is at 1430 IST, in comparison to the observed 1430-1730 IST peak; (ii) Over Western Ghats and Burmese mountains, the model simulates a diurnal rainfall peak at 1430 IST, as opposed to the observed peak of 1430-1730 IST; (iii) Over Sumatra, both model and observations show a diurnal peak at 1730 IST; (iv) The observed southward propagating diurnal rainfall bands over BoB are weakly simulated by WRF. Besides the diurnal cycle of rainfall, the mean spatial pattern of total rainfall and its partitioning between the convective and stratiform components, are also well simulated. The “best” model configuration was used to conduct two nested simulations with one-way, three-level nesting (54-18-6km) over CI and BoB. While, the 54km and 18km simulations were conducted for the whole of July, 2006, the 6km simulation was carried out for the period 18 - 24 July, 2006. The results of our coarse- and fine-scale numerical simulations of the diurnal cycle of monsoon rainfall will be discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Temporal analysis of gene expression data has been limited to identifying genes whose expression varies with time and/or correlation between genes that have similar temporal profiles. Often, the methods do not consider the underlying network constraints that connect the genes. It is becoming increasingly evident that interactions change substantially with time. Thus far, there is no systematic method to relate the temporal changes in gene expression to the dynamics of interactions between them. Information on interaction dynamics would open up possibilities for discovering new mechanisms of regulation by providing valuable insight into identifying time-sensitive interactions as well as permit studies on the effect of a genetic perturbation. Results: We present NETGEM, a tractable model rooted in Markov dynamics, for analyzing the dynamics of the interactions between proteins based on the dynamics of the expression changes of the genes that encode them. The model treats the interaction strengths as random variables which are modulated by suitable priors. This approach is necessitated by the extremely small sample size of the datasets, relative to the number of interactions. The model is amenable to a linear time algorithm for efficient inference. Using temporal gene expression data, NETGEM was successful in identifying (i) temporal interactions and determining their strength, (ii) functional categories of the actively interacting partners and (iii) dynamics of interactions in perturbed networks. Conclusions: NETGEM represents an optimal trade-off between model complexity and data requirement. It was able to deduce actively interacting genes and functional categories from temporal gene expression data. It permits inference by incorporating the information available in perturbed networks. Given that the inputs to NETGEM are only the network and the temporal variation of the nodes, this algorithm promises to have widespread applications, beyond biological systems. The source code for NETGEM is available from https://github.com/vjethava/NETGEM

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper addresses the problem of maximum margin classification given the moments of class conditional densities and the false positive and false negative error rates. Using Chebyshev inequalities, the problem can be posed as a second order cone programming problem. The dual of the formulation leads to a geometric optimization problem, that of computing the distance between two ellipsoids, which is solved by an iterative algorithm. The formulation is extended to non-linear classifiers using kernel methods. The resultant classifiers are applied to the case of classification of unbalanced datasets with asymmetric costs for misclassification. Experimental results on benchmark datasets show the efficacy of the proposed method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Support Vector Clustering has gained reasonable attention from the researchers in exploratory data analysis due to firm theoretical foundation in statistical learning theory. Hard Partitioning of the data set achieved by support vector clustering may not be acceptable in real world scenarios. Rough Support Vector Clustering is an extension of Support Vector Clustering to attain a soft partitioning of the data set. But the Quadratic Programming Problem involved in Rough Support Vector Clustering makes it computationally expensive to handle large datasets. In this paper, we propose Rough Core Vector Clustering algorithm which is a computationally efficient realization of Rough Support Vector Clustering. Here Rough Support Vector Clustering problem is formulated using an approximate Minimum Enclosing Ball problem and is solved using an approximate Minimum Enclosing Ball finding algorithm. Experiments done with several Large Multi class datasets such as Forest cover type, and other Multi class datasets taken from LIBSVM page shows that the proposed strategy is efficient, finds meaningful soft cluster abstractions which provide a superior generalization performance than the SVM classifier.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a novel, scalable, clustering based Ordinal Regression formulation, which is an instance of a Second Order Cone Program (SOCP) with one Second Order Cone (SOC) constraint. The main contribution of the paper is a fast algorithm, CB-OR, which solves the proposed formulation more eficiently than general purpose solvers. Another main contribution of the paper is to pose the problem of focused crawling as a large scale Ordinal Regression problem and solve using the proposed CB-OR. Focused crawling is an efficient mechanism for discovering resources of interest on the web. Posing the problem of focused crawling as an Ordinal Regression problem avoids the need for a negative class and topic hierarchy, which are the main drawbacks of the existing focused crawling methods. Experiments on large synthetic and benchmark datasets show the scalability of CB-OR. Experiments also show that the proposed focused crawler outperforms the state-of-the-art.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Applications in various domains often lead to very large and frequently high-dimensional data. Successful algorithms must avoid the curse of dimensionality but at the same time should be computationally efficient. Finding useful patterns in large datasets has attracted considerable interest recently. The primary goal of the paper is to implement an efficient Hybrid Tree based clustering method based on CF-Tree and KD-Tree, and combine the clustering methods with KNN-Classification. The implementation of the algorithm involves many issues like good accuracy, less space and less time. We will evaluate the time and space efficiency, data input order sensitivity, and clustering quality through several experiments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering algorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint ensures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the state-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and synthetic datasets show that the proposed algorithm outperforms SVM solvers in terms of training time and achieves similar accuracies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a new algorithm for learning polyhedral classifiers. In contrast to existing methods for learning polyhedral classifier which solve a constrained optimization problem, our method solves an unconstrained optimization problem. Our method is based on a logistic function based model for the posterior probability function. We propose an alternating optimization algorithm, namely, SPLA1 (Single Polyhedral Learning Algorithm1) which maximizes the loglikelihood of the training data to learn the parameters. We also extend our method to make it independent of any user specified parameter (e.g., number of hyperplanes required to form a polyhedral set) in SPLA2. We show the effectiveness of our approach with experiments on various synthetic and real world datasets and compare our approach with a standard decision tree method (OC1) and a constrained optimization based method for learning polyhedral sets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we consider the process of discovering frequent episodes in event sequences. The most computationally intensive part of this process is that of counting the frequencies of a set of candidate episodes. We present two new frequency counting algorithms for speeding up this part. These, referred to as non-overlapping and non-inteleaved frequency counts, are based on directly counting suitable subsets of the occurrences of an episode. Hence they are different from the frequency counts of Mannila et al [1], where they count the number of windows in which the episode occurs. Our new frequency counts offer a speed-up factor of 7 or more on real and synthetic datasets. We also show how the new frequency counts can be used when the events in episodes have time-durations as well.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

During summer, the northern Indian Ocean exhibits significant atmospheric intraseasonal variability associated with active and break phases of the monsoon in the 30-90 days band. In this paper, we investigate mechanisms of the Sea Surface Temperature (SST) signature of this atmospheric variability, using a combination of observational datasets and Ocean General Circulation Model sensitivity experiments. In addition to the previously-reported intraseasonal SST signature in the Bay of Bengal, observations show clear SST signals in the Arabian Sea related to the active/break cycle of the monsoon. As the atmospheric intraseasonal oscillation moves northward, SST variations appear first at the southern tip of India (day 0), then in the Somali upwelling region (day 10), northern Bay of Bengal (day 19) and finally in the Oman upwelling region (day 23). The Bay of Bengal and Oman signals are most clearly associated with the monsoon active/break index, whereas the relationship with signals near Somali upwelling and the southern tip of India is weaker. In agreement with previous studies, we find that heat flux variations drive most of the intraseasonal SST variability in the Bay of Bengal, both in our model (regression coefficient, 0.9, against similar to 0.25 for wind stress) and in observations (0.8 regression coefficient); similar to 60% of the heat flux variation is due do shortwave radiation and similar to 40% due to latent heat flux. On the other hand, both observations and model results indicate a prominent role of dynamical oceanic processes in the Arabian Sea. Wind-stress variations force about 70-100% of SST intraseasonal variations in the Arabian Sea, through modulation of oceanic processes (entrainment, mixing, Ekman pumping, lateral advection). Our similar to 100 km resolution model suggests that internal oceanic variability (i.e. eddies) contributes substantially to intraseasonal variability at small-scale in the Somali upwelling region, but does not contribute to large-scale intraseasonal SST variability due to its small spatial scale and random phase relation to the active-break monsoon cycle. The effect of oceanic eddies; however, remains to be explored at a higher spatial resolution.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a ID sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. (C) 2012 Elsevier Masson SAS. All rights reserved.