960 resultados para data sets


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The production of rainfed crops in semi-arid tropics exhibits large variation in response to the variation in seasonal rainfall. There are several farm-level decisions such as the choice of cropping pattern, whether to invest in fertilizers, pesticides etc., the choice of the period for planting, plant population density etc. for which the appropriate choice (associated with maximum production or minimum risk) depends upon the nature of the rainfall variability or the prediction for a specific year. In this paper, we have addressed the problem of identifying the appropriate strategies for cultivation of rainfed groundnut in the Anantapur region in a semi-arid part of the Indian peninsula. The approach developed involves participatory research with active collaboration with farmers, so that the problems with perceived need are addressed with the modern tools and data sets available. Given the large spatial variation of climate and soil, the appropriate strategies are necessarily location specific. With the approach adopted, it is possible to tap the detailed location specific knowledge of the complex rainfed ecosystem and gain an insight into the variety of options of land use and management practices available to each category of stakeholders. We believe such a participatory approach is essential for identifying strategies that have a favourable cost-benefit ratio over the region considered and hence are associated with a high chance of acceptance by the stakeholders. (C) 2002 Elsevier Science Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a scheme for the compression of tree structured intermediate code consisting of a sequence of trees specified by a regular tree grammar. The scheme is based on arithmetic coding, and the model that works in conjunction with the coder is automatically generated from the syntactical specification of the tree language. Experiments on data sets consisting of intermediate code trees yield compression ratios ranging from 2.5 to 8, for file sizes ranging from 167 bytes to 1 megabyte.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP(mean average precision). We propose new, almost-lineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain)in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization.The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a randomized algorithm for large scale SVM learning which solves the problem by iterating over random subsets of the data. Crucial to the algorithm for scalability is the size of the subsets chosen. In the context of text classification we show that, by using ideas from random projections, a sample size of O(log n) can be used to obtain a solution which is close to the optimal with a high probability. Experiments done on synthetic and real life data sets demonstrate that the algorithm scales up SVM learners, without loss in accuracy. 1

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Even though several techniques have been proposed in the literature for achieving multiclass classification using Support Vector Machine(SVM), the scalability aspect of these approaches to handle large data sets still needs much of exploration. Core Vector Machine(CVM) is a technique for scaling up a two class SVM to handle large data sets. In this paper we propose a Multiclass Core Vector Machine(MCVM). Here we formulate the multiclass SVM problem as a Quadratic Programming(QP) problem defining an SVM with vector valued output. This QP problem is then solved using the CVM technique to achieve scalability to handle large data sets. Experiments done with several large synthetic and real world data sets show that the proposed MCVM technique gives good generalization performance as that of SVM at a much lesser computational expense. Further, it is observed that MCVM scales well with the size of the data set.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an approach for identifying the faulted line section and fault location on transmission systems using support vector machines (SVMs) for diagnosis/post-fault analysis purpose. Power system disturbances are often caused by faults on transmission lines. When fault occurs on a transmission system, the protective relay detects the fault and initiates the tripping operation, which isolates the affected part from the rest of the power system. Based on the fault section identified, rapid and corrective restoration procedures can thus be taken to minimize the power interruption and limit the impact of outage on the system. The approach is particularly important for post-fault diagnosis of any mal-operation of relays following a disturbance in the neighboring line connected to the same substation. This may help in improving the fault monitoring/diagnosis process, thus assuring secure operation of the power systems. In this paper we compare SVMs with radial basis function neural networks (RBFNN) in data sets corresponding to different faults on a transmission system. Classification and regression accuracy is reported for both strategies. Studies on a practical 24-Bus equivalent EHV transmission system of the Indian Southern region is presented for indicating the improved generalization with the large margin classifiers in enhancing the efficacy of the chosen model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An extension of the supramolecular synthon-based fragment approach (SBFA) method for transferability of multipole charge density parameters to include weak supramolecular synthons is proposed. In particular, the SBFA method is applied to C-H center dot center dot center dot O, C-H center dot center dot center dot F, and F center dot center dot center dot F containing synthons. A high resolution charge density study has been performed on 4-fluorobenzoic acid to build a synthon library for C-H center dot center dot center dot F infinite chain interactions. Libraries for C-H center dot center dot center dot O and F center dot center dot center dot F synthons were taken from earlier work. The SBFA methodology was applied successfully to 2- and 3-fluorobenzoic acids, data sets for which were collected in a routine manner at 100 K, and the modularity of the synthons was demonstrated. Cocrystals of isonicotinamide with all three fluorobenzoic acids were also studied with the SBFA method. The topological analysis of inter- and intramolecular interaction regions was performed using Bader's AIM approach. This study shows that the SBFA method is generally applicable to generate charge density maps using information from multiple intermolecular regions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Morse-Smale complex is a useful topological data structure for the analysis and visualization of scalar data. This paper describes an algorithm that processes all mesh elements of the domain in parallel to compute the Morse-Smale complex of large two-dimensional data sets at interactive speeds. We employ a reformulation of the Morse-Smale complex using Forman's Discrete Morse Theory and achieve scalability by computing the discrete gradient using local accesses only. We also introduce a novel approach to merge gradient paths that ensures accurate geometry of the computed complex. We demonstrate that our algorithm performs well on both multicore environments and on massively parallel architectures such as the GPU.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Long-distance dispersal (LDD) events, although rare for most plant species, can strongly influence population and community dynamics. Animals function as a key biotic vector of seeds and thus, a mechanistic and quantitative understanding of how individual animal behaviors scale to dispersal patterns at different spatial scales is a question of critical importance from both basic and applied perspectives. Using a diffusion-theory based analytical approach for a wide range of animal movement and seed transportation patterns, we show that the scale (a measure of local dispersal) of the seed dispersal kernel increases with the organisms' rate of movement and mean seed retention time. We reveal that variations in seed retention time is a key determinant of various measures of LDD such as kurtosis (or shape) of the kernel, thinkness of tails and the absolute number of seeds falling beyond a threshold distance. Using empirical data sets of frugivores, we illustrate the importance of variability in retention times for predicting the key disperser species that influence LDD. Our study makes testable predictions linking animal movement behaviors and gut retention times to dispersal patterns and, more generally, highlights the potential importance of animal behavioral variability for the LDD of seeds.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

1. Dispersal ability of a species is a key ecological characteristic, affecting a range of processes from adaptation, community dynamics and genetic structure, to distribution and range size. It is determined by both intrinsic species traits and extrinsic landscape-related properties. 2. Using butterflies as a model system, the following questions were addressed: (i) given similar extrinsic factors, which intrinsic species trait(s) explain dispersal ability? (ii) can one of these traits be used as a proxy for dispersal ability? (iii) the effect of interactions between the traits, and phylogenetic relatedness, on dispersal ability. 3. Four data sets, using different measures of dispersal, were compiled from published literature. The first data set uses mean dispersal distances from capture-mark-recapture studies, and the other three use mobility indices. Data for six traits that can potentially affect dispersal ability were collected: wingspan, larval host plant specificity, adult habitat specificity, mate location strategy, voltinism and flight period duration. Each data set was subjected to both unifactorial, and multifactorial, phylogenetically controlled analyses. 4. Among the factors considered, wingspan was the most important determinant of dispersal ability, although the predictive powers of regression models were low. Voltinism and flight period duration also affect dispersal ability, especially in case of temperate species. Interactions between the factors did not affect dispersal ability, and phylogenetic relatedness was significant in one data set. 5. While using wingspan as the only proxy for dispersal ability maybe problematic, it is usually the only easily accessible species-specific trait for a large number of species. It can thus be a satisfactory proxy when carefully interpreted, especially for analyses involving many species from all across the world.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we explore a novel idea of using high dynamic range (HDR) technology for uncertainty visualization. We focus on scalar volumetric data sets where every data point is associated with scalar uncertainty. We design a transfer function that maps each data point to a color in HDR space. The luminance component of the color is exploited to capture uncertainty. We modify existing tone mapping techniques and suitably integrate them with volume ray casting to obtain a low dynamic range (LDR) image. The resulting image is displayed on a conventional 8-bits-per-channel display device. The usage of HDR mapping reveals fine details in uncertainty distribution and enables the users to interactively study the data in the context of corresponding uncertainty information. We demonstrate the utility of our method and evaluate the results using data sets from ocean modeling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an approach for identifying the faulted line section and fault location on transmission systems using support vector machines (SVMs) for diagnosis/post-fault analysis purpose. Power system disturbances are often caused by faults on transmission lines. When fault occurs on a transmission system, the protective relay detects the fault and initiates the tripping operation, which isolates the affected part from the rest of the power system. Based on the fault section identified, rapid and corrective restoration procedures can thus be taken to minimize the power interruption and limit the impact of outage on the system. The approach is particularly important for post-fault diagnosis of any mal-operation of relays following a disturbance in the neighboring line connected to the same substation. This may help in improving the fault monitoring/diagnosis process, thus assuring secure operation of the power systems. In this paper we compare SVMs with radial basis function neural networks (RBFNN) in data sets corresponding to different faults on a transmission system. Classification and regression accuracy is reported for both strategies. Studies on a practical 24-Bus equivalent EHV transmission system of the Indian Southern region is presented for indicating the improved generalization with the large margin classifiers in enhancing the efficacy of the chosen model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fragment Finder 2.0 is a web-based interactive computing server which can be used to retrieve structurally similar protein fragments from 25 and 90% nonredundant data sets. The computing server identifies structurally similar fragments using the protein backbone C alpha angles. In addition, the identified fragments can be superimposed using either of the two structural superposition programs, STAMP and PROFIT, provided in the server. The freely available Java plug-in Jmol has been interfaced with the server for the visualization of the query and superposed fragments. The server is the updated version of a previously developed search engine and employs an in-house-developed fast pattern matching algorithm. This server can be accessed freely over the World Wide Web through the URL http://cluster.physics.iisc.ernet.in/ff/.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Convergence of the vast sequence space of proteins into a highly restricted fold/conformational space suggests a simple yet unique underlying mechanism of protein folding that has been the subject of much debate in the last several decades. One of the major challenges related to the understanding of protein folding or in silico protein structure prediction is the discrimination of non-native structures/decoys from the native structure. Applications of knowledge-based potentials to attain this goal have been extensively reported in the literature. Also, scoring functions based on accessible surface area and amino acid neighbourhood considerations were used in discriminating the decoys from native structures. In this article, we have explored the potential of protein structure network (PSN) parameters to validate the native proteins against a large number of decoy structures generated by diverse methods. We are guided by two principles: (a) the PSNs capture the local properties from a global perspective and (b) inclusion of non-covalent interactions, at all-atom level, including the side-chain atoms, in the network construction accommodates the sequence dependent features. Several network parameters such as the size of the largest cluster, community size, clustering coefficient are evaluated and scored on the basis of the rank of the native structures and the Z-scores. The network analysis of decoy structures highlights the importance of the global properties contributing to the uniqueness of native structures. The analysis also exhibits that the network parameters can be used as metrics to identify the native structures and filter out non-native structures/decoys in a large number of data-sets; thus also has a potential to be used in the protein `structure prediction' problem.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T-2) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly.