65 resultados para data sets


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Our study concerns an important current problem, that of diffusion of information in social networks. This problem has received significant attention from the Internet research community in the recent times, driven by many potential applications such as viral marketing and sales promotions. In this paper, we focus on the target set selection problem, which involves discovering a small subset of influential players in a given social network, to perform a certain task of information diffusion. The target set selection problem manifests in two forms: 1) top-k nodes problem and 2) lambda-coverage problem. In the top-k nodes problem, we are required to find a set of k key nodes that would maximize the number of nodes being influenced in the network. The lambda-coverage problem is concerned with finding a set of k key nodes having minimal size that can influence a given percentage lambda of the nodes in the entire network. We propose a new way of solving these problems using the concept of Shapley value which is a well known solution concept in cooperative game theory. Our approach leads to algorithms which we call the ShaPley value-based Influential Nodes (SPINs) algorithms for solving the top-k nodes problem and the lambda-coverage problem. We compare the performance of the proposed SPIN algorithms with well known algorithms in the literature. Through extensive experimentation on four synthetically generated random graphs and six real-world data sets (Celegans, Jazz, NIPS coauthorship data set, Netscience data set, High-Energy Physics data set, and Political Books data set), we show that the proposed SPIN approach is more powerful and computationally efficient. Note to Practitioners-In recent times, social networks have received a high level of attention due to their proven ability in improving the performance of web search, recommendations in collaborative filtering systems, spreading a technology in the market using viral marketing techniques, etc. It is well known that the interpersonal relationships (or ties or links) between individuals cause change or improvement in the social system because the decisions made by individuals are influenced heavily by the behavior of their neighbors. An interesting and key problem in social networks is to discover the most influential nodes in the social network which can influence other nodes in the social network in a strong and deep way. This problem is called the target set selection problem and has two variants: 1) the top-k nodes problem, where we are required to identify a set of k influential nodes that maximize the number of nodes being influenced in the network and 2) the lambda-coverage problem which involves finding a set of influential nodes having minimum size that can influence a given percentage lambda of the nodes in the entire network. There are many existing algorithms in the literature for solving these problems. In this paper, we propose a new algorithm which is based on a novel interpretation of information diffusion in a social network as a cooperative game. Using this analogy, we develop an algorithm based on the Shapley value of the underlying cooperative game. The proposed algorithm outperforms the existing algorithms in terms of generality or computational complexity or both. Our results are validated through extensive experimentation on both synthetically generated and real-world data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The modularity of the supramolecular synthon is used to obtain transferability of charge density derived multipolar parameters for structural fragments, thus creating an opportunity to derive charge density maps for new compounds. On the basis of high resolution X-ray diffraction data obtained at 100 K for three compounds methoxybenzoic acid, acetanilide, and 4-methyl-benzoic acid, multipole parameters for O-H center dot center dot center dot O carboxylic acid dimer and N-H center dot center dot center dot O amide infinite chain synthon fragments have been derived. The robustness associated with these supramolecular synthons has been used to model charge density derived multipolar parameters for 4-(acetylamino)benzoic acid and 4-methylacetanilide. The study provides pointers to the design and fabrication of a synthon library of high resolution X-ray diffraction data sets. It has been demonstrated that the derived charge density features can be exploited in both intra- and intermolecular space for any organic compound based on transferability of multipole parameters. The supramolecular synthon based fragments approach (SBFA) has been compared with experimental charge density data to check the reliability of use of this methodology for transferring charge density derived multipole parameters.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Our ability to infer the protein quaternary structure automatically from atom and lattice information is inadequate, especially for weak complexes, and heteromeric quaternary structures. Several approaches exist, but they have limited performance. Here, we present a new scheme to infer protein quaternary structure from lattice and protein information, with all-around coverage for strong, weak and very weak affinity homomeric and heteromeric complexes. The scheme combines naive Bayes classifier and point group symmetry under Boolean framework to detect quaternary structures in crystal lattice. It consistently produces >= 90% coverage across diverse benchmarking data sets, including a notably superior 95% coverage for recognition heteromeric complexes, compared with 53% on the same data set by current state-of-the-art method. The detailed study of a limited number of prediction-failed cases offers interesting insights into the intriguing nature of protein contacts in lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ligand-induced conformational changes in proteins are of immense functional relevance. It is a major challenge to elucidate the network of amino acids that are responsible for the percolation of ligand-induced conformational changes to distal regions in the protein from a global perspective. Functionally important subtle conformational changes (at the level of side-chain noncovalent interactions) upon ligand binding or as a result of environmental variations are also elusive in conventional studies such as those using root-mean-square deviations (r.m.s.d.s). In this article, the network representation of protein structures and their analyses provides an efficient tool to capture these variations (both drastic and subtle) in atomistic detail in a global milieu. A generalized graph theoretical metric, using network parameters such as cliques and/or communities, is used to determine similarities or differences between structures in a rigorous manner. The ligand-induced global rewiring in the protein structures is also quantified in terms of network parameters. Thus, a judicious use of graph theory in the context of protein structures can provide meaningful insights into global structural reorganizations upon perturbation and can also be helpful for rigorous structural comparison. Data sets for the present study include high-resolution crystal structures of serine proteases from the S1A family and are probed to quantify the ligand-induced subtle structural variations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Combinatorial exchanges are double sided marketplaces with multiple sellers and multiple buyers trading with the help of combinatorial bids. The allocation and other associated problems in such exchanges are known to be among the hardest to solve among all economic mechanisms. It has been shown that the problems of surplus maximization or volume maximization in combinatorial exchanges are inapproximable even with free disposal. In this paper, the surplus maximization problem is formulated as an integer linear programming problem and we propose a Lagrangian relaxation based heuristic to find a near optimal solution. We develop computationally efficient tâtonnement mechanisms for clearing combinatorial exchanges where the Lagrangian multipliers can be interpreted as the prices of the items set by the exchange in each iteration. Our mechanisms satisfy Individual-rationality and Budget-nonnegativity properties. The computational experiments performed on representative data sets show that the proposed heuristic produces a feasible solution with negligible optimality gap.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The production of rainfed crops in semi-arid tropics exhibits large variation in response to the variation in seasonal rainfall. There are several farm-level decisions such as the choice of cropping pattern, whether to invest in fertilizers, pesticides etc., the choice of the period for planting, plant population density etc. for which the appropriate choice (associated with maximum production or minimum risk) depends upon the nature of the rainfall variability or the prediction for a specific year. In this paper, we have addressed the problem of identifying the appropriate strategies for cultivation of rainfed groundnut in the Anantapur region in a semi-arid part of the Indian peninsula. The approach developed involves participatory research with active collaboration with farmers, so that the problems with perceived need are addressed with the modern tools and data sets available. Given the large spatial variation of climate and soil, the appropriate strategies are necessarily location specific. With the approach adopted, it is possible to tap the detailed location specific knowledge of the complex rainfed ecosystem and gain an insight into the variety of options of land use and management practices available to each category of stakeholders. We believe such a participatory approach is essential for identifying strategies that have a favourable cost-benefit ratio over the region considered and hence are associated with a high chance of acceptance by the stakeholders. (C) 2002 Elsevier Science Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a scheme for the compression of tree structured intermediate code consisting of a sequence of trees specified by a regular tree grammar. The scheme is based on arithmetic coding, and the model that works in conjunction with the coder is automatically generated from the syntactical specification of the tree language. Experiments on data sets consisting of intermediate code trees yield compression ratios ranging from 2.5 to 8, for file sizes ranging from 167 bytes to 1 megabyte.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP(mean average precision). We propose new, almost-lineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain)in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization.The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a randomized algorithm for large scale SVM learning which solves the problem by iterating over random subsets of the data. Crucial to the algorithm for scalability is the size of the subsets chosen. In the context of text classification we show that, by using ideas from random projections, a sample size of O(log n) can be used to obtain a solution which is close to the optimal with a high probability. Experiments done on synthetic and real life data sets demonstrate that the algorithm scales up SVM learners, without loss in accuracy. 1

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Even though several techniques have been proposed in the literature for achieving multiclass classification using Support Vector Machine(SVM), the scalability aspect of these approaches to handle large data sets still needs much of exploration. Core Vector Machine(CVM) is a technique for scaling up a two class SVM to handle large data sets. In this paper we propose a Multiclass Core Vector Machine(MCVM). Here we formulate the multiclass SVM problem as a Quadratic Programming(QP) problem defining an SVM with vector valued output. This QP problem is then solved using the CVM technique to achieve scalability to handle large data sets. Experiments done with several large synthetic and real world data sets show that the proposed MCVM technique gives good generalization performance as that of SVM at a much lesser computational expense. Further, it is observed that MCVM scales well with the size of the data set.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an approach for identifying the faulted line section and fault location on transmission systems using support vector machines (SVMs) for diagnosis/post-fault analysis purpose. Power system disturbances are often caused by faults on transmission lines. When fault occurs on a transmission system, the protective relay detects the fault and initiates the tripping operation, which isolates the affected part from the rest of the power system. Based on the fault section identified, rapid and corrective restoration procedures can thus be taken to minimize the power interruption and limit the impact of outage on the system. The approach is particularly important for post-fault diagnosis of any mal-operation of relays following a disturbance in the neighboring line connected to the same substation. This may help in improving the fault monitoring/diagnosis process, thus assuring secure operation of the power systems. In this paper we compare SVMs with radial basis function neural networks (RBFNN) in data sets corresponding to different faults on a transmission system. Classification and regression accuracy is reported for both strategies. Studies on a practical 24-Bus equivalent EHV transmission system of the Indian Southern region is presented for indicating the improved generalization with the large margin classifiers in enhancing the efficacy of the chosen model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An extension of the supramolecular synthon-based fragment approach (SBFA) method for transferability of multipole charge density parameters to include weak supramolecular synthons is proposed. In particular, the SBFA method is applied to C-H center dot center dot center dot O, C-H center dot center dot center dot F, and F center dot center dot center dot F containing synthons. A high resolution charge density study has been performed on 4-fluorobenzoic acid to build a synthon library for C-H center dot center dot center dot F infinite chain interactions. Libraries for C-H center dot center dot center dot O and F center dot center dot center dot F synthons were taken from earlier work. The SBFA methodology was applied successfully to 2- and 3-fluorobenzoic acids, data sets for which were collected in a routine manner at 100 K, and the modularity of the synthons was demonstrated. Cocrystals of isonicotinamide with all three fluorobenzoic acids were also studied with the SBFA method. The topological analysis of inter- and intramolecular interaction regions was performed using Bader's AIM approach. This study shows that the SBFA method is generally applicable to generate charge density maps using information from multiple intermolecular regions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Morse-Smale complex is a useful topological data structure for the analysis and visualization of scalar data. This paper describes an algorithm that processes all mesh elements of the domain in parallel to compute the Morse-Smale complex of large two-dimensional data sets at interactive speeds. We employ a reformulation of the Morse-Smale complex using Forman's Discrete Morse Theory and achieve scalability by computing the discrete gradient using local accesses only. We also introduce a novel approach to merge gradient paths that ensures accurate geometry of the computed complex. We demonstrate that our algorithm performs well on both multicore environments and on massively parallel architectures such as the GPU.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Long-distance dispersal (LDD) events, although rare for most plant species, can strongly influence population and community dynamics. Animals function as a key biotic vector of seeds and thus, a mechanistic and quantitative understanding of how individual animal behaviors scale to dispersal patterns at different spatial scales is a question of critical importance from both basic and applied perspectives. Using a diffusion-theory based analytical approach for a wide range of animal movement and seed transportation patterns, we show that the scale (a measure of local dispersal) of the seed dispersal kernel increases with the organisms' rate of movement and mean seed retention time. We reveal that variations in seed retention time is a key determinant of various measures of LDD such as kurtosis (or shape) of the kernel, thinkness of tails and the absolute number of seeds falling beyond a threshold distance. Using empirical data sets of frugivores, we illustrate the importance of variability in retention times for predicting the key disperser species that influence LDD. Our study makes testable predictions linking animal movement behaviors and gut retention times to dispersal patterns and, more generally, highlights the potential importance of animal behavioral variability for the LDD of seeds.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

1. Dispersal ability of a species is a key ecological characteristic, affecting a range of processes from adaptation, community dynamics and genetic structure, to distribution and range size. It is determined by both intrinsic species traits and extrinsic landscape-related properties. 2. Using butterflies as a model system, the following questions were addressed: (i) given similar extrinsic factors, which intrinsic species trait(s) explain dispersal ability? (ii) can one of these traits be used as a proxy for dispersal ability? (iii) the effect of interactions between the traits, and phylogenetic relatedness, on dispersal ability. 3. Four data sets, using different measures of dispersal, were compiled from published literature. The first data set uses mean dispersal distances from capture-mark-recapture studies, and the other three use mobility indices. Data for six traits that can potentially affect dispersal ability were collected: wingspan, larval host plant specificity, adult habitat specificity, mate location strategy, voltinism and flight period duration. Each data set was subjected to both unifactorial, and multifactorial, phylogenetically controlled analyses. 4. Among the factors considered, wingspan was the most important determinant of dispersal ability, although the predictive powers of regression models were low. Voltinism and flight period duration also affect dispersal ability, especially in case of temperate species. Interactions between the factors did not affect dispersal ability, and phylogenetic relatedness was significant in one data set. 5. While using wingspan as the only proxy for dispersal ability maybe problematic, it is usually the only easily accessible species-specific trait for a large number of species. It can thus be a satisfactory proxy when carefully interpreted, especially for analyses involving many species from all across the world.