919 resultados para VLE data sets
Resumo:
The K-means algorithm for clustering is very much dependent on the initial seed values. We use a genetic algorithm to find a near-optimal partitioning of the given data set by selecting proper initial seed values in the K-means algorithm. Results obtained are very encouraging and in most of the cases, on data sets having well separated clusters, the proposed scheme reached a global minimum.
Resumo:
Our study concerns an important current problem, that of diffusion of information in social networks. This problem has received significant attention from the Internet research community in the recent times, driven by many potential applications such as viral marketing and sales promotions. In this paper, we focus on the target set selection problem, which involves discovering a small subset of influential players in a given social network, to perform a certain task of information diffusion. The target set selection problem manifests in two forms: 1) top-k nodes problem and 2) lambda-coverage problem. In the top-k nodes problem, we are required to find a set of k key nodes that would maximize the number of nodes being influenced in the network. The lambda-coverage problem is concerned with finding a set of k key nodes having minimal size that can influence a given percentage lambda of the nodes in the entire network. We propose a new way of solving these problems using the concept of Shapley value which is a well known solution concept in cooperative game theory. Our approach leads to algorithms which we call the ShaPley value-based Influential Nodes (SPINs) algorithms for solving the top-k nodes problem and the lambda-coverage problem. We compare the performance of the proposed SPIN algorithms with well known algorithms in the literature. Through extensive experimentation on four synthetically generated random graphs and six real-world data sets (Celegans, Jazz, NIPS coauthorship data set, Netscience data set, High-Energy Physics data set, and Political Books data set), we show that the proposed SPIN approach is more powerful and computationally efficient. Note to Practitioners-In recent times, social networks have received a high level of attention due to their proven ability in improving the performance of web search, recommendations in collaborative filtering systems, spreading a technology in the market using viral marketing techniques, etc. It is well known that the interpersonal relationships (or ties or links) between individuals cause change or improvement in the social system because the decisions made by individuals are influenced heavily by the behavior of their neighbors. An interesting and key problem in social networks is to discover the most influential nodes in the social network which can influence other nodes in the social network in a strong and deep way. This problem is called the target set selection problem and has two variants: 1) the top-k nodes problem, where we are required to identify a set of k influential nodes that maximize the number of nodes being influenced in the network and 2) the lambda-coverage problem which involves finding a set of influential nodes having minimum size that can influence a given percentage lambda of the nodes in the entire network. There are many existing algorithms in the literature for solving these problems. In this paper, we propose a new algorithm which is based on a novel interpretation of information diffusion in a social network as a cooperative game. Using this analogy, we develop an algorithm based on the Shapley value of the underlying cooperative game. The proposed algorithm outperforms the existing algorithms in terms of generality or computational complexity or both. Our results are validated through extensive experimentation on both synthetically generated and real-world data sets.
Resumo:
The lifestyles of people living in single-family housing areas on the outskirts of the Greater Helsinki Region (GHR) are different from those living in inner city area. The urban structure of the GHR is concentrated in the capital on the one hand, and spread out across the outskirts on the other. Socioeconomic spatial divisions are evident as well-paid and educated residents move to the inner city or the single-family house dominated suburban neighbourhoods depending on their housing preferences and life situations. The following thesis explores how these lifestyles have emerged through the housing choices and daily mobility of the residents living in the new single-family housing areas on the outskirts of the GHR and the inner city. The study shows that, when it comes to lifestyles, residents on the outskirts of the region have different housing preferences and daily mobility patterns when compared with their inner city counterparts. Based on five different case study areas my results show that these differences are related to residents values, preferences and attitudes towards the neighbourhood, on the one hand, and limited by urban structure on the other. This also confirms earlier theoretical analyses and findings from the GHR. Residents who moved to the outskirts of Greater Helsinki Region and the apartment buildings of the inner city were similar in the basic elements of their housing preferences: they sought a safe and peaceful neighbourhood close to the natural environment. However, where housing choices, daily mobility and activities vary different lifestyles develop in both the outskirts and the inner city. More specifically, lifestyles in the city apartment blocks were inherently urban. Liveliness and highest order facilities were appreciated and daily mobility patterns were supported by diverse modes of transportation for the purposes of work, shopping and leisure time. On the outskirts, by contrast, lifestyles were largely post-suburban and child-friendliness appreciated. Due to the heterachical urban structure, daily mobility was more car-dependent since work, shopping and free time activities of the residents are more spread around the region. The urban structure frames the daily mobility on the outskirts of the region, but this is not to say that short local trips replace longer regional ones. This comparative case study was carried out in the single-family housing areas of Sundsberg in Kirkkonummi, Landbo in Helsinki and Ylästö in Vantaa, as well as in the inner city apartment building areas of Punavuori and Katajanokka in Helsinki. The data is comprised of residential surveys, interviews, and statistics and GIS data sets that illustrate regional daily mobility, socio-economic structure and vis-à-vis housing stock.
Resumo:
The modularity of the supramolecular synthon is used to obtain transferability of charge density derived multipolar parameters for structural fragments, thus creating an opportunity to derive charge density maps for new compounds. On the basis of high resolution X-ray diffraction data obtained at 100 K for three compounds methoxybenzoic acid, acetanilide, and 4-methyl-benzoic acid, multipole parameters for O-H center dot center dot center dot O carboxylic acid dimer and N-H center dot center dot center dot O amide infinite chain synthon fragments have been derived. The robustness associated with these supramolecular synthons has been used to model charge density derived multipolar parameters for 4-(acetylamino)benzoic acid and 4-methylacetanilide. The study provides pointers to the design and fabrication of a synthon library of high resolution X-ray diffraction data sets. It has been demonstrated that the derived charge density features can be exploited in both intra- and intermolecular space for any organic compound based on transferability of multipole parameters. The supramolecular synthon based fragments approach (SBFA) has been compared with experimental charge density data to check the reliability of use of this methodology for transferring charge density derived multipole parameters.
Resumo:
Our ability to infer the protein quaternary structure automatically from atom and lattice information is inadequate, especially for weak complexes, and heteromeric quaternary structures. Several approaches exist, but they have limited performance. Here, we present a new scheme to infer protein quaternary structure from lattice and protein information, with all-around coverage for strong, weak and very weak affinity homomeric and heteromeric complexes. The scheme combines naive Bayes classifier and point group symmetry under Boolean framework to detect quaternary structures in crystal lattice. It consistently produces >= 90% coverage across diverse benchmarking data sets, including a notably superior 95% coverage for recognition heteromeric complexes, compared with 53% on the same data set by current state-of-the-art method. The detailed study of a limited number of prediction-failed cases offers interesting insights into the intriguing nature of protein contacts in lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes.
Resumo:
Hydrolytic polymerization of caprolactam to Nylon 6 in a semibatch reactor is carried out by heating a mixture of water and caprolactam. Evaporation of volatiles caused by heating results in a pressure build-up. After the pressure reaches a predetermined value, vapors are vented to keep the pressure constant for some time, and thereafter, to lower the pressure to a value slightly above atmospheric in a preprogrammed manner. The characteristics of the polymer are determined by the chemical reactions and the vaporization of water and caprolactam. The semibatch operation has been simulated and the predictions have been compared with industria data. The observed temperature and pressure histories were predicted with a fair degree of accuracy. It was found that the predictions of the degree of polymerization however are sensitive to the vapor-liquid equilibrium relations. A comparison with an earlier model, which neglected mass transfer resistance, indicates that simulation using the VLE data of Giori and Hayes and accounting for mass transfer resistance is more reliable.
Resumo:
Ligand-induced conformational changes in proteins are of immense functional relevance. It is a major challenge to elucidate the network of amino acids that are responsible for the percolation of ligand-induced conformational changes to distal regions in the protein from a global perspective. Functionally important subtle conformational changes (at the level of side-chain noncovalent interactions) upon ligand binding or as a result of environmental variations are also elusive in conventional studies such as those using root-mean-square deviations (r.m.s.d.s). In this article, the network representation of protein structures and their analyses provides an efficient tool to capture these variations (both drastic and subtle) in atomistic detail in a global milieu. A generalized graph theoretical metric, using network parameters such as cliques and/or communities, is used to determine similarities or differences between structures in a rigorous manner. The ligand-induced global rewiring in the protein structures is also quantified in terms of network parameters. Thus, a judicious use of graph theory in the context of protein structures can provide meaningful insights into global structural reorganizations upon perturbation and can also be helpful for rigorous structural comparison. Data sets for the present study include high-resolution crystal structures of serine proteases from the S1A family and are probed to quantify the ligand-induced subtle structural variations.
Resumo:
Combinatorial exchanges are double sided marketplaces with multiple sellers and multiple buyers trading with the help of combinatorial bids. The allocation and other associated problems in such exchanges are known to be among the hardest to solve among all economic mechanisms. It has been shown that the problems of surplus maximization or volume maximization in combinatorial exchanges are inapproximable even with free disposal. In this paper, the surplus maximization problem is formulated as an integer linear programming problem and we propose a Lagrangian relaxation based heuristic to find a near optimal solution. We develop computationally efficient tâtonnement mechanisms for clearing combinatorial exchanges where the Lagrangian multipliers can be interpreted as the prices of the items set by the exchange in each iteration. Our mechanisms satisfy Individual-rationality and Budget-nonnegativity properties. The computational experiments performed on representative data sets show that the proposed heuristic produces a feasible solution with negligible optimality gap.
Resumo:
The production of rainfed crops in semi-arid tropics exhibits large variation in response to the variation in seasonal rainfall. There are several farm-level decisions such as the choice of cropping pattern, whether to invest in fertilizers, pesticides etc., the choice of the period for planting, plant population density etc. for which the appropriate choice (associated with maximum production or minimum risk) depends upon the nature of the rainfall variability or the prediction for a specific year. In this paper, we have addressed the problem of identifying the appropriate strategies for cultivation of rainfed groundnut in the Anantapur region in a semi-arid part of the Indian peninsula. The approach developed involves participatory research with active collaboration with farmers, so that the problems with perceived need are addressed with the modern tools and data sets available. Given the large spatial variation of climate and soil, the appropriate strategies are necessarily location specific. With the approach adopted, it is possible to tap the detailed location specific knowledge of the complex rainfed ecosystem and gain an insight into the variety of options of land use and management practices available to each category of stakeholders. We believe such a participatory approach is essential for identifying strategies that have a favourable cost-benefit ratio over the region considered and hence are associated with a high chance of acceptance by the stakeholders. (C) 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
We propose a scheme for the compression of tree structured intermediate code consisting of a sequence of trees specified by a regular tree grammar. The scheme is based on arithmetic coding, and the model that works in conjunction with the coder is automatically generated from the syntactical specification of the tree language. Experiments on data sets consisting of intermediate code trees yield compression ratios ranging from 2.5 to 8, for file sizes ranging from 167 bytes to 1 megabyte.
Resumo:
Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP(mean average precision). We propose new, almost-lineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain)in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization.The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.
Resumo:
We propose a randomized algorithm for large scale SVM learning which solves the problem by iterating over random subsets of the data. Crucial to the algorithm for scalability is the size of the subsets chosen. In the context of text classification we show that, by using ideas from random projections, a sample size of O(log n) can be used to obtain a solution which is close to the optimal with a high probability. Experiments done on synthetic and real life data sets demonstrate that the algorithm scales up SVM learners, without loss in accuracy. 1
Resumo:
Even though several techniques have been proposed in the literature for achieving multiclass classification using Support Vector Machine(SVM), the scalability aspect of these approaches to handle large data sets still needs much of exploration. Core Vector Machine(CVM) is a technique for scaling up a two class SVM to handle large data sets. In this paper we propose a Multiclass Core Vector Machine(MCVM). Here we formulate the multiclass SVM problem as a Quadratic Programming(QP) problem defining an SVM with vector valued output. This QP problem is then solved using the CVM technique to achieve scalability to handle large data sets. Experiments done with several large synthetic and real world data sets show that the proposed MCVM technique gives good generalization performance as that of SVM at a much lesser computational expense. Further, it is observed that MCVM scales well with the size of the data set.
Resumo:
This paper presents an approach for identifying the faulted line section and fault location on transmission systems using support vector machines (SVMs) for diagnosis/post-fault analysis purpose. Power system disturbances are often caused by faults on transmission lines. When fault occurs on a transmission system, the protective relay detects the fault and initiates the tripping operation, which isolates the affected part from the rest of the power system. Based on the fault section identified, rapid and corrective restoration procedures can thus be taken to minimize the power interruption and limit the impact of outage on the system. The approach is particularly important for post-fault diagnosis of any mal-operation of relays following a disturbance in the neighboring line connected to the same substation. This may help in improving the fault monitoring/diagnosis process, thus assuring secure operation of the power systems. In this paper we compare SVMs with radial basis function neural networks (RBFNN) in data sets corresponding to different faults on a transmission system. Classification and regression accuracy is reported for both strategies. Studies on a practical 24-Bus equivalent EHV transmission system of the Indian Southern region is presented for indicating the improved generalization with the large margin classifiers in enhancing the efficacy of the chosen model.
Resumo:
An extension of the supramolecular synthon-based fragment approach (SBFA) method for transferability of multipole charge density parameters to include weak supramolecular synthons is proposed. In particular, the SBFA method is applied to C-H center dot center dot center dot O, C-H center dot center dot center dot F, and F center dot center dot center dot F containing synthons. A high resolution charge density study has been performed on 4-fluorobenzoic acid to build a synthon library for C-H center dot center dot center dot F infinite chain interactions. Libraries for C-H center dot center dot center dot O and F center dot center dot center dot F synthons were taken from earlier work. The SBFA methodology was applied successfully to 2- and 3-fluorobenzoic acids, data sets for which were collected in a routine manner at 100 K, and the modularity of the synthons was demonstrated. Cocrystals of isonicotinamide with all three fluorobenzoic acids were also studied with the SBFA method. The topological analysis of inter- and intramolecular interaction regions was performed using Bader's AIM approach. This study shows that the SBFA method is generally applicable to generate charge density maps using information from multiple intermolecular regions.