54 resultados para Web Mining
Resumo:
The protein-protein docking programs typically perform four major tasks: (i) generation of docking poses, (ii) selecting a subset of poses, (iii) their structural refinement and (iv) scoring, ranking for the final assessment of the true quaternary structure. Although the tasks can be integrated or performed in a serial order, they are by nature modular, allowing an opportunity to substitute one algorithm with another. We have implemented two modular web services, (i) PRUNE: to select a subset of docking poses generated during sampling search (http://pallab.serc.iisc.ernet.in/prune) and (ii) PROBE: to refine, score and rank them (http://pallab.serc.iisc.ernet.in/probe). The former uses a new interface area based edge-scoring function to eliminate > 95% of the poses generated during docking search. In contrast to other multi-parameter-based screening functions, this single parameter based elimination reduces the computational time significantly, in addition to increasing the chances of selecting native-like models in the top rank list. The PROBE server performs ranking of pruned poses, after structure refinement and scoring using a regression model for geometric compatibility, and normalized interaction energy. While web-service similar to PROBE is infrequent, no web-service akin to PRUNE has been described before. Both the servers are publicly accessible and free for use.
Resumo:
Depth measures the extent of atom/residue burial within a protein. It correlates with properties such as protein stability, hydrogen exchange rate, protein-protein interaction hot spots, post-translational modification sites and sequence variability. Our server, DEPTH, accurately computes depth and solvent-accessible surface area (SASA) values. We show that depth can be used to predict small molecule ligand binding cavities in proteins. Often, some of the residues lining a ligand binding cavity are both deep and solvent exposed. Using the depth-SASA pair values for a residue, its likelihood to form part of a small molecule binding cavity is estimated. The parameters of the method were calibrated over a training set of 900 high-resolution X-ray crystal structures of single-domain proteins bound to small molecules (molecular weight < 1.5 KDa). The prediction accuracy of DEPTH is comparable to that of other geometry-based prediction methods including LIGSITE, SURFNET and Pocket-Finder (all with Matthew's correlation coefficient of similar to 0.4) over a testing set of 225 single and multi-chain protein structures. Users have the option of tuning several parameters to detect cavities of different sizes, for example, geometrically flat binding sites. The input to the server is a protein 3D structure in PDB format. The users have the option of tuning the values of four parameters associated with the computation of residue depth and the prediction of binding cavities. The computed depths, SASA and binding cavity predictions are displayed in 2D plots and mapped onto 3D representations of the protein structure using Jmol. Links are provided to download the outputs. Our server is useful for all structural analysis based on residue depth and SASA, such as guiding site-directed mutagenesis experiments and small molecule docking exercises, in the context of protein functional annotation and drug discovery.
Resumo:
In this paper we propose a new method of data handling for web servers. We call this method Network Aware Buffering and Caching (NABC for short). NABC facilitates reduction of data copies in web server's data sending path, by doing three things: (1) Layout the data in main memory in a way that protocol processing can be done without data copies (2) Keep a unified cache of data in kernel and ensure safe access to it by various processes and kernel and (3) Pass only the necessary meta data between processes so that bulk data handling time spent during IPC can be reduced. We realize NABC by implementing a set of system calls and an user library. The end product of the implementation is a set of APIs specifically designed for use by the web servers. We port an in house web server called SWEET, to NABC APIs and evaluate performance using a range of workloads both simulated and real. The results show a very impressive gain of 12% to 21% in throughput for static file serving and 1.6 to 4 times gain in throughput for lightweight dynamic content serving for a server using NABC APIs over the one using UNIX APIs.
Resumo:
CDS/ISIS, an advanced non-numerical information storage and retrieval software was developed by UNESCO. With the emergence of WWW technology, most of the information activities are becoming Web-centric. Libraries and information providers are taking advantage of these Internet developments to provide access to their resources/information on the Web. A number of tools are now available for publishing CDS/ISIS databases on the Internet. One such tool is the WWWISIS Web gateway software, developed by BIREME, Brazil. This paper illustrates porting of sample records from a bibliographic database into CDS/ISIS, and then publishing this database on the Internet using WWWISIS.
Resumo:
With the emergence of Internet, the global connectivity of computers has become a reality. Internet has progressed to provide many user-friendly tools like Gopher, WAIS, WWW etc. for information publishing and access. The WWW, which integrates all other access tools, also provides a very convenient means for publishing and accessing multimedia and hypertext linked documents stored in computers spread across the world. With the emergence of WWW technology, most of the information activities are becoming Web-centric. Once the information is published on the Web, a user can access this information from any part of the world. A Web browser like Netscape or Internet Explorer is used as a common user interface for accessing information/databases. This will greatly relieve a user from learning the search syntax of individual information systems. Libraries are taking advantage of these developments to provide access to their resources on the Web. CDS/ISIS is a very popular bibliographic information management software used in India. In this tutorial we present details of integrating CDS/ISIS with the WWW. A number of tools are now available for making CDS/ISIS database accessible on the Internet/Web. Some of these are 1) the WAIS_ISIS Server. 2) the WWWISIS Server 3) the IQUERY Server. In this tutorial, we have explained in detail the steps involved in providing Web access to an existing CDS/ISIS database using the freely available software, WWWISIS. This software is developed, maintained and distributed by BIREME, the Latin American & Caribbean Centre on Health Sciences Information. WWWISIS acts as a server for CDS/ISIS databases in a WWW client/server environment. It supports functions for searching, formatting and data entry operations over CDS/ISIS databases. WWWISIS is available for various operating systems. We have tested this software on Windows '95, Windows NT and Red Hat Linux release 5.2 (Appolo) Kernel 2. 0. 36 on an i686. The testing was carried out using IISc's main library's OPAC containing more than 80,000 records and Current Contents issues (bibliographic data) containing more than 25,000 records. WWWISIS is fully compatible with CDS/ISIS 3.07 file structure. However, on a system running Unix or its variant, there is no guarantee of this compatibility. It is therefore safe to recreate the master and the inverted files, using utilities provided by BIREME, under Unix environment.
Resumo:
PDB Goodies is a web-based graphical user interface (GUI) to manipulate the Protein Data Bank file containing the three-dimensional atomic coordinates of protein structures. The program also allows users to save the manipulated three-dimensional atomic coordinate file on their local client system. These fragments are used in various stages of structure elucidation and analysis. This software is incorporated with all the three-dimensional protein structures available in the Protein Data Bank, which presently holds approximately 18 000 structures. In addition, this program works on a three-dimensional atomic coordinate file (Protein Data Bank format) uploaded from the client machine. The program is written using CGI/PERL scripts and is platform independent. The program PDB Goodies can be accessed over the World Wide Web at http:// 144.16.71.11/pdbgoodies/.
Resumo:
A graphics package has been developed to display the main chain torsion angles phi, psi (phi, Psi); (Ramachandran angles) in a protein of known structure. In addition, the package calculates the Ramachandran angles at the central residue in the stretch of three amino acids having specified the flanking residue types. The package displays the Ramachandran angles along with a detailed analysis output. This software is incorporated with all the protein structures available in the Protein Databank.
Resumo:
Several replacement policies for web caches have been proposed and studied extensively in the literature. Different replacement policies perform better in terms of (i) the number of objects found in the cache (cache hit), (ii) the network traffic avoided by fetching the referenced object from the cache, or (iii) the savings in response time. In this paper, we propose a simple and efficient replacement policy (hereafter known as SE) which improves all three performance measures. Trace-driven simulations were done to evaluate the performance of SE. We compare SE with two widely used and efficient replacement policies, namely Least Recently Used (LRU) and Least Unified Value (LUV) algorithms. Our results show that SE performs at least as well as, if not better than, both these replacement policies. Unlike various other replacement policies proposed in literature, our SE policy does not require parameter tuning or a-priori trace analysis and has an efficient and simple implementation that can be incorporated in any existing proxy server or web server with ease.
Resumo:
In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window by multiple transactions at a time. Therefore, we propose a novel approach for mining CFIpsilas cumulatively by making sliding width(ges1) over high speed data streams. However, it is nontrivial to mine CFIpsilas cumulatively over stream, because such growth may lead to the generation of exponential number of candidates for closure checking. In this study, we develop an efficient algorithm, stream-close, for mining CFIpsilas over stream by exploring some interesting properties. Our performance study reveals that stream-close achieves good scalability and has promising results.
Resumo:
Web services are now a key ingredient of software services offered by software enterprises. Many standardized web services are now available as commodity offerings from web service providers. An important problem for a web service requester is the web service composition problem which involves selecting the right mix of web service offerings to execute an end-to-end business process. Web service offerings are now available in bundled form as composite web services and more recently, volume discounts are also on offer, based on the number of executions of web services requested. In this paper, we develop efficient algorithms for the web service composition problem in the presence of composite web service offerings and volume discounts. We model this problem as a combinatorial auction with volume discounts. We first develop efficient polynomial time algorithms when the end-to-end service involves a linear workflow of web services. Next we develop efficient polynomial time algorithms when the end-to-end service involves a tree workflow of web services.
Resumo:
Rapid urbanisation in India has posed serious challenges to the decision makers in regional planning involving plethora of issues including provision of basic amenities (like electricity, water, sanitation, transport, etc.). Urban planning entails an understanding of landscape and urban dynamics with causal factors. Identifying, delineating and mapping landscapes on temporal scale provide an opportunity to monitor the changes, which is important for natural resource management and sustainable planning activities. Multi-source, multi-sensor, multi-temporal, multi-frequency or multi-polarization remote sensing data with efficient classification algorithms and pattern recognition techniques aid in capturing these dynamics. This paper analyses the landscape dynamics of Greater Bangalore by: (i) characterisation of direct impervious surface, (ii) computation of forest fragmentation indices and (iii) modeling to quantify and categorise urban changes. Linear unmixing is used for solving the mixed pixel problem of coarse resolution super spectral MODIS data for impervious surface characterisation. Fragmentation indices were used to classify forests – interior, perforated, edge, transitional, patch and undetermined. Based on this, urban growth model was developed to determine the type of urban growth – Infill, Expansion and Outlying growth. This helped in visualising urban growth poles and consequence of earlier policy decisions that can help in evolving strategies for effective land use policies.
Resumo:
Engineering education quality embraces the activities through which a technical institution satisfies itself that the quality of education it provides and standards it has set are appropriate and are being maintained. There is a need to develop a standardised approach to most aspects of quality assurance for engineering programmes which is sufficiently well defined to be accepted for all assessments.We have designed a Technical Educational Quality Assurance and Assessment (TEQ-AA) System, which makes use of the information on the web and analyzes the standards of the institution. With the standards as anchors for definition, the institution is clearer about its present in order to plan better for its future and enhancing the level of educational quality.The system has been tested and implemented on the technical educational Institutions in the Karnataka State which usually host their web pages for commercially advertising their technical education programs and their Institution objectives, policies, etc., for commercialization and for better reach-out to the students and faculty. This helps in assisting the students in selecting an institution for study and to assist in employment.
Resumo:
In this paper, we address a key problem faced by advertisers in sponsored search auctions on the web: how much to bid, given the bids of the other advertisers, so as to maximize individual payoffs? Assuming the generalized second price auction as the auction mechanism, we formulate this problem in the framework of an infinite horizon alternative-move game of advertiser bidding behavior. For a sponsored search auction involving two advertisers, we characterize all the pure strategy and mixed strategy Nash equilibria. We also prove that the bid prices will lead to a Nash equilibrium, if the advertisers follow a myopic best response bidding strategy. Following this, we investigate the bidding behavior of the advertisers if they use Q-learning. We discover empirically an interesting trend that the Q-values converge even if both the advertisers learn simultaneously.