966 resultados para Web documents
Resumo:
The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.
Resumo:
As the virtual world grows more complex, finding a standard way for storing data becomes increasingly important. Ideally, each data item would be brought into the computer system only once. References for data items need to be cryptographically verifiable, so the data can maintain its identity while being passed around. This way there will be only one copy of the users family photo album, while the user can use multiple tools to show or manipulate the album. Copies of users data could be stored on some of his family members computer, some of his computers, but also at some online services which he uses. When all actors operate over one replicated copy of the data, the system automatically avoids a single point of failure. Thus the data will not disappear with one computer breaking, or one service provider going out of business. One shared copy also makes it possible to delete a piece of data from all systems at once, on users request. In our research we tried to find a model that would make data manageable to users, and make it possible to have the same data stored at various locations. We studied three systems, Persona, Freenet, and GNUnet, that suggest different models for protecting user data. The main application areas of the systems studied include securing online social networks, providing anonymous web, and preventing censorship in file-sharing. Each of the systems studied store user data on machines belonging to third parties. The systems differ in measures they take to protect their users from data loss, forged information, censorship, and being monitored. All of the systems use cryptography to secure names used for the content, and to protect the data from outsiders. Based on the gained knowledge, we built a prototype platform called Peerscape, which stores user data in a synchronized, protected database. Data items themselves are protected with cryptography against forgery, but not encrypted as the focus has been disseminating the data directly among family and friends instead of letting third parties store the information. We turned the synchronizing database into peer-to-peer web by revealing its contents through an integrated http server. The REST-like http API supports development of applications in javascript. To evaluate the platform’s suitability for application development we wrote some simple applications, including a public chat room, bittorrent site, and a flower growing game. During our early tests we came to the conclusion that using the platform for simple applications works well. As web standards develop further, writing applications for the platform should become easier. Any system this complex will have its problems, and we are not expecting our platform to replace the existing web, but are fairly impressed with the results and consider our work important from the perspective of managing user data.
Resumo:
As the virtual world grows more complex, finding a standard way for storing data becomes increasingly important. Ideally, each data item would be brought into the computer system only once. References for data items need to be cryptographically verifiable, so the data can maintain its identity while being passed around. This way there will be only one copy of the users family photo album, while the user can use multiple tools to show or manipulate the album. Copies of users data could be stored on some of his family members computer, some of his computers, but also at some online services which he uses. When all actors operate over one replicated copy of the data, the system automatically avoids a single point of failure. Thus the data will not disappear with one computer breaking, or one service provider going out of business. One shared copy also makes it possible to delete a piece of data from all systems at once, on users request. In our research we tried to find a model that would make data manageable to users, and make it possible to have the same data stored at various locations. We studied three systems, Persona, Freenet, and GNUnet, that suggest different models for protecting user data. The main application areas of the systems studied include securing online social networks, providing anonymous web, and preventing censorship in file-sharing. Each of the systems studied store user data on machines belonging to third parties. The systems differ in measures they take to protect their users from data loss, forged information, censorship, and being monitored. All of the systems use cryptography to secure names used for the content, and to protect the data from outsiders. Based on the gained knowledge, we built a prototype platform called Peerscape, which stores user data in a synchronized, protected database. Data items themselves are protected with cryptography against forgery, but not encrypted as the focus has been disseminating the data directly among family and friends instead of letting third parties store the information. We turned the synchronizing database into peer-to-peer web by revealing its contents through an integrated http server. The REST-like http API supports development of applications in javascript. To evaluate the platform s suitability for application development we wrote some simple applications, including a public chat room, bittorrent site, and a flower growing game. During our early tests we came to the conclusion that using the platform for simple applications works well. As web standards develop further, writing applications for the platform should become easier. Any system this complex will have its problems, and we are not expecting our platform to replace the existing web, but are fairly impressed with the results and consider our work important from the perspective of managing user data.
Resumo:
Encoding protein 3D structures into 1D string using short structural prototypes or structural alphabets opens a new front for structure comparison and analysis. Using the well-documented 16 motifs of Protein Blocks (PBs) as structural alphabet, we have developed a methodology to compare protein structures that are encoded as sequences of PBs by aligning them using dynamic programming which uses a substitution matrix for PBs. This methodology is implemented in the applications available in Protein Block Expert (PBE) server. PBE addresses common issues in the field of protein structure analysis such as comparison of proteins structures and identification of protein structures in structural databanks that resemble a given structure. PBE-T provides facility to transform any PDB file into sequences of PBs. PBE-ALIGNc performs comparison of two protein structures based on the alignment of their corresponding PB sequences. PBE-ALIGNm is a facility for mining SCOP database for similar structures based on the alignment of PBs. Besides, PBE provides an interface to a database (PBE-SAdb) of preprocessed PB sequences from SCOP culled at 95% and of all-against-all pairwise PB alignments at family and superfamily levels. PBE server is freely available at http://bioinformatics.univ-reunion.fr/ PBE/.
Resumo:
Owing to high evolutionary divergence, it is not always possible to identify distantly related protein domains by sequence search techniques. Intermediate sequences possess sequence features of more than one protein and facilitate detection of remotely related proteins. We have demonstrated recently the employment of Cascade PSI-BLAST where we perform PSI-BLAST for many 'generations', initiating searches from new homologues as well. Such a rigorous propagation through generations of PSI-BLAST employs effectively the role of intermediates in detecting distant similarities between proteins. This approach has been tested on a large number of folds and its performance in detecting superfamily level relationships is similar to 35% better than simple PSI-BLAST searches. We present a web server for this search method that permits users to perform Cascade PSI-BLAST searches against the Pfam, SCOP and SwissProt databases. The URL for this server is http://crick.mbu.iisc.ernet.in/similar to CASCADE/CascadeBlast.html.
Resumo:
The unique characteristics of marketspace in combination with the fast growing number of consumers interested in e-commerce have created new research areas of interest to both marketing and consumer behaviour researchers. Consumer behaviour researchers interested in the decision making processes of consumers have two new sets of questions to answer. The first set of questions is related to how useful theories developed for a marketplace are in a marketspace context. Cyber auctions, Internet communities and the possibilities for consumers to establish dialogues not only with companies but also with other consumers make marketspace unique. The effects of these distinctive characteristics on the behaviour of consumers have not been systematically analysed and therefore constitute the second set of questions which have to be studied. Most companies feel that they have to be online even though the effects of being on the Net are not unambiguously positive. The relevance of the relationship marketing paradigm in a marketspace context have to be studied. The relationship enhancement effects of websites from the customers’ point of view are therefore emphasized in this research paper. Representatives of the Net-generation were analysed and the results show that companies should develop marketspace strategies while Net presence has a value-added effect on consumers. The results indicate that the decision making processes of the consumers are also changing as a result of the progress of marketspace
Resumo:
"Fifty-six teachers, from four European countries, were interviewed to ascertain their attitudes to and beliefs about the Collaborative Learning Environments (CLEs) which were designed under the Innovative Technologies for Collaborative Learning Project. Their responses were analysed using categories based on a model from cultural-historical activity theory [Engestrom, Y. (1987). Learning by expanding.- An activity-theoretical approach to developmental research. Helsinki: Orienta-Konsultit; Engestrom, Y., Engestrom, R., & Suntio, A. (2002). Can a school community learn to master its own future? An activity-theoretical study of expansive learning among middle school teachers. In G. Wells & G. Claxton (Eds.), Learning for life in the 21st century. Oxford: Blackwell Publishers]. The teachers were positive about CLEs and their possible role in initiating pedagogical innovation and enhancing personal professional development. This positive perception held across cultures and national boundaries. Teachers were aware of the fact that demanding planning was needed for successful implementations of CLEs. However, the specific strategies through which the teachers can guide students' inquiries in CLEs and the assessment of new competencies that may characterize student performance in the CLEs were poorly represented in the teachers' reflections on CLEs. The attitudes and beliefs of the teachers from separate countries had many similarities, but there were also some clear differences, which are discussed in the article. (c) 2005 Elsevier Ltd. All rights reserved."
Resumo:
The open development model of software production has been characterized as the future model of knowledge production and distributed work. Open development model refers to publicly available source code ensured by an open source license, and the extensive and varied distributed participation of volunteers enabled by the Internet. Contemporary spokesmen of open source communities and academics view open source development as a new form of volunteer work activity characterized by hacker ethic and bazaar governance . The development of the Linux operating system is perhaps the best know example of such an open source project. It started as an effort by a user-developer and grew quickly into a large project with hundreds of user-developer as contributors. However, in hybrids , in which firms participate in open source projects oriented towards end-users, it seems that most users do not write code. The OpenOffice.org project, initiated by Sun Microsystems, in this study represents such a project. In addition, the Finnish public sector ICT decision-making concerning open source use is studied. The purpose is to explore the assumptions, theories and myths related to the open development model by analysing the discursive construction of the OpenOffice.org community: its developers, users and management. The qualitative study aims at shedding light on the dynamics and challenges of community construction and maintenance, and related power relations in hybrid open source, by asking two main research questions: How is the structure and membership constellation of the community, specifically the relation between developers and users linguistically constructed in hybrid open development? What characterizes Internet-mediated virtual communities and how can they be defined? How do they differ from hierarchical forms of knowledge production on one hand and from traditional volunteer communities on the other? The study utilizes sociological, psychological and anthropological concepts of community for understanding the connection between the real and the imaginary in so-called virtual open source communities. Intermediary methodological and analytical concepts are borrowed from discourse and rhetorical theories. A discursive-rhetorical approach is offered as a methodological toolkit for studying texts and writing in Internet communities. The empirical chapters approach the problem of community and its membership from four complementary points of views. The data comprises mailing list discussion, personal interviews, web page writings, email exchanges, field notes and other historical documents. The four viewpoints are: 1) the community as conceived by volunteers 2) the individual contributor s attachment to the project 3) public sector organizations as users of open source 4) the community as articulated by the community manager. I arrive at four conclusions concerning my empirical studies (1-4) and two general conclusions (5-6). 1) Sun Microsystems and OpenOffice.org Groupware volunteers failed in developing necessary and sufficient open code and open dialogue to ensure collaboration thus splitting the Groupware community into volunteers we and the firm them . 2) Instead of separating intrinsic and extrinsic motivations, I find that volunteers unique patterns of motivations are tied to changing objects and personal histories prior and during participation in the OpenOffice.org Lingucomponent project. Rather than seeing volunteers as a unified community, they can be better understood as independent entrepreneurs in search of a collaborative community . The boundaries between work and hobby are blurred and shifting, thus questioning the usefulness of the concept of volunteer . 3) The public sector ICT discourse portrays a dilemma and tension between the freedom to choose, use and develop one s desktop in the spirit of open source on one hand and the striving for better desktop control and maintenance by IT staff and user advocates, on the other. The link between the global OpenOffice.org community and the local end-user practices are weak and mediated by the problematic IT staff-(end)user relationship. 4) Authoring community can be seen as a new hybrid open source community-type of managerial practice. The ambiguous concept of community is a powerful strategic tool for orienting towards multiple real and imaginary audiences as evidenced in the global membership rhetoric. 5) The changing and contradictory discourses of this study show a change in the conceptual system and developer-user relationship of the open development model. This change is characterized as a movement from hacker ethic and bazaar governance to more professionally and strategically regulated community. 6) Community is simultaneously real and imagined, and can be characterized as a runaway community . Discursive-action can be seen as a specific type of online open source engagement. Hierarchies and structures are created through discursive acts. Key words: Open Source Software, open development model, community, motivation, discourse, rhetoric, developer, user, end-user
Resumo:
Trafficking in human beings has become one of the most talked about criminal concerns of the 21st century. But this is not all that it has become. Trafficking has also been declared as one of the most pressing human rights issues of our time. In this sense, it has become a part of the expansion of the human rights phenomenon. Although it is easy to see that the crime of trafficking violates several of the human rights of its victims, it is still, in its essence, a fairly conventional although particularly heinous and often transnational crime, consisting of acts between private actors, and lacking, therefore, the vertical effect associated traditionally with human rights violations. This thesis asks, then, why, and how, has the anti-trafficking campaign been translated in human rights language. And even more fundamentally: in light of the critical, theoretical studies surrounding the expansion of the human rights phenomenon, especially that of Costas Douzinas, who has declared that we have come to the end of human rights as a consequence of the expansion and bureaucratization of the phenomenon, can human rights actually bring salvation to the victims of trafficking? The thesis demonstrates that the translation process of the anti-trafficking campaign into human rights language has been a complicated process involving various actors, including scholars, feminist NGOs, local activists and global human rights NGOs. It has also been driven by a complicated web of interests, the most prevalent one the sincere will to help the victims having become entangled with other aims, such as political, economical, and structural goals. As a consequence of its fragmented background, the human rights approach to trafficking seeks still its final form, consisting of several different claims. After an assessment of these claims from a legal perspective, this thesis concludes that the approach is most relevant regarding the mistreatment of victims of trafficking in the hands of state authorities. It seems to be quite common that authorities have trouble identifying the victims of trafficking, which means that the rights granted to themin international and national documents are not realized in practice, but victims of trafficking are systematically deported as illegal immigrants. It is argued that in order to understand the measures of the authorities, and to assess the usefulness of human rights, it is necessary to adopt a Foucauldian perspective and to observe the measures as biopolitical defence mechanisms. From a biopolitical perspective, the victims of trafficking can be seen as a threat to the population a threat that must be eliminated either by assimilating them to the main population with the help of disciplinary techniques, or by excluding them completely from the society. This biopolitical aim is accomplished through an impenetrable net of seemingly insignificant practices and discourses that not even the participants are aware of. As a result of these practices and discourses, trafficking victims only very few of fit the myth of the perfect victim, produced by biopolitical discourses become invisible and therefore subject to deportation as (risky) illegal immigrants, turning them into bare life in the Agambenian sense, represented by the homo sacer, who cannot be sacrificed, yet does not enjoy the protection of the society and its laws. It is argued, following Jacques Rancière and Slavoj i ek, that human rights can, through their universality and formal equality, provide bare life the tools to formulate political claims and therefore utilize their politicization through their exclusion to return to the sphere of power and politics. Even though human rights have inevitably become entangled with biopolitical practices, they are still perhaps the most efficient way to challenge biopower. Human rights have not, therefore, become useless for the victims of trafficking, but they must be conceived as a universal tool to formulate political claims and challenge power .In the case of trafficking this means that human rights must be utilized to constantly renegotiate the borders of the problematic concept of victim of trafficking created by international instruments, policies and discourses, including those that are sincerely aimed to provide help for the victims.
Resumo:
The protein-protein docking programs typically perform four major tasks: (i) generation of docking poses, (ii) selecting a subset of poses, (iii) their structural refinement and (iv) scoring, ranking for the final assessment of the true quaternary structure. Although the tasks can be integrated or performed in a serial order, they are by nature modular, allowing an opportunity to substitute one algorithm with another. We have implemented two modular web services, (i) PRUNE: to select a subset of docking poses generated during sampling search (http://pallab.serc.iisc.ernet.in/prune) and (ii) PROBE: to refine, score and rank them (http://pallab.serc.iisc.ernet.in/probe). The former uses a new interface area based edge-scoring function to eliminate > 95% of the poses generated during docking search. In contrast to other multi-parameter-based screening functions, this single parameter based elimination reduces the computational time significantly, in addition to increasing the chances of selecting native-like models in the top rank list. The PROBE server performs ranking of pruned poses, after structure refinement and scoring using a regression model for geometric compatibility, and normalized interaction energy. While web-service similar to PROBE is infrequent, no web-service akin to PRUNE has been described before. Both the servers are publicly accessible and free for use.
Resumo:
Depth measures the extent of atom/residue burial within a protein. It correlates with properties such as protein stability, hydrogen exchange rate, protein-protein interaction hot spots, post-translational modification sites and sequence variability. Our server, DEPTH, accurately computes depth and solvent-accessible surface area (SASA) values. We show that depth can be used to predict small molecule ligand binding cavities in proteins. Often, some of the residues lining a ligand binding cavity are both deep and solvent exposed. Using the depth-SASA pair values for a residue, its likelihood to form part of a small molecule binding cavity is estimated. The parameters of the method were calibrated over a training set of 900 high-resolution X-ray crystal structures of single-domain proteins bound to small molecules (molecular weight < 1.5 KDa). The prediction accuracy of DEPTH is comparable to that of other geometry-based prediction methods including LIGSITE, SURFNET and Pocket-Finder (all with Matthew's correlation coefficient of similar to 0.4) over a testing set of 225 single and multi-chain protein structures. Users have the option of tuning several parameters to detect cavities of different sizes, for example, geometrically flat binding sites. The input to the server is a protein 3D structure in PDB format. The users have the option of tuning the values of four parameters associated with the computation of residue depth and the prediction of binding cavities. The computed depths, SASA and binding cavity predictions are displayed in 2D plots and mapped onto 3D representations of the protein structure using Jmol. Links are provided to download the outputs. Our server is useful for all structural analysis based on residue depth and SASA, such as guiding site-directed mutagenesis experiments and small molecule docking exercises, in the context of protein functional annotation and drug discovery.
Resumo:
In this paper we propose a new method of data handling for web servers. We call this method Network Aware Buffering and Caching (NABC for short). NABC facilitates reduction of data copies in web server's data sending path, by doing three things: (1) Layout the data in main memory in a way that protocol processing can be done without data copies (2) Keep a unified cache of data in kernel and ensure safe access to it by various processes and kernel and (3) Pass only the necessary meta data between processes so that bulk data handling time spent during IPC can be reduced. We realize NABC by implementing a set of system calls and an user library. The end product of the implementation is a set of APIs specifically designed for use by the web servers. We port an in house web server called SWEET, to NABC APIs and evaluate performance using a range of workloads both simulated and real. The results show a very impressive gain of 12% to 21% in throughput for static file serving and 1.6 to 4 times gain in throughput for lightweight dynamic content serving for a server using NABC APIs over the one using UNIX APIs.