963 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An Android application uses a permission system to regulate the access to system resources and users' privacy-relevant information. Existing works have demonstrated several techniques to study the required permissions declared by the developers, but little attention has been paid towards used permissions. Besides, no specific permission combination is identified to be effective for malware detection. To fill these gaps, we have proposed a novel pattern mining algorithm to identify a set of contrast permission patterns that aim to detect the difference between clean and malicious applications. A benchmark malware dataset and a dataset of 1227 clean applications has been collected by us to evaluate the performance of the proposed algorithm. Valuable findings are obtained by analyzing the returned contrast permission patterns. © 2013 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

 This research investigated the proliferation of malicious applications on smartphones and a framework that can efficiently detect and classify such applications based on behavioural patterns was proposed. Additionally the causes and impact of unauthorised disclosure of personal information by clean applications were examined and countermeasures to protect smartphone users’ privacy were proposed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Various factors are believed to govern the selection of references in citation networks, but a precise, quantitative determination of their importance has remained elusive. In this paper, we show that three factors can account for the referencing pattern of citation networks for two topics, namely "graphenes" and "complex networks", thus allowing one to reproduce the topological features of the networks built with papers being the nodes and the edges established by citations. The most relevant factor was content similarity, while the other two - in-degree (i.e. citation counts) and age of publication - had varying importance depending on the topic studied. This dependence indicates that additional factors could play a role. Indeed, by intuition one should expect the reputation (or visibility) of authors and/or institutions to affect the referencing pattern, and this is only indirectly considered via the in-degree that should correlate with such reputation. Because information on reputation is not readily available, we simulated its effect on artificial citation networks considering two communities with distinct fitness (visibility) parameters. One community was assumed to have twice the fitness value of the other, which amounts to a double probability for a paper being cited. While the h-index for authors in the community with larger fitness evolved with time with slightly higher values than for the control network (no fitness considered), a drastic effect was noted for the community with smaller fitness. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Development of homology modeling methods will remain an area of active research. These methods aim to develop and model increasingly accurate three-dimensional structures of yet uncrystallized therapeutically relevant proteins e.g. Class A G-Protein Coupled Receptors. Incorporating protein flexibility is one way to achieve this goal. Here, I will discuss the enhancement and validation of the ligand-steered modeling, originally developed by Dr. Claudio Cavasotto, via cross modeling of the newly crystallized GPCR structures. This method uses known ligands and known experimental information to optimize relevant protein binding sites by incorporating protein flexibility. The ligand-steered models were able to model, reasonably reproduce binding sites and the co-crystallized native ligand poses of the β2 adrenergic and Adenosine 2A receptors using a single template structure. They also performed better than the choice of template, and crude models in a small scale high-throughput docking experiments and compound selectivity studies. Next, the application of this method to develop high-quality homology models of Cannabinoid Receptor 2, an emerging non-psychotic pain management target, is discussed. These models were validated by their ability to rationalize structure activity relationship data of two, inverse agonist and agonist, series of compounds. The method was also applied to improve the virtual screening performance of the β2 adrenergic crystal structure by optimizing the binding site using β2 specific compounds. These results show the feasibility of optimizing only the pharmacologically relevant protein binding sites and applicability to structure-based drug design projects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the number of protein folds is quite limited, a mode of analysis that will be increasingly common in the future, especially with the advent of structural genomics, is to survey and re-survey the finite parts list of folds from an expanding number of perspectives. We have developed a new resource, called PartsList, that lets one dynamically perform these comparative fold surveys. It is available on the web at http://bioinfo.mbb.yale.edu/partslist and http://www.partslist.org. The system is based on the existing fold classifications and functions as a form of companion annotation for them, providing ‘global views’ of many already completed fold surveys. The central idea in the system is that of comparison through ranking; PartsList will rank the approximately 420 folds based on more than 180 attributes. These include: (i) occurrence in a number of completely sequenced genomes (e.g. it will show the most common folds in the worm versus yeast); (ii) occurrence in the structure databank (e.g. most common folds in the PDB); (iii) both absolute and relative gene expression information (e.g. most changing folds in expression over the cell cycle); (iv) protein–protein interactions, based on experimental data in yeast and comprehensive PDB surveys (e.g. most interacting fold); (v) sensitivity to inserted transposons; (vi) the number of functions associated with the fold (e.g. most multi-functional folds); (vii) amino acid composition (e.g. most Cys-rich folds); (viii) protein motions (e.g. most mobile folds); and (ix) the level of similarity based on a comprehensive set of structural alignments (e.g. most structurally variable folds). The integration of whole-genome expression and protein–protein interaction data with structural information is a particularly novel feature of our system. We provide three ways of visualizing the rankings: a profiler emphasizing the progression of high and low ranks across many pre-selected attributes, a dynamic comparer for custom comparisons and a numerical rankings correlator. These allow one to directly compare very different attributes of a fold (e.g. expression level, genome occurrence and maximum motion) in the uniform numerical format of ranks. This uniform framework, in turn, highlights the way that the frequency of many of the attributes falls off with approximate power-law behavior (i.e. according to V–b, for attribute value V and constant exponent b), with a few folds having large values and most having small values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The visual stimuli that elicit neural activity differ for different retinal ganglion cells and these cells have been categorized by the visual information that they transmit. If specific visual information is conveyed exclusively or primarily by a particular set of ganglion cells, one might expect the cells to be organized spatially so that their sampling of information from the visual field is complete but not redundant. In other words, the laterally spreading dendrites of the ganglion cells should completely cover the retinal plane without gaps or significant overlap. The first evidence for this sort of arrangement, which has been called a tiling or tessellation, was for the two types of "alpha" ganglion cells in cat retina. Other reports of tiling by ganglion cells have been made subsequently. We have found evidence of a particularly rigorous tiling for the four types of ganglion cells in rabbit retina that convey information about the direction of retinal image motion (the ON-OFF direction-selective cells). Although individual cells in the four groups are morphologically indistinguishable, they are organized as four overlaid tilings, each tiling consisting of like-type cells that respond preferentially to a particular direction of retinal image motion. These observations lend support to the hypothesis that tiling is a general feature of the organization of information outflow from the retina and clearly implicate mechanisms for recognition of like-type cells and establishment of mutually acceptable territories during retinal development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The exponential increase of subjective, user-generated content since the birth of the Social Web, has led to the necessity of developing automatic text processing systems able to extract, process and present relevant knowledge. In this paper, we tackle the Opinion Retrieval, Mining and Summarization task, by proposing a unified framework, composed of three crucial components (information retrieval, opinion mining and text summarization) that allow the retrieval, classification and summarization of subjective information. An extensive analysis is conducted, where different configurations of the framework are suggested and analyzed, in order to determine which is the best one, and under which conditions. The evaluation carried out and the results obtained show the appropriateness of the individual components, as well as the framework as a whole. By achieving an improvement over 10% compared to the state-of-the-art approaches in the context of blogs, we can conclude that subjective text can be efficiently dealt with by means of our proposed framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bibliography: p. 61-69.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"NSF 07-28" --p. 4 of cover.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"January 1985."

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article, we propose a framework, namely, Prediction-Learning-Distillation (PLD) for interactive document classification and distilling misclassified documents. Whenever a user points out misclassified documents, the PLD learns from the mistakes and identifies the same mistakes from all other classified documents. The PLD then enforces this learning for future classifications. If the classifier fails to accept relevant documents or reject irrelevant documents on certain categories, then PLD will assign those documents as new positive/negative training instances. The classifier can then strengthen its weakness by learning from these new training instances. Our experiments’ results have demonstrated that the proposed algorithm can learn from user-identified misclassified documents, and then distil the rest successfully.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional content-based filtering methods usually utilize text extraction and classification techniques for building user profiles as well as for representations of contents, i.e. item profiles. These methods have some disadvantages e.g. mismatch between user profile terms and item profile terms, leading to low performance. Some of the disadvantages can be overcome by incorporating a common ontology which enables representing both the users' and the items' profiles with concepts taken from the same vocabulary. We propose a new content-based method for filtering and ranking the relevancy of items for users, which utilizes a hierarchical ontology. The method measures the similarity of the user's profile to the items' profiles, considering the existing of mutual concepts in the two profiles, as well as the existence of "related" concepts, according to their position in the ontology. The proposed filtering algorithm computes the similarity between the users' profiles and the items' profiles, and rank-orders the relevant items according to their relevancy to each user. The method is being implemented in ePaper, a personalized electronic newspaper project, utilizing a hierarchical ontology designed specifically for classification of News items. It can, however, be utilized in other domains and extended to other ontologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The activities of the Institute of Information Technologies in the area of automatic text processing are outlined. Major problems related to different steps of processing are pointed out together with the shortcomings of the existing solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The principal feature of ontology, which is developed for a text processing, is wider knowledge representation of an external world due to introduction of three-level hierarchy. It allows to improve semantic interpretation of natural language texts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While news stories are an important traditional medium to broadcast and consume news, microblogging has recently emerged as a place where people can dis- cuss, disseminate, collect or report information about news. However, the massive information in the microblogosphere makes it hard for readers to keep up with these real-time updates. This is especially a problem when it comes to breaking news, where people are more eager to know “what is happening”. Therefore, this dis- sertation is intended as an exploratory effort to investigate computational methods to augment human effort when monitoring the development of breaking news on a given topic from a microblog stream by extractively summarizing the updates in a timely manner. More specifically, given an interest in a topic, either entered as a query or presented as an initial news report, a microblog temporal summarization system is proposed to filter microblog posts from a stream with three primary concerns: topical relevance, novelty, and salience. Considering the relatively high arrival rate of microblog streams, a cascade framework consisting of three stages is proposed to progressively reduce quantity of posts. For each step in the cascade, this dissertation studies methods that improve over current baselines. In the relevance filtering stage, query and document expansion techniques are applied to mitigate sparsity and vocabulary mismatch issues. The use of word embedding as a basis for filtering is also explored, using unsupervised and supervised modeling to characterize lexical and semantic similarity. In the novelty filtering stage, several statistical ways of characterizing novelty are investigated and ensemble learning techniques are used to integrate results from these diverse techniques. These results are compared with a baseline clustering approach using both standard and delay-discounted measures. In the salience filtering stage, because of the real-time prediction requirement a method of learning verb phrase usage from past relevant news reports is used in conjunction with some standard measures for characterizing writing quality. Following a Cranfield-like evaluation paradigm, this dissertation includes a se- ries of experiments to evaluate the proposed methods for each step, and for the end- to-end system. New microblog novelty and salience judgments are created, building on existing relevance judgments from the TREC Microblog track. The results point to future research directions at the intersection of social media, computational jour- nalism, information retrieval, automatic summarization, and machine learning.