22 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining


Relevância:

40.00% 40.00%

Publicador:

Resumo:

A general approach to information correction and fusion for belief functions is proposed, where not only may the information items be irrelevant, but sources may lie as well. We introduce a new correction scheme, which takes into account uncertain metaknowledge on the source’s relevance and truthfulness and that generalizes Shafer’s discounting operation. We then show how to reinterpret all connectives of Boolean logic in terms of source behavior assumptions with respect to relevance and truthfulness. We are led to generalize the unnormalized Dempster’s rule to all Boolean connectives, while taking into account the uncertainties pertaining to assumptions concerning the behavior of sources. Eventually, we further extend this approach to an even more general setting, where source behavior assumptions do not have to be restricted to relevance and truthfulness.We also establish the commutativity property between correction and fusion processes, when the behaviors of the sources are independent.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Many plain text information hiding techniques demand deep semantic processing, and so suffer in reliability. In contrast, syntactic processing is a more mature and reliable technology. Assuming a perfect parser, this paper evaluates a set of automated and reversible syntactic transforms that can hide information in plain text without changing the meaning or style of a document. A large representative collection of newspaper text is fed through a prototype system. In contrast to previous work, the output is subjected to human testing to verify that the text has not been significantly compromised by the information hiding procedure, yielding a success rate of 96% and bandwidth of 0.3 bits per sentence. © 2007 SPIE-IS&T.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Modern cancer research on prognostic and predictive biomarkers demands the integration of established and emerging high-throughput technologies. However, these data are meaningless unless carefully integrated with patient clinical outcome and epidemiological information. Integrated datasets hold the key to discovering new biomarkers and therapeutic targets in cancer. We have developed a novel approach and set of methods for integrating and interrogating phenomic, genomic and clinical data sets to facilitate cancer biomarker discovery and patient stratification. Applied to a known paradigm, the biological and clinical relevance of TP53, PICan was able to recapitulate the known biomarker status and prognostic significance at a DNA, RNA and protein levels.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With over 50 billion downloads and more than 1.3 million apps in Google’s official market, Android has continued to gain popularity amongst smartphone users worldwide. At the same time there has been a rise in malware targeting the platform, with more recent strains employing highly sophisticated detection avoidance techniques. As traditional signature based methods become less potent in detecting unknown malware, alternatives are needed for timely zero-day discovery. Thus this paper proposes an approach that utilizes ensemble learning for Android malware detection. It combines advantages of static analysis with the efficiency and performance of ensemble machine learning to improve Android malware detection accuracy. The machine learning models are built using a large repository of malware samples and benign apps from a leading antivirus vendor. Experimental results and analysis presented shows that the proposed method which uses a large feature space to leverage the power of ensemble learning is capable of 97.3 % to 99% detection accuracy with very low false positive rates.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Learning or writing regular expressions to identify instances of a specific
concept within text documents with a high precision and recall is challenging.
It is relatively easy to improve the precision of an initial regular expression
by identifying false positives covered and tweaking the expression to avoid the
false positives. However, modifying the expression to improve recall is difficult
since false negatives can only be identified by manually analyzing all documents,
in the absence of any tools to identify the missing instances. We focus on partially
automating the discovery of missing instances by soliciting minimal user
feedback. We present a technique to identify good generalizations of a regular
expression that have improved recall while retaining high precision. We empirically
demonstrate the effectiveness of the proposed technique as compared to
existing methods and show results for a variety of tasks such as identification of
dates, phone numbers, product names, and course numbers on real world datasets