852 resultados para content-based filtering
Resumo:
The number of research papers available today is growing at a staggering rate, generating a huge amount of information that people cannot keep up with. According to a tendency indicated by the United States’ National Science Foundation, more than 10 million new papers will be published in the next 20 years. Because most of these papers will be available on the Web, this research focus on exploring issues on recommending research papers to users, in order to directly lead users to papers of their interest. Recommender systems are used to recommend items to users among a huge stream of available items, according to users’ interests. This research focuses on the two most prevalent techniques to date, namely Content-Based Filtering and Collaborative Filtering. The first explores the text of the paper itself, recommending items similar in content to the ones the user has rated in the past. The second explores the citation web existing among papers. As these two techniques have complementary advantages, we explored hybrid approaches to recommending research papers. We created standalone and hybrid versions of algorithms and evaluated them through both offline experiments on a database of 102,295 papers, and an online experiment with 110 users. Our results show that the two techniques can be successfully combined to recommend papers. The coverage is also increased at the level of 100% in the hybrid algorithms. In addition, we found that different algorithms are more suitable for recommending different kinds of papers. Finally, we verified that users’ research experience influences the way users perceive recommendations. In parallel, we found that there are no significant differences in recommending papers for users from different countries. However, our results showed that users’ interacting with a research paper Recommender Systems are much happier when the interface is presented in the user’s native language, regardless the language that the papers are written. Therefore, an interface should be tailored to the user’s mother language.
Resumo:
Traditional content-based filtering methods usually utilize text extraction and classification techniques for building user profiles as well as for representations of contents, i.e. item profiles. These methods have some disadvantages e.g. mismatch between user profile terms and item profile terms, leading to low performance. Some of the disadvantages can be overcome by incorporating a common ontology which enables representing both the users' and the items' profiles with concepts taken from the same vocabulary. We propose a new content-based method for filtering and ranking the relevancy of items for users, which utilizes a hierarchical ontology. The method measures the similarity of the user's profile to the items' profiles, considering the existing of mutual concepts in the two profiles, as well as the existence of "related" concepts, according to their position in the ontology. The proposed filtering algorithm computes the similarity between the users' profiles and the items' profiles, and rank-orders the relevant items according to their relevancy to each user. The method is being implemented in ePaper, a personalized electronic newspaper project, utilizing a hierarchical ontology designed specifically for classification of News items. It can, however, be utilized in other domains and extended to other ontologies.
Resumo:
This research investigates techniques to analyse long duration acoustic recordings to help ecologists monitor birdcall activities. It designs a generalized algorithm to identify a broad range of bird species. It allows ecologists to search for arbitrary birdcalls of interest, rather than restricting them to just a very limited number of species on which the recogniser is trained. The algorithm can help ecologists find sounds of interest more efficiently by filtering out large volumes of unwanted sounds and only focusing on birdcalls.
Resumo:
Although context could be exploited to improve performance, elasticity and adaptation in most distributed systems that adopt the publish/subscribe (P/S) communication model, only a few researchers have focused on the area of context-aware matching in P/S systems and have explored its implications in domains with highly dynamic context like wireless sensor networks (WSNs) and IoT-enabled applications. Most adopted P/S models are context agnostic or do not differentiate context from the other application data. In this article, we present a novel context-aware P/S model. SilboPS manages context explicitly, focusing on the minimization of network overhead in domains with recurrent context changes related, for example, to mobile ad hoc networks (MANETs). Our approach represents a solution that helps to efficiently share and use sensor data coming from ubiquitous WSNs across a plethora of applications intent on using these data to build context awareness. Specifically, we empirically demonstrate that decoupling a subscription from the changing context in which it is produced and leveraging contextual scoping in the filtering process notably reduces (un)subscription cost per node, while improving the global performance/throughput of the network of brokers without fltering the cost of SIENA-like topology changes.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014
Resumo:
Digital forensic examiners often need to identify the type of a file or file fragment based only on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds classification by sampling several blocks from the file. Experimental results demonstrate that up to a fifteen-fold reduction in file size analysis time can be achieved with limited impact on accuracy.
Resumo:
To sustain an ongoing rapid growth of video information, there is an emerging demand for a sophisticated content-based video indexing system. However, current video indexing solutions are still immature and lack of any standard. This doctoral consists of a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple audio-visual modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s).
Resumo:
This paper introduces PartSS, a new partition-based fil- tering for tasks performing string comparisons under edit distance constraints. PartSS offers improvements over the state-of-the-art method NGPP with the implementation of a new partitioning scheme and also improves filtering abil- ities by exploiting theoretical results on shifting and scaling ranges, thus accelerating the rate of calculating edit distance between strings. PartSS filtering has been implemented within two major tasks of data integration: similarity join and approximate membership extraction under edit distance constraints. The evaluation on an extensive range of real-world datasets demonstrates major gain in efficiency over NGPP and QGrams approaches.
Resumo:
Bioacoustic data can provide an important base for environmental monitoring. To explore a large amount of field recordings collected, an automated similarity search algorithm is presented in this paper. A region of an audio defined by frequency and time bounds is provided by a user; the content of the region is used to construct a query. In the retrieving process, our algorithm will automatically scan through recordings to search for similar regions. In detail, we present a feature extraction approach based on the visual content of vocalisations – in this case ridges, and develop a generic regional representation of vocalisations for indexing. Our feature extraction method works best for bird vocalisations showing ridge characteristics. The regional representation method allows the content of an arbitrary region of a continuous recording to be described in a compressed format.
Resumo:
Event-based systems are seen as good candidates for supporting distributed applications in dynamic and ubiquitous environments because they support decoupled and asynchronous many-to-many information dissemination. Event systems are widely used, because asynchronous messaging provides a flexible alternative to RPC (Remote Procedure Call). They are typically implemented using an overlay network of routers. A content-based router forwards event messages based on filters that are installed by subscribers and other routers. The filters are organized into a routing table in order to forward incoming events to proper subscribers and neighbouring routers. This thesis addresses the optimization of content-based routing tables organized using the covering relation and presents novel data structures and configurations for improving local and distributed operation. Data structures are needed for organizing filters into a routing table that supports efficient matching and runtime operation. We present novel results on dynamic filter merging and the integration of filter merging with content-based routing tables. In addition, the thesis examines the cost of client mobility using different protocols and routing topologies. We also present a new matching technique called temporal subspace matching. The technique combines two new features. The first feature, temporal operation, supports notifications, or content profiles, that persist in time. The second feature, subspace matching, allows more expressive semantics, because notifications may contain intervals and be defined as subspaces of the content space. We also present an application of temporal subspace matching pertaining to metadata-based continuous collection and object tracking.
Resumo:
The increased availability of image capturing devices has enabled collections of digital images to rapidly expand in both size and diversity. This has created a constantly growing need for efficient and effective image browsing, searching, and retrieval tools. Pseudo-relevance feedback (PRF) has proven to be an effective mechanism for improving retrieval accuracy. An original, simple yet effective rank-based PRF mechanism (RB-PRF) that takes into account the initial rank order of each image to improve retrieval accuracy is proposed. This RB-PRF mechanism innovates by making use of binary image signatures to improve retrieval precision by promoting images similar to highly ranked images and demoting images similar to lower ranked images. Empirical evaluations based on standard benchmarks, namely Wang, Oliva & Torralba, and Corel datasets demonstrate the effectiveness of the proposed RB-PRF mechanism in image retrieval.
Resumo:
Image filtering techniques have numerous potential applications in biomedical imaging and image processing. The design of filters largely depends on the a-priori knowledge about the type of noise corrupting the image and image features. This makes the standard filters to be application and image specific. The most popular filters such as average, Gaussian and Wiener reduce noisy artifacts by smoothing. However, this operation normally results in smoothing of the edges as well. On the other hand, sharpening filters enhance the high frequency details making the image non-smooth. An integrated general approach to design filters based on discrete cosine transform (DCT) is proposed in this study for optimal medical image filtering. This algorithm exploits the better energy compaction property of DCT and re-arrange these coefficients in a wavelet manner to get the better energy clustering at desired spatial locations. This algorithm performs optimal smoothing of the noisy image by preserving high and low frequency features. Evaluation results show that the proposed filter is robust under various noise distributions.
Resumo:
This paper presents a novel approach using combined features to retrieve images containing specific objects, scenes or buildings. The content of an image is characterized by two kinds of features: Harris-Laplace interest points described by the SIFT descriptor and edges described by the edge color histogram. Edges and corners contain the maximal amount of information necessary for image retrieval. The feature detection in this work is an integrated process: edges are detected directly based on the Harris function; Harris interest points are detected at several scales and Harris-Laplace interest points are found using the Laplace function. The combination of edges and interest points brings efficient feature detection and high recognition ratio to the image retrieval system. Experimental results show this system has good performance. © 2005 IEEE.