976 resultados para Filtering techniques


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Information overload has become a serious issue for web users. Personalisation can provide effective solutions to overcome this problem. Recommender systems are one popular personalisation tool to help users deal with this issue. As the base of personalisation, the accuracy and efficiency of web user profiling affects the performances of recommender systems and other personalisation systems greatly. In Web 2.0, the emerging user information provides new possible solutions to profile users. Folksonomy or tag information is a kind of typical Web 2.0 information. Folksonomy implies the users‘ topic interests and opinion information. It becomes another source of important user information to profile users and to make recommendations. However, since tags are arbitrary words given by users, folksonomy contains a lot of noise such as tag synonyms, semantic ambiguities and personal tags. Such noise makes it difficult to profile users accurately or to make quality recommendations. This thesis investigates the distinctive features and multiple relationships of folksonomy and explores novel approaches to solve the tag quality problem and profile users accurately. Harvesting the wisdom of crowds and experts, three new user profiling approaches are proposed: folksonomy based user profiling approach, taxonomy based user profiling approach, hybrid user profiling approach based on folksonomy and taxonomy. The proposed user profiling approaches are applied to recommender systems to improve their performances. Based on the generated user profiles, the user and item based collaborative filtering approaches, combined with the content filtering methods, are proposed to make recommendations. The proposed new user profiling and recommendation approaches have been evaluated through extensive experiments. The effectiveness evaluation experiments were conducted on two real world datasets collected from Amazon.com and CiteULike websites. The experimental results demonstrate that the proposed user profiling and recommendation approaches outperform those related state-of-the-art approaches. In addition, this thesis proposes a parallel, scalable user profiling implementation approach based on advanced cloud computing techniques such as Hadoop, MapReduce and Cascading. The scalability evaluation experiments were conducted on a large scaled dataset collected from Del.icio.us website. This thesis contributes to effectively use the wisdom of crowds and expert to help users solve information overload issues through providing more accurate, effective and efficient user profiling and recommendation approaches. It also contributes to better usages of taxonomy information given by experts and folksonomy information contributed by users in Web 2.0.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel two-stage information filtering model which combines the merits of term-based and pattern- based approaches to effectively filter sheer volume of information. In particular, the first filtering stage is supported by a novel rough analysis model which efficiently removes a large number of irrelevant documents, thereby addressing the overload problem. The second filtering stage is empowered by a semantically rich pattern taxonomy mining model which effectively fetches incoming documents according to the specific information needs of a user, thereby addressing the mismatch problem. The experiments have been conducted to compare the proposed two-stage filtering (T-SM) model with other possible "term-based + pattern-based" or "term-based + term-based" IF models. The results based on the RCV1 corpus show that the T-SM model significantly outperforms other types of "two-stage" IF models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Signal Processing (SP) is a subject of central importance in engineering and the applied sciences. Signals are information-bearing functions, and SP deals with the analysis and processing of signals (by dedicated systems) to extract or modify information. Signal processing is necessary because signals normally contain information that is not readily usable or understandable, or which might be disturbed by unwanted sources such as noise. Although many signals are non-electrical, it is common to convert them into electrical signals for processing. Most natural signals (such as acoustic and biomedical signals) are continuous functions of time, with these signals being referred to as analog signals. Prior to the onset of digital computers, Analog Signal Processing (ASP) and analog systems were the only tool to deal with analog signals. Although ASP and analog systems are still widely used, Digital Signal Processing (DSP) and digital systems are attracting more attention, due in large part to the significant advantages of digital systems over the analog counterparts. These advantages include superiority in performance,s peed, reliability, efficiency of storage, size and cost. In addition, DSP can solve problems that cannot be solved using ASP, like the spectral analysis of multicomonent signals, adaptive filtering, and operations at very low frequencies. Following the recent developments in engineering which occurred in the 1980's and 1990's, DSP became one of the world's fastest growing industries. Since that time DSP has not only impacted on traditional areas of electrical engineering, but has had far reaching effects on other domains that deal with information such as economics, meteorology, seismology, bioengineering, oceanology, communications, astronomy, radar engineering, control engineering and various other applications. This book is based on the Lecture Notes of Associate Professor Zahir M. Hussain at RMIT University (Melbourne, 2001-2009), the research of Dr. Amin Z. Sadik (at QUT & RMIT, 2005-2008), and the Note of Professor Peter O'Shea at Queensland University of Technology. Part I of the book addresses the representation of analog and digital signals and systems in the time domain and in the frequency domain. The core topics covered are convolution, transforms (Fourier, Laplace, Z. Discrete-time Fourier, and Discrete Fourier), filters, and random signal analysis. There is also a treatment of some important applications of DSP, including signal detection in noise, radar range estimation, banking and financial applications, and audio effects production. Design and implementation of digital systems (such as integrators, differentiators, resonators and oscillators are also considered, along with the design of conventional digital filters. Part I is suitable for an elementary course in DSP. Part II (which is suitable for an advanced signal processing course), considers selected signal processing systems and techniques. Core topics covered are the Hilbert transformer, binary signal transmission, phase-locked loops, sigma-delta modulation, noise shaping, quantization, adaptive filters, and non-stationary signal analysis. Part III presents some selected advanced DSP topics. We hope that this book will contribute to the advancement of engineering education and that it will serve as a general reference book on digital signal processing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Stem cells have attracted tremendous interest in recent times due to their promise in providing innovative new treatments for a great range of currently debilitating diseases. This is due to their potential ability to regenerate and repair damaged tissue, and hence restore lost body function, in a manner beyond the body's usual healing process. Bone marrow-derived mesenchymal stem cells or bone marrow stromal cells are one type of adult stem cells that are of particular interest. Since they are derived from a living human adult donor, they do not have the ethical issues associated with the use of human embryonic stem cells. They are also able to be taken from a patient or other donors with relative ease and then grown readily in the laboratory for clinical application. Despite the attractive properties of bone marrow stromal cells, there is presently no quick and easy way to determine the quality of a sample of such cells. Presently, a sample must be grown for weeks and subject to various time-consuming assays, under the direction of an expert cell biologist, to determine whether it will be useful. Hence there is a great need for innovative new ways to assess the quality of cell cultures for research and potential clinical application. The research presented in this thesis investigates the use of computerised image processing and pattern recognition techniques to provide a quicker and simpler method for the quality assessment of bone marrow stromal cell cultures. In particular, aim of this work is to find out whether it is possible, through the use of image processing and pattern recognition techniques, to predict the growth potential of a culture of human bone marrow stromal cells at early stages, before it is readily apparent to a human observer. With the above aim in mind, a computerised system was developed to classify the quality of bone marrow stromal cell cultures based on phase contrast microscopy images. Our system was trained and tested on mixed images of both healthy and unhealthy bone marrow stromal cell samples taken from three different patients. This system, when presented with 44 previously unseen bone marrow stromal cell culture images, outperformed human experts in the ability to correctly classify healthy and unhealthy cultures. The system correctly classified the health status of an image 88% of the time compared to an average of 72% of the time for human experts. Extensive training and testing of the system on a set of 139 normal sized images and 567 smaller image tiles showed an average performance of 86% and 85% correct classifications, respectively. The contributions of this thesis include demonstrating the applicability and potential of computerised image processing and pattern recognition techniques to the task of quality assessment of bone marrow stromal cell cultures. As part of this system, an image normalisation method has been suggested and a new segmentation algorithm has been developed for locating cell regions of irregularly shaped cells in phase contrast images. Importantly, we have validated the efficacy of both the normalisation and segmentation method, by demonstrating that both methods quantitatively improve the classification performance of subsequent pattern recognition algorithms, in discriminating between cell cultures of differing health status. We have shown that the quality of a cell culture of bone marrow stromal cells may be assessed without the need to either segment individual cells or to use time-lapse imaging. Finally, we have proposed a set of features, that when extracted from the cell regions of segmented input images, can be used to train current state of the art pattern recognition systems to predict the quality of bone marrow stromal cell cultures earlier and more consistently than human experts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ethernet is a key component of the standards used for digital process buses in transmission substations, namely IEC 61850 and IEEE Std 1588-2008 (PTPv2). These standards use multicast Ethernet frames that can be processed by more than one device. This presents some significant engineering challenges when implementing a sampled value process bus due to the large amount of network traffic. A system of network traffic segregation using a combination of Virtual LAN (VLAN) and multicast address filtering using managed Ethernet switches is presented. This includes VLAN prioritisation of traffic classes such as the IEC 61850 protocols GOOSE, MMS and sampled values (SV), and other protocols like PTPv2. Multicast address filtering is used to limit SV/GOOSE traffic to defined subsets of subscribers. A method to map substation plant reference designations to multicast address ranges is proposed that enables engineers to determine the type of traffic and location of the source by inspecting the destination address. This method and the proposed filtering strategy simplifies future changes to the prioritisation of network traffic, and is applicable to both process bus and station bus applications.