77 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Text clustering can be considered as a four step process consisting of feature extraction, text representation, document clustering and cluster interpretation. Most text clustering models consider text as an unordered collection of words. However the semantics of text would be better captured if word sequences are taken into account.

In this paper we propose a sequence based text clustering model where four novel sequence based components are introduced in each of the four steps in the text clustering process.

Experiments conducted on the Reuters dataset and Sydney Morning Herald (SMH) news archives demonstrate the advantage of the proposed sequence based model, in terms of capturing context with semantics, accuracy and speed, compared to clustering of documents based on single words and n-gram based models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An Android application uses a permission system to regulate the access to system resources and users' privacy-relevant information. Existing works have demonstrated several techniques to study the required permissions declared by the developers, but little attention has been paid towards used permissions. Besides, no specific permission combination is identified to be effective for malware detection. To fill these gaps, we have proposed a novel pattern mining algorithm to identify a set of contrast permission patterns that aim to detect the difference between clean and malicious applications. A benchmark malware dataset and a dataset of 1227 clean applications has been collected by us to evaluate the performance of the proposed algorithm. Valuable findings are obtained by analyzing the returned contrast permission patterns. © 2013 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

 This research investigated the proliferation of malicious applications on smartphones and a framework that can efficiently detect and classify such applications based on behavioural patterns was proposed. Additionally the causes and impact of unauthorised disclosure of personal information by clean applications were examined and countermeasures to protect smartphone users’ privacy were proposed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The severe class distribution shews the presence of underrepresented data, which has great effects on the performance of learning algorithm, is still a challenge of data mining and machine learning. Lots of researches currently focus on experimental comparison of the existing re-sampling approaches. We believe it requires new ways of constructing better algorithms to further balance and analyse the data set. This paper presents a Fuzzy-based Information Decomposition oversampling (FIDoS) algorithm used for handling the imbalanced data. Generally speaking, this is a new way of addressing imbalanced learning problems from missing data perspective. First, we assume that there are missing instances in the minority class that result in the imbalanced dataset. Then the proposed algorithm which takes advantages of fuzzy membership function is used to transfer information to the missing minority class instances. Finally, the experimental results demonstrate that the proposed algorithm is more practical and applicable compared to sampling techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When the average number of spam messages received is continually increasing exponentially, both the Internet service provider and the end user suffer. The lack of an efficient solution may threaten the usability of the email as a communication means. In this paper we present a filtering mechanism applying the idea of preference ranking. This filtering mechanism will distinguish spam emails from other email on the Internet. The preference ranking gives the similarity values for nominated emails and spam emails specified by users, so that the ISP/end users can deal with spam emails at filtering points. We designed three filtering points to classify nominated emails into spam email, unsure email and legitimate email. This filtering mechanism can be applied on both middleware and at the client-side. The experiments show that high precision, recall and TCR (total cost ratio) of spam emails can be predicted for the preference based filtering mechanisms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To use the vast amount of information efficiently and effectively from Web sites is very important for making informed decisions. There are, however, still many problems that need to be overcome in the information gathering research arena to enable the delivery of relevant information required by users. In this paper, an information gathering system is develop by means of multiple agents to solve those problems. We employed some ideas of Gaia's methodology and an open agent architecture to analyze and design the system. The system consists of a query preprocessing agent, information retrieval agent, information filtering agent, and information management agent. The filtering agent is trained with categorized documents and can provide users with the necessary information. The experimental results show that all agents in the system can work cooperatively to retrieve relevant information from the World Wide Web environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While the important role of family carers has been increasingly recognized in healthcare service provision, particularly for patients with acute or chronic illnesses, the family carer's information needs have not been well understood or adequately supported by health information systems. In this study, we explore the information needs of a family carer by analyzing the extensive online diary of a Vietnamese family carer supporting his wife, who was a lung cancer patient. The study provides a deep understanding of the information needs of the family carer and suggests a four-stage information journey model including identification, searching, interpretation and information sharing, and collaboration. A number of themes emerge from the study including the key role of the carer, information filtering by the carer, information sharing and collaboration, and the influence of Vietnamese culture. The paper concludes with a discussion of the requirements for health information systems that meet the needs of family carers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, much attention has been given to the mass spectrometry (MS) technology based disease classification, diagnosis, and protein-based biomarker identification. Similar to microarray based investigation, proteomic data generated by such kind of high-throughput experiments are often with high feature-to-sample ratio. Moreover, biological information and pattern are compounded with data noise, redundancy and outliers. Thus, the development of algorithms and procedures for the analysis and interpretation of such kind of data is of paramount importance. In this paper, we propose a hybrid system for analyzing such high dimensional data. The proposed method uses the k-mean clustering algorithm based feature extraction and selection procedure to bridge the filter selection and wrapper selection methods. The potential informative mass/charge (m/z) markers selected by filters are subject to the k-mean clustering algorithm for correlation and redundancy reduction, and a multi-objective Genetic Algorithm selector is then employed to identify discriminative m/z markers generated by k-mean clustering algorithm. Experimental results obtained by using the proposed method indicate that it is suitable for m/z biomarker selection and MS based sample classification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent emergence of intelligent agent technology and advances in information gathering have been the important steps forward in efficiently managing and using the vast amount of information now available on the Web to make informed decisions. There are, however, still many problems that need to be overcome in the information gathering research arena to enable the delivery of relevant information required by end users. Good decisions cannot be made without sufficient, timely, and correct information. Traditionally it is said that knowledge is power, however, nowadays sufficient, timely, and correct information is power. So gathering relevant information to meet user information needs is the crucial step for making good decisions. The ideal goal of information gathering is to obtain only the information that users need (no more and no less). However, the volume of information available, diversity formats of information, uncertainties of information, and distributed locations of information (e.g. World Wide Web) hinder the process of gathering the right information to meet the user needs. Specifically, two fundamental issues in regard to efficiency of information gathering are mismatch and overload. The mismatch means some information that meets user needs has not been gathered (or missed out), whereas, the overload means some gathered information is not what users need. Traditional information retrieval has been developed well in the past twenty years. The introduction of the Web has changed people's perceptions of information retrieval. Usually, the task of information retrieval is considered to have the function of leading the user to those documents that are relevant to his/her information needs. The similar function in information retrieval is to filter out the irrelevant documents (or called information filtering). Research into traditional information retrieval has provided many retrieval models and techniques to represent documents and queries. Nowadays, information is becoming highly distributed, and increasingly difficult to gather. On the other hand, people have found a lot of uncertainties that are contained in the user information needs. These motivate the need for research in agent-based information gathering. Agent-based information systems arise at this moment. In these kinds of systems, intelligent agents will get commitments from their users and act on the users behalf to gather the required information. They can easily retrieve the relevant information from highly distributed uncertain environments because of their merits of intelligent, autonomy and distribution. The current research for agent-based information gathering systems is divided into single agent gathering systems, and multi-agent gathering systems. In both research areas, there are still open problems to be solved so that agent-based information gathering systems can retrieve the uncertain information more effectively from the highly distributed environments. The aim of this thesis is to research the theoretical framework for intelligent agents to gather information from the Web. This research integrates the areas of information retrieval and intelligent agents. The specific research areas in this thesis are the development of an information filtering model for single agent systems, and the development of a dynamic belief model for information fusion for multi-agent systems. The research results are also supported by the construction of real information gathering agents (e.g., Job Agent) for the Internet to help users to gather useful information stored in Web sites. In such a framework, information gathering agents have abilities to describe (or learn) the user information needs, and act like users to retrieve, filter, and/or fuse the information. A rough set based information filtering model is developed to address the problem of overload. The new approach allows users to describe their information needs on user concept spaces rather than on document spaces, and it views a user information need as a rough set over the document space. The rough set decision theory is used to classify new documents into three regions: positive region, boundary region, and negative region. Two experiments are presented to verify this model, and it shows that the rough set based model provides an efficient approach to the overload problem. In this research, a dynamic belief model for information fusion in multi-agent environments is also developed. This model has a polynomial time complexity, and it has been proven that the fusion results are belief (mass) functions. By using this model, a collection fusion algorithm for information gathering agents is presented. The difficult problem for this research is the case where collections may be used by more than one agent. This algorithm, however, uses the technique of cooperation between agents, and provides a solution for this difficult problem in distributed information retrieval systems. This thesis presents the solutions to the theoretical problems in agent-based information gathering systems, including information filtering models, agent belief modeling, and collection fusions. It also presents solutions to some of the technical problems in agent-based information systems, such as document classification, the architecture for agent-based information gathering systems, and the decision in multiple agent environments. Such kinds of information gathering agents will gather relevant information from highly distributed uncertain environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we investigate an approach to eliciting practitioners’ problem-solving experience across an application domain. The approach is based on a well-known ‘pattern mining’ process which commonly results in a collection of sharable and reusable ‘design patterns’. While pattern mining has been recognised to work effectively in numerous domains, its main problem is the degree of technical proficiency that few domain practitioners are prepared to master. In our approach to pattern mining, patterns are induced indirectly from designers’ experience, as determined by analysing their past projects, the problems encountered and solutions applied in problem rectification. Through the cycles of hermeneutic revisions, the pattern mining process has been refined and ultimately its deficiencies addressed. The hermeneutic method used in the study has been clearly shown in the paper and illustrated with examples drawn from the multimedia domain. The resulting approach to experience elicitation provided opportunities for active participation of multimedia practitioners in capturing and sharing their design experience.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A performance feature around text-based artwork installed on Federation Square Plaza, Melbourne, celebrating the emergence of a new space. Nearamnew is derived from the local word 'narr-m', signifying 'the place where Melbourne now stands'. Memory traces are gathered into a whorl pattern in the cobbles and are studded with nine paved figures. Federal poems are carved into the figures, giving voice to the many historical, cultural and spiritual communities who are tributaries to this place.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information and communication technologies such as email, text messaging and video messaging are commonly used by the general population. However, international research has shown that they are not used routinely by GPs to communicate or consult with patients. Investigating Victorian GPs’ perceptions of doing so is timely given Australia’s new National Broadband Network, which may facilitate web-based modes of doctor-patient interaction. This study therefore aimed to explore Victorian GPs’ experiences of, and attitudes toward, using information and communication technologies to consult with patients. Qualitative telephone interviews were carried out with a maximum variation sample of 36 GPs from across Victoria. GPs reported a range of perspectives on using new consultation technologies within their practice. Common concerns included medico-legal and remuneration issues and perceived patient information technology literacy. Policy makers should incorporate GPs’ perspectives into primary care service delivery planning to promote the effective use of information and communication technologies in improving accessibility and quality of general practice care.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Email overload is a recent problem that there is increasingly difficulty people have faced to process the large number of emails received daily. Currently this problem becomes more and more serious and it has already affected the normal usage of email as a knowledge management tool. It has been recognized that categorizing emails into meaningful groups can greatly save cognitive load to process emails and thus this is an effective way to manage email overload problem. However, most current approaches still require significant human input when categorizing emails. In this paper we develop an automatic email clustering system, underpinned by a new nonparametric text clustering algorithm. This system does not require any predefined input parameters and can automatically generate meaningful email clusters. Experiments show our new algorithm outperforms existing text clustering algorithms with higher efficiency in terms of computational time and clustering quality measured by different gauges.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes capturing design experiences by applying grounded theory to pattern mining. The presented approach aims at inducing expert development knowledge and its subsequent packaging into domain-specific design patterns, which could later be used by both experienced and novice developers in the field. The method was evaluated empirically in a domain of Web development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a small set of patterns that are produced in a process of domain-wide pattern mining. We provide a brief description of the experience mining process across web development domain and explain how the resulting pattern languages were discovered. A subset of the mined patterns was selected for this paper because of their pertinence to most web development projects, i.e. colour scheme and readability issues and images download issue.