960 resultados para rough sets


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel two-stage information filtering model which combines the merits of term-based and pattern- based approaches to effectively filter sheer volume of information. In particular, the first filtering stage is supported by a novel rough analysis model which efficiently removes a large number of irrelevant documents, thereby addressing the overload problem. The second filtering stage is empowered by a semantically rich pattern taxonomy mining model which effectively fetches incoming documents according to the specific information needs of a user, thereby addressing the mismatch problem. The experiments have been conducted to compare the proposed two-stage filtering (T-SM) model with other possible "term-based + pattern-based" or "term-based + term-based" IF models. The results based on the RCV1 corpus show that the T-SM model significantly outperforms other types of "two-stage" IF models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

R. Jensen and Q. Shen. Fuzzy-Rough Sets Assisted Attribute Selection. IEEE Transactions on Fuzzy Systems, vol. 15, no. 1, pp. 73-89, 2007.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

X. Wang, J. Yang, X. Teng, W. Xia, and R. Jensen. Feature Selection based on Rough Sets and Particle Swarm Optimization. Pattern Recognition Letters, vol. 28, no. 4, pp. 459-471, 2007.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Q. Shen and R. Jensen, 'Rough sets, their extensions and applications,' International Journal of Automation and Computing (IJAC), vol. 4, no. 3, pp. 217-218, 2007.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Q. Shen and R. Jensen, 'Selecting Informative Features with Fuzzy-Rough Sets and its Application for Complex Systems Monitoring,' Pattern Recognition, vol. 37, no. 7, pp. 1351-1363, 2004.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

R. Jensen, Q. Shen, Data Reduction with Rough Sets, In: Encyclopedia of Data Warehousing and Mining - 2nd Edition, Vol. II, 2008.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rough Set Data Analysis (RSDA) is a non-invasive data analysis approach that solely relies on the data to find patterns and decision rules. Despite its noninvasive approach and ability to generate human readable rules, classical RSDA has not been successfully used in commercial data mining and rule generating engines. The reason is its scalability. Classical RSDA slows down a great deal with the larger data sets and takes much longer times to generate the rules. This research is aimed to address the issue of scalability in rough sets by improving the performance of the attribute reduction step of the classical RSDA - which is the root cause of its slow performance. We propose to move the entire attribute reduction process into the database. We defined a new schema to store the initial data set. We then defined SOL queries on this new schema to find the attribute reducts correctly and faster than the traditional RSDA approach. We tested our technique on two typical data sets and compared our results with the traditional RSDA approach for attribute reduction. In the end we also highlighted some of the issues with our proposed approach which could lead to future research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper highlights the prediction of learning disabilities (LD) in school-age children using rough set theory (RST) with an emphasis on application of data mining. In rough sets, data analysis start from a data table called an information system, which contains data about objects of interest, characterized in terms of attributes. These attributes consist of the properties of learning disabilities. By finding the relationship between these attributes, the redundant attributes can be eliminated and core attributes determined. Also, rule mining is performed in rough sets using the algorithm LEM1. The prediction of LD is accurately done by using Rosetta, the rough set tool kit for analysis of data. The result obtained from this study is compared with the output of a similar study conducted by us using Support Vector Machine (SVM) with Sequential Minimal Optimisation (SMO) algorithm. It is found that, using the concepts of reduct and global covering, we can easily predict the learning disabilities in children

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes an optimal strategy for extracting probabilistic rules from databases. Two inductive learning-based statistic measures and their rough set-based definitions: accuracy and coverage are introduced. The simplicity of a rule emphasized in this paper has previously been ignored in the discovery of probabilistic rules. To avoid the high computational complexity of rough-set approach, some rough-set terminologies rather than the approach itself are applied to represent the probabilistic rules. The genetic algorithm is exploited to find the optimal probabilistic rules that have the highest accuracy and coverage, and shortest length. Some heuristic genetic operators are also utilized in order to make the global searching and evolution of rules more efficiently. Experimental results have revealed that it run more efficiently and generate probabilistic classification rules of the same integrity when compared with traditional classification methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Selecting a set of features which is optimal for a given task is the problem which plays an important role in a wide variety of contexts including pattern recognition, images understanding and machine learning. The concept of reduction of the decision table based on the rough set is very useful for feature selection. In this paper, a genetic algorithm based approach is presented to search the relative reduct decision table of the rough set. This approach has the ability to accommodate multiple criteria such as accuracy and cost of classification into the feature selection process and finds the effective feature subset for texture classification . On the basis of the effective feature subset selected, this paper presents a method to extract the objects which are higher than their surroundings, such as trees or forest, in the color aerial images. The experiments results show that the feature subset selected and the method of the object extraction presented in this paper are practical and effective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rough set is a new mathematical approach to imprecision, vagueness and uncertainty. The concept of reduction of the decision table based on the rough sets is very useful for feature selection. The paper describes an application of rough sets method to feature selection and reduction in texture images recognition. The methods applied include continuous data discretization based on Fuzzy c-means and, and rough set method for feature selection and reduction. The trees extractions in the aerial images were applied. The experiments show that the methods presented in this paper are practical and effective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This Project aims to develop methods for data classification in a Data Warehouse for decision-making purposes. We also have as another goal the reduction of an attribute set in a Data Warehouse, in which a given reduced set is capable of keeping the same properties of the original one. Once we achieve a reduced set, we have a smaller computational cost of processing, we are able to identify non-relevant attributes to certain kinds of situations, and finally we are also able to recognize patterns in the database that will help us to take decisions. In order to achieve these main objectives, it will be implemented the Rough Sets algorithm. We chose PostgreSQL as our data base management system due to its efficiency, consolidation and finally, it’s an open-source system (free distribution)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Outliers are objects that show abnormal behavior with respect to their context or that have unexpected values in some of their parameters. In decision-making processes, information quality is of the utmost importance. In specific applications, an outlying data element may represent an important deviation in a production process or a damaged sensor. Therefore, the ability to detect these elements could make the difference between making a correct and an incorrect decision. This task is complicated by the large sizes of typical databases. Due to their importance in search processes in large volumes of data, researchers pay special attention to the development of efficient outlier detection techniques. This article presents a computationally efficient algorithm for the detection of outliers in large volumes of information. This proposal is based on an extension of the mathematical framework upon which the basic theory of detection of outliers, founded on Rough Set Theory, has been constructed. From this starting point, current problems are analyzed; a detection method is proposed, along with a computational algorithm that allows the performance of outlier detection tasks with an almost-linear complexity. To illustrate its viability, the results of the application of the outlier-detection algorithm to the concrete example of a large database are presented.