823 resultados para Two-stage classification


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a classification problem typically we face two challenging issues, the diverse characteristic of negative documents and sometimes a lot of negative documents that are closed to positive documents. Therefore, it is hard for a single classifier to clearly classify incoming documents into classes. This paper proposes a novel gradual problem solving to create a two-stage classifier. The first stage identifies reliable negatives (negative documents with weak positive characteristics). It concentrates on minimizing the number of false negative documents (recall-oriented). We use Rocchio, an existing recall based classifier, for this stage. The second stage is a precision-oriented “fine tuning”, concentrates on minimizing the number of false positive documents by applying pattern (a statistical phrase) mining techniques. In this stage a pattern-based scoring is followed by threshold setting (thresholding). Experiment shows that our statistical phrase based two-stage classifier is promising.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information mismatch and overload are two fundamental issues influencing the effectiveness of information filtering systems. Even though both term-based and pattern-based approaches have been proposed to address the issues, neither of these approaches alone can provide a satisfactory decision for determining the relevant information. This paper presents a novel two-stage decision model for solving the issues. The first stage is a novel rough analysis model to address the overload problem. The second stage is a pattern taxonomy mining model to address the mismatch problem. The experimental results on RCV1 and TREC filtering topics show that the proposed model significantly outperforms the state-of-the-art filtering systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Membrane proteins play important roles in many biochemical processes and are also attractive targets of drug discovery for various diseases. The elucidation of membrane protein types provides clues for understanding the structure and function of proteins. Recently we developed a novel system for predicting protein subnuclear localizations. In this paper, we propose a simplified version of our system for predicting membrane protein types directly from primary protein structures, which incorporates amino acid classifications and physicochemical properties into a general form of pseudo-amino acid composition. In this simplified system, we will design a two-stage multi-class support vector machine combined with a two-step optimal feature selection process, which proves very effective in our experiments. The performance of the present method is evaluated on two benchmark datasets consisting of five types of membrane proteins. The overall accuracies of prediction for five types are 93.25% and 96.61% via the jackknife test and independent dataset test, respectively. These results indicate that our method is effective and valuable for predicting membrane protein types. A web server for the proposed method is available at http://www.juemengt.com/jcc/memty_page.php

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new hierarchical clustering algorithm for crop stage classification using hyperspectral satellite image. Amongst the multiple benefits and uses of remote sensing, one of the important application is to solve the problem of crop stage classification. Modern commercial imaging satellites, owing to their large volume of satellite imagery, offer greater opportunities for automated image analysis. Hence, we propose a unsupervised algorithm namely Hierarchical Artificial Immune System (HAIS) of two steps: splitting the cluster centers and merging them. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The classification results have been compared with K-means and Artificial Immune System algorithms. From the results obtained, we conclude that the proposed hierarchical clustering algorithm is accurate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The presence of a large number of spectral bands in the hyperspectral images increases the capability to distinguish between various physical structures. However, they suffer from the high dimensionality of the data. Hence, the processing of hyperspectral images is applied in two stages: dimensionality reduction and unsupervised classification techniques. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The selected dimensions are classified using Niche Hierarchical Artificial Immune System (NHAIS). The NHAIS combines the splitting method to search for the optimal cluster centers using niching procedure and the merging method is used to group the data points based on majority voting. Results are presented for two hyperspectral images namely EO-1 Hyperion image and Indian pines image. A performance comparison of this proposed hierarchical clustering algorithm with the earlier three unsupervised algorithms is presented. From the results obtained, we deduce that the NHAIS is efficient.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies two models of two-stage processing with no-wait in process. The first model is the two-machine flow shop, and the other is the assembly model. For both models we consider the problem of minimizing the makespan, provided that the setup and removal times are separated from the processing times. Each of these scheduling problems is reduced to the Traveling Salesman Problem (TSP). We show that, in general, the assembly problem is NP-hard in the strong sense. On the other hand, the two-machine flow shop problem reduces to the Gilmore-Gomory TSP, and is solvable in polynomial time. The same holds for the assembly problem under some reasonable assumptions. Using these and existing results, we provide a complete complexity classification of the relevant two-stage no-wait scheduling models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model's generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information Overload and Mismatch are two fundamental problems affecting the effectiveness of information filtering systems. Even though both term-based and patternbased approaches have been proposed to address the problems of overload and mismatch, neither of these approaches alone can provide a satisfactory solution to address these problems. This paper presents a novel two-stage information filtering model which combines the merits of term-based and pattern-based approaches to effectively filter sheer volume of information. In particular, the first filtering stage is supported by a novel rough analysis model which efficiently removes a large number of irrelevant documents, thereby addressing the overload problem. The second filtering stage is empowered by a semantically rich pattern taxonomy mining model which effectively fetches incoming documents according to the specific information needs of a user, thereby addressing the mismatch problem. The experimental results based on the RCV1 corpus show that the proposed twostage filtering model significantly outperforms the both termbased and pattern-based information filtering models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel two-stage information filtering model which combines the merits of term-based and pattern- based approaches to effectively filter sheer volume of information. In particular, the first filtering stage is supported by a novel rough analysis model which efficiently removes a large number of irrelevant documents, thereby addressing the overload problem. The second filtering stage is empowered by a semantically rich pattern taxonomy mining model which effectively fetches incoming documents according to the specific information needs of a user, thereby addressing the mismatch problem. The experiments have been conducted to compare the proposed two-stage filtering (T-SM) model with other possible "term-based + pattern-based" or "term-based + term-based" IF models. The results based on the RCV1 corpus show that the T-SM model significantly outperforms other types of "two-stage" IF models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper seeks to identify and quantify sources of the lagging productivity in Singapore’s retail sector as reported in the Economic Strategies Committee 2010 report. A two-stage analysis is adopted. In the first stage, the Malmquist productivity index is employed which provides measures of productivity change, technological change and efficiency change. In the second stage, technical efficiency estimates are regressed against explanatory variables based on a truncated regression model. Sources of technical efficiency were attributed to quality of workers while product assortment and competition negatively impacted on efficiency.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spectrum sensing is considered to be one of the most important tasks in cognitive radio. One of the common assumption among current spectrum sensing detectors is the full presence or complete absence of the primary user within the sensing period. In reality, there are many situations where the primary user signal only occupies a portion of the observed signal and the assumption of primary user duty cycle not necessarily fulfilled. In this paper we show that the true detection performance can degrade from the assumed achievable values when the observed primary user exhibits a certain duty cycle. Therefore, a two-stage detection method incorporating primary user duty cycle that enhances the detection performance is proposed. The proposed detector can improve the probability of detection under low duty cycle at the expense of a small decrease in performance at high duty cycle.