99 resultados para Classification Nomenclature


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm — the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel ant system based optimisation method which integrates genetic algorithms and simplex algorithms. This method is able to not only speed up the search process for solutions, but also improve the quality of the solutions. In this paper, the proposed method is applied to set up a learning model for the "tuned" mask, which is used for texture classification. Experimental results on aerial images and comparisons with genetic algorithms and genetic simplex algorithms are presented to illustrate the merit and feasibility of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a novel ant colony algorithm integrating genetic algorithms and simplex algorithms. This method is able to not only speed up searching process for optimal solutions, but also improve the quality of the solutions. The proposed method is applied to set up a learning model for the "tuned" mask, which is used for texture classification. Experimental results on real world images and comparisons with genetic algorithms and genetic simplex algorithms are presented to illustrate the merit and feasibility of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rough set is a new mathematical approach to imprecision, vagueness and uncertainty. The concept of reduction of the decision table based on the rough sets is very useful for feature selection. The paper describes an application of rough sets method to feature selection and reduction in texture images recognition. The methods applied include continuous data discretization based on Fuzzy c-means and, and rough set method for feature selection and reduction. The trees extractions in the aerial images were applied. The experiments show that the methods presented in this paper are practical and effective.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An effective scheme for soccer summarization is significant to improve the usage of this massively growing video data. The paper presents an extension to our recent work which proposed a framework to integrate highlights into play-breaks to construct more complete soccer summaries. The current focus is to demonstrate the benefits of detecting some specific audio-visual features during play-break sequences in order to classify highlights contained within them. The main purpose is to generate summaries which are self-consumable individually. To support this framework, the algorithms for shot classification and detection of near-goal and slow-motion replay scenes is described. The results of our experiment using 5 soccer videos (20 minutes each) show the performance and reliability of our framework.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a performance study of four statistical test algorithms used to identify smooth image blocks in order to filter the reconstructed image of a video coded image. The four algorithms considered are the Coefficient of Variation (CV), Exponential Entropy of Pal and Pal (E), Shannon's (Logarithmic) Entropy (H), and Quadratic Entropy (Q). These statistical algorithms are employed to distinguish between smooth and textured blocks in a reconstructed image. The linear filtering is carried out on the smooth blocks of the image to reduce the blocking artefact. The rationale behind applying the filter on the smooth blocks only is that the blocking artefact is visually more prominent in the smooth region of an image rather than in the textured region.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining refers to extracting or "mining" knowledge from large amounts of data. It is an increasingly popular field that uses statistical, visualization, machine learning, and other data manipulation and knowledge extraction techniques aimed at gaining an insight into the relationships and patterns hidden in the data. Availability of digital data within picture archiving and communication systems raises a possibility of health care and research enhancement associated with manipulation, processing and handling of data by computers.That is the basis for computer-assisted radiology development. Further development of computer-assisted radiology is associated with the use of new intelligent capabilities such as multimedia support and data mining in order to discover the relevant knowledge for diagnosis. It is very useful if results of data mining can be communicated to humans in an understandable way. In this paper, we present our work on data mining in medical image archiving systems. We investigate the use of a very efficient data mining technique, a decision tree, in order to learn the knowledge for computer-assisted image analysis. We apply our method to the classification of x-ray images for lung cancer diagnosis. The proposed technique is based on an inductive decision tree learning algorithm that has low complexity with high transparency and accuracy. The results show that the proposed algorithm is robust, accurate, fast, and it produces a comprehensible structure, summarizing the knowledge it induces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The communication via email is one of the most popular services of the Internet. Emails have brought us great convenience in our daily work and life. However, unsolicited messages or spam, flood our email boxes, which results in bandwidth, time and money wasting. To this end, this paper presents a rough set based model to classify emails into three categories - spam, no-spam and suspicious, rather than two classes (spam and non-spam) in most currently used approaches. By comparing with popular classification methods like Naive Bayes classification, the error ratio that a non-spam is discriminated to spam can be reduced using our proposed model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose - Research has so far not approached the contents of corporate code of ethics from a strategic classification point of view. Therefore, the objective of this paper is to introduce and describe a framework of classification and empirical illustration to provide insights into the strategic approaches of corporate code of ethics content within and across contextual business environments.

Design/methodology/approach -
The paper summarizes the content analysis of code prescription and the intensity of codification in the contents of 78 corporate codes of ethics in Australia.

Findings - The paper finds that, generally, the studied corporate codes of ethics in Australia are of standardized and replicated strategic approaches. In particular, customized and individualized strategic approaches are far from penetrating the ethos of corporate codes of ethics content.

Research limitations/implications -
The research is limited to Australian codes of ethics. Suggestions for further research are provided in terms of the search for best practice of customized and individualized corporate codes of ethics content across countries.

Practical implications -
The framework contributes to an identification of four strategic approaches of corporate codes of ethics content, namely standardized, replicated, individualized and customized.

Originality/value - The principal contribution of this paper is a generic framework to identify strategic approaches of corporate codes of ethics content. The framework is derived from two generic dimensions: the context of application and the application of content. The timing of application is also a crucial generic dimension to the success or failure of codes of ethics content. Empirical illustrations based upon corporate codes of ethics in Australia's top companies underpin the topic explored.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper contributes to a better understanding of geophysical characteristics and benthic communities in the Hopkins site in Victoria, Australia. An automated decision tree classification system was used to classify substrata and dominant biota communities. Geophysical sampling and underwater video data collected in this study reveals a complex bathymetry and biological structure which complements the limited information of benthic marine ecosystems in coastal waters of Victoria. The technique of combining derivative products from the backscatter and the bathymetry datasets was found to improve separability for broad biota and substrata categories over the use of either of these datasets alone.


Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the key applications of microarray studies is to select and classify gene expression profiles of cancer and normal subjects. In this study, two hybrid approaches–genetic algorithm with decision tree (GADT) and genetic algorithm with neural network (GANN)–are utilized to select optimal gene sets which contribute to the highest classification accuracy. Two benchmark microarray datasets were tested, and the most significant disease related genes have been identified. Furthermore, the selected gene sets achieved comparably high sample classification accuracy (96.79% and 94.92% in colon cancer dataset, 98.67% and 98.05% in leukemia dataset) compared with those obtained by mRMR algorithm. The study results indicate that these two hybrid methods are able to select disease related genes and improve classification accuracy.