87 resultados para Classification approach


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering of multivariate data is a commonly used technique in ecology, and many approaches to clustering are available. The results from a clustering algorithm are uncertain, but few clustering approaches explicitly acknowledge this uncertainty. One exception is Bayesian mixture modelling, which treats all results probabilistically, and allows comparison of multiple plausible classifications of the same data set. We used this method, implemented in the AutoClass program, to classify catchments (watersheds) in the Murray Darling Basin (MDB), Australia, based on their physiographic characteristics (e.g. slope, rainfall, lithology). The most likely classification found nine classes of catchments. Members of each class were aggregated geographically within the MDB. Rainfall and slope were the two most important variables that defined classes. The second-most likely classification was very similar to the first, but had one fewer class. Increasing the nominal uncertainty of continuous data resulted in a most likely classification with five classes, which were again aggregated geographically. Membership probabilities suggested that a small number of cases could be members of either of two classes. Such cases were located on the edges of groups of catchments that belonged to one class, with a group belonging to the second-most likely class adjacent. A comparison of the Bayesian approach to a distance-based deterministic method showed that the Bayesian mixture model produced solutions that were more spatially cohesive and intuitively appealing. The probabilistic presentation of results from the Bayesian classification allows richer interpretation, including decisions on how to treat cases that are intermediate between two or more classes, and whether to consider more than one classification. The explicit consideration and presentation of uncertainty makes this approach useful for ecological investigations, where both data and expectations are often highly uncertain.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Microarray data classification is one of the most important emerging clinical applications in the medical community. Machine learning algorithms are most frequently used to complete this task. We selected one of the state-of-the-art kernel-based algorithms, the support vector machine (SVM), to classify microarray data. As a large number of kernels are available, a significant research question is what is the best kernel for patient diagnosis based on microarray data classification using SVM? We first suggest three solutions based on data visualization and quantitative measures. Different types of microarray problems then test the proposed solutions. Finally, we found that the rule-based approach is most useful for automatic kernel selection for SVM to classify microarray data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel approach to Episodic Associative Memory (EAM), known as Episodic Associative Memory with a Neighborhood Effect (EAMwNE) is presented in this paper. It overcomes the representation limitations of existing episodic memory models and increases the potential for their use in practical application.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Appropriate choice of a kernel is the most important ingredient of the kernel-based learning methods such as support vector machine (SVM). Automatic kernel selection is a key issue given the number of kernels available, and the current trial-and-error nature of selecting the best kernel for a given problem. This paper introduces a new method for automatic kernel selection, with empirical results based on classification. The empirical study has been conducted among five kernels with 112 different classification problems, using the popular kernel based statistical learning algorithm SVM. We evaluate the kernels’ performance in terms of accuracy measures. We then focus on answering the question: which kernel is best suited to which type of classification problem? Our meta-learning methodology involves measuring the problem characteristics using classical, distance and distribution-based statistical information. We then combine these measures with the empirical results to present a rule-based method to select the most appropriate kernel for a classification problem. The rules are generated by the decision tree algorithm C5.0 and are evaluated with 10 fold cross validation. All generated rules offer high accuracy ratings.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Researchers worldwide have been actively seeking for the most robust and powerful solutions to detect and classify key events (or highlights) in various sports domains. Most approaches have employed manual heuristics that model the typical pattern of audio-visual features within particular sport events To avoid manual observation and knowledge, machine-learning can be used as an alternative approach. To bridge the gaps between these two alternatives, an attempt is made to integrate statistics into heuristic models during highlight detection in our investigation. The models can be designed with a modest amount of domain-knowledge, making them less subjective and more robust for different sports. We have also successfully used a universal scope of detection and a standard set of features that can be applied for different sports that include soccer, basketball and Australian football. An experiment on a large dataset of sport videos, with a total of around 15 hours, has demonstrated the effectiveness and robustness of our
aIlgorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A major challenge facing freshwater ecologists and managers is the development of models that link stream ecological condition to catchment scale effects, such as land use. Previous attempts to make such models have followed two general approaches. The bottom-up approach employs mechanistic models, which can quickly become too complex to be useful. The top-down approach employs empirical models derived from large data sets, and has often suffered from large amounts of unexplained variation in stream condition.

We believe that the lack of success of both modelling approaches may be at least partly explained by scientists considering too wide a breadth of catchment type. Thus, we believe that by stratifying large sets of catchments into groups of similar types prior to modelling, both types of models may be improved. This paper describes preliminary work using a Bayesian classification software package, ‘Autoclass’ (Cheeseman and Stutz 1996) to create classes of catchments within the Murray Darling Basin based on physiographic data.

Autoclass uses a model-based classification method that employs finite mixture modelling and trades off model fit versus complexity, leading to a parsimonious solution. The software provides information on the posterior probability that the classification is ‘correct’ and also probabilities for alternative classifications. The importance of each attribute in defining the individual classes is calculated and presented, assisting description of the classes. Each case is ‘assigned’ to a class based on membership probability, but the probability of membership of other classes is also provided. This feature deals very well with cases that do not fit neatly into a larger class. Lastly, Autoclass requires the user to specify the measurement error of continuous variables.

Catchments were derived from the Australian digital elevation model. Physiographic data werederived from national spatial data sets. There was very little information on measurement errors for the spatial data, and so a conservative error of 5% of data range was adopted for all continuous attributes. The incorporation of uncertainty into spatial data sets remains a research challenge.

The results of the classification were very encouraging. The software found nine classes of catchments in the Murray Darling Basin. The classes grouped together geographically, and followed altitude and latitude gradients, despite the fact that these variables were not included in the classification. Descriptions of the classes reveal very different physiographic environments, ranging from dry and flat catchments (i.e. lowlands), through to wet and hilly catchments (i.e. mountainous areas). Rainfall and slope were two important discriminators between classes. These two attributes, in particular, will affect the ways in which the stream interacts with the catchment, and can thus be expected to modify the effects of land use change on ecological condition. Thus, realistic models of the effects of land use change on streams would differ between the different types of catchments, and sound management practices will differ.

A small number of catchments were assigned to their primary class with relatively low probability. These catchments lie on the boundaries of groups of catchments, with the second most likely class being an adjacent group. The locations of these ‘uncertain’ catchments show that the Bayesian classification dealt well with cases that do not fit neatly into larger classes.

Although the results are intuitive, we cannot yet assess whether the classifications described in this paper would assist the modelling of catchment scale effects on stream ecological condition. It is most likely that catchment classification and modelling will be an iterative process, where the needs of the model are used to guide classification, and the results of classifications used to suggest further refinements to models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research examined the corporate branding approaches and strategies adopted by six prominent Australian arts and cultural organisations. The aim of this exploration was to identify patterns in branding across different arts and cultural organisations, and attempt to provide an initial classification for understanding how these organisations approach branding strategy. We found that three factors influenced branding strategy in the surveyed organisations, viz., the focus of branding process, the degree of consistency in branding communication, and the required level of customers’ involvement in the branded products. The organisations studied were then plotted on a continuum that considered each of these factors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a new technique in the investigation of object classification and illustrates the potential use of this technique for the analysis of a range of biological data, using avian morphometric data as an example. The nascent variable precision rough sets (VPRS) model is introduced and compared with the decision tree method ID3 (through a ‘leave n out’ approach), using the same dataset of morphometric measures of European barn swallows (Hirundo rustica) and assessing the accuracy of gender classification based on these measures. The results demonstrate that the VPRS model, allied with the use of a modern method of discretization of data, is comparable with the more traditional non-parametric ID3 decision tree method. We show that, particularly in small samples, the VPRS model can improve classification and to a lesser extent prediction aspects over ID3. Furthermore, through the ‘leave n out’ approach, some indication can be produced of the relative importance of the different morphometric measures used in this problem. In this case we suggest that VPRS has advantages over ID3, as it intelligently uses more of the morphometric data available for the data classification, whilst placing less emphasis on variables with low reliability. In biological terms, the results suggest that the gender of swallows can be determined with reasonable accuracy from morphometric data and highlight the most important variables in this process. We suggest that both analysis techniques are potentially useful for the analysis of a range of different types of biological datasets, and that VPRS in particular has potential for application to a range of biological circumstances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Regardless of the technical procedure used in signalling corporate collapse, the bottom line rests on the predictive power of the corresponding statistical model. In that regard, it is imperative to empirically test the model using a data sample of both collapsed and non-collapsed companies. A superior model is one that successfully classifies collapsed and non-collapsed companies in their respective categories with a high degree of accuracy. Empirical studies of this nature have thus far done one of two things. (1) Some have classified companies based on a specific statistical modelling process. (2) Some have classified companies based on two (sometimes – but rarely – more than two) independent statistical modelling processes for the purposes of comparing one with the other. In the latter case, the mindset of the researchers has been – invariably – to pitch one procedure against the other. This paper raises the question, why pitch one statistical process against another; why not make the two procedures work together? As such, this paper puts forward an innovative dual-classification scheme for signalling corporate collapse: dual in the sense that it relies on two statistical procedures concurrently. Using a data sample of Australian publicly listed companies, the proposed scheme is tested against the traditional approach taken thus far in the pertinent literature. The results demonstrate that the proposed dual-classification scheme signals collapse with a higher degree of accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automated classification of lung nodules is challenging because of the variation in shape and size of lung nodules, as well as their associated differences in their images. Ensemble based learners have demonstrated the potentialof good performance. Random forests are employed for pulmonary nodule classification where each tree in the forest produces a classification decision, and an integrated output is calculated. A classification aided by clustering approach is proposed to improve the lung nodule classification performance. Three experiments are performed using the LIDC lung image database of 32 cases. The classification performance and execution times are presented and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It has been an important and challenging task to classify and evaluate the contents in wool blends. Quantitative characterisation of animal fibre scale patterns has attracted considerable attention, since it is the major evidence for identification and subsequent classification purpose. Although techniques such as imaging processing and linear demarcation functions have been used to identify unknown fibre type with some success, a more comprehensive approach is required to perform this task. In this paper, a new approach is presented, which employs non-linear demarcation functions by using an artificial neural network (ANN). Based on scale pattern features extracted by using image processing techniques the artificial neural network (ANN) model is to classify mohair and merino fibres. It is observed that the techniques developed in this work are very effective and have the potential to be applied to other animal fibres.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, the Internet email has become one of the primary method of communication used by everyone for the exchange of ideas and information. However, in recent years, along with the rapid growth of the Internet and email, there has been a dramatic growth in spam. Classifications algorithms have been successfully used to filter spam, but with a certain amount of false positive trade-offs. This problem is mainly caused by the dynamic nature of spam content, spam delivery strategies, as well as the diversification of the classification algorithms. This paper presents an approach of email classification to overcome the burden of analyzing technique of GL (grey list) analyser as further refinements of our previous multi-classifier based email classification [10]. In this approach, we introduce a “majority voting grey list (MVGL)” analyzing technique with two different variations which will analyze only the product of GL emails. Our empirical evidence proofs the improvements of this approach, in terms of complexity and cost, compared to existing GL analyser. This approach also overcomes the limitation of human interaction of existing analyzing technique.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a novel multi-label classification framework for domains with large numbers of labels. Automatic image annotation is such a domain, as the available semantic concepts are typically hundreds. The proposed framework comprises an initial clustering phase that breaks the original training set into several disjoint clusters of data. It then trains a multi-label classifier from the data of each cluster. Given a new test instance, the framework first finds the nearest cluster and then applies the corresponding model. Empirical results using two clustering algorithms, four multi-label classification algorithms and three image annotation data sets suggest that the proposed approach can improve the performance and reduce the training time of standard multi-label classification algorithms, particularly in the case of large number of labels.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Feature selection is an important technique in dealing with application problems with large number of variables and limited training samples, such as image processing, combinatorial chemistry, and microarray analysis. Commonly employed feature selection strategies can be divided into filter and wrapper. In this study, we propose an embedded two-layer feature selection approach to combining the advantages of filter and wrapper algorithms while avoiding their drawbacks. The hybrid algorithm, called GAEF (Genetic Algorithm with embedded filter), divides the feature selection process into two stages. In the first stage, Genetic Algorithm (GA) is employed to pre-select features while in the second stage a filter selector is used to further identify a small feature subset for accurate sample classification. Three benchmark microarray datasets are used to evaluate the proposed algorithm. The experimental results suggest that this embedded two-layer feature selection strategy is able to improve the stability of the selection results as well as the sample classification accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents novel vehicle detection and classification method by measuring and processing magnetic signal based on single micro-electro- mechanical system (MEMS) magnetic sensor. When a vehicle moves over the ground, it generates a succession of impacts on the earth's magnetic field, which can be detected by single magnetic sensor. The magnetic signal measured by the magnetic sensor is related to the moving direction and the type of the vehicle. Generally, the recognition rate using single sensor detector is not high. In order to improve the recognition rate, a novel feature extraction algorithm and a novel vehicle classification and recognition algorithm are presented. The concavity and convexity areas, and the angles of concave and convex parts of the waveform are extracted. An improved support vector machine (ISVM) classifier is developed to perform vehicle classification and recognition. The effectiveness of the proposed approach is verified by outdoor experiments.