950 resultados para class imbalance problems


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A numerical procedure, based on the parametric differentiation and implicit finite difference scheme, has been developed for a class of problems in the boundary-layer theory for saddle-point regions. Here, the results are presented for the case of a three-dimensional stagnation-point flow with massive blowing. The method compares very well with other methods for particular cases (zero or small mass blowing). Results emphasize that the present numerical procedure is well suited for the solution of saddle-point flows with massive blowing, which could not be solved by other methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, the performance of semi-supervised learning has been theoretically investigated. However, most of this theoretical development has focussed on binary classification problems. In this paper, we take it a step further by extending the work of Castelli and Cover [1] [2] to the multi-class paradigm. Particularly, we consider the key problem in semi-supervised learning of classifying an unseen instance x into one of K different classes, using a training dataset sampled from a mixture density distribution and composed of l labelled records and u unlabelled examples. Even under the assumption of identifiability of the mixture and having infinite unlabelled examples, labelled records are needed to determine the K decision regions. Therefore, in this paper, we first investigate the minimum number of labelled examples needed to accomplish that task. Then, we propose an optimal multi-class learning algorithm which is a generalisation of the optimal procedure proposed in the literature for binary problems. Finally, we make use of this generalisation to study the probability of error when the binary class constraint is relaxed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This contribution proposes a powerful technique for two-class imbalanced classification problems by combining the synthetic minority over-sampling technique (SMOTE) and the particle swarm optimisation (PSO) aided radial basis function (RBF) classifier. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier's structure and the parameters of RBF kernels are determined using a PSO algorithm based on the criterion of minimising the leave-one-out misclassification rate. The experimental results obtained on a simulated imbalanced data set and three real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of learning from imbalanced data is of critical importance in a large number of application domains and can be a bottleneck in the performance of various conventional learning methods that assume the data distribution to be balanced. The class imbalance problem corresponds to dealing with the situation where one class massively outnumbers the other. The imbalance between majority and minority would lead machine learning to be biased and produce unreliable outcomes if the imbalanced data is used directly. There has been increasing interest in this research area and a number of algorithms have been developed. However, independent evaluation of the algorithms is limited. This paper aims at evaluating the performance of five representative data sampling methods namely SMOTE, ADASYN, BorderlineSMOTE, SMOTETomek and RUSBoost that deal with class imbalance problems. A comparative study is conducted and the performance of each method is critically analysed in terms of assessment metrics. © 2013 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large class of scattering problems of surface water waves by vertical barriers lead to mixed boundary value problems for Laplace equation. Specific attentions are paid, in the present article, to highlight an analytical method to handle this class of problems of surface water wave scattering, when the barriers in question are non-reflecting in nature. A new set of boundary conditions is proposed for such non-reflecting barriers and tile resulting boundary value problems are handled in the linearized theory of water waves. Three basic poblems of scattering by vertical barriers are solved. The present new theory of non-reflecting vertical barriers predict new transmission coefficients and tile solutions of tile mathematical problems turn out to be extremely simple and straight forward as compared to the solution for other types of barriers handled previously.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

One of our most pressing needs in creating a more sustainable world is the explicit development of holistic policy. This is becoming increasingly apparent as we are faced with more and more ‘wicked problems', the most difficult class of problems that we can conceptualize. Such problems consist of ‘clusters’ of problems, and include socio-political and moral-spiritual issues. This paper articulates a methodology that can be applied to the analysis and design of underlying organizational structures and processes that will consistently and effectively address wicked problems while being consistent with the advocated 'learning by doing' approach to change management and policy making. This transdisciplinary methodology—known as the institutionalist policymaking framework—has been developed from the perspective of institutional economics synthesized with perspectives from ecological economics and system dynamics. In particular it draws on the work first presented in Hayden’s 1993 paper ‘Institutionalist Policymaking’—and further developed in his 2006 book, at the heart of which lies the SFM—and the applicability of this approach in tackling complex and wicked problems.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A systematic approach is developed for scaling analysis of momentum, heat and species conservation equations pertaining to the case of solidification of a binary mixture. The problem formulation and description of boundary conditions are kept fairly general, so that a large class of problems can be addressed. Analysis of the momentum equations coupled with phase change considerations leads to the establishment of an advection velocity scale. Analysis of the energy equation leads to an estimation of the solid layer thickness. Different regimes corresponding to different dominant modes of transport are simultaneously identified. A comparative study involving several cases of possible thermal boundary conditions is also performed. Finally, a scaling analysis of the species conservation equation is carried out, revealing the effect of a non-equilibrium solidification model on solute segregation and species distribution. It is shown that non-equilibrium effects result in an enhanced macrosegregation compared with the case of an equilibrium model. For the sake of assessment of the scaling analysis, the predictions are validated against corresponding computational results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This document aims to describe an update of the implementation of the J48Consolidated class within WEKA platform. The J48Consolidated class implements the CTC algorithm [2][3] which builds a unique decision tree based on a set of samples. The J48Consolidated class extends WEKA’s J48 class which implements the well-known C4.5 algorithm. This implementation was described in the technical report "J48Consolidated: An implementation of CTC algorithm for WEKA". The main, but not only, change in this update is the integration of the notion of coverage in order to determine the number of samples to be generated to build a consolidated tree. We define coverage as the percentage of examples of the training sample present in –or covered by– the set of generated subsamples. So, depending on the type of samples that we use, we will need more or less samples in order to achieve a specific value of coverage.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model's generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Logistics involves planning, managing, and organizing the flows of goods from the point of origin to the point of destination in order to meet some requirements. Logistics and transportation aspects are very important and represent a relevant costs for producing and shipping companies, but also for public administration and private citizens. The optimization of resources and the improvement in the organization of operations is crucial for all branches of logistics, from the operation management to the transportation. As we will have the chance to see in this work, optimization techniques, models, and algorithms represent important methods to solve the always new and more complex problems arising in different segments of logistics. Many operation management and transportation problems are related to the optimization class of problems called Vehicle Routing Problems (VRPs). In this work, we consider several real-world deterministic and stochastic problems that are included in the wide class of the VRPs, and we solve them by means of exact and heuristic methods. We treat three classes of real-world routing and logistics problems. We deal with one of the most important tactical problems that arises in the managing of the bike sharing systems, that is the Bike sharing Rebalancing Problem (BRP). We propose models and algorithms for real-world earthwork optimization problems. We describe the 3DP process and we highlight several optimization issues in 3DP. Among those, we define the problem related to the tool path definition in the 3DP process, the 3D Routing Problem (3DRP), which is a generalization of the arc routing problem. We present an ILP model and several heuristic algorithms to solve the 3DRP.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis articulates a methodology that can be applied to the analysis and design of underlying organisational structures and processes that will consistently and effectively address ‘wicked problems’ (the most difficult class of problems that we can conceptualise: problems which consist of ‘clusters’ of problems; problems within these clusters cannot be solved in isolation from one another, and include sociopolitical and moral-spiritual issues (Rittel and Webber 1973)) in forestry. This transdisciplinary methodology has been developed from the perspective of institutional economics synthesised with perspectives from ecological economics and system dynamics. The institutionalist policymaking framework provides an approach for the explicit development of holistic policy. An illustrative application of this framework has been applied to the wicked problem of forestry in southern Tasmania as an example of the applicability of the approach in the Australian context. To date all attempts to seek solutions to that prevailing wicked problem set have relied on non-reflexive, partial and highly reductionist thinking. A formal assessment of prevailing governance and process arrangements applying to that particular forestry industry has been undertaken using the social fabric matrix. This methodology lies at the heart of the institutionalist policymaking framework, and allows for the systematic exploration of elaborately complex causal links and relationships, such as are present in southern Tasmania. Some possible attributes of an alternative approach to forest management that sustains ecological, social and economic values of forests have been articulated as indicative of the alternative policy and management outcomes that real-world application of this transdisciplinary, discursive and reflexive framework may crystallise. Substantive and lasting solutions to wicked problems need to be formed endogenously, that is, from within the system. The institutionalist policymaking framework is a vehicle through which this endogenous creation of solutions to wicked problems may be realised.