845 resultados para Sign Data LMS algorithm.
Resumo:
This paper considers the problem of concept generalization in decision-making systems where such features of real-world databases as large size, incompleteness and inconsistence of the stored information are taken into account. The methods of the rough set theory (like lower and upper approximations, positive regions and reducts) are used for the solving of this problem. The new discretization algorithm of the continuous attributes is proposed. It essentially increases an overall performance of generalization algorithms and can be applied to processing of real value attributes in large data tables. Also the search algorithm of the significant attributes combined with a stage of discretization is developed. It allows avoiding splitting of continuous domains of insignificant attributes into intervals.
Resumo:
Data processing services for Meteosat geostationary satellite are presented. Implemented services correspond to the different levels of remote-sensing data processing, including noise reduction at preprocessing level, cloud mask extraction at low-level and fractal dimension estimation at high-level. Cloud mask obtained as a result of Markovian segmentation of infrared data. To overcome high computation complexity of Markovian segmentation parallel algorithm is developed. Fractal dimension of Meteosat data estimated using fractional Brownian motion models.
Resumo:
The problem of finding the optimal join ordering executing a query to a relational database management system is a combinatorial optimization problem, which makes deterministic exhaustive solution search unacceptable for queries with a great number of joined relations. In this work an adaptive genetic algorithm with dynamic population size is proposed for optimizing large join queries. The performance of the algorithm is compared with that of several classical non-deterministic optimization algorithms. Experiments have been performed optimizing several random queries against a randomly generated data dictionary. The proposed adaptive genetic algorithm with probabilistic selection operator outperforms in a number of test runs the canonical genetic algorithm with Elitist selection as well as two common random search strategies and proves to be a viable alternative to existing non-deterministic optimization approaches.
Resumo:
Constant increase of human population result in more and more people living in emergency dangerous regions. In order to protect them from possible emergencies we need effective solution for decision taking in case of emergencies, because lack of time for taking decision and possible lack of data. One among possible methods of taking such decisions is shown in this article.
Resumo:
A major drawback of artificial neural networks is their black-box character. Therefore, the rule extraction algorithm is becoming more and more important in explaining the extracted rules from the neural networks. In this paper, we use a method that can be used for symbolic knowledge extraction from neural networks, once they have been trained with desired function. The basis of this method is the weights of the neural network trained. This method allows knowledge extraction from neural networks with continuous inputs and output as well as rule extraction. An example of the application is showed. This example is based on the extraction of average load demand of a power plant.
Resumo:
In the paper learning algorithm for adjusting weight coefficients of the Cascade Neo-Fuzzy Neural Network (CNFNN) in sequential mode is introduced. Concerned architecture has the similar structure with the Cascade-Correlation Learning Architecture proposed by S.E. Fahlman and C. Lebiere, but differs from it in type of artificial neurons. CNFNN consists of neo-fuzzy neurons, which can be adjusted using high-speed linear learning procedures. Proposed CNFNN is characterized by high learning rate, low size of learning sample and its operations can be described by fuzzy linguistic “if-then” rules providing “transparency” of received results, as compared with conventional neural networks. Using of online learning algorithm allows to process input data sequentially in real time mode.
Resumo:
AMS Subj. Classification: 62P10, 62H30, 68T01
Resumo:
Report published in the Proceedings of the National Conference on "Education in the Information Society", Plovdiv, May, 2013
Resumo:
2000 Mathematics Subject Classification: 91E45.
Resumo:
The correlated probit model is frequently used for multiple ordered data since it allows to incorporate seamlessly different correlation structures. The estimation of the probit model parameters based on direct maximization of the limited information maximum likelihood is a numerically intensive procedure. We propose an extension of the EM algorithm for obtaining maximum likelihood estimates for a correlated probit model for multiple ordinal outcomes. The algorithm is implemented in the free software environment for statistical computing and graphics R. We present two simulation studies to examine the performance of the developed algorithm. We apply the model to data on 121 women with cervical or endometrial cancer. Patients developed normal tissue reactions as a result of post-operative external beam pelvic radiotherapy. In this work we focused on modeling the effects of a genetic factor on early skin and early urogenital tissue reactions and on assessing the strength of association between the two types of reactions. We established that there was an association between skin reactions and polymorphism XRCC3 codon 241 (C>T) (rs861539) and that skin and urogenital reactions were positively correlated. ACM Computing Classification System (1998): G.3.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014
Resumo:
This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.
Resumo:
Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.
Resumo:
We develop a simplified implementation of the Hoshen-Kopelman cluster counting algorithm adapted for honeycomb networks. In our implementation of the algorithm we assume that all nodes in the network are occupied and links between nodes can be intact or broken. The algorithm counts how many clusters there are in the network and determines which nodes belong to each cluster. The network information is stored into two sets of data. The first one is related to the connectivity of the nodes and the second one to the state of links. The algorithm finds all clusters in only one scan across the network and thereafter cluster relabeling operates on a vector whose size is much smaller than the size of the network. Counting the number of clusters of each size, the algorithm determines the cluster size probability distribution from which the mean cluster size parameter can be estimated. Although our implementation of the Hoshen-Kopelman algorithm works only for networks with a honeycomb (hexagonal) structure, it can be easily changed to be applied for networks with arbitrary connectivity between the nodes (triangular, square, etc.). The proposed adaptation of the Hoshen-Kopelman cluster counting algorithm is applied to studying the thermal degradation of a graphene-like honeycomb membrane by means of Molecular Dynamics simulation with a Langevin thermostat. ACM Computing Classification System (1998): F.2.2, I.5.3.
Resumo:
Sequential pattern mining is an important subject in data mining with broad applications in many different areas. However, previous sequential mining algorithms mostly aimed to calculate the number of occurrences (the support) without regard to the degree of importance of different data items. In this paper, we propose to explore the search space of subsequences with normalized weights. We are not only interested in the number of occurrences of the sequences (supports of sequences), but also concerned about importance of sequences (weights). When generating subsequence candidates we use both the support and the weight of the candidates while maintaining the downward closure property of these patterns which allows to accelerate the process of candidate generation.