6 resultados para learning machine

em Bulgarian Digital Mathematics Library at IMI-BAS


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham’s razor non-plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The results of research the intelligence multimodal man-machine interface and virtual reality means for assistive medical systems including computers and mechatronic systems (robots) are discussed. The gesture translation for disability peoples, the learning-by-showing technology and virtual operating room with 3D visualization are presented in this report and were announced at International exhibition "Intelligent and Adaptive Robots–2005".

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Formal grammars can used for describing complex repeatable structures such as DNA sequences. In this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar. L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant development, and model the morphology of a variety of organisms. We believe that parallel grammars also can be used for modeling genetic mechanisms and sequences such as promoters. Promoters are short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the promoter recognition a complex problem. We replace the problem of promoter recognition by induction of context-free stochastic L-grammar rules, which are later used for the structural analysis of promoter sequences. L-grammar rules are derived automatically from the drosophila and vertebrate promoter datasets using a genetic programming technique and their fitness is evaluated using a Support Vector Machine (SVM) classifier. The artificial promoter sequences generated using the derived L- grammar rules are analyzed and compared with natural promoter sequences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The advances in building learning technology now have to emphasize on the aspect of the individual learning besides the popular focus on the technology per se. Unlike the common research where a great deal has been on finding ways to build, manage, classify, categorize and search knowledge on the server, there is an interest in our work to look at the knowledge development at the individual’s learning. We build the technology that resides behind the knowledge sharing platform where learning and sharing activities of an individual take place. The system that we built, KFTGA (Knowledge Flow Tracer and Growth Analyzer), demonstrates the capability of identifying the topics and subjects that an individual is engaged with during the knowledge sharing session and measuring the knowledge growth of the individual learning on a specific subject on a given time space.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of the work is to claim that engineers can be motivated to study statistical concepts by using the applications in their experience connected with Statistical ideas. The main idea is to choose a data from the manufacturing factility (for example, output from CMM machine) and explain that even if the parts used do not meet exact specifications they are used in production. By graphing the data one can show that the error is random but follows a distribution, that is, there is regularily in the data in statistical sense. As the error distribution is continuous, we advocate that the concept of randomness be introducted starting with continuous random variables with probabilities connected with areas under the density. The discrete random variables are then introduced in terms of decision connected with size of the errors before generalizing to abstract concept of probability. Using software, they can then be motivated to study statistical analysis of the data they encounter and the use of this analysis to make engineering and management decisions.