6 resultados para statistical learning
em Bulgarian Digital Mathematics Library at IMI-BAS
Resumo:
When Recurrent Neural Networks (RNN) are going to be used as Pattern Recognition systems, the problem to be considered is how to impose prescribed prototype vectors ξ^1,ξ^2,...,ξ^p as fixed points. The synaptic matrix W should be interpreted as a sort of sign correlation matrix of the prototypes, In the classical approach. The weak point in this approach, comes from the fact that it does not have the appropriate tools to deal efficiently with the correlation between the state vectors and the prototype vectors The capacity of the net is very poor because one can only know if one given vector is adequately correlated with the prototypes or not and we are not able to know what its exact correlation degree. The interest of our approach lies precisely in the fact that it provides these tools. In this paper, a geometrical vision of the dynamic of states is explained. A fixed point is viewed as a point in the Euclidean plane R2. The retrieving procedure is analyzed trough statistical frequency distribution of the prototypes. The capacity of the net is improved and the spurious states are reduced. In order to clarify and corroborate the theoretical results, together with the formal theory, an application is presented
Resumo:
In the current paper we firstly give a short introduction on e-learning platforms and review the case of the e-class open e-learning platform being used by the Greek tertiary education sector. Our analysis includes strategic selection issues and outcomes in general and operational and adoption issues in the case of the Technological Educational Institute (TEI) of Larissa, Greece. The methodology is being based on qualitative analysis of interviews with key actors using the platform, and statistical analysis of quantitative data related to adoption and usage in the relevant populations. The author has been a key actor in all stages and describes his insights as an early adopter, diffuser and innovative user. We try to explain the issues under consideration using existing past research outcomes and we also arrive to some conclusions and points for further research.
Resumo:
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham’s razor non-plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.
Resumo:
This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.
Resumo:
The purpose of the work is to claim that engineers can be motivated to study statistical concepts by using the applications in their experience connected with Statistical ideas. The main idea is to choose a data from the manufacturing factility (for example, output from CMM machine) and explain that even if the parts used do not meet exact specifications they are used in production. By graphing the data one can show that the error is random but follows a distribution, that is, there is regularily in the data in statistical sense. As the error distribution is continuous, we advocate that the concept of randomness be introducted starting with continuous random variables with probabilities connected with areas under the density. The discrete random variables are then introduced in terms of decision connected with size of the errors before generalizing to abstract concept of probability. Using software, they can then be motivated to study statistical analysis of the data they encounter and the use of this analysis to make engineering and management decisions.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014