9 resultados para Lanczos, Linear systems, Generalized cross validation

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays problem of solving sparse linear systems over the field GF(2) remain as a challenge. The popular approach is to improve existing methods such as the block Lanczos method (the Montgomery method) and the Wiedemann-Coppersmith method. Both these methods are considered in the thesis in details: there are their modifications and computational estimation for each process. It demonstrates the most complicated parts of these methods and gives the idea how to improve computations in software point of view. The research provides the implementation of accelerated binary matrix operations computer library which helps to make the progress steps in the Montgomery and in the Wiedemann-Coppersmith methods faster.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mobile malwares are increasing with the growing number of Mobile users. Mobile malwares can perform several operations which lead to cybersecurity threats such as, stealing financial or personal information, installing malicious applications, sending premium SMS, creating backdoors, keylogging and crypto-ransomware attacks. Knowing the fact that there are many illegitimate Applications available on the App stores, most of the mobile users remain careless about the security of their Mobile devices and become the potential victim of these threats. Previous studies have shown that not every antivirus is capable of detecting all the threats; due to the fact that Mobile malwares use advance techniques to avoid detection. A Network-based IDS at the operator side will bring an extra layer of security to the subscribers and can detect many advanced threats by analyzing their traffic patterns. Machine Learning(ML) will provide the ability to these systems to detect unknown threats for which signatures are not yet known. This research is focused on the evaluation of Machine Learning classifiers in Network-based Intrusion detection systems for Mobile Networks. In this study, different techniques of Network-based intrusion detection with their advantages, disadvantages and state of the art in Hybrid solutions are discussed. Finally, a ML based NIDS is proposed which will work as a subsystem, to Network-based IDS deployed by Mobile Operators, that can help in detecting unknown threats and reducing false positives. In this research, several ML classifiers were implemented and evaluated. This study is focused on Android-based malwares, as Android is the most popular OS among users, hence most targeted by cyber criminals. Supervised ML algorithms based classifiers were built using the dataset which contained the labeled instances of relevant features. These features were extracted from the traffic generated by samples of several malware families and benign applications. These classifiers were able to detect malicious traffic patterns with the TPR upto 99.6% during Cross-validation test. Also, several experiments were conducted to detect unknown malware traffic and to detect false positives. These classifiers were able to detect unknown threats with the Accuracy of 97.5%. These classifiers could be integrated with current NIDS', which use signatures, statistical or knowledge-based techniques to detect malicious traffic. Technique to integrate the output from ML classifier with traditional NIDS is discussed and proposed for future work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The pulp and paper industry is currently facing broad structural changes due to global shifts in demand and supply. These changes have significant impacts on national economies worldwide. Planted forests (especially eucalyptus) and recovered paper have quickly increased their importance as raw material for paper and paperboard production. Although advances in information and communication technologies could reduce the demand for communication papers, and the growth of paper consumption has indeed flattened in developed economies, particularly in North America and Western Europe, the consumption is increasing on a global scale. Moreover, the focal point of production and consumption is moving from the Western world to the rapidly growing markets of Southeast Asia. This study analyzes how the so-called megatrends (globalization, technological development, and increasing environmental awareness) affect the pulp and paper industry’s external environment, and seeks reliable ways to incorporate the impact of the megatrends on the models concerning the demand, trade, and use of paper and pulp. The study expands current research in several directions and points of view, for example, by applying and incorporating several quantitative methods and different models. As a result, the thesis makes a significant contribution to better understand and measure the impacts of structural changes on the pulp and paper industry. It also provides some managerial and policy implications.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A rotating machine usually consists of a rotor and bearings that supports it. The nonidealities in these components may excite vibration of the rotating system. The uncontrolled vibrations may lead to excessive wearing of the components of the rotating machine or reduce the process quality. Vibrations may be harmful even when amplitudes are seemingly low, as is usually the case in superharmonic vibration that takes place below the first critical speed of the rotating machine. Superharmonic vibration is excited when the rotational velocity of the machine is a fraction of the natural frequency of the system. In such a situation, a part of the machine’s rotational energy is transformed into vibration energy. The amount of vibration energy should be minimised in the design of rotating machines. The superharmonic vibration phenomena can be studied by analysing the coupled rotor-bearing system employing a multibody simulation approach. This research is focused on the modelling of hydrodynamic journal bearings and rotorbearing systems supported by journal bearings. In particular, the non-idealities affecting the rotor-bearing system and their effect on the superharmonic vibration of the rotating system are analysed. A comparison of computationally efficient journal bearing models is carried out in order to validate one model for further development. The selected bearing model is improved in order to take the waviness of the shaft journal into account. The improved model is implemented and analyzed in a multibody simulation code. A rotor-bearing system that consists of a flexible tube roll, two journal bearings and a supporting structure is analysed employing the multibody simulation technique. The modelled non-idealities are the shell thickness variation in the tube roll and the waviness of the shaft journal in the bearing assembly. Both modelled non-idealities may cause subharmonic resonance in the system. In multibody simulation, the coupled effect of the non-idealities can be captured in the analysis. Additionally one non-ideality is presented that does not excite the vibrations itself but affects the response of the rotorbearing system, namely the waviness of the bearing bushing which is the non-rotating part of the bearing system. The modelled system is verified with measurements performed on a test rig. In the measurements the waviness of bearing bushing was not measured and therefore it’s affect on the response was not verified. In conclusion, the selected modelling approach is an appropriate method when analysing the response of the rotor-bearing system. When comparing the simulated results to the measured ones, the overall agreement between the results is concluded to be good.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation describes a networking approach to infinite-dimensional systems theory, where there is a minimal distinction between inputs and outputs. We introduce and study two closely related classes of systems, namely the state/signal systems and the port-Hamiltonian systems, and describe how they relate to each other. Some basic theory for these two classes of systems and the interconnections of such systems is provided. The main emphasis lies on passive and conservative systems, and the theoretical concepts are illustrated using the example of a lossless transfer line. Much remains to be done in this field and we point to some directions for future studies as well.