863 resultados para Parallel genetic algorithm
Resumo:
A workshop recently held at the Ecole Polytechnique Federale de Lausanne (EPFL, Switzerland) was dedicated to understanding the genetic basis of adaptive change, taking stock of the different approaches developed in theoretical population genetics and landscape genomics and bringing together knowledge accumulated in both research fields. Indeed, an important challenge in theoretical population genetics is to incorporate effects of demographic history and population structure. But important design problems (e.g. focus on populations as units, focus on hard selective sweeps, no hypothesis-based framework in the design of the statistical tests) reduce their capability of detecting adaptive genetic variation. In parallel, landscape genomics offers a solution to several of these problems and provides a number of advantages (e.g. fast computation, landscape heterogeneity integration). But the approach makes several implicit assumptions that should be carefully considered (e.g. selection has had enough time to create a functional relationship between the allele distribution and the environmental variable, or this functional relationship is assumed to be constant). To address the respective strengths and weaknesses mentioned above, the workshop brought together a panel of experts from both disciplines to present their work and discuss the relevance of combining these approaches, possibly resulting in a joint software solution in the future.
Resumo:
The objective of this work was to evaluate the efficiency of EST‑SSR markers in the assessment of the genetic diversity of rubber tree genotypes (Hevea brasiliensis) and to verify the transferability of these markers for wild species of Hevea. Forty‑five rubber tree accessions from the Instituto Agronômico (Campinas, SP, Brazil) and six wild species were used. Information provided by modified Roger's genetic distance were used to analyze EST‑SSR data. UPGMA clustering divided the samples into two major groups with high genetic differentiation, while the software Structure distributed the 51 clones into eight groups. A parallel could be established between both clustering analyses. The 30 polymorphic EST‑SSRs showed from two to ten alleles and were efficient in amplifying the six wild species. Functional EST‑SSR microsatellites are efficient in evaluating the genetic diversity among rubber tree clones and can be used to translate the genetic differences among cultivars and to fingerprint closely related materials. The accessions from the Instituto Agronômico show high genetic diversity. The EST‑SSR markers, developed from Hevea brasiliensis, show transferability and are able to amplify other species of Hevea.
Genetic diversity between improved banana diploids using canonical variables and the Ward-MLM method
Resumo:
The objective of this work was to estimate the genetic diversity of improved banana diploids using data from quantitative analysis and from simple sequence repeats (SSR) marker, simultaneously. The experiment was carried out with 33 diploids, in an augmented block design with 30 regular treatments and three common ones. Eighteen agronomic characteristics and 20 SSR primers were used. The agronomic characteristics and the SSR were analyzed simultaneously by the Ward-MLM, cluster, and IML procedures. The Ward clustering method considered the combined matrix obtained by the Gower algorithm. The Ward-MLM procedure identified three ideal groups (G1, G2, and G3) based on pseudo-F and pseudo-t² statistics. The dendrogram showed relative similarity between the G1 genotypes, justified by genealogy. In G2, 'Calcutta 4' appears in 62% of the genealogies. Similar behavior was observed in G3, in which the 028003-01 diploid is the male parent of the 086079-10 and 042079-06 genotypes. The method with canonical variables had greater discriminatory power than Ward-MLM. Although reduced, the genetic variability available is sufficient to be used in the development of new hybrids.
Resumo:
Summary Biodiversity is usually studied through species or genetic diversities. To date, these two levels of diversity have remained the independent .fields of investigations of community ecologists and population geneticists. However, recent joint analyses of species and genetic diversities have suggested that common processes may underlie the two levels. Positive correlations between species diversity and genetic diversity may arise when the effects of drift and migration overwhelm selective effects. The first goal of this thesis was to make a joint investigation of the patterns of species and genetic diversity in a community of freshwater gastropods living in a floodplain habitat. The second goal was to determine, as far as possible, the relative influences of the processes underlying the patterns observed at each level. In chapter 2 we investigate the relative influences of different evolutionary forces in shaping the genetic structure of Radix balthica populations. Results revealed that the structure inferred using quantitative traits was lower or equal to the one inferred using neutral molecular markers. Consequently, the pattern of structure observed could be only due to random drift, possibly to uniform selection, but definitely not to selection for local optima. In chapter 3, we analyze the temporal variation of species and genetic diversities in five localities. An extended period of drought occurred at the end of the study period leading to decay of both species and genetic diversities. This parallel loss of diversity following a natural perturbation highlighted the role sometimes predominant of random drift over selection on patterns of biodiversity in a floodplain habitat. In chapter 4, we compare the spatial genetic structures of two sympatric species: Radix balthica and Planorbis carinatus. We found that R. balthica populations are weakly structured and have moderate to high values of gene diversity. In contrast, P. carinatus populations are highly structured and poorly diverse. Then we measured correlations between various indices of species and genetic diversity using genetic data .from the two species. We found only one significant correlation: between species richness and gene diversity of P. carinatus. This result highlights the .need to use genetic date from more than one species to infer correlations between species and genetic diversities. Overall, this thesis provided new insights into the common processes underlying patterns of species and genetic diversity. Résumé La biodiversité est généralement étudiée au niveau de la diversité génétique ou spécifique. Ces deux niveaux sont restés jusqu'à maintenant les domaines d'investigation séparés des généticiens des populations et des écologistes des communautés. Cependant, des analyses conjointes des diversités génétique et spécifique ont récemment suggéré que des processus similaires pouvaient influencer ces deux niveaux. Des corrélations positives entre les diversités génétique et spécifique pourraient être dues aux effets de migration et de dérive qui dominent les effets sélectifs. Le premier but de cette thèse était de faire une étude conjointe des diversités génétique et spécifique dans une communauté de gastéropodes d'eau douce. Le second objectif était de déterminer les influences relatives des différents processus liés à chaque niveau de diversité. Dans le chapitre 2 nous cherchons à déterminer quelles forces évolutives influencent la structure génétique de quatre populations de Radix balthica. La structure mesurée sur des traits quantitatifs s'est révélée être plus faible ou égale à celle mesurée avec des marqueurs moléculaires neutres. La structure observée pourrait ainsi être due uniquement à la dérive génétique, potentiellement à la sélection uniforme, mais en aucun cas à la sélection locale pour différents optima. Dans le chapitre 3 nous analysons la variation temporelle des diversités génétique et spécifique dans cinq localités. Une récente période de sécheresse a causé une diminution parallèle des deux niveaux de diversité. Cette perturbation à mis en évidence le rôle parfois prépondérant de la dérive par rapport à celui de la sélection dans le déterminisme de la biodiversité dans un écosytème alluvial. Dans le chapitre 4, nous comparons la structure génétique spatiale de deux espèces vivant en sympatrie : Radix balthica et Planorbis carinatus. Les populations de R. balthica sont peu structurées et présentent un niveau de diversité relativement élevé alors que celles de P. carinatus sont fortement structurées et peu diversifiées. Nous avons ensuite mesuré différentes corrélations entre les diversités génétique et spécifique, mais la seule relation significative a été trouvée entre la richesse spécifique et la diversité génétique de P. carinatus. Ainsi, cette thèse a permis de découvrir de nouveaux aspects des processus qui influencent en parallèle la diversité aux niveaux génétique et spécifique.
Resumo:
Converging evidence favors an abnormal susceptibility to oxidative stress in schizophrenia. Decreased levels of glutathione (GSH), the major cellular antioxidant and redox regulator, was observed in cerebrospinal-fluid and prefrontal cortex of patients. Importantly, abnormal GSH synthesis of genetic origin was observed: Two case-control studies showed an association with a GAG trinucleotide repeat (TNR) polymorphism in the GSH key synthesizing enzyme glutamate-cysteine-ligase (GCL) catalytic subunit (GCLC) gene. The most common TNR genotype 7/7 was more frequent in controls, whereas the rarest TNR genotype 8/8 was three times more frequent in patients. The disease associated genotypes (35% of patients) correlated with decreased GCLC protein, GCL activity and GSH content. Similar GSH system anomalies were observed in early psychosis patients. Such redox dysregulation combined with environmental stressors at specific developmental stages could underlie structural and functional connectivity anomalies. In pharmacological and knock-out (KO) models, GSH deficit induces anomalies analogous to those reported in patients. (a) morphology: spine density and GABA-parvalbumine immunoreactivity (PV-I) were decreased in anterior cingulate cortex. KO mice showed delayed cortical PV-I at PD10. This effect is exacerbated in mice with increased DA from PD5-10. KO mice exhibit cortical impairment in myelin and perineuronal net known to modulate PV connectivity. (b) physiology: In cultured neurons, NMDA response are depressed by D2 activation. In hippocampus, NMDA-dependent synaptic plasticity is impaired and kainate induced g-oscillations are reduced in parallel to PV-I. (c) cognition: low GSH models show increased sensitivity to stress, hyperactivity, abnormal object recognition, olfactory integration and social behavior. In a clinical study, GSH precursor N-acetyl cysteine (NAC) as add on therapy, improves the negative symptoms and decreases the side effects of antipsychotics. In an auditory oddball paradigm, NAC improves the mismatched negativity, an evoked potential related to pre-attention and to NMDA receptors function. In summary, clinical and experimental evidence converge to demonstrate that a genetically induced dysregulation of GSH synthesis combined with environmental insults in early development represent a major risk factor contributing to the development of schizophrenia
Resumo:
The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.
Resumo:
Transmission of drug-resistant pathogens presents an almost-universal challenge for fighting infectious diseases. Transmitted drug resistance mutations (TDRM) can persist in the absence of drugs for considerable time. It is generally believed that differential TDRM-persistence is caused, at least partially, by variations in TDRM-fitness-costs. However, in vivo epidemiological evidence for the impact of fitness costs on TDRM-persistence is rare. Here, we studied the persistence of TDRM in HIV-1 using longitudinally-sampled nucleotide sequences from the Swiss-HIV-Cohort-Study (SHCS). All treatment-naïve individuals with TDRM at baseline were included. Persistence of TDRM was quantified via reversion rates (RR) determined with interval-censored survival models. Fitness costs of TDRM were estimated in the genetic background in which they occurred using a previously published and validated machine-learning algorithm (based on in vitro replicative capacities) and were included in the survival models as explanatory variables. In 857 sequential samples from 168 treatment-naïve patients, 17 TDRM were analyzed. RR varied substantially and ranged from 174.0/100-person-years;CI=[51.4, 588.8] (for 184V) to 2.7/100-person-years;[0.7, 10.9] (for 215D). RR increased significantly with fitness cost (increase by 1.6[1.3,2.0] per standard deviation of fitness costs). When subdividing fitness costs into the average fitness cost of a given mutation and the deviation from the average fitness cost of a mutation in a given genetic background, we found that both components were significantly associated with reversion-rates. Our results show that the substantial variations of TDRM persistence in the absence of drugs are associated with fitness-cost differences both among mutations and among different genetic backgrounds for the same mutation.
Resumo:
Given the cost constraints of the European health-care systems, criteria are needed to decide which genetic services to fund from the public budgets, if not all can be covered. To ensure that high-priority services are available equitably within and across the European countries, a shared set of prioritization criteria would be desirable. A decision process following the accountability for reasonableness framework was undertaken, including a multidisciplinary EuroGentest/PPPC-ESHG workshop to develop shared prioritization criteria. Resources are currently too limited to fund all the beneficial genetic testing services available in the next decade. Ethically and economically reflected prioritization criteria are needed. Prioritization should be based on considerations of medical benefit, health need and costs. Medical benefit includes evidence of benefit in terms of clinical benefit, benefit of information for important life decisions, benefit for other people apart from the person tested and the patient-specific likelihood of being affected by the condition tested for. It may be subject to a finite time window. Health need includes the severity of the condition tested for and its progression at the time of testing. Further discussion and better evidence is needed before clearly defined recommendations can be made or a prioritization algorithm proposed. To our knowledge, this is the first time a clinical society has initiated a decision process about health-care prioritization on a European level, following the principles of accountability for reasonableness. We provide points to consider to stimulate this debate across the EU and to serve as a reference for improving patient management.
Resumo:
The purpose of this research was to study the genetic diversity and genetic relatedness of 60 genotypes of grapevines derived from the Germplasm Bank of Embrapa Semiárido, Juazeiro, BA, Brazil. Seven previously characterized microsatellite markers were used: VVS2, VVMD5, VVMD7, VVMD27, VVMD3, ssrVrZAG79 and ssrVrZAG62. The expected heterozygosity (He) and polymorphic information content (PIC) were calculated, and the cluster analysis were processed to generate a dendrogram using the algorithm UPGMA. The He ranged from 81.8% to 88.1%, with a mean of 84.8%. The loci VrZAG79 and VVMD7 were the most informative, with a PIC of 87 and 86%, respectively, while VrZAG62 was the least informative, with a PIC value of 80%. Cluster analysis by UPGMA method allowed separation of the genotypes according to their genealogy and identification of possible parentage for the cultivars 'Dominga', 'Isaura', 'CG 26916', 'CG28467' and 'Roni Redi'.
Resumo:
We present an algorithm for the computation of reducible invariant tori of discrete dynamical systems that is suitable for tori of dimensions larger than 1. It is based on a quadratically convergent scheme that approximates, at the same time, the Fourier series of the torus, its Floquet transformation, and its Floquet matrix. The Floquet matrix describes the linearization of the dynamics around the torus and, hence, its linear stability. The algorithm presents a high degree of parallelism, and the computational effort grows linearly with the number of Fourier modes needed to represent the solution. For these reasons it is a very good option to compute quasi-periodic solutions with several basic frequencies. The paper includes some examples (flows) to show the efficiency of the method in a parallel computer. In these flows we compute invariant tori of dimensions up to 5, by taking suitable sections.
Resumo:
The maximum realizable power throughput of power electronic converters may be limited or constrained by technical or economical considerations. One solution to this problemis to connect several power converter units in parallel. The parallel connection can be used to increase the current carrying capacity of the overall system beyond the ratings of individual power converter units. Thus, it is possible to use several lower-power converter units, produced in large quantities, as building blocks to construct high-power converters in a modular manner. High-power converters realized by using parallel connection are needed for example in multimegawatt wind power generation systems. Parallel connection of power converter units is also required in emerging applications such as photovoltaic and fuel cell power conversion. The parallel operation of power converter units is not, however, problem free. This is because parallel-operating units are subject to overcurrent stresses, which are caused by unequal load current sharing or currents that flow between the units. Commonly, the term ’circulatingcurrent’ is used to describe both the unequal load current sharing and the currents flowing between the units. Circulating currents, again, are caused by component tolerances and asynchronous operation of the parallel units. Parallel-operating units are also subject to stresses caused by unequal thermal stress distribution. Both of these problemscan, nevertheless, be handled with a proper circulating current control. To design an effective circulating current control system, we need information about circulating current dynamics. The dynamics of the circulating currents can be investigated by developing appropriate mathematical models. In this dissertation, circulating current models aredeveloped for two different types of parallel two-level three-phase inverter configurations. Themodels, which are developed for an arbitrary number of parallel units, provide a framework for analyzing circulating current generation mechanisms and developing circulating current control systems. In addition to developing circulating current models, modulation of parallel inverters is considered. It is illustrated that depending on the parallel inverter configuration and the modulation method applied, common-mode circulating currents may be excited as a consequence of the differential-mode circulating current control. To prevent the common-mode circulating currents that are caused by the modulation, a dual modulator method is introduced. The dual modulator basically consists of two independently operating modulators, the outputs of which eventually constitute the switching commands of the inverter. The two independently operating modulators are referred to as primary and secondary modulators. In its intended usage, the same voltage vector is fed to the primary modulators of each parallel unit, and the inputs of the secondary modulators are obtained from the circulating current controllers. To ensure that voltage commands obtained from the circulating current controllers are realizable, it must be guaranteed that the inverter is not driven into saturation by the primary modulator. The inverter saturation can be prevented by limiting the inputs of the primary and secondary modulators. Because of this, also a limitation algorithm is proposed. The operation of both the proposed dual modulator and the limitation algorithm is verified experimentally.
Resumo:
Diplomityön tarkoituksena on optimoida asiakkaiden sähkölaskun laskeminen hajautetun laskennan avulla. Älykkäiden etäluettavien energiamittareiden tullessa jokaiseen kotitalouteen, energiayhtiöt velvoitetaan laskemaan asiakkaiden sähkölaskut tuntiperusteiseen mittaustietoon perustuen. Kasvava tiedonmäärä lisää myös tarvittavien laskutehtävien määrää. Työssä arvioidaan vaihtoehtoja hajautetun laskennan toteuttamiseksi ja luodaan tarkempi katsaus pilvilaskennan mahdollisuuksiin. Lisäksi ajettiin simulaatioita, joiden avulla arvioitiin rinnakkaislaskennan ja peräkkäislaskennan eroja. Sähkölaskujen oikeinlaskemisen tueksi kehitettiin mittauspuu-algoritmi.
Resumo:
To obtain the desirable accuracy of a robot, there are two techniques available. The first option would be to make the robot match the nominal mathematic model. In other words, the manufacturing and assembling tolerances of every part would be extremely tight so that all of the various parameters would match the “design” or “nominal” values as closely as possible. This method can satisfy most of the accuracy requirements, but the cost would increase dramatically as the accuracy requirement increases. Alternatively, a more cost-effective solution is to build a manipulator with relaxed manufacturing and assembling tolerances. By modifying the mathematical model in the controller, the actual errors of the robot can be compensated. This is the essence of robot calibration. Simply put, robot calibration is the process of defining an appropriate error model and then identifying the various parameter errors that make the error model match the robot as closely as possible. This work focuses on kinematic calibration of a 10 degree-of-freedom (DOF) redundant serial-parallel hybrid robot. The robot consists of a 4-DOF serial mechanism and a 6-DOF hexapod parallel manipulator. The redundant 4-DOF serial structure is used to enlarge workspace and the 6-DOF hexapod manipulator is used to provide high load capabilities and stiffness for the whole structure. The main objective of the study is to develop a suitable calibration method to improve the accuracy of the redundant serial-parallel hybrid robot. To this end, a Denavit–Hartenberg (DH) hybrid error model and a Product-of-Exponential (POE) error model are developed for error modeling of the proposed robot. Furthermore, two kinds of global optimization methods, i.e. the differential-evolution (DE) algorithm and the Markov Chain Monte Carlo (MCMC) algorithm, are employed to identify the parameter errors of the derived error model. A measurement method based on a 3-2-1 wire-based pose estimation system is proposed and implemented in a Solidworks environment to simulate the real experimental validations. Numerical simulations and Solidworks prototype-model validations are carried out on the hybrid robot to verify the effectiveness, accuracy and robustness of the calibration algorithms.
Resumo:
The dissertation proposes two control strategies, which include the trajectory planning and vibration suppression, for a kinematic redundant serial-parallel robot machine, with the aim of attaining the satisfactory machining performance. For a given prescribed trajectory of the robot's end-effector in the Cartesian space, a set of trajectories in the robot's joint space are generated based on the best stiffness performance of the robot along the prescribed trajectory. To construct the required system-wide analytical stiffness model for the serial-parallel robot machine, a variant of the virtual joint method (VJM) is proposed in the dissertation. The modified method is an evolution of Gosselin's lumped model that can account for the deformations of a flexible link in more directions. The effectiveness of this VJM variant is validated by comparing the computed stiffness results of a flexible link with the those of a matrix structural analysis (MSA) method. The comparison shows that the numerical results from both methods on an individual flexible beam are almost identical, which, in some sense, provides mutual validation. The most prominent advantage of the presented VJM variant compared with the MSA method is that it can be applied in a flexible structure system with complicated kinematics formed in terms of flexible serial links and joints. Moreover, by combining the VJM variant and the virtual work principle, a systemwide analytical stiffness model can be easily obtained for mechanisms with both serial kinematics and parallel kinematics. In the dissertation, a system-wide stiffness model of a kinematic redundant serial-parallel robot machine is constructed based on integration of the VJM variant and the virtual work principle. Numerical results of its stiffness performance are reported. For a kinematic redundant robot, to generate a set of feasible joints' trajectories for a prescribed trajectory of its end-effector, its system-wide stiffness performance is taken as the constraint in the joints trajectory planning in the dissertation. For a prescribed location of the end-effector, the robot permits an infinite number of inverse solutions, which consequently yields infinite kinds of stiffness performance. Therefore, a differential evolution (DE) algorithm in which the positions of redundant joints in the kinematics are taken as input variables was employed to search for the best stiffness performance of the robot. Numerical results of the generated joint trajectories are given for a kinematic redundant serial-parallel robot machine, IWR (Intersector Welding/Cutting Robot), when a particular trajectory of its end-effector has been prescribed. The numerical results show that the joint trajectories generated based on the stiffness optimization are feasible for realization in the control system since they are acceptably smooth. The results imply that the stiffness performance of the robot machine deviates smoothly with respect to the kinematic configuration in the adjacent domain of its best stiffness performance. To suppress the vibration of the robot machine due to varying cutting force during the machining process, this dissertation proposed a feedforward control strategy, which is constructed based on the derived inverse dynamics model of target system. The effectiveness of applying such a feedforward control in the vibration suppression has been validated in a parallel manipulator in the software environment. The experimental study of such a feedforward control has also been included in the dissertation. The difficulties of modelling the actual system due to the unknown components in its dynamics is noticed. As a solution, a back propagation (BP) neural network is proposed for identification of the unknown components of the dynamics model of the target system. To train such a BP neural network, a modified Levenberg-Marquardt algorithm that can utilize an experimental input-output data set of the entire dynamic system is introduced in the dissertation. Validation of the BP neural network and the modified Levenberg- Marquardt algorithm is done, respectively, by a sinusoidal output approximation, a second order system parameters estimation, and a friction model estimation of a parallel manipulator, which represent three different application aspects of this method.
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.