975 resultados para Selection Algorithms
Resumo:
Background: The ratio of the rates of non-synonymous and synonymous substitution (d(N)/d(S)) is commonly used to estimate selection in coding sequences. It is often suggested that, all else being equal, d(N)/d(S) should be lower in populations with large effective size (Ne) due to increased efficacy of purifying selection. As N-e is difficult to measure directly, life history traits such as body mass, which is typically negatively associated with population size, have commonly been used as proxies in empirical tests of this hypothesis. However, evidence of whether the expected positive correlation between body mass and d(N)/d(S) is consistently observed is conflicting. Results: Employing whole genome sequence data from 48 avian species, we assess the relationship between rates of molecular evolution and life history in birds. We find a negative correlation between dN/dS and body mass, contrary to nearly neutral expectation. This raises the question whether the correlation might be a method artefact. We therefore in turn consider non-stationary base composition, divergence time and saturation as possible explanations, but find no clear patterns. However, in striking contrast to d(N)/d(S), the ratio of radical to conservative amino acid substitutions (K-r/K-c) correlates positively with body mass. Conclusions: Our results in principle accord with the notion that non-synonymous substitutions causing radical amino acid changes are more efficiently removed by selection in large populations, consistent with nearly neutral theory. These findings have implications for the use of d(N)/d(S) and suggest that caution is warranted when drawing conclusions about lineage-specific modes of protein evolution using this metric.
Resumo:
The noise power spectrum (NPS) is the reference metric for understanding the noise content in computed tomography (CT) images. To evaluate the noise properties of clinical multidetector (MDCT) scanners, local 2D and 3D NPSs were computed for different acquisition reconstruction parameters.A 64- and a 128-MDCT scanners were employed. Measurements were performed on a water phantom in axial and helical acquisition modes. CT dose index was identical for both installations. Influence of parameters such as the pitch, the reconstruction filter (soft, standard and bone) and the reconstruction algorithm (filtered-back projection (FBP), adaptive statistical iterative reconstruction (ASIR)) were investigated. Images were also reconstructed in the coronal plane using a reformat process. Then 2D and 3D NPS methods were computed.In axial acquisition mode, the 2D axial NPS showed an important magnitude variation as a function of the z-direction when measured at the phantom center. In helical mode, a directional dependency with lobular shape was observed while the magnitude of the NPS was kept constant. Important effects of the reconstruction filter, pitch and reconstruction algorithm were observed on 3D NPS results for both MDCTs. With ASIR, a reduction of the NPS magnitude and a shift of the NPS peak to the low frequency range were visible. 2D coronal NPS obtained from the reformat images was impacted by the interpolation when compared to 2D coronal NPS obtained from 3D measurements.The noise properties of volume measured in last generation MDCTs was studied using local 3D NPS metric. However, impact of the non-stationarity noise effect may need further investigations.
Resumo:
quantiNemo is an individual-based, genetically explicit stochastic simulation program. It was developed to investigate the effects of selection, mutation, recombination and drift on quantitative traits with varying architectures in structured populations connected by migration and located in a heterogeneous habitat. quantiNemo is highly flexible at various levels: population, selection, trait(s) architecture, genetic map for QTL and/or markers, environment, demography, mating system, etc. quantiNemo is coded in C++ using an object-oriented approach and runs on any computer platform. Availability: Executables for several platforms, user's manual, and source code are freely available under the GNU General Public License at http://www2.unil.ch/popgen/softwares/quantinemo.
Resumo:
The state of the art to describe image quality in medical imaging is to assess the performance of an observer conducting a task of clinical interest. This can be done by using a model observer leading to a figure of merit such as the signal-to-noise ratio (SNR). Using the non-prewhitening (NPW) model observer, we objectively characterised the evolution of its figure of merit in various acquisition conditions. The NPW model observer usually requires the use of the modulation transfer function (MTF) as well as noise power spectra. However, although the computation of the MTF poses no problem when dealing with the traditional filtered back-projection (FBP) algorithm, this is not the case when using iterative reconstruction (IR) algorithms, such as adaptive statistical iterative reconstruction (ASIR) or model-based iterative reconstruction (MBIR). Given that the target transfer function (TTF) had already shown it could accurately express the system resolution even with non-linear algorithms, we decided to tune the NPW model observer, replacing the standard MTF by the TTF. It was estimated using a custom-made phantom containing cylindrical inserts surrounded by water. The contrast differences between the inserts and water were plotted for each acquisition condition. Then, mathematical transformations were performed leading to the TTF. As expected, the first results showed a dependency of the image contrast and noise levels on the TTF for both ASIR and MBIR. Moreover, FBP also proved to be dependent of the contrast and noise when using the lung kernel. Those results were then introduced in the NPW model observer. We observed an enhancement of SNR every time we switched from FBP to ASIR to MBIR. IR algorithms greatly improve image quality, especially in low-dose conditions. Based on our results, the use of MBIR could lead to further dose reduction in several clinical applications.
Resumo:
The objective of this work was to validate microsatellite markers associated with resistance to soybean cyst nematode (Heterodera glycines Ichinohe) races 3 and 14, in soybean (Glycine max L.) genotypes, for use in marker-assisted selection (MAS) programs. Microsatellites of soybean linkage groups A2, D2 and G were tested in two populations, and their selection efficiencies were determined. The populations were 65 F2:3 families from Msoy8001 (resistant) x Conquista (susceptible) cross, and 66 F2:3 families of S5995 (resistant) x Renascença (susceptible) cross, evaluated for resistance to races 3 and 14, respectively. Families with female index up to 30% were considered moderately resistant. Markers of A2 and G linkage groups were associated with resistance to race 3. Markers Satt309 and GMENOD2B explained the greatest proportion of phenotypic variance in the different groups. The combinations Satt309+GMENOD2B and Satt309+Satt187 presented 100% selection efficiency. Resistance to race 14 was associated with markers of G linkage group, and selection efficiency in the Satt309+Satt356 combination was 100%. The selection differential obtained by phenotypic and marker assisted selection showed that both can result in similar gains.
Resumo:
Modeling the mechanisms that determine how humans and other agents choose among different behavioral and cognitive processes-be they strategies, routines, actions, or operators-represents a paramount theoretical stumbling block across disciplines, ranging from the cognitive and decision sciences to economics, biology, and machine learning. By using the cognitive and decision sciences as a case study, we provide an introduction to what is also known as the strategy selection problem. First, we explain why many researchers assume humans and other animals to come equipped with a repertoire of behavioral and cognitive processes. Second, we expose three descriptive, predictive, and prescriptive challenges that are common to all disciplines which aim to model the choice among these processes. Third, we give an overview of different approaches to strategy selection. These include cost‐benefit, ecological, learning, memory, unified, connectionist, sequential sampling, and maximization approaches. We conclude by pointing to opportunities for future research and by stressing that the selection problem is far from being resolved.
Resumo:
The objectives of this work were to analyze theoretical genetic gains of maize due to recurrent selection among full-sib and half-sib families, obtained by Design I, Full-Sib Design and Half-Sib Design, and genotypic variability and gene loss with long term selection. The designs were evaluated by simulation, based on average estimated gains after ten selection cycles. The simulation process was based on seven gene systems with ten genes (with distinct degrees of dominance), three population classes (with different gene frequencies), under three environmental conditions (heritability values), and four selection strategies. Each combination was repeated ten times, amounting to 25, 200 simulations. Full-sib selection is generally more efficient than half-sib selection, mainly with favorable dominant genes. The use of full-sib families derived by Design I is generally more efficient than using progenies obtained by Full-Sib Design. Using Design I with 50 males and 200 females (effective size of 160) did not result in improved populations with minimum genotypic variability. In the populations with lower effective size (160 and 400) the loss of favorable genes was restricted to recessive genes with reduced frequencies.
Resumo:
The objective of this work was to determine soybean resistance inheritance to Heterodera glycines Ichinohe (soybean cyst nematode - SCN) races 3 and 9, as well as to evaluate the efficiency of direct and indirect selection in a soybean population of 112 recombinant inbred lines (RIL) derived from the resistant cultivar Hartwig. The experiment was conducted in a completely randomized design, in Londrina, PR, Brazil. The estimated narrow-sense heritabilities for resistance to races 3 and 9 were 80.67 and 77.97%. The genetic correlation coefficient (r g = 0.17; p<0.01) shows that some genetic components of resistance to these two races are inherited together. The greatest genetic gain by indirect selection was obtained to race 9, selecting to race 3 due to simpler inheritance of resistance to race 9 and not because these two races share common resistance genes. The resistance of cultivar Hartwig to races 3 and 9 is determined by 4 and 2 genes, respectively. One of these genes confers resistance to both races, explaining a fraction of the significant genetic correlation found between resistance to these SCN races. The inheritance pattern described indicates that selection for resistance to SCN must be performed for each race individually.
Resumo:
The objective of this work was to evaluate the potential of allohexaploid pearl millet x elephantgrass (HGL) population for a recurrent selection program through open-pollinated progenies. Seventy-eight progenies, one representative sample of the population, and two commercial cultivars, Pioneiro and Paraíso, were evaluated in a 9x9 triple lattice design, in two sites. Plant height and dry matter yield were evaluated in three and four cuts, respectively. For plant height, the 17 best progenies were similar to both commercial controls, while for dry matter yield they were higher than 'Paraíso' and lower than 'Pioneiro'. The correlation between progenies and cuts indicated that the fourth cut represents the mean of all cuts, and the possibility of using early selection. Heritability estimates considering cuts and sites were 56.9% for plant height and 58.8% for dry matter yield, and the expected response to selection was 23.4% for dry matter yield and 18.1% for plant height. These results demonstrate the promising HGL population potential for a recurrent selection program.
Resumo:
Recently, several anonymization algorithms have appeared for privacy preservation on graphs. Some of them are based on random-ization techniques and on k-anonymity concepts. We can use both of them to obtain an anonymized graph with a given k-anonymity value. In this paper we compare algorithms based on both techniques in orderto obtain an anonymized graph with a desired k-anonymity value. We want to analyze the complexity of these methods to generate anonymized graphs and the quality of the resulting graphs.
Resumo:
Positive selection is widely estimated from protein coding sequence alignments by the nonsynonymous-to-synonymous ratio omega. Increasingly elaborate codon models are used in a likelihood framework for this estimation. Although there is widespread concern about the robustness of the estimation of the omega ratio, more efforts are needed to estimate this robustness, especially in the context of complex models. Here, we focused on the branch-site codon model. We investigated its robustness on a large set of simulated data. First, we investigated the impact of sequence divergence. We found evidence of underestimation of the synonymous substitution rate for values as small as 0.5, with a slight increase in false positives for the branch-site test. When dS increases further, underestimation of dS is worse, but false positives decrease. Interestingly, the detection of true positives follows a similar distribution, with a maximum for intermediary values of dS. Thus, high dS is more of a concern for a loss of power (false negatives) than for false positives of the test. Second, we investigated the impact of GC content. We showed that there is no significant difference of false positives between high GC (up to similar to 80%) and low GC (similar to 30%) genes. Moreover, neither shifts of GC content on a specific branch nor major shifts in GC along the gene sequence generate many false positives. Our results confirm that the branch-site is a very conservative test.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.
Resumo:
Most local agencies in Iowa currently make their pavement treatment decisions based on their limited experience due primarily to lack of a systematic decision-making framework and a decision-aid tool. The lack of objective condition assessment data of agency pavements also contributes to this problem. This study developed a systematic pavement treatment selection framework for local agencies to assist them in selecting the most appropriate treatment and to help justify their maintenance and rehabilitation decisions. The framework is based on an extensive literature review of the various pavement treatment techniques in terms of their technical applicability and limitations, meaningful practices of neighboring states, and the results of a survey of local agencies. The treatment selection framework involves three different steps: pavement condition assessment, selection of technically feasible treatments using decision trees, and selection of the most appropriate treatment considering the return-on-investment (ROI) and other non-economic factors. An Excel-based spreadsheet tool that automates the treatment selection framework was also developed, along with a standalone user guide for the tool. The Pavement Treatment Selection Tool (PTST) for Local Agencies allows users to enter the severity and extent levels of existing distresses and then, recommends a set of technically feasible treatments. The tool also evaluates the ROI of each feasible treatment and, if necessary, it can also evaluate the non-economic value of each treatment option to help determine the most appropriate treatment for the pavement. It is expected that the framework and tool will help local agencies improve their pavement asset management practices significantly and make better economic and defensible decisions on pavement treatment selection.
Resumo:
The objective of this work was to identify the best selection strategies for the more promising parental combinations to obtain lines with good resistance to soybean Asian rust (Phakopsora pachyrhizi). Two experiments were carried out in the field during the 2006/2007 and 2007/2008 growing seasons, to determine the percentage of infected leaf area of individual plants of five parents and their segregant F2 and F3 populations. The data obtained indicates that additive genetic variance predominates in the control of soybean resistance to Asian rust, and that the year and time of assessment do not significantly influence the estimates of the genetic parameters obtained. The narrow-sense heritability (h²r) ranged from 23.12 to 55.83%, and indicates the possibility of successful selection of resistant individuals in the early generations of the breeding program. All the procedures used to select the most promising populations to generate superior inbred lines for resistance to P. pachyrhizi presented similar results and identified the BR01-18437 x BRS 232 population as the best for inbred line selection.