976 resultados para Structured methods
Resumo:
Nowadays, genome-wide association studies (GWAS) and genomic selection (GS) methods which use genome-wide marker data for phenotype prediction are of much potential interest in plant breeding. However, to our knowledge, no studies have been performed yet on the predictive ability of these methods for structured traits when using training populations with high levels of genetic diversity. Such an example of a highly heterozygous, perennial species is grapevine. The present study compares the accuracy of models based on GWAS or GS alone, or in combination, for predicting simple or complex traits, linked or not with population structure. In order to explore the relevance of these methods in this context, we performed simulations using approx 90,000 SNPs on a population of 3,000 individuals structured into three groups and corresponding to published diversity grapevine data. To estimate the parameters of the prediction models, we defined four training populations of 1,000 individuals, corresponding to these three groups and a core collection. Finally, to estimate the accuracy of the models, we also simulated four breeding populations of 200 individuals. Although prediction accuracy was low when breeding populations were too distant from the training populations, high accuracy levels were obtained using the sole core-collection as training population. The highest prediction accuracy was obtained (up to 0.9) using the combined GWAS-GS model. We thus recommend using the combined prediction model and a core-collection as training population for grapevine breeding or for other important economic crops with the same characteristics.
Resumo:
Many traits and/or strategies expressed by organisms are quantitative phenotypes. Because populations are of finite size and genomes are subject to mutations, these continuously varying phenotypes are under the joint pressure of mutation, natural selection and random genetic drift. This article derives the stationary distribution for such a phenotype under a mutation-selection-drift balance in a class-structured population allowing for demographically varying class sizes and/or changing environmental conditions. The salient feature of the stationary distribution is that it can be entirely characterized in terms of the average size of the gene pool and Hamilton's inclusive fitness effect. The exploration of the phenotypic space varies exponentially with the cumulative inclusive fitness effect over state space, which determines an adaptive landscape. The peaks of the landscapes are those phenotypes that are candidate evolutionary stable strategies and can be determined by standard phenotypic selection gradient methods (e.g. evolutionary game theory, kin selection theory, adaptive dynamics). The curvature of the stationary distribution provides a measure of the stability by convergence of candidate evolutionary stable strategies, and it is evaluated explicitly for two biological scenarios: first, a coordination game, which illustrates that, for a multipeaked adaptive landscape, stochastically stable strategies can be singled out by letting the size of the gene pool grow large; second, a sex-allocation game for diploids and haplo-diploids, which suggests that the equilibrium sex ratio follows a Beta distribution with parameters depending on the features of the genetic system.
Resumo:
BACKGROUND: We aimed to assess the value of a structured clinical assessment and genetic testing for refining the diagnosis of abacavir hypersensitivity reactions (ABC-HSRs) in a routine clinical setting. METHODS: We performed a diagnostic reassessment using a structured patient chart review in individuals who had stopped ABC because of suspected HSR. Two HIV physicians blinded to the human leukocyte antigen (HLA) typing results independently classified these individuals on a scale between 3 (ABC-HSR highly likely) and -3 (ABC-HSR highly unlikely). Scoring was based on symptoms, onset of symptoms and comedication use. Patients were classified as clinically likely (mean score > or =2), uncertain (mean score > or = -1 and < or = 1) and unlikely (mean score < or = -2). HLA typing was performed using sequence-based methods. RESULTS: From 131 reassessed individuals, 27 (21%) were classified as likely, 43 (33%) as unlikely and 61 (47%) as uncertain ABC-HSR. Of the 131 individuals with suspected ABC-HSR, 31% were HLA-B*5701-positive compared with 1% of 140 ABC-tolerant controls (P < 0.001). HLA-B*5701 carriage rate was higher in individuals with likely ABC-HSR compared with those with uncertain or unlikely ABC-HSR (78%, 30% and 5%, respectively, P < 0.001). Only six (7%) HLA-B*5701-negative individuals were classified as likely HSR after reassessment. CONCLUSIONS: HLA-B*5701 carriage is highly predictive of clinically diagnosed ABC-HSR. The high proportion of HLA-B*5701-negative individuals with minor symptoms among individuals with suspected HSR indicates overdiagnosis of ABC-HSR in the era preceding genetic screening. A structured clinical assessment and genetic testing could reduce the rate of inappropriate ABC discontinuation and identify individuals at high risk for ABC-HSR.
Resumo:
Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.
Resumo:
Introduction Our institution (University hospital) is encouraging physical activities for health through various popular sporting events in the city of Lausanne, the biggest of which is a road race of 2, 4, 10 and 20km. Objective To create an efficient and sustainable training program in preparation of the race for a group of motivated hospital employees without any prior experience with structured training and to identifying the benefits and limitations encountered.. Methods Subjects of various fitness levels were recruited by add and agreed to undergo lab and field testing before a 12-week 3 times/week running program, based on maximal aerobic speed (MAS-30/30 sec intervals), running technique exercises and endurance training. The interval session was the only one supervised. Their goal was the 10km (11 subjects) and the 20km (6 subjects). Results A group of 17 subjects (7 male and 10 female), mean age 36.6±7.3 years, VO2max 44.0±5.5 ml/kg/min, filed test interval MAS 15.1±2.4 km/h started the program. 2 were lost because of injury (while skiing). Adherence to interval sessions was excellent, although 3 weekly training sessions proved to be difficult for most of the subjects. Performance in the race was satisfying for all of them, 6/7 subjects having improved their running time from the previous year, the others participated for the first time and 7/8 completed the race satisfyingly, one DNF-ed because of sinusitis. Repeat MAS field test was available for 6 subjects, who improved by 5.9% (p<0.01). Subjectively, all of the participants were very satisfied with improvement, interaction with colleagues from various professions, and with self achievement and confidence. Conclusions Implementation of a structured training program for recreational or non-athletes can be very successful in creating a better self-confidence, a better working environment inside a hospital facility and obviously in improvement of physical fitness and athletic performance. Above all, it can only encourage health institutions to promote the health of their own employees through physical activity, which can allow people to connect through sports. As a result, subjects in this study tend to encourage other employees to be more active and are hungry for more advice and continued offers for physical activities benefiting both them and the institution through better efficiency at work and less absenteeism common to more active people.
Resumo:
We propose a novel formulation to solve the problem of intra-voxel reconstruction of the fibre orientation distribution function (FOD) in each voxel of the white matter of the brain from diffusion MRI data. The majority of the state-of-the-art methods in the field perform the reconstruction on a voxel-by-voxel level, promoting sparsity of the orientation distribution. Recent methods have proposed a global denoising of the diffusion data using spatial information prior to reconstruction, while others promote spatial regularisation through an additional empirical prior on the diffusion image at each q-space point. Our approach reconciles voxelwise sparsity and spatial regularisation and defines a spatially structured FOD sparsity prior, where the structure originates from the spatial coherence of the fibre orientation between neighbour voxels. The method is shown, through both simulated and real data, to enable accurate FOD reconstruction from a much lower number of q-space samples than the state of the art, typically 15 samples, even for quite adverse noise conditions.
Resumo:
Objective To evaluate the knowledge about diagnostic imaging methods among primary care and medical emergency physicians. Materials and Methods Study developed with 119 primary care and medical emergency physicians in Montes Claros, MG, Brazil, by means of a structured questionnaire about general knowledge and indications of imaging methods in common clinical settings. A rate of correct responses corresponding to ≥ 80% was considered as satisfactory. The Poisson regression (PR) model was utilized in the data analysis. Results Among the 81 individuals who responded the questionnaire, 65% (n = 53) demonstrated to have satisfactory general knowledge and 44% (n = 36) gave correct responses regarding indications of imaging methods. Respectively, 65% (n = 53) and 51% (n = 41) of the respondents consider that radiography and computed tomography do not use ionizing radiation. The prevalence of a satisfactory general knowledge about imaging methods was associated with medical residency in the respondents' work field (PR = 4.55; IC 95%: 1.18-16.67; p-value: 0.03), while the prevalence of correct responses regarding indication of imaging methods was associated with the professional practice in primary health care (PR = 1.79; IC 95%: 1.16-2.70; p-value: 0.01). Conclusion Major deficiencies were observed as regards the knowledge about imaging methods among physicians, with better results obtained by those involved in primary health care and by residents.
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
Investment decision-making on far-reaching innovation ideas is one of the key challenges practitioners and academics face in the field of innovation management. However, the management practices and theories strongly rely on evaluation systems that do not fit in well with this setting. These systems and practices normally cannot capture the value of future opportunities under high uncertainty because they ignore the firm’s potential for growth and flexibility. Real options theory and options-based methods have been offered as a solution to facilitate decision-making on highly uncertain investment objects. Much of the uncertainty inherent in these investment objects is attributable to unknown future events. In this setting, real options theory and methods have faced some challenges. First, the theory and its applications have largely been limited to market-priced real assets. Second, the options perspective has not proved as useful as anticipated because the tools it offers are perceived to be too complicated for managerial use. Third, there are challenges related to the type of uncertainty existing real options methods can handle: they are primarily limited to parametric uncertainty. Nevertheless, the theory is considered promising in the context of far-reaching and strategically important innovation ideas. The objective of this dissertation is to clarify the potential of options-based methodology in the identification of innovation opportunities. The constructive research approach gives new insights into the development potential of real options theory under non-parametric and closeto- radical uncertainty. The distinction between real options and strategic options is presented as an explanans for the discovered limitations of the theory. The findings offer managers a new means of assessing future innovation ideas based on the frameworks constructed during the course of the study.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
Products developed at industries, institutes and research centers are expected to have high level of quality and performance, having a minimum waste, which require efficient and robust tools to numerically simulate stringent project conditions with great reliability. In this context, Computational Fluid Dynamics (CFD) plays an important role and the present work shows two numerical algorithms that are used in the CFD community to solve the Euler and Navier-Stokes equations applied to typical aerospace and aeronautical problems. Particularly, unstructured discretization of the spatial domain has gained special attention by the international community due to its ease in discretizing complex spatial domains. This work has the main objective of illustrating some advantages and disadvantages of numerical algorithms using structured and unstructured spatial discretization of the flow governing equations. Numerical methods include a finite volume formulation and the Euler and Navier-Stokes equations are applied to solve a transonic nozzle problem, a low supersonic airfoil problem and a hypersonic inlet problem. In a structured context, these problems are solved using MacCormacks implicit algorithm with Steger and Warmings flux vector splitting technique, while, in an unstructured context, Jameson and Mavriplis explicit algorithm is used. Convergence acceleration is obtained using a spatially variable time stepping procedure.
Resumo:
Phenomena in cyber domain, especially threats to security and privacy, have proven an increasingly heated topic addressed by different writers and scholars at an increasing pace – both nationally and internationally. However little public research has been done on the subject of cyber intelligence. The main research question of the thesis was: To what extent is the applicability of cyber intelligence acquisition methods circumstantial? The study was conducted in sequential a manner, starting with defining the concept of intelligence in cyber domain and identifying its key attributes, followed by identifying the range of intelligence methods in cyber domain, criteria influencing their applicability, and types of operatives utilizing cyber intelligence. The methods and criteria were refined into a hierarchical model. The existing conceptions of cyber intelligence were mapped through an extensive literature study on a wide variety of sources. The established understanding was further developed through 15 semi-structured interviews with experts of different backgrounds, whose wide range of points of view proved to substantially enhance the perspective on the subject. Four of the interviewed experts participated in a relatively extensive survey based on the constructed hierarchical model on cyber intelligence that was formulated in to an AHP hierarchy and executed in the Expert Choice Comparion online application. It was concluded that Intelligence in cyber domain is an endorsing, cross-cutting intelligence discipline that adds value to all aspects of conventional intelligence and furthermore that it bears a substantial amount of characteristic traits – both advantageous and disadvantageous – and furthermore that the applicability of cyber intelligence methods is partly circumstantially limited.
Resumo:
A project to identify metrics for assessing the quality of open data based on the needs of small voluntary sector organisations in the UK and India. For this project we assumed the purpose of open data metrics is to determine the value of a group of open datasets to a defined community of users. We adopted a much more user-centred approach than most open data research using small structured workshops to identify users’ key problems and then working from those problems to understand how open data can help address them and the key attributes of the data if it is to be successful. We then piloted different metrics that might be used to measure the presence of those attributes. The result was six metrics that we assessed for validity, reliability, discrimination, transferability and comparability. This user-centred approach to open data research highlighted some fundamental issues with expanding the use of open data from its enthusiast base.
Resumo:
La percepció per visió es millorada quan es pot gaudir d'un camp de visió ampli. Aquesta tesi es concentra en la percepció visual de la profunditat amb l'ajuda de càmeres omnidireccionals. La percepció 3D s'obté generalment en la visió per computadora utilitzant configuracions estèreo amb el desavantatge del cost computacional elevat a l'hora de buscar els elements visuals comuns entre les imatges. La solució que ofereix aquesta tesi és l'ús de la llum estructurada per resoldre el problema de relacionar les correspondències. S'ha realitzat un estudi sobre els sistemes de visió omnidireccional. S'han avaluat vàries configuracions estèreo i s'ha escollit la millor. Els paràmetres del model són difícils de mesurar directament i, en conseqüència, s'ha desenvolupat una sèrie de mètodes de calibració. Els resultats obtinguts són prometedors i demostren que el sensor pot ésser utilitzat en aplicacions per a la percepció de la profunditat com serien el modelatge de l'escena, la inspecció de canonades, navegació de robots, etc.
Resumo:
Alternative meshes of the sphere and adaptive mesh refinement could be immensely beneficial for weather and climate forecasts, but it is not clear how mesh refinement should be achieved. A finite-volume model that solves the shallow-water equations on any mesh of the surface of the sphere is presented. The accuracy and cost effectiveness of four quasi-uniform meshes of the sphere are compared: a cubed sphere, reduced latitude–longitude, hexagonal–icosahedral, and triangular–icosahedral. On some standard shallow-water tests, the hexagonal–icosahedral mesh performs best and the reduced latitude–longitude mesh performs well only when the flow is aligned with the mesh. The inclusion of a refined mesh over a disc-shaped region is achieved using either gradual Delaunay, gradual Voronoi, or abrupt 2:1 block-structured refinement. These refined regions can actually degrade global accuracy, presumably because of changes in wave dispersion where the mesh is highly nonuniform. However, using gradual refinement to resolve a mountain in an otherwise coarse mesh can improve accuracy for the same cost. The model prognostic variables are height and momentum collocated at cell centers, and (to remove grid-scale oscillations of the A grid) the mass flux between cells is advanced from the old momentum using the momentum equation. Quadratic and upwind biased cubic differencing methods are used as explicit corrections to a fast implicit solution that uses linear differencing.