944 resultados para selection methods
Resumo:
Hypertrophic cardiomyopathy (HCM) is a cardiovascular disease where the heart muscle is partially thickened and blood flow is - potentially fatally - obstructed. It is one of the leading causes of sudden cardiac death in young people. Electrocardiography (ECG) and Echocardiography (Echo) are the standard tests for identifying HCM and other cardiac abnormalities. The American Heart Association has recommended using a pre-participation questionnaire for young athletes instead of ECG or Echo tests due to considerations of cost and time involved in interpreting the results of these tests by an expert cardiologist. Initially we set out to develop a classifier for automated prediction of young athletes’ heart conditions based on the answers to the questionnaire. Classification results and further in-depth analysis using computational and statistical methods indicated significant shortcomings of the questionnaire in predicting cardiac abnormalities. Automated methods for analyzing ECG signals can help reduce cost and save time in the pre-participation screening process by detecting HCM and other cardiac abnormalities. Therefore, the main goal of this dissertation work is to identify HCM through computational analysis of 12-lead ECG. ECG signals recorded on one or two leads have been analyzed in the past for classifying individual heartbeats into different types of arrhythmia as annotated primarily in the MIT-BIH database. In contrast, we classify complete sequences of 12-lead ECGs to assign patients into two groups: HCM vs. non-HCM. The challenges and issues we address include missing ECG waves in one or more leads and the dimensionality of a large feature-set. We address these by proposing imputation and feature-selection methods. We develop heartbeat-classifiers by employing Random Forests and Support Vector Machines, and propose a method to classify full 12-lead ECGs based on the proportion of heartbeats classified as HCM. The results from our experiments show that the classifiers developed using our methods perform well in identifying HCM. Thus the two contributions of this thesis are the utilization of computational and statistical methods for discovering shortcomings in a current screening procedure and the development of methods to identify HCM through computational analysis of 12-lead ECG signals.
Resumo:
A organização automática de mensagens de correio electrónico é um desafio actual na área da aprendizagem automática. O número excessivo de mensagens afecta cada vez mais utilizadores, especialmente os que usam o correio electrónico como ferramenta de comunicação e trabalho. Esta tese aborda o problema da organização automática de mensagens de correio electrónico propondo uma solução que tem como objectivo a etiquetagem automática de mensagens. A etiquetagem automática é feita com recurso às pastas de correio electrónico anteriormente criadas pelos utilizadores, tratando-as como etiquetas, e à sugestão de múltiplas etiquetas para cada mensagem (top-N). São estudadas várias técnicas de aprendizagem e os vários campos que compõe uma mensagem de correio electrónico são analisados de forma a determinar a sua adequação como elementos de classificação. O foco deste trabalho recai sobre os campos textuais (o assunto e o corpo das mensagens), estudando-se diferentes formas de representação, selecção de características e algoritmos de classificação. É ainda efectuada a avaliação dos campos de participantes através de algoritmos de classificação que os representam usando o modelo vectorial ou como um grafo. Os vários campos são combinados para classificação utilizando a técnica de combinação de classificadores Votação por Maioria. Os testes são efectuados com um subconjunto de mensagens de correio electrónico da Enron e um conjunto de dados privados disponibilizados pelo Institute for Systems and Technologies of Information, Control and Communication (INSTICC). Estes conjuntos são analisados de forma a perceber as características dos dados. A avaliação do sistema é realizada através da percentagem de acerto dos classificadores. Os resultados obtidos apresentam melhorias significativas em comparação com os trabalhos relacionados.
Resumo:
Locomotor tasks characterization plays an important role in trying to improve the quality of life of a growing elderly population. This paper focuses on this matter by trying to characterize the locomotion of two population groups with different functional fitness levels (high or low) while executing three different tasks-gait, stair ascent and stair descent. Features were extracted from gait data, and feature selection methods were used in order to get the set of features that allow differentiation between functional fitness level. Unsupervised learning was used to validate the sets obtained and, ultimately, indicated that it is possible to distinguish the two population groups. The sets of best discriminate features for each task are identified and thoroughly analysed. Copyright © 2014 SCITEPRESS - Science and Technology Publications. All rights reserved.
Resumo:
Atualmente, as estratégias que as empresas optam por seguir para a maximização de recursos materiais e humanos, podem representar a diferença entre o sucesso e o fracasso. A seleção de fornecedores é um fator bastante crítico para o desempenho da empresa compradora, sendo por vezes necessária a resolução de problemas que apresentam um elevado grau de complexidade. A escolha dos métodos a ser utilizados e a eleição dos critérios mais relevantes foi feito com base no estudo de diversos autores e nas repostas obtidas a um inquérito online difundido por uma amostra de empresas portuguesas, criado especificamente para compreender quais os fatores que mais peso tinham nas decisões de escolha de parceiros. Além disso, os resultados adquiridos desta forma foram utilizados para conceder mais precisão às ponderações efetuadas na ferramenta de seleção, na escolha dos melhores fornecedores introduzidos pelos utilizadores da mesma. Muitos estudos literários propõem o uso de métodos para simplificar a tarefa de seleção de fornecedores. Esta dissertação aplica o estudo realizado nos métodos de seleção, nomeadamente o Simple Multi-Attribute Rating Technique (SMART) e Analytic Hierarchy Process (AHP), necessários para o desenvolvimento de uma ferramenta de software online que permitia, a qualquer empresa nacional, obter uma classificação para os seus fornecedores perante um conjunto de critérios e subcritérios.
Resumo:
No ambiente altamente competitivo de hoje, um processo eficaz de seleção de fornecedores é muito importante para o sucesso de qualquer organização [33]. Esta dissertação procura determinar quais os critérios e métodos mais utilizados no problema da seleção de fornecedores, contribuindo assim para o apoio a entidades que pretendam iniciar uma seleção de fornecedores de uma forma mais eficaz. Para atingir os objetivos propostos, foi realizada uma análise de artigos que fazem a revisão literária dos métodos e critérios desde o ano de 1985 até ao ano 2012. Com os dados obtidos destas revisões, foi possível identificar quais os três principais métodos utilizados ao longo dos anos, sendo eles o DEA, AHP e Fuzzy set theory e os principais critérios utilizados na seleção de fornecedores. Nesta dissertação, é apresentada uma visão geral da tomada de decisão e os métodos utilizados na tomada de decisão multicritério. É abordado o problema da seleção de fornecedores, o seu processo de seleção e as revisões literárias dos métodos e critérios de seleção utilizados nos últimos anos. Por fim, é apresentada a contribuição para a seleção de fornecedores do estudo realizado durante o desenvolvimento desta dissertação, sendo apresentados e explicados os principais métodos de seleção de fornecedores, bem como os critérios utilizados.
Resumo:
Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Eletrónica Médica)
Resumo:
Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Eletrónica Médica)
Resumo:
The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching 90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.
Resumo:
We develop methods for Bayesian inference in vector error correction models which are subject to a variety of switches in regime (e.g. Markov switches in regime or structural breaks). An important aspect of our approach is that we allow both the cointegrating vectors and the number of cointegrating relationships to change when the regime changes. We show how Bayesian model averaging or model selection methods can be used to deal with the high-dimensional model space that results. Our methods are used in an empirical study of the Fisher effect.
Resumo:
We develop methods for Bayesian inference in vector error correction models which are subject to a variety of switches in regime (e.g. Markov switches in regime or structural breaks). An important aspect of our approach is that we allow both the cointegrating vectors and the number of cointegrating relationships to change when the regime changes. We show how Bayesian model averaging or model selection methods can be used to deal with the high-dimensional model space that results. Our methods are used in an empirical study of the Fisher e ffect.
Resumo:
This paper discusses the challenges faced by the empirical macroeconomist and methods for surmounting them. These challenges arise due to the fact that macroeconometric models potentially include a large number of variables and allow for time variation in parameters. These considerations lead to models which have a large number of parameters to estimate relative to the number of observations. A wide range of approaches are surveyed which aim to overcome the resulting problems. We stress the related themes of prior shrinkage, model averaging and model selection. Subsequently, we consider a particular modelling approach in detail. This involves the use of dynamic model selection methods with large TVP-VARs. A forecasting exercise involving a large US macroeconomic data set illustrates the practicality and empirical success of our approach.
Resumo:
Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.
Resumo:
SUMMARYSpecies distribution models (SDMs) represent nowadays an essential tool in the research fields of ecology and conservation biology. By combining observations of species occurrence or abundance with information on the environmental characteristic of the observation sites, they can provide information on the ecology of species, predict their distributions across the landscape or extrapolate them to other spatial or time frames. The advent of SDMs, supported by geographic information systems (GIS), new developments in statistical models and constantly increasing computational capacities, has revolutionized the way ecologists can comprehend species distributions in their environment. SDMs have brought the tool that allows describing species realized niches across a multivariate environmental space and predict their spatial distribution. Predictions, in the form of probabilistic maps showing the potential distribution of the species, are an irreplaceable mean to inform every single unit of a territory about its biodiversity potential. SDMs and the corresponding spatial predictions can be used to plan conservation actions for particular species, to design field surveys, to assess the risks related to the spread of invasive species, to select reserve locations and design reserve networks, and ultimately, to forecast distributional changes according to scenarios of climate and/or land use change.By assessing the effect of several factors on model performance and on the accuracy of spatial predictions, this thesis aims at improving techniques and data available for distribution modelling and at providing the best possible information to conservation managers to support their decisions and action plans for the conservation of biodiversity in Switzerland and beyond. Several monitoring programs have been put in place from the national to the global scale, and different sources of data now exist and start to be available to researchers who want to model species distribution. However, because of the lack of means, data are often not gathered at an appropriate resolution, are sampled only over limited areas, are not spatially explicit or do not provide a sound biological information. A typical example of this is data on 'habitat' (sensu biota). Even though this is essential information for an effective conservation planning, it often has to be approximated from land use, the closest available information. Moreover, data are often not sampled according to an established sampling design, which can lead to biased samples and consequently to spurious modelling results. Understanding the sources of variability linked to the different phases of the modelling process and their importance is crucial in order to evaluate the final distribution maps that are to be used for conservation purposes.The research presented in this thesis was essentially conducted within the framework of the Landspot Project, a project supported by the Swiss National Science Foundation. The main goal of the project was to assess the possible contribution of pre-modelled 'habitat' units to model the distribution of animal species, in particular butterfly species, across Switzerland. While pursuing this goal, different aspects of data quality, sampling design and modelling process were addressed and improved, and implications for conservation discussed. The main 'habitat' units considered in this thesis are grassland and forest communities of natural and anthropogenic origin as defined in the typology of habitats for Switzerland. These communities are mainly defined at the phytosociological level of the alliance. For the time being, no comprehensive map of such communities is available at the national scale and at fine resolution. As a first step, it was therefore necessary to create distribution models and maps for these communities across Switzerland and thus to gather and collect the necessary data. In order to reach this first objective, several new developments were necessary such as the definition of expert models, the classification of the Swiss territory in environmental domains, the design of an environmentally stratified sampling of the target vegetation units across Switzerland, the development of a database integrating a decision-support system assisting in the classification of the relevés, and the downscaling of the land use/cover data from 100 m to 25 m resolution.The main contributions of this thesis to the discipline of species distribution modelling (SDM) are assembled in four main scientific papers. In the first, published in Journal of Riogeography different issues related to the modelling process itself are investigated. First is assessed the effect of five different stepwise selection methods on model performance, stability and parsimony, using data of the forest inventory of State of Vaud. In the same paper are also assessed: the effect of weighting absences to ensure a prevalence of 0.5 prior to model calibration; the effect of limiting absences beyond the environmental envelope defined by presences; four different methods for incorporating spatial autocorrelation; and finally, the effect of integrating predictor interactions. Results allowed to specifically enhance the GRASP tool (Generalized Regression Analysis and Spatial Predictions) that now incorporates new selection methods and the possibility of dealing with interactions among predictors as well as spatial autocorrelation. The contribution of different sources of remotely sensed information to species distribution models was also assessed. The second paper (to be submitted) explores the combined effects of sample size and data post-stratification on the accuracy of models using data on grassland distribution across Switzerland collected within the framework of the Landspot project and supplemented with other important vegetation databases. For the stratification of the data, different spatial frameworks were compared. In particular, environmental stratification by Swiss Environmental Domains was compared to geographical stratification either by biogeographic regions or political states (cantons). The third paper (to be submitted) assesses the contribution of pre- modelled vegetation communities to the modelling of fauna. It is a two-steps approach that combines the disciplines of community ecology and spatial ecology and integrates their corresponding concepts of habitat. First are modelled vegetation communities per se and then these 'habitat' units are used in order to model animal species habitat. A case study is presented with grassland communities and butterfly species. Different ways of integrating vegetation information in the models of butterfly distribution were also evaluated. Finally, a glimpse to climate change is given in the fourth paper, recently published in Ecological Modelling. This paper proposes a conceptual framework for analysing range shifts, namely a catalogue of the possible patterns of change in the distribution of a species along elevational or other environmental gradients and an improved quantitative methodology to identify and objectively describe these patterns. The methodology was developed using data from the Swiss national common breeding bird survey and the article presents results concerning the observed shifts in the elevational distribution of breeding birds in Switzerland.The overall objective of this thesis is to improve species distribution models as potential inputs for different conservation tools (e.g. red lists, ecological networks, risk assessment of the spread of invasive species, vulnerability assessment in the context of climate change). While no conservation issues or tools are directly tested in this thesis, the importance of the proposed improvements made in species distribution modelling is discussed in the context of the selection of reserve networks.RESUMELes modèles de distribution d'espèces (SDMs) représentent aujourd'hui un outil essentiel dans les domaines de recherche de l'écologie et de la biologie de la conservation. En combinant les observations de la présence des espèces ou de leur abondance avec des informations sur les caractéristiques environnementales des sites d'observation, ces modèles peuvent fournir des informations sur l'écologie des espèces, prédire leur distribution à travers le paysage ou l'extrapoler dans l'espace et le temps. Le déploiement des SDMs, soutenu par les systèmes d'information géographique (SIG), les nouveaux développements dans les modèles statistiques, ainsi que la constante augmentation des capacités de calcul, a révolutionné la façon dont les écologistes peuvent comprendre la distribution des espèces dans leur environnement. Les SDMs ont apporté l'outil qui permet de décrire la niche réalisée des espèces dans un espace environnemental multivarié et prédire leur distribution spatiale. Les prédictions, sous forme de carte probabilistes montrant la distribution potentielle de l'espèce, sont un moyen irremplaçable d'informer chaque unité du territoire de sa biodiversité potentielle. Les SDMs et les prédictions spatiales correspondantes peuvent être utilisés pour planifier des mesures de conservation pour des espèces particulières, pour concevoir des plans d'échantillonnage, pour évaluer les risques liés à la propagation d'espèces envahissantes, pour choisir l'emplacement de réserves et les mettre en réseau, et finalement, pour prévoir les changements de répartition en fonction de scénarios de changement climatique et/ou d'utilisation du sol. En évaluant l'effet de plusieurs facteurs sur la performance des modèles et sur la précision des prédictions spatiales, cette thèse vise à améliorer les techniques et les données disponibles pour la modélisation de la distribution des espèces et à fournir la meilleure information possible aux gestionnaires pour appuyer leurs décisions et leurs plans d'action pour la conservation de la biodiversité en Suisse et au-delà. Plusieurs programmes de surveillance ont été mis en place de l'échelle nationale à l'échelle globale, et différentes sources de données sont désormais disponibles pour les chercheurs qui veulent modéliser la distribution des espèces. Toutefois, en raison du manque de moyens, les données sont souvent collectées à une résolution inappropriée, sont échantillonnées sur des zones limitées, ne sont pas spatialement explicites ou ne fournissent pas une information écologique suffisante. Un exemple typique est fourni par les données sur 'l'habitat' (sensu biota). Même s'il s'agit d'une information essentielle pour des mesures de conservation efficaces, elle est souvent approximée par l'utilisation du sol, l'information qui s'en approche le plus. En outre, les données ne sont souvent pas échantillonnées selon un plan d'échantillonnage établi, ce qui biaise les échantillons et par conséquent les résultats de la modélisation. Comprendre les sources de variabilité liées aux différentes phases du processus de modélisation s'avère crucial afin d'évaluer l'utilisation des cartes de distribution prédites à des fins de conservation.La recherche présentée dans cette thèse a été essentiellement menée dans le cadre du projet Landspot, un projet soutenu par le Fond National Suisse pour la Recherche. L'objectif principal de ce projet était d'évaluer la contribution d'unités 'd'habitat' pré-modélisées pour modéliser la répartition des espèces animales, notamment de papillons, à travers la Suisse. Tout en poursuivant cet objectif, différents aspects touchant à la qualité des données, au plan d'échantillonnage et au processus de modélisation sont abordés et améliorés, et leurs implications pour la conservation des espèces discutées. Les principaux 'habitats' considérés dans cette thèse sont des communautés de prairie et de forêt d'origine naturelle et anthropique telles que définies dans la typologie des habitats de Suisse. Ces communautés sont principalement définies au niveau phytosociologique de l'alliance. Pour l'instant aucune carte de la distribution de ces communautés n'est disponible à l'échelle nationale et à résolution fine. Dans un premier temps, il a donc été nécessaire de créer des modèles de distribution de ces communautés à travers la Suisse et par conséquent de recueillir les données nécessaires. Afin d'atteindre ce premier objectif, plusieurs nouveaux développements ont été nécessaires, tels que la définition de modèles experts, la classification du territoire suisse en domaines environnementaux, la conception d'un échantillonnage environnementalement stratifié des unités de végétation cibles dans toute la Suisse, la création d'une base de données intégrant un système d'aide à la décision pour la classification des relevés, et le « downscaling » des données de couverture du sol de 100 m à 25 m de résolution. Les principales contributions de cette thèse à la discipline de la modélisation de la distribution d'espèces (SDM) sont rassemblées dans quatre articles scientifiques. Dans le premier article, publié dans le Journal of Biogeography, différentes questions liées au processus de modélisation sont étudiées en utilisant les données de l'inventaire forestier de l'Etat de Vaud. Tout d'abord sont évalués les effets de cinq méthodes de sélection pas-à-pas sur la performance, la stabilité et la parcimonie des modèles. Dans le même article sont également évalués: l'effet de la pondération des absences afin d'assurer une prévalence de 0.5 lors de la calibration du modèle; l'effet de limiter les absences au-delà de l'enveloppe définie par les présences; quatre méthodes différentes pour l'intégration de l'autocorrélation spatiale; et enfin, l'effet de l'intégration d'interactions entre facteurs. Les résultats présentés dans cet article ont permis d'améliorer l'outil GRASP qui intègre désonnais de nouvelles méthodes de sélection et la possibilité de traiter les interactions entre variables explicatives, ainsi que l'autocorrélation spatiale. La contribution de différentes sources de données issues de la télédétection a également été évaluée. Le deuxième article (en voie de soumission) explore les effets combinés de la taille de l'échantillon et de la post-stratification sur le la précision des modèles. Les données utilisées ici sont celles concernant la répartition des prairies de Suisse recueillies dans le cadre du projet Landspot et complétées par d'autres sources. Pour la stratification des données, différents cadres spatiaux ont été comparés. En particulier, la stratification environnementale par les domaines environnementaux de Suisse a été comparée à la stratification géographique par les régions biogéographiques ou par les cantons. Le troisième article (en voie de soumission) évalue la contribution de communautés végétales pré-modélisées à la modélisation de la faune. C'est une approche en deux étapes qui combine les disciplines de l'écologie des communautés et de l'écologie spatiale en intégrant leurs concepts de 'habitat' respectifs. Les communautés végétales sont modélisées d'abord, puis ces unités de 'habitat' sont utilisées pour modéliser les espèces animales. Une étude de cas est présentée avec des communautés prairiales et des espèces de papillons. Différentes façons d'intégrer l'information sur la végétation dans les modèles de répartition des papillons sont évaluées. Enfin, un clin d'oeil aux changements climatiques dans le dernier article, publié dans Ecological Modelling. Cet article propose un cadre conceptuel pour l'analyse des changements dans la distribution des espèces qui comprend notamment un catalogue des différentes formes possibles de changement le long d'un gradient d'élévation ou autre gradient environnemental, et une méthode quantitative améliorée pour identifier et décrire ces déplacements. Cette méthodologie a été développée en utilisant des données issues du monitoring des oiseaux nicheurs répandus et l'article présente les résultats concernant les déplacements observés dans la distribution altitudinale des oiseaux nicheurs en Suisse.L'objectif général de cette thèse est d'améliorer les modèles de distribution des espèces en tant que source d'information possible pour les différents outils de conservation (par exemple, listes rouges, réseaux écologiques, évaluation des risques de propagation d'espèces envahissantes, évaluation de la vulnérabilité des espèces dans le contexte de changement climatique). Bien que ces questions de conservation ne soient pas directement testées dans cette thèse, l'importance des améliorations proposées pour la modélisation de la distribution des espèces est discutée à la fin de ce travail dans le contexte de la sélection de réseaux de réserves.
Resumo:
Details of the timescales and processes involved in each of the recruitment/selection methods
Resumo:
A method to estimate an extreme quantile that requires no distributional assumptions is presented. The approach is based on transformed kernel estimation of the cumulative distribution function (cdf). The proposed method consists of a double transformation kernel estimation. We derive optimal bandwidth selection methods that have a direct expression for the smoothing parameter. The bandwidth can accommodate to the given quantile level. The procedure is useful for large data sets and improves quantile estimation compared to other methods in heavy tailed distributions. Implementation is straightforward and R programs are available.