880 resultados para Feature selection process


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Hypertrophic cardiomyopathy (HCM) is a cardiovascular disease where the heart muscle is partially thickened and blood flow is - potentially fatally - obstructed. It is one of the leading causes of sudden cardiac death in young people. Electrocardiography (ECG) and Echocardiography (Echo) are the standard tests for identifying HCM and other cardiac abnormalities. The American Heart Association has recommended using a pre-participation questionnaire for young athletes instead of ECG or Echo tests due to considerations of cost and time involved in interpreting the results of these tests by an expert cardiologist. Initially we set out to develop a classifier for automated prediction of young athletes’ heart conditions based on the answers to the questionnaire. Classification results and further in-depth analysis using computational and statistical methods indicated significant shortcomings of the questionnaire in predicting cardiac abnormalities. Automated methods for analyzing ECG signals can help reduce cost and save time in the pre-participation screening process by detecting HCM and other cardiac abnormalities. Therefore, the main goal of this dissertation work is to identify HCM through computational analysis of 12-lead ECG. ECG signals recorded on one or two leads have been analyzed in the past for classifying individual heartbeats into different types of arrhythmia as annotated primarily in the MIT-BIH database. In contrast, we classify complete sequences of 12-lead ECGs to assign patients into two groups: HCM vs. non-HCM. The challenges and issues we address include missing ECG waves in one or more leads and the dimensionality of a large feature-set. We address these by proposing imputation and feature-selection methods. We develop heartbeat-classifiers by employing Random Forests and Support Vector Machines, and propose a method to classify full 12-lead ECGs based on the proportion of heartbeats classified as HCM. The results from our experiments show that the classifiers developed using our methods perform well in identifying HCM. Thus the two contributions of this thesis are the utilization of computational and statistical methods for discovering shortcomings in a current screening procedure and the development of methods to identify HCM through computational analysis of 12-lead ECG signals.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The problem of selecting suppliers/partners is a crucial and important part in the process of decision making for companies that intend to perform competitively in their area of activity. The selection of supplier/partner is a time and resource-consuming task that involves data collection and a careful analysis of the factors that can positively or negatively influence the choice. Nevertheless it is a critical process that affects significantly the operational performance of each company. In this work, trough the literature review, there were identified five broad suppliers selection criteria: Quality, Financial, Synergies, Cost, and Production System. Within these criteria, it was also included five sub-criteria. Thereafter, a survey was elaborated and companies were contacted in order to answer which factors have more relevance in their decisions to choose the suppliers. Interpreted the results and processed the data, it was adopted a model of linear weighting to reflect the importance of each factor. The model has a hierarchical structure and can be applied with the Analytic Hierarchy Process (AHP) method or Simple Multi-Attribute Rating Technique (SMART). The result of the research undertaken by the authors is a reference model that represents a decision making support for the suppliers/partners selection process.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Part 17: Risk Analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results: We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion: ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper proposes a process for the classifi cation of new residential electricity customers. The current state of the art is extended by using a combination of smart metering and survey data and by using model-based feature selection for the classifi cation task. Firstly, the normalized representative consumption profi les of the population are derived through the clustering of data from households. Secondly, new customers are classifi ed using survey data and a limited amount of smart metering data. Thirdly, regression analysis and model-based feature selection results explain the importance of the variables and which are the drivers of diff erent consumption profi les, enabling the extraction of appropriate models. The results of a case study show that the use of survey data signi ficantly increases accuracy of the classifi cation task (up to 20%). Considering four consumption groups, more than half of the customers are correctly classifi ed with only one week of metering data, with more weeks the accuracy is signifi cantly improved. The use of model-based feature selection resulted in the use of a signifi cantly lower number of features allowing an easy interpretation of the derived models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

INTRODUÇÃO: Muitos estudos têm investigado a associação do polimorfismo VNTR (número variável de repetições em série) localizado na região promotora do gene da enzima monoamina oxidase A (MAOA) com alterações no comportamento humano e em diversos transtornos psiquiátricos. OBJETIVO: O objetivo do presente trabalho foi revisar a literatura sobre a participação desse polimorfismo funcional na modulação do comportamento humano para o desenvolvimento dos transtornos psiquiátricos. MÉTODO: A pesquisa foi realizada na literatura em inglês, de janeiro de 1998 a junho de 2009, disponível no Medline, Embase, Web of Science e na base de dados PsycInfo, utilizando os seguintes termos: "MAOA e comportamento humano" e "MAOA e psiquiatria". RESULTADOS: Foram encontrados 3.873 estudos. Desses, 109 foram selecionados e incluídos na revisão. Encontrou-se associação de alelos de baixa atividade do VNTR com transtorno de personalidade antissocial, transtorno de conduta, transtorno de déficit de atenção e hiperatividade, jogo patológico e dependência de substâncias. Alelos da alta atividade da MAOA foram associados a depressão, ansiedade, neuroticismo e anorexia nervosa. Não se encontrou associação entre polimorfismos da MAOA e esquizofrenia e transtorno bipolar. CONCLUSÃO: Os principais achados dão suporte ao papel do polimorfismo VNTR da região promotora do gene da MAOA em alguns transtornos psiquiátricos, apesar das divergências encontradas devidas às dificuldades metodológicas de estudos em genética. De modo geral, os estudos associam os alelos de baixa atividade da MAOA com comportamentos impulsivos e agressivos ("comportamentos hiperativos"), enquanto os alelos de alta atividade do gene são mais associados a "comportamentos hipoativos".

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Age-related changes in running kinematics have been reported in the literature using classical inferential statistics. However, this approach has been hampered by the increased number of biomechanical gait variables reported and subsequently the lack of differences presented in these studies. Data mining techniques have been applied in recent biomedical studies to solve this problem using a more general approach. In the present work, we re-analyzed lower extremity running kinematic data of 17 young and 17 elderly male runners using the Support Vector Machine (SVM) classification approach. In total, 31 kinematic variables were extracted to train the classification algorithm and test the generalized performance. The results revealed different accuracy rates across three different kernel methods adopted in the classifier, with the linear kernel performing the best. A subsequent forward feature selection algorithm demonstrated that with only six features, the linear kernel SVM achieved 100% classification performance rate, showing that these features provided powerful combined information to distinguish age groups. The results of the present work demonstrate potential in applying this approach to improve knowledge about the age-related differences in running gait biomechanics and encourages the use of the SVM in other clinical contexts. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The purpose of this study was to investigate the respiratory protective device selection process and to identify changes in this process when an exposure limit value is updated. Two previous studies conducted in mining industries in the metropolitan area of Sao Paulo were put through the respiratory protective device selection process. The protection factors of the equipment provided by the companies were compared with the required protection factors and with the FUNDACENTRO`s respiratory protection program. The results showed that until 2005, some companies were providing inadequate protection, and after the change in crystalline silica exposure limit value in 2006, all the analyzed companies were providing inadequate respirators. This study suggests that there is an opportunity to create a web portal, where the selection process can be done by the companies with updated information.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

David Hull's (1988c) model of science as a selection process suffers from a two-fold inability: (a) to ascertain when a lineage of theories has been established; i.e., when theories are descendants of older theories or are novelties, and what counts as a distinct lineage; and (b) to specify what the scientific analogue is of genotype and phenotype. This paper seeks to clarify these issues and to propose an abstract model of theories analogous to particulate genetic structure, in order to reconstruct relationships of descent and identity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Superparamagnetic iron oxide nanoparticles (SPIONs) are applied in stem cell labeling because of their high magnetic susceptibility as compared with ordinary paramagnetic species, their low toxicity, and their ease of magnetic manipulation. The present work is the study of CD133(+) stem cell labeling by SPIONs coupled to a specific antibody (AC133), resulting in the antigenic labeling of the CD133+ stem cell, and a method was developed for the quantification of the SPION content per cell, necessary for molecular imaging optimization. Flow cytometry analysis established the efficiency of the selection process and helped determine that the CD133 cells selected by chromatographic affinity express the transmembrane glycoprotein CD133. The presence of antibodies coupled to the SPION, expressed in the cell membrane, was observed by transmission electron microscopy. Quantification of the SPION concentration in the marked cells using the ferromagnetic resonance technique resulted in a value of 1.70 x 10 (13) mol iron (9.5 pg) or 7.0 x 10 (6) nanoparticles per cell ( the measurement was carried out in a volume of 2 mu L containing about 6.16 x 10 5 pg iron, equivalent to 4.5 x 10 (11) SPIONs). (c) 2008 Elsevier Inc. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we propose a method based on association rule-mining to enhance the diagnosis of medical images (mammograms). It combines low-level features automatically extracted from images and high-level knowledge from specialists to search for patterns. Our method analyzes medical images and automatically generates suggestions of diagnoses employing mining of association rules. The suggestions of diagnosis are used to accelerate the image analysis performed by specialists as well as to provide them an alternative to work on. The proposed method uses two new algorithms, PreSAGe and HiCARe. The PreSAGe algorithm combines, in a single step, feature selection and discretization, and reduces the mining complexity. Experiments performed on PreSAGe show that this algorithm is highly suitable to perform feature selection and discretization in medical images. HiCARe is a new associative classifier. The HiCARe algorithm has an important property that makes it unique: it assigns multiple keywords per image to suggest a diagnosis with high values of accuracy. Our method was applied to real datasets, and the results show high sensitivity (up to 95%) and accuracy (up to 92%), allowing us to claim that the use of association rules is a powerful means to assist in the diagnosing task.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A pesquisa se caracteriza pela abordagem plurimetodológica do tipo qualitativa/quantitativa, e tem como objetivo investigar de que maneira se constroem as estratégias de conciliação entre a formação esportiva e escolar em atletas de elite que servem às seleções brasileiras masculinas de basquetebol Sub 17 e Sub 19. O estudo se organizou em três capítulos. Do tipo “estado do conhecimento”, o primeiro capítulo tem por objetivo mapear as produções acadêmicas que tratam da conciliação entre formação escolar e formação esportiva. Utiliza como fonte a base de dados Scielo para busca nacional e o Portal Periódicos Capes para busca internacional. Foram encontrados 17 artigos distribuídos em 13 periódicos. Os dados foram classificados/analisados por meio de indicadores bibliométricos, como distribuição anual, distribuição por revista, relação autoral e origem demográfica. Para análise também foram levados em consideração uma tese de doutorado, três dissertações de mestrado e três trabalhos apresentados em congresso, além de um número especial de periódico, não localizado nas bases escolhidas. Mostra que a preocupação com o tema surge na Europa e nos Estados Unidos, na década de 70, e que, no Brasil, essa questão passa a ser abordada nos anos 2000. Demonstra tentativas de conciliação entre as formações realizadas em países da Europa, Estados Unidos e Brasil, além da importância da família e do pertencimento de classes sociais na possibilidade de priorização a uma das formações envolvidas. O segundo capítulo, de natureza quali-quantitativa, investiga as estratégias utilizadas pelos atletas convocados em 2013 para as seleções brasileiras de basquetebol masculinas de base Sub 17 e Sub 19 anos, quanto às possíveis conciliações entre formação esportiva e escolar. Busca, ainda, compreender a influência das convocações para as seleções nacionais nos índices de escolaridade desses atletas de elite, como abandono, atraso e repetência escolar. A pesquisa mostra que esse grupo de atletas de elite apresenta médias de repetência, abandono e atraso escolar maiores que as médias nacionais. O terceiro capítulo analisa o entendimento desse grupo de jovens atletas em relação à formação escolar ou, ainda, se um possível desinteresse do grupo pelo modelo atual de escola se daria apenas pelo fato de serem esportistas de elite. Para isso, recorre às possibilidades de investigação oriundas da segunda metade do questionário utilizado como instrumento para adotar uma metodologia de livre associação de palavras direcionadas a partir de quatro palavras indutoras (estruturas semânticas), a saber: “treinar”, “estudar”, “ir a escola” e “competir”. Essa associação livre é usualmente utilizada como suporte teórico/metodológico em pesquisas que investigam representação social (ACOSTA, 2005). Ao dar visibilidade a essas questões nota-se que a posição desses atletas, em relação à escola, não difere das encontradas em outras pesquisas que tratam de jovens inseridos no ensino médio. A falta de significado do que se aprende na escola em relação ao que eles desejam desenvolver como atividade laboral, faz com que a escola seja entendida como monótona, mas, ao mesmo tempo, necessária, caso seus projetos de formação esportiva não aconteçam.