981 resultados para VARIABLE SELECTION


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. Methods: A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60- mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables. Results: After an exhaustive process of pre-processing to ensure data quality–lost values imputation, probes quality, data smoothing and intraclass variability filtering–the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples. Conclusions: We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O biodiesel tem sido amplamente utilizado como uma fonte de energia renovável, que contribui para a diminuição de demanda por diesel mineral. Portanto, existem várias propriedades que devem ser monitoradas, a fim de produzir e distribuir biodiesel com a qualidade exigida. Neste trabalho, as propriedades físicas do biodiesel, tais como massa específica, índice de refração e ponto de entupimento de filtro a frio foram medidas e associadas a espectrometria no infravermelho próximo (NIR) e espectrometria no infravermelho médio (Mid-IR) utilizando ferramentas quimiométricas. Os métodos de regressão por mínimos quadrados parciais (PLS), regressão de mínimos quadrados parciais por intervalos (iPLS), e regressão por máquinas de vetor de suporte (SVM) com seleção de variáveis por Algoritmo Genético (GA) foram utilizadas para modelar as propriedades mencionadas. As amostras de biodiesel foram sintetizadas a partir de diferentes fontes, tais como canola, girassol, milho e soja. Amostras adicionais de biodiesel foram adquiridas de um fornecedor da região sul do Brasil. Em primeiro lugar, o pré-processamento de correção de linha de base foi usado para normalizar os dados espectrais de NIR, seguidos de outros tipos de pré-processamentos que foram aplicados, tais como centralização dos dados na média, 1 derivada e variação de padrão normal. O melhor resultado para a previsão do ponto de entupimento de filtro a frio foi utilizando os espectros de Mid-IR e o método de regressão GA-SVM, com alto coeficiente de determinação da previsão, R2Pred=0,96 e baixo valor da Raiz Quadrada do Erro Médio Quadrático da previsão, RMSEP (C)= 0,6. Para o modelo de previsão da massa específica, o melhor resultado foi obtido utilizando os espectros de Mid-IR e regressão por PLS, com R2Pred=0,98 e RMSEP (g/cm3)= 0,0002. Quanto ao modelo de previsão para o índice de refração, o melhor resultado foi obtido utilizando os espectros de Mid-IR e regressão por PLS, com excelente R2Pred=0,98 e RMSEP= 0,0001. Para esses conjuntos de dados, o PLS e o SVM demonstraram sua robustez, apresentando-se como ferramentas úteis para a previsão das propriedades do biodiesel estudadas

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In most parts of China, mosquitoes have been subjected to organophosphate (OP) insecticide treatments since the mid-1960s, and resistance gene monitoring in the Culex pipiens complex (Diptera: Culicidae) started in only a few locations from the end of the 1980s. Many resistant alleles at the Ester locus have been found in field populations, including those commonly found around the world (Ester(B1) and Ester(2)), and those endemic to China (Ester(B6), Ester(B7), Ester(8), and Ester(9)). This situation is atypical, and may represent a complex situation for the evolution of insecticide resistance genes in China. To increase our understanding of the Chinese situation and our ability to manage resistance in the C. pipiens complex, a large study was performed. Twenty field populations were sampled from Beijing to Guangzhou. Bioassays with five insecticides (dichlorvos, parathion, chlorpyrifos, 2-sec-butylphenyl methyl carbamate, and propoxur) disclosed resistance levels variable according to the geographic origin, and up to 85-fold for dichlorvos. Six overproduced esterases were identified, including two that have not been previously described. Most of them were found in all samples, although at variable frequencies, suggesting variable selection or a transient situation, e.g., each one was recently restricted to a particular geographic area. The results are discussed in the context of recent alterations to insecticide campaigns, and of the evolution of resistance genes in Chinese C. pipiens populations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The extended gravitational index G(Q) and quantum-chemical descriptors were calculated for the relationship analysis of aminoquinolines. An evolutionary algorithm was described for variable selection and building QSAR models. And the quasi-newton neural networks were employed with better results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We discuss a general approach to dynamic sparsity modeling in multivariate time series analysis. Time-varying parameters are linked to latent processes that are thresholded to induce zero values adaptively, providing natural mechanisms for dynamic variable inclusion/selection. We discuss Bayesian model specification, analysis and prediction in dynamic regressions, time-varying vector autoregressions, and multivariate volatility models using latent thresholding. Application to a topical macroeconomic time series problem illustrates some of the benefits of the approach in terms of statistical and economic interpretations as well as improved predictions. Supplementary materials for this article are available online. © 2013 Copyright Taylor and Francis Group, LLC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

During mitotic cell cycles, DNA experiences many types of endogenous and exogenous damaging agents that could potentially cause double strand breaks (DSB). In S. cerevisiae, DSBs are primarily repaired by mitotic recombination and as a result, could lead to loss-of-heterozygosity (LOH). Genetic recombination can happen in both meiosis and mitosis. While genome-wide distribution of meiotic recombination events has been intensively studied, mitotic recombination events have not been mapped unbiasedly throughout the genome until recently. Methods for selecting mitotic crossovers and mapping the positions of crossovers have recently been developed in our lab. Our current approach uses a diploid yeast strain that is heterozygous for about 55,000 SNPs, and employs SNP-Microarrays to map LOH events throughout the genome. These methods allow us to examine selected crossovers and unselected mitotic recombination events (crossover, noncrossover and BIR) at about 1 kb resolution across the genome. Using this method, we generated maps of spontaneous and UV-induced LOH events. In this study, we explore machine learning and variable selection techniques to build a predictive model for where the LOH events occur in the genome.

Randomly from the yeast genome, we simulated control tracts resembling the LOH tracts in terms of tract lengths and locations with respect to single-nucleotide-polymorphism positions. We then extracted roughly 1,100 features such as base compositions, histone modifications, presence of tandem repeats etc. and train classifiers to distinguish control tracts and LOH tracts. We found interesting features of good predictive values. We also found that with the current repertoire of features, the prediction is generally better for spontaneous LOH events than UV-induced LOH events.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MOTIVATION: Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensional predictors present serious challenges. This article develops a computationally feasible method based on boosting and stability selection. Specifically, we modified the component-wise gradient boosting to improve the computational feasibility and introduced random permutation in stability selection for controlling false discoveries. RESULTS: We have proposed a high-dimensional variable selection method by incorporating stability selection to control false discovery. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries. The proposed method is applied to study the associations of 2339 common single-nucleotide polymorphisms (SNPs) with overall survival among cutaneous melanoma (CM) patients. The results have confirmed that BRCA2 pathway SNPs are likely to be associated with overall survival, as reported by previous literature. Moreover, we have identified several new Fanconi anemia (FA) pathway SNPs that are likely to modulate survival of CM patients. AVAILABILITY AND IMPLEMENTATION: The related source code and documents are freely available at https://sites.google.com/site/bestumich/issues. CONTACT: yili@umich.edu.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Semiconductor fabrication involves several sequential processing steps with the result that critical production variables are often affected by a superposition of affects over multiple steps. In this paper a Virtual Metrology (VM) system for early stage measurement of such variables is presented; the VM system seeks to express the contribution to the output variability that is due to a defined observable part of the production line. The outputs of the processed system may be used for process monitoring and control purposes. A second contribution of this work is the introduction of Elastic Nets, a regularization and variable selection technique for the modelling of highly-correlated datasets, as a technique for the development of VM models. Elastic Nets and the proposed VM system are illustrated using real data from a multi-stage etch process used in the fabrication of disk drive read/write heads. © 2013 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Chapter 1 introduces the scope of the work by identifying the clinically relevant prenatal disorders and presently available diagnostic methods. The methodology followed in this work is presented, along with a brief account of the principles of the analytical and statistical tools employed. A thorough description of the state of the art of metabolomics in prenatal research concludes the chapter, highlighting the merit of this novel strategy to identify robust disease biomarkers. The scarce use of maternal and newborn urine in previous reports enlightens the relevance of this work. Chapter 2 presents a description of all the experimental details involved in the work performed, comprising sampling, sample collection and preparation issues, data acquisition protocols and data analysis procedures. The proton Nuclear Magnetic Resonance (NMR) characterization of maternal urine composition in healthy pregnancies is presented in Chapter 3. The urinary metabolic profile characteristic of each pregnancy trimester was defined and a 21-metabolite signature found descriptive of the metabolic adaptations occurring throughout pregnancy. 8 metabolites were found, for the first time to our knowledge, to vary in connection to pregnancy, while known metabolic effects were confirmed. This chapter includes a study of the effects of non-fasting (used in this work) as a possible confounder. Chapter 4 describes the metabolomic study of 2nd trimester maternal urine for the diagnosis of fetal disorders and prediction of later-developing complications. This was achieved by applying a novel variable selection method developed in the context of this work. It was found that fetal malformations (FM) (and, specifically those of the central nervous system, CNS) and chromosomal disorders (CD) (and, specifically, trisomy 21, T21) are accompanied by changes in energy, amino acids, lipids and nucleotides metabolic pathways, with CD causing a further deregulation in sugars metabolism, urea cycle and/or creatinine biosynthesis. Multivariate analysis models´ validation revealed classification rates (CR) of 84% for FM (87%, CNS) and 85% for CD (94%, T21). For later-diagnosed preterm delivery (PTD), preeclampsia (PE) and intrauterine growth restriction (IUGR), it is found that urinary NMR profiles have early predictive value, with CRs ranging from 84% for PTD (11-20 gestational weeks, g.w., prior to diagnosis), 94% for PE (18-24 g.w. pre-diagnosis) and 94% for IUGR (2-22 g.w. pre-diagnosis). This chapter includes results obtained for an ultraperformance liquid chromatography-mass spectrometry (UPLC-MS) study of pre-PTD samples and correlation with NMR data. One possible marker was detected, although its identification was not possible. Chapter 5 relates to the NMR metabolomic study of gestational diabetes mellitus (GDM), establishing a potentially predictive urinary metabolic profile for GDM, 2-21 g.w. prior to diagnosis (CR 83%). Furthermore, the NMR spectrum was shown to carry information on individual phenotypes, able to predict future insulin treatment requirement (CR 94%). Chapter 6 describes results that demonstrate the impact of delivery mode (CR 88%) and gender (CR 76%) on newborn urinary profile. It was also found that newborn prematurity, respiratory depression, large for gestational age growth and malformations induce relevant metabolic perturbations (CR 82-92%), as well as maternal conditions, namely GDM (CR 82%) and maternal psychiatric disorders (CR 91%). Finally, the main conclusions of this thesis are presented in Chapter 7, highlighting the value of maternal or newborn urine metabolomics for pregnancy monitoring and disease prediction, towards the development of new early and non-invasive diagnostic methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les simulations ont été implémentées avec le programme Java.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El presente proyecto tiene como objeto identificar cuáles son los conceptos de salud, enfermedad, epidemiología y riesgo aplicables a las empresas del sector de extracción de petróleo y gas natural en Colombia. Dado, el bajo nivel de predicción de los análisis financieros tradicionales y su insuficiencia, en términos de inversión y toma de decisiones a largo plazo, además de no considerar variables como el riesgo y las expectativas de futuro, surge la necesidad de abordar diferentes perspectivas y modelos integradores. Esta apreciación es pertinente dentro del sector de extracción de petróleo y gas natural, debido a la creciente inversión extranjera que ha reportado, US$2.862 millones en el 2010, cifra mayor a diez veces su valor en el año 2003. Así pues, se podrían desarrollar modelos multi-dimensional, con base en los conceptos de salud financiera, epidemiológicos y estadísticos. El termino de salud y su adopción en el sector empresarial, resulta útil y mantiene una coherencia conceptual, evidenciando una presencia de diferentes subsistemas o factores interactuantes e interconectados. Es necesario mencionar también, que un modelo multidimensional (multi-stage) debe tener en cuenta el riesgo y el análisis epidemiológico ha demostrado ser útil al momento de determinarlo e integrarlo en el sistema junto a otros conceptos, como la razón de riesgo y riesgo relativo. Esto se analizará mediante un estudio teórico-conceptual, que complementa un estudio previo, para contribuir al proyecto de finanzas corporativas de la línea de investigación en Gerencia.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The lasso procedure is an estimator-shrinkage and variable selection method. This paper shows that there always exists an interval of tuning parameter values such that the corresponding mean squared prediction error for the lasso estimator is smaller than for the ordinary least squares estimator. For an estimator satisfying some condition such as unbiasedness, the paper defines the corresponding generalized lasso estimator. Its mean squared prediction error is shown to be smaller than that of the estimator for values of the tuning parameter in some interval. This implies that all unbiased estimators are not admissible. Simulation results for five models support the theoretical results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

1. Studies of landscape change are seldom conducted at scales commensurate with the processes they purport to investigate. Landscape change is a landscape-level process, yet most studies focus on patches. Even when landscape context is considered, inference remains at the patch-level. The unit of investigation must be extended beyond individual patches to whole mosaics in order to advance understanding of faunal responses to landscape change.

2. In this study, we aggregated data from multiple sites per landscape such that both the response and explanatory variables characterized 'whole' landscapes, allowing for landscape-level inference about factors influencing species' incidence.

3. We used hierarchical partitioning and Bayesian variable selection methods to develop species-specific models that examined the influence of four categories of landscape properties – habitat extent, habitat configuration, landscape composition and geographical location – on the incidence of 58 species of woodland-dependent birds in 24 agricultural landscapes (each 100 km2) in south-eastern Australia.

4. There was strong evidence for a positive effect of habitat extent for 27 species. Thirty species were related to at least one of the four landscape composition variables, and geographical location was important for 19 species. Habitat configuration was influential for 13 species and where important, the impacts of fragmentation per se were detrimental.

5. Variation among species in the influential landscape variables indicates that different species respond to different sets of cues in land mosaics. Thus, although all species were grouped a priori as 'woodland-dependent', expectations based on general ecological characteristics may prove unreliable.

6. Synthesis and applications. These results underscore the value of moving beyond the fragmentation paradigm focused on the spatial pattern of habitat vs. non-habitat, to a greater appreciation of the composition and heterogeneity of land mosaics. Landscape-level inference will enable improved conservation outcomes by recognizing the influence of landscape properties on biota and devising strategies at this scale to complement patch-based management. We provide strong empirical evidence that biodiversity management in agricultural landscapes must focus on habitat extent. Complementary management of other landscape attributes, such as habitat aggregation and intensity of agricultural land-use, will also enhance the value of agricultural landscapes for woodland birds.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper applies the generalised linear model for modelling geographical variation to esophageal cancer incidence data in the Caspian region of Iran. The data have a complex and hierarchical structure that makes them suitable for hierarchical analysis using Bayesian techniques, but with care required to deal with problems arising from counts of events observed in small geographical areas when overdispersion and residual spatial autocorrelation are present. These considerations lead to nine regression models derived from using three probability distributions for count data: Poisson, generalised Poisson and negative binomial, and three different autocorrelation structures. We employ the framework of Bayesian variable selection and a Gibbs sampling based technique to identify significant cancer risk factors. The framework deals with situations where the number of possible models based on different combinations of candidate explanatory variables is large enough such that calculation of posterior probabilities for all models is difficult or infeasible. The evidence from applying the modelling methodology suggests that modelling strategies based on the use of generalised Poisson and negative binomial with spatial autocorrelation work well and provide a robust basis for inference.