197 resultados para Permutation
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
This study retrospectively evaluated the spatial and temporal disease patterns associated with influenza-like illness (ILI), positive rapid influenza antigen detection tests (RIDT), and confirmed H1N1 S-OIV cases reported to the Cameron County Department of Health and Human Services between April 26 and May 13, 2009 using the space-time permutation scan statistic software SaTScan in conjunction with geographical information system (GIS) software ArcGIS 9.3. The rate and age-adjusted relative risk of each influenza measure was calculated and a cluster analysis was conducted to determine the geographic regions with statistically higher incidence of disease. A Poisson distribution model was developed to identify the effect that socioeconomic status, population density, and certain population attributes of a census block-group had on that area's frequency of S-OIV confirmed cases over the entire outbreak. Predominant among the spatiotemporal analyses of ILI, RIDT and S-OIV cases in Cameron County is the consistent pattern of a high concentration of cases along the southern border with Mexico. These findings in conjunction with the slight northward space-time shifts of ILI and RIDT cluster centers highlight the southern border as the primary site for public health interventions. Finally, the community-based multiple regression model revealed that three factors—percentage of the population under age 15, average household size, and the number of high school graduates over age 25—were significantly associated with laboratory-confirmed S-OIV in the Lower Rio Grande Valley. Together, these findings underscore the need for community-based surveillance, improve our understanding of the distribution of the burden of influenza within the community, and have implications for vaccination and community outreach initiatives.^
Resumo:
Radiomics is the high-throughput extraction and analysis of quantitative image features. For non-small cell lung cancer (NSCLC) patients, radiomics can be applied to standard of care computed tomography (CT) images to improve tumor diagnosis, staging, and response assessment. The first objective of this work was to show that CT image features extracted from pre-treatment NSCLC tumors could be used to predict tumor shrinkage in response to therapy. This is important since tumor shrinkage is an important cancer treatment endpoint that is correlated with probability of disease progression and overall survival. Accurate prediction of tumor shrinkage could also lead to individually customized treatment plans. To accomplish this objective, 64 stage NSCLC patients with similar treatments were all imaged using the same CT scanner and protocol. Quantitative image features were extracted and principal component regression with simulated annealing subset selection was used to predict shrinkage. Cross validation and permutation tests were used to validate the results. The optimal model gave a strong correlation between the observed and predicted shrinkages with . The second objective of this work was to identify sets of NSCLC CT image features that are reproducible, non-redundant, and informative across multiple machines. Feature sets with these qualities are needed for NSCLC radiomics models to be robust to machine variation and spurious correlation. To accomplish this objective, test-retest CT image pairs were obtained from 56 NSCLC patients imaged on three CT machines from two institutions. For each machine, quantitative image features with concordance correlation coefficient values greater than 0.90 were considered reproducible. Multi-machine reproducible feature sets were created by taking the intersection of individual machine reproducible feature sets. Redundant features were removed through hierarchical clustering. The findings showed that image feature reproducibility and redundancy depended on both the CT machine and the CT image type (average cine 4D-CT imaging vs. end-exhale cine 4D-CT imaging vs. helical inspiratory breath-hold 3D CT). For each image type, a set of cross-machine reproducible, non-redundant, and informative image features was identified. Compared to end-exhale 4D-CT and breath-hold 3D-CT, average 4D-CT derived image features showed superior multi-machine reproducibility and are the best candidates for clinical correlation.
Resumo:
Finding the degree-constrained minimum spanning tree (DCMST) of a graph is a widely studied NP-hard problem. One of its most important applications is network design. Here we deal with a new variant of the DCMST problem, which consists of finding not only the degree- but also the role-constrained minimum spanning tree (DRCMST), i.e., we add constraints to restrict the role of the nodes in the tree to root, intermediate or leaf node. Furthermore, we do not limit the number of root nodes to one, thereby, generally, building a forest of DRCMSTs. The modeling of network design problems can benefit from the possibility of generating more than one tree and determining the role of the nodes in the network. We propose a novel permutation-based representation to encode these forests. In this new representation, one permutation simultaneously encodes all the trees to be built. We simulate a wide variety of DRCMST problems which we optimize using eight different evolutionary computation algorithms encoding individuals of the population using the proposed representation. The algorithms we use are: estimation of distribution algorithm, generational genetic algorithm, steady-state genetic algorithm, covariance matrix adaptation evolution strategy, differential evolution, elitist evolution strategy, non-elitist evolution strategy and particle swarm optimization. The best results are for the estimation of distribution algorithms and both types of genetic algorithms, although the genetic algorithms are significantly faster.
Resumo:
Encontrar el árbol de expansión mínimo con restricción de grado de un grafo (DCMST por sus siglas en inglés) es un problema NP-complejo ampliamente estudiado. Una de sus aplicaciones más importantes es el dise~no de redes. Aquí nosotros tratamos una nueva variante del problema DCMST, que consiste en encontrar el árbol de expansión mínimo no solo con restricciones de grado, sino también con restricciones de rol (DRCMST), es decir, a~nadimos restricciones para restringir el rol que los nodos tienen en el árbol. Estos roles pueden ser nodo raíz, nodo intermedio o nodo hoja. Por otra parte, no limitamos el número de nodos raíz a uno, por lo que, en general, construiremos bosques de DRCMSTs. El modelado en los problemas de dise~no de redes puede beneficiarse de la posibilidad de generar más de un árbol y determinar el rol de los nodos en la red. Proponemos una nueva representación basada en permutaciones para codificar los bosques de DRCMSTs. En esta nueva representación, una permutación codifica simultáneamente todos los árboles que se construirán. Nosotros simulamos una amplia variedad de problemas DRCMST que optimizamos utilizando ocho algoritmos de computación evolutiva diferentes que codifican los individuos de la población utilizando la representación propuesta. Los algoritmos que utilizamos son: algoritmo de estimación de distribuciones (EDA), algoritmo genético generacional (gGA), algoritmo genético de estado estacionario (ssGA), estrategia evolutiva basada en la matriz de covarianzas (CMAES), evolución diferencial (DE), estrategia evolutiva elitista (ElitistES), estrategia evolutiva no elitista (NonElitistES) y optimización por enjambre de partículas (PSO). Los mejores resultados fueron para el algoritmo de estimación de distribuciones utilizado y ambos tipos de algoritmos genéticos, aunque los algoritmos genéticos fueron significativamente más rápidos.---ABSTRACT---Finding the degree-constrained minimum spanning tree (DCMST) of a graph is a widely studied NP-hard problem. One of its most important applications is network design. Here we deal with a new variant of the DCMST problem, which consists of finding not only the degree- but also the role-constrained minimum spanning tree (DRCMST), i.e., we add constraints to restrict the role of the nodes in the tree to root, intermediate or leaf node. Furthermore, we do not limit the number of root nodes to one, thereby, generally, building a forest of DRCMSTs. The modeling of network design problems can benefit from the possibility of generating more than one tree and determining the role of the nodes in the network. We propose a novel permutation-based representation to encode the forest of DRCMSTs. In this new representation, one permutation simultaneously encodes all the trees to be built. We simulate a wide variety of DRCMST problems which we optimize using eight diferent evolutionary computation algorithms encoding individuals of the population using the proposed representation. The algorithms we use are: estimation of distribution algorithm (EDA), generational genetic algorithm (gGA), steady-state genetic algorithm (ssGA), covariance matrix adaptation evolution strategy (CMAES), diferential evolution (DE), elitist evolution strategy (ElististES), non-elitist evolution strategy (NonElististES) and particle swarm optimization (PSO). The best results are for the estimation of distribution algorithm and both types of genetic algorithms, although the genetic algorithms are significantly faster. iv
Resumo:
En la actualidad, cualquier compañía de telecomunicaciones que posea su propia red debe afrontar el problema del mantenimiento de la misma. Ofrecer un mínimo de calidad de servicio a sus clientes debe ser uno de sus objetivos principales. Esta calidad debe mantenerse aunque ocurran incidencias en la red. El presente trabajo pretende resolver el problema de priorizar el orden en que se restauran los cables, caminos y circuitos, dañados por una incidencia, dentro de una red troncal de transporte perteneciente a una operadora de telecomunicaciones. Tras un planteamiento detallado del problema y de todos los factores que influirán en la toma de decisión, en primer lugar se acomete una solución basada en Métodos Multicriterio Discretos, concretamente con el uso de ELECTRE I y AHP. A continuación se realiza una propuesta de solución basada en Redes Neuronales (con dos aproximaciones diferentes al problema). Por último se utiliza un método basado en la Optimización por Enjambre de Partículas (PSO), adaptado a un problema de permutación de enteros (ordenación), y con una forma particular de evaluar la mejor posición global del enjambre. Complementariamente se realiza una exposición de lo que es una empresa Operadora de telecomunicaciones, de sus departamentos y procesos internos, de los servicios que ofrece, de las redes sobre las que se soportan, y de los puntos clave a tener en cuenta en la implementación de sus sistemas informáticos de gestión integral. ABSTRACT: Nowadays, any telecommunications company that owns its own network must face the problem of maintaining it (service assurance). Provide a minimum quality of service to its customers must be one of its main objectives. This quality should be maintained although any incidents happen to occur in the network. This thesis aims to solve the problem of prioritizing the order in which the damaged cables, trails, path and circuits, within a backbone transport network, should be restored. This work begins with a detailed explanation about network maintenance issues and all the factors that influence decision-making problem. First of all, one solution based on Discrete Multicriteria methods is tried (ELECTRE I and AHP algorithms are used). Also, a solution based on neural networks (with two different approaches to the problem) is analyzed. Finally, this thesis proposes an algorithm based on Particle Swarm Optimization (PSO), adapted to a problem of integers permutation, and a particular way of evaluating the best overall position of the swarm method. In addition, there is included an exhibition about telecommunications companies, its departments, internal processes, services offered to clients, physical networks, and key points to consider when implementing its integrated management systems.
Resumo:
We study the Morton-Franks-Williams inequality for closures of simple braids (also known as positive permutation braids). This allows to prove, in a simple way, that the set of simple braids is an orthonormal basis for the inner product of the Hecke algebra of the braid group defined by Kálmán, who first obtained this result by using an interesting connection with Contact Topology. We also introduce a new technique to study the Homflypt polynomial for closures of positive braids, namely resolution trees whose leaves are simple braids. In terms of these simple resolution trees, we characterize closed positive braids for which the Morton-Franks-Williams inequality is strict. In particular, we determine explicitly the positive braid words on three strands whose closures have braid index three.
Resumo:
Geographic variation in cancer rates is thought to be the result of two major factors: environmental agents varying spatially and the attributes, genetic or cultural, of the populations inhabiting the areas studied. These attributes in turn result from the history of the populations in question. We had previously constructed an ethnohistorical database for Europe since 2200 B.C., permitting estimates of the ethnic composition of modern European populations. We were able to show that these estimates correlate with genetic distances. In this study, we wanted to see whether they also correlate with cancer rates. We employed two data sets of cancer mortalities from 42 types of cancer for the European Economic Community and for Central Europe. We subjected spatial differences in cancer mortalities, genetic, ethnohistorical, and geographic distances to matrix permutation tests to determine the magnitude and significance of their association. Our findings are that distances in cancer mortalities are correlated more with ethnohistorical distances than with genetic distances. Possibly the cancer rates may be affected by loci other than the genetic systems available to us, and/or by cultural factors mediated by the ethnohistorical differences. We find it remarkable that patterns of frequently ancient ethnic admixture are still reflected in modern cancer mortalities. Partial correlations with geography suggest that local environmental factors affect the mortalities as well.
Resumo:
Early detection is an effective means of reducing cancer mortality. Here, we describe a highly sensitive high-throughput screen that can identify panels of markers for the early detection of solid tumor cells disseminated in peripheral blood. The method is a two-step combination of differential display and high-sensitivity cDNA arrays. In a primary screen, differential display identified 170 candidate marker genes differentially expressed between breast tumor cells and normal breast epithelial cells. In a secondary screen, high-sensitivity arrays assessed expression levels of these genes in 48 blood samples, 22 from healthy volunteers and 26 from breast cancer patients. Cluster analysis identified a group of 12 genes that were elevated in the blood of cancer patients. Permutation analysis of individual genes defined five core genes (P ≤ 0.05, permax test). As a group, the 12 genes generally distinguished accurately between healthy volunteers and patients with breast cancer. Mean expression levels of the 12 genes were elevated in 77% (10 of 13) untreated invasive cancer patients, whereas cluster analysis correctly classified volunteers and patients (P = 0.0022, Fisher's exact test). Quantitative real-time PCR confirmed array results and indicated that the sensitivity of the assay (1:2 × 108 transcripts) was sufficient to detect disseminated solid tumor cells in blood. Expression-based blood assays developed with the screening approach described here have the potential to detect and classify solid tumor cells originating from virtually any primary site in the body.
Resumo:
An artificial DNA bending agent has been designed to assess helix flexibility over regions as small as a protein binding site. Bending was obtained by linking a pair of 15-base-long triple helix forming oligonucleotides (TFOs) by an adjustable polymeric linker. By design, DNA bending was introduced into the double helix within a 10-bp spacer region positioned between the two sites of 15-base triple helix formation. The existence of this bend has been confirmed by circular permutation and phase-sensitive electrophoresis, and the directionality of the bend has been determined as a compression of the minor helix groove. The magnitude of the resulting duplex bend was found to be dependent on the length of the polymeric linker in a fashion consistent with a simple geometric model. Data suggested that a 50-70 degrees bend was achieved by binding of the TFO chimera with the shortest linker span (18 rotatable bonds). Equilibrium analysis showed that, relative to a chimera which did not bend the duplex, the stability of the triple helix possessing a 50-70 degrees bend was reduced by less than 1 kcal/mol of that of the unbent complex. Based upon this similarity, it is proposed that duplex DNA may be much more flexible with respect to minor groove compression than previously assumed. It is shown that this unusual flexibility is consistent with recent quantitation of protein-induced minor groove bending.
Resumo:
We perform numerical simulations, including parallel tempering, a four-state Potts glass model with binary random quenched couplings using the JANUS application-oriented computer. We find and characterize a glassy transition, estimating the critical temperature and the value of the critical exponents. Nevertheless, the extrapolation to infinite volume is hampered by strong scaling corrections. We show that there is no ferromagnetic transition in a large temperature range around the glassy critical temperature. We also compare our results with those obtained recently on the “random permutation” Potts glass.
Resumo:
We introduce a general class of su(1|1) supersymmetric spin chains with long-range interactions which includes as particular cases the su(1|1) Inozemtsev (elliptic) and Haldane-Shastry chains, as well as the XX model. We show that this class of models can be fermionized with the help of the algebraic properties of the su(1|1) permutation operator and take advantage of this fact to analyze their quantum criticality when a chemical potential term is present in the Hamiltonian. We first study the low-energy excitations and the low-temperature behavior of the free energy, which coincides with that of a (1+1)-dimensional conformal field theory (CFT) with central charge c=1 when the chemical potential lies in the critical interval (0,E(π)), E(p) being the dispersion relation. We also analyze the von Neumann and Rényi ground state entanglement entropies, showing that they exhibit the logarithmic scaling with the size of the block of spins characteristic of a one-boson (1+1)-dimensional CFT. Our results thus show that the models under study are quantum critical when the chemical potential belongs to the critical interval, with central charge c=1. From the analysis of the fermion density at zero temperature, we also conclude that there is a quantum phase transition at both ends of the critical interval. This is further confirmed by the behavior of the fermion density at finite temperature, which is studied analytically (at low temperature), as well as numerically for the su(1|1) elliptic chain.
Resumo:
Subpixel methods increase the accuracy and efficiency of image detectors, processing units, and algorithms and provide very cost-effective systems for object tracking. A recently proposed method permits micropixel and submicropixel accuracies providing certain design constraints on the target are met. In this paper, we explore the use of Costas arrays - permutation matrices with ideal auto-ambiguity properties - for the design of such targets.
Resumo:
Optimized structure of the educational program consisting of a set of the interconnected educational objects is offered by means of problem solution of optimum partition of the acyclic weighed graph. The condition of acyclicity preservation for subgraphs is formulated and the quantitative assessment of decision options is executed. The original algorithm of search of quasioptimum partition using the genetic algorithm scheme with coding chromosomes by permutation is offered. Object-oriented realization of algorithm in language C++ is described and results of numerical experiments are presented.
Resumo:
Originally presented as the author's thesis, University of Illinois at Urbana-Champaign.