935 resultados para reverse logistic
Resumo:
Data mining, and in particular decision trees have been used in different fields: engineering, medicine, banking and finance, etc., to analyze a target variable through decision variables. The following article examines the use of the decision trees algorithm as a tool in territorial logistic planning. The decision tree built has estimated population density indexes for territorial units with similar logistics characteristics in a concise and practical way.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
We study dynamics of the bistable logistic map with delayed feedback, under the influence of white Gaussian noise and periodic modulation applied to the variable. This system may serve as a model to describe population dynamics under finite resources in noisy environment with seasonal fluctuations. While a very small amount of noise has no effect on the global structure of the coexisting attractors in phase space, an intermediate noise totally eliminates one of the attractors. Slow periodic modulation enhances the attractor annihilation.
Resumo:
Predicting failures in a distributed system based on previous events through logistic regression is a standard approach in literature. This technique is not reliable, though, in two situations: in the prediction of rare events, which do not appear in enough proportion for the algorithm to capture, and in environments where there are too many variables, as logistic regression tends to overfit on this situations; while manually selecting a subset of variables to create the model is error- prone. On this paper, we solve an industrial research case that presented this situation with a combination of elastic net logistic regression, a method that allows us to automatically select useful variables, a process of cross-validation on top of it and the application of a rare events prediction technique to reduce computation time. This process provides two layers of cross- validation that automatically obtain the optimal model complexity and the optimal mode l parameters values, while ensuring even rare events will be correctly predicted with a low amount of training instances. We tested this method against real industrial data, obtaining a total of 60 out of 80 possible models with a 90% average model accuracy.
Resumo:
Like all hyperthermophiles yet tested, the bacterium Thermotoga maritima contains a reverse gyrase. Here we show that it contains also a DNA gyrase. The genes top2A and top2B encoding the two subunits of a DNA gyrase-like enzyme have been cloned and sequenced. The Top2A (type II DNA topoisomerase A protein) is more similar to GyrA (DNA gyrase A protein) than to ParC [topoisomerase IV (Topo IV) C protein]. The difference is especially striking at the C-terminal domain, which differentiates DNA gyrases from Topo IV. DNA gyrase activity was detected in T. maritima and purified to homogeneity using a novobiocin-Sepharose column. This hyperhermophilic DNA gyrase has an optimal activity around 82–86°C. In contrast to plasmids from hyperthermophilic archaea, which are from relaxed to positively supercoiled, we found that the plasmid pRQ7 from Thermotoga sp. RQ7 is negatively supercoiled. pRQ7 became positively supercoiled after addition of novobiocin to cell cultures, indicating that its negative supercoiling is due to the DNA gyrase of the host strain. The findings concerning DNA gyrase and negative supercoiling in Thermotogales put into question the role of reverse gyrase in hyperthermophiles.
Resumo:
HIV-1 reverse transcriptase (RT) catalyzes the synthesis of DNA from DNA or RNA templates. During this process, it must transfer its primer from one template to another RNA or DNA template. Binary complexes made of RT and a primer/template bind an additional single-stranded RNA molecule of the same nucleotide sequence as that of the DNA or RNA template. The additional RNA strand leads to a 10-fold decrease of the off-rate constant, koff, of RT from a primer/DNA template. In a binary complex of RT and a primer/template, the primer can be cross-linked to both the p66 and p51 subunits. Depending on the location of the photoreactive group in the primer, the distribution of the cross-linked primers between subunits is dependent on the nature of the template and of the additional single-stranded molecule. Greater cross-linking of the primer to p51 occurs with DNA templates, whereas cross-linking to p66 predominates with RNA templates. Excess single-stranded DNA shifts the distribution of cross-linking from p66 to p51 with RNA templates, and excess single-stranded RNA shifts the cross-linking from p51 to p66 with DNA templates. RT thus uses two primer/template binding modes depending on the nature of the template.
Resumo:
Exposure to 3TC of HIV-1 mutant strains containing non-nucleoside reverse transcriptase inhibitor (NNRTI)-specific mutations in their reverse transcriptase (RT) easily selected for double-mutant viruses that had acquired the characteristic 184-Ile mutation in their RT in addition to the NNRTI-specific mutations. Conversely, exposure of 3TC-resistant 184-Val mutant HIV-1 strains to nine different NNRTIs resulted in the rapid emergence of NNRTI-resistant virus strains at a time that was not more delayed than when wild-type HIV-1(IIIB) was exposed to the same compounds. The RTs of these resistant virus strains had acquired the NNRTI-characteristic mutations in addition to the preexisting 184-Val mutation. Surprisingly, when the 184-Ile mutant HIV-1 was exposed to a variety of NNRTIs, the 188-His mutation invariably occurred concomitantly with the 184-Ile mutation in the HIV-1 RT. Breakthrough of this double-mutant virus was markedly accelerated as compared with the mutant virus selected from the wild-type or 184-Val mutant HIV-1 strain. The double (184-Ile + 188-His) mutant virus showed a much more profound resistance profile against the NNRTIs than the 188-His HIV-1 mutant. In contrast with the sequential chemotherapy, concomitant combination treatment of HIV-1-infected cells with 3TC and a variety of NNRTIs resulted in a dramatic delay of virus breakthrough and resistance development.
Resumo:
A critical requirement for integration of retroviruses, other than HIV and possibly related lentiviruses, is the breakdown of the nuclear envelope during mitosis. Nuclear envelope breakdown occurs during mitotic M-phase, the envelope reforming immediately after cell division, thereby permitting the translocation of the retroviral preintegration complex into the nucleus and enabling integration to proceed. In the oocyte, during metaphase II (MII) of the second meiosis, the nuclear envelope is also absent and the oocyte remains in MII arrest for a much longer period of time compared with M-phase in a somatic cell. Pseudotyped replication-defective retroviral vector was injected into the perivitelline space of bovine oocytes during MII. We show that reverse-transcribed gene transfer can take place in an oocyte in MII arrest of meiosis, leading to production of offspring, the majority of which are transgenic. We discuss the implications of this mechanism both as a means of production of transgenic livestock and as a model for naturally occurring recursive transgenesis.
Resumo:
HIV-1 replication is inhibited by the incorporation of chain-terminating nucleotides at the 3′ end of the growing DNA chain. Here we show a nucleotide-dependent reaction catalyzed by HIV-1 reverse transcriptase that can efficiently remove the chain-terminating residue, yielding an extendible primer terminus. Radioactively labeled 3′-terminal residue from the primer can be transferred into a product that is resistant to calf intestinal alkaline phosphatase and sensitive to cleavage by snake venom phosphodiesterase. The products formed from different nucleotide substrates have unique electrophoretic migrations and have been identified as dinucleoside tri- or tetraphosphates. The reaction is inhibited by dNTPs that are complementary to the next position on the template (Ki ≈ 5 μM), suggesting competition between dinucleoside polyphosphate synthesis and DNA polymerization. Dinucleoside polyphosphate synthesis was inhibited by an HIV-1 specific non-nucleoside inhibitor and was absent in mutant HIV-1 reverse transcriptase deficient in polymerase activity, indicating that this activity requires a functional polymerase active site. We suggest that dinucleoside polyphosphate synthesis occurs by transfer of the 3′ nucleotide from the primer to the pyrophosphate moiety in the nucleoside di- or triphosphate substrate through a mechanism analogous to pyrophosphorolysis. Unlike pyrophosphorolysis, however, the reaction is nucleotide-dependent, is resistant to pyrophosphatase, and produces dinucleoside polyphosphates. Because it occurs at physiological concentrations of ribonucleoside triphosphates, this reaction may determine the in vivo activity of many nucleoside antiretroviral drugs.
Resumo:
We previously demonstrated that hybrid retrotransposons composed of the yeast Ty1 element and the reverse transcriptase (RT) of HIV-1 are active in the yeast Saccharomyces cerevisiae. The RT activity of these hybrid Ty1/HIV-1 (his3AI/AIDS RT; HART) elements can be monitored by using a simple genetic assay. HART element reverse transcription depends on both the polymerase and RNase H domains of HIV-1 RT. Here we demonstrate that the HART assay is sensitive to inhibitors of HIV-1 RT. (−)-(S)-8-Chloro-4,5,6,7-tetrahydro-5-methyl-6-(3-methyl-2-butenyl)imidazo[4,5,1-jk][1,4]-benzodiazepin-2(1H)-thione monohydrochloride (8 Cl-TIBO), a well characterized non-nucleoside RT inhibitor (NNRTI) of HIV-1 RT, blocks propagation of HART elements. HART elements that express NNRTI-resistant RT variants of HIV-1 are insensitive to 8 Cl-TIBO, demonstrating the specificity of inhibition in this assay. HART elements carrying NNRTI-resistant variants of HIV-1 RT can be used to identify compounds that are active against drug-resistant viruses.
Resumo:
Mutations in the human presenilin genes PS1 and PS2 cause early-onset Alzheimer’s disease. Studies in Caenorhabditis elegans and in mice indicate that one function of presenilin genes is to facilitate Notch-pathway signaling. Notably, mutations in the C. elegans presenilin gene sel-12 reduce signaling through an activated version of the Notch receptor LIN-12. To investigate the function of a second C. elegans presenilin gene hop-1 and to examine possible genetic interactions between hop-1 and sel-12, we used a reverse genetic strategy to isolate deletion alleles of both loci. Animals bearing both hop-1 and sel-12 deletions displayed new phenotypes not observed in animals bearing either single deletion. These new phenotypes—germ-line proliferation defects, maternal-effect embryonic lethality, and somatic gonad defects—resemble those resulting from a reduction in signaling through the C. elegans Notch receptors GLP-1 and LIN-12. Thus SEL-12 and HOP-1 appear to function redundantly in promoting Notch-pathway signaling. Phenotypic analyses of hop-1 and sel-12 single and double mutant animals suggest that sel-12 provides more presenilin function than does hop-1.
Resumo:
The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.
Resumo:
The (β/α)8 barrel is the most commonly occurring fold among protein catalysts. To lay a groundwork for engineering novel barrel proteins, we investigated the amino acid sequence restrictions at 182 structural positions of the prototypical (β/α)8 barrel enzyme triosephosphate isomerase. Using combinatorial mutagenesis and functional selection, we find that turn sequences, α-helix capping and stop motifs, and residues that pack the interface between β-strands and α-helices are highly mutable. Conversely, any mutation of residues in the central core of the β-barrel, β-strand stop motifs, and a single buried salt bridge between amino acids R189 and D227 substantially reduces catalytic activity. Four positions are effectively immutable: conservative single substitutions at these four positions prevent the mutant protein from complementing a triosephosphate isomerase knockout in Escherichia coli. At 142 of the 182 positions, mutation to at least one amino acid of a seven-letter amino acid alphabet produces a triosephosphate isomerase with wild-type activity. Consequently, it seems likely that (β/α)8 barrel structures can be encoded with a subset of the 20 amino acids. Such simplification would greatly decrease the computational burden of (β/α)8 barrel design.
Resumo:
Recent evidence suggests that the Myc and Mad1 proteins are implicated in the regulation of the gene encoding the human telomerase reverse transcriptase (hTERT), the catalytic subunit of telomerase. We have analyzed the in vivo interaction between endogenous c-Myc and Mad1 proteins and the hTERT promoter in HL60 cells with the use of the chromatin immunoprecipitation assay. The E-boxes at the hTERT proximal promoter were occupied in vivo by c-Myc in exponentially proliferating HL60 cells but not in cells induced to differentiate by DMSO. In contrast, Mad1 protein was induced and bound to the hTERT promoter in differentiated HL60 cells. Concomitantly, the acetylation of the histones at the promoter was significantly reduced. These data suggest that the reciprocal E-box occupancy by c-Myc and Mad1 is responsible for activation and repression of the hTERT gene in proliferating and differentiated HL60 cells, respectively. Furthermore, the histone deacetylase inhibitor trichostatin A inhibited deacetylation of histones at the hTERT promoter and attenuated the repression of hTERT transcription during HL60 cell differentiation. In addition, trichostatin A treatment activated hTERT transcription in resting human lymphocytes and fibroblasts. Taken together, these results indicate that acetylation/deacetylation of histones is operative in the regulation of hTERT expression.