481 resultados para Hinkley, Sherman
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
Gaines studied History and Education at Lincoln and was frequently seen in Memorial Hall chatting with his mentors in the History Department, Drs. W. Sherman Savage and Lorenzo Greene about his future after graduation.
Resumo:
The Founding of Lincoln Institute -- Period of Development -- The Administrations of Smith, Mitchell, and Clayton -- The Grand Old Man -- The Period from 1989-1898 -- The Period of the Presidents -- Lincoln Institute at the Turn of the century -- The Period of Opposition -- From Institute to University -- A New President with a new Program -- Growth Despite Opposition -- Lincoln University since 1931.
Resumo:
Transgenic expression of the influenza virus hemagglutinin (HA) in the pancreatic islet β cells of InsHA mice leads to peripheral tolerance of HA-specific T cells. To examine the onset of tolerance, InsHA mice were immunized with influenza virus A/PR/8 at different ages, and the presence of nontolerant T cells was determined by the induction of autoimmune diabetes. The data revealed a neonatal period wherein T cells were not tolerant and influenza virus infection led to HA-specific β cell destruction and autoimmune diabetes. The ability to induce autoimmunity gradually waned, such that adult mice were profoundly tolerant to viral HA and were protected from diabetes. Because cross-presentation of islet antigens by professional antigen-presenting cells had been reported to induce peripheral tolerance, the temporal relationship between tolerance induction and activation of HA-specific T cells in the lymph nodes draining the pancreas was examined. In tolerant adult mice, but not in 1-week-old neonates, activation and proliferation of HA-specific CD8+ T cells occurred in the pancreatic lymph nodes. Thus, lack of tolerance in the perinatal period correlated with lack of activation of antigen-specific CD8+ T cells. This work provides evidence for the developmental regulation of peripheral tolerance induction.
Resumo:
Although the CLN3 gene for Batten disease, the most common inherited neurovisceral storage disease of childhood, was identified in 1995, the function of the corresponding protein still remains elusive. We previously cloned the Saccharomyces cerevisiae homologue to the human CLN3 gene, designated BTN1, which is not essential and whose product is 39% identical and 59% similar to Cln3p. We report that btn1-Δ deletion yeast strains are more resistant to d-(−)-threo-2-amino-1-[p-nitrophenyl]-1,3-propanediol (denoted ANP), a phenotype that is complemented in yeast by the human CLN3 gene. Furthermore, the severity of Batten disease in humans and the degree of ANP resistance in yeast are related when the equivalent amino acid replacements in Cln3p and Btn1p are compared. These results indicate that yeast can be used as a model for the study of Batten disease.
Resumo:
When one nerve cell acts on another, its postsynaptic effect can vary greatly. In sensory systems, inputs from “drivers” can be differentiated from those of “modulators.” The driver can be identified as the transmitter of receptive field properties; the modulator can be identified as altering the probability of certain aspects of that transmission. Where receptive fields are not available, the distinction is more difficult and currently is undefined. We use the visual pathways, particularly the thalamic geniculate relay for which much relevant evidence is available, to explore ways in which drivers can be distinguished from modulators. The extent to which the distinction may apply first to other parts of the thalamus and then, possibly, to other parts of the brain is considered. We suggest the following distinctions: Cross-correlograms from driver inputs have sharper peaks than those from modulators; there are likely to be few drivers but many modulators for any one cell; and drivers are likely to act only through ionotropic receptors having a fast postsynaptic effect whereas modulators also are likely to activate metabotropic receptors having a slow and prolonged postsynaptic effect.
Resumo:
In a survey of microbial systems capable of generating unusual metabolite structural variability, Streptomyces venezuelae ATCC 15439 is notable in its ability to produce two distinct groups of macrolide antibiotics. Methymycin and neomethymycin are derived from the 12-membered ring macrolactone 10-deoxymethynolide, whereas narbomycin and pikromycin are derived from the 14-membered ring macrolactone, narbonolide. This report describes the cloning and characterization of the biosynthetic gene cluster for these antibiotics. Central to the cluster is a polyketide synthase locus (pikA) that encodes a six-module system comprised of four multifunctional proteins, in addition to a type II thioesterase (TEII). Immediately downstream is a set of genes for desosamine biosynthesis (des) and macrolide ring hydroxylation. The study suggests that Pik TEII plays a role in forming a metabolic branch through which polyketides of different chain length are generated, and the glycosyl transferase (encoded by desVII) has the ability to catalyze glycosylation of both the 12- and 14-membered ring macrolactones. Moreover, the pikC-encoded P450 hydroxylase provides yet another layer of structural variability by introducing regiochemical diversity into the macrolide ring systems. The data support the notion that the architecture of the pik gene cluster as well as the unusual substrate specificity of particular enzymes contributes to its ability to generate four macrolide antibiotics.
Resumo:
The structural and functional organization of the Cct complex was addressed by genetic analyses of subunit interactions and catalytic cooperativity among five of the eight different essential subunits, Cct1p–Cct8p, in the yeast Saccharomyces cerevisiae. The cct1–1, cct2–3, and cct3–1 alleles, containing mutations at the conserved putative ATP-binding motif, GDGTT, are cold-sensitive, whereas single and multiple replacements of the corresponding motif in Cct6p are well tolerated by the cell. We demonstrated herein that cct6–3 (L19S), but not the parolog cct1–5 (R26I), specifically suppresses the cct1–1, cct2–3, and cct3–1 alleles, and that this suppression can be modulated by mutations in a putative phosphorylation motif, RXS, and the putative ATP-binding pocket of Cct6p. Our results suggest that the Cct ring is comprised of a single hetero-oligomer containing eight subunits of differential functional hierarchy, in which catalytic cooperativity of ATP-binding/hydrolysis takes place in a sequential manner different from the concerted cooperativity proposed for GroEL.