206 resultados para Imputation déterministe
Resumo:
L’objectif de cette thèse par articles est de présenter modestement quelques étapes du parcours qui mènera (on espère) à une solution générale du problème de l’intelligence artificielle. Cette thèse contient quatre articles qui présentent chacun une différente nouvelle méthode d’inférence perceptive en utilisant l’apprentissage machine et, plus particulièrement, les réseaux neuronaux profonds. Chacun de ces documents met en évidence l’utilité de sa méthode proposée dans le cadre d’une tâche de vision par ordinateur. Ces méthodes sont applicables dans un contexte plus général, et dans certains cas elles on tété appliquées ailleurs, mais ceci ne sera pas abordé dans le contexte de cette de thèse. Dans le premier article, nous présentons deux nouveaux algorithmes d’inférence variationelle pour le modèle génératif d’images appelé codage parcimonieux “spike- and-slab” (CPSS). Ces méthodes d’inférence plus rapides nous permettent d’utiliser des modèles CPSS de tailles beaucoup plus grandes qu’auparavant. Nous démontrons qu’elles sont meilleures pour extraire des détecteur de caractéristiques quand très peu d’exemples étiquetés sont disponibles pour l’entraînement. Partant d’un modèle CPSS, nous construisons ensuite une architecture profonde, la machine de Boltzmann profonde partiellement dirigée (MBP-PD). Ce modèle a été conçu de manière à simplifier d’entraînement des machines de Boltzmann profondes qui nécessitent normalement une phase de pré-entraînement glouton pour chaque couche. Ce problème est réglé dans une certaine mesure, mais le coût d’inférence dans le nouveau modèle est relativement trop élevé pour permettre de l’utiliser de manière pratique. Dans le deuxième article, nous revenons au problème d’entraînement joint de machines de Boltzmann profondes. Cette fois, au lieu de changer de famille de modèles, nous introduisons un nouveau critère d’entraînement qui donne naissance aux machines de Boltzmann profondes à multiples prédictions (MBP-MP). Les MBP-MP sont entraînables en une seule étape et ont un meilleur taux de succès en classification que les MBP classiques. Elles s’entraînent aussi avec des méthodes variationelles standard au lieu de nécessiter un classificateur discriminant pour obtenir un bon taux de succès en classification. Par contre, un des inconvénients de tels modèles est leur incapacité de générer deséchantillons, mais ceci n’est pas trop grave puisque la performance de classification des machines de Boltzmann profondes n’est plus une priorité étant donné les dernières avancées en apprentissage supervisé. Malgré cela, les MBP-MP demeurent intéressantes parce qu’elles sont capable d’accomplir certaines tâches que des modèles purement supervisés ne peuvent pas faire, telles que celle de classifier des données incomplètes ou encore celle de combler intelligemment l’information manquante dans ces données incomplètes. Le travail présenté dans cette thèse s’est déroulé au milieu d’une période de transformations importantes du domaine de l’apprentissage à réseaux neuronaux profonds qui a été déclenchée par la découverte de l’algorithme de “dropout” par Geoffrey Hinton. Dropout rend possible un entraînement purement supervisé d’architectures de propagation unidirectionnel sans être exposé au danger de sur- entraînement. Le troisième article présenté dans cette thèse introduit une nouvelle fonction d’activation spécialement con ̧cue pour aller avec l’algorithme de Dropout. Cette fonction d’activation, appelée maxout, permet l’utilisation de aggrégation multi-canal dans un contexte d’apprentissage purement supervisé. Nous démontrons comment plusieurs tâches de reconnaissance d’objets sont mieux accomplies par l’utilisation de maxout. Pour terminer, sont présentons un vrai cas d’utilisation dans l’industrie pour la transcription d’adresses de maisons à plusieurs chiffres. En combinant maxout avec une nouvelle sorte de couche de sortie pour des réseaux neuronaux de convolution, nous démontrons qu’il est possible d’atteindre un taux de succès comparable à celui des humains sur un ensemble de données coriace constitué de photos prises par les voitures de Google. Ce système a été déployé avec succès chez Google pour lire environ cent million d’adresses de maisons.
Resumo:
Ce projet de recherche a été réalisé avec la collaboration de FPInnovations. Une part des travaux concernant le problème de récolte chilien a été effectuée à l'Instituto Sistemas Complejos de Ingeniería (ISCI) à Santiago (Chili).
Resumo:
Learning Disability (LD) is a classification including several disorders in which a child has difficulty in learning in a typical manner, usually caused by an unknown factor or factors. LD affects about 15% of children enrolled in schools. The prediction of learning disability is a complicated task since the identification of LD from diverse features or signs is a complicated problem. There is no cure for learning disabilities and they are life-long. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. The aim of this paper is to develop a new algorithm for imputing missing values and to determine the significance of the missing value imputation method and dimensionality reduction method in the performance of fuzzy and neuro fuzzy classifiers with specific emphasis on prediction of learning disabilities in school age children. In the basic assessment method for prediction of LD, checklists are generally used and the data cases thus collected fully depends on the mood of children and may have also contain redundant as well as missing values. Therefore, in this study, we are proposing a new algorithm, viz. the correlation based new algorithm for imputing the missing values and Principal Component Analysis (PCA) for reducing the irrelevant attributes. After the study, it is found that, the preprocessing methods applied by us improves the quality of data and thereby increases the accuracy of the classifiers. The system is implemented in Math works Software Mat Lab 7.10. The results obtained from this study have illustrated that the developed missing value imputation method is very good contribution in prediction system and is capable of improving the performance of a classifier.
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced
Resumo:
All of the imputation techniques usually applied for replacing values below the detection limit in compositional data sets have adverse effects on the variability. In this work we propose a modification of the EM algorithm that is applied using the additive log-ratio transformation. This new strategy is applied to a compositional data set and the results are compared with the usual imputation techniques
Resumo:
Low concentrations of elements in geochemical analyses have the peculiarity of being compositional data and, for a given level of significance, are likely to be beyond the capabilities of laboratories to distinguish between minute concentrations and complete absence, thus preventing laboratories from reporting extremely low concentrations of the analyte. Instead, what is reported is the detection limit, which is the minimum concentration that conclusively differentiates between presence and absence of the element. A spatially distributed exhaustive sample is employed in this study to generate unbiased sub-samples, which are further censored to observe the effect that different detection limits and sample sizes have on the inference of population distributions starting from geochemical analyses having specimens below detection limit (nondetects). The isometric logratio transformation is used to convert the compositional data in the simplex to samples in real space, thus allowing the practitioner to properly borrow from the large source of statistical techniques valid only in real space. The bootstrap method is used to numerically investigate the reliability of inferring several distributional parameters employing different forms of imputation for the censored data. The case study illustrates that, in general, best results are obtained when imputations are made using the distribution best fitting the readings above detection limit and exposes the problems of other more widely used practices. When the sample is spatially correlated, it is necessary to combine the bootstrap with stochastic simulation
Resumo:
There is almost not a case in exploration geology, where the studied data doesn’t includes below detection limits and/or zero values, and since most of the geological data responds to lognormal distributions, these “zero data” represent a mathematical challenge for the interpretation. We need to start by recognizing that there are zero values in geology. For example the amount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-exists with nepheline. Another common essential zero is a North azimuth, however we can always change that zero for the value of 360°. These are known as “Essential zeros”, but what can we do with “Rounded zeros” that are the result of below the detection limit of the equipment? Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimes we need to differentiate between a sodic and a potassic alteration. Pre-classification into groups requires a good knowledge of the distribution of the data and the geochemical characteristics of the groups which is not always available. Considering the zero values equal to the limit of detection of the used equipment will generate spurious distributions, especially in ternary diagrams. Same situation will occur if we replace the zero values by a small amount using non-parametric or parametric techniques (imputation). The method that we are proposing takes into consideration the well known relationships between some elements. For example, in copper porphyry deposits, there is always a good direct correlation between the copper values and the molybdenum ones, but while copper will always be above the limit of detection, many of the molybdenum values will be “rounded zeros”. So, we will take the lower quartile of the real molybdenum values and establish a regression equation with copper, and then we will estimate the “rounded” zero values of molybdenum by their corresponding copper values. The method could be applied to any type of data, provided we establish first their correlation dependency. One of the main advantages of this method is that we do not obtain a fixed value for the “rounded zeros”, but one that depends on the value of the other variable. Key words: compositional data analysis, treatment of zeros, essential zeros, rounded zeros, correlation dependency
Resumo:
The R-package “compositions”is a tool for advanced compositional analysis. Its basic functionality has seen some conceptual improvement, containing now some facilities to work with and represent ilr bases built from balances, and an elaborated subsys- tem for dealing with several kinds of irregular data: (rounded or structural) zeroes, incomplete observations and outliers. The general approach to these irregularities is based on subcompositions: for an irregular datum, one can distinguish a “regular” sub- composition (where all parts are actually observed and the datum behaves typically) and a “problematic” subcomposition (with those unobserved, zero or rounded parts, or else where the datum shows an erratic or atypical behaviour). Systematic classification schemes are proposed for both outliers and missing values (including zeros) focusing on the nature of irregularities in the datum subcomposition(s). To compute statistics with values missing at random and structural zeros, a projection approach is implemented: a given datum contributes to the estimation of the desired parameters only on the subcompositon where it was observed. For data sets with values below the detection limit, two different approaches are provided: the well-known imputation technique, and also the projection approach. To compute statistics in the presence of outliers, robust statistics are adapted to the characteristics of compositional data, based on the minimum covariance determinant approach. The outlier classification is based on four different models of outlier occur- rence and Monte-Carlo-based tests for their characterization. Furthermore the package provides special plots helping to understand the nature of outliers in the dataset. Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator, robustness, rounded zeros
Resumo:
El acuerdo militar entre Colombia y los Estados Unidos, la Ley 418 de 1997, la ley 975 de 2005, y la amnistia e indulto como casos de inmunidad de jurisdicción para establecer los títulos de imputación que pueden configurar una responsabilidad del estado patrimonial en Colombia.
Resumo:
En una reciente decisión, la Corte Suprema de Justicia Colombiana condenó a un médico por haber prestado sus servicios profesionales a personas pertenecientes a un grupo armado al margen de la Ley. En el presente escrito revisamos ese fallo a la luz de la teoría de la imputación objetiva para diferir de la opinión del Alto Tribunal, por cuanto entendemos que el ejercicio de la medicina jamás constituirá un riesgo desaprobado y éste es un elemento necesario para que pueda hablarse de un delito.
Resumo:
El pasado 16 de marzo de 2011, la Corte Suprema de Justicia se ocupó de un caso donde se califica a un Juez con el delito de falsedad ideológica en documento público. En este fallo se trazaron los lineamientos fundamentales de este delito, mismos que son analizados en el presente escrito para concluir que si bien el resultado del proceso en la Corte −la condena del procesado− es correcto, es necesario superar el entendimiento causal de este delito para interpretarlo de acuerdo con la moderna teoría de la imputación objetiva.
Resumo:
La complejidad que supone abarcar el estudio de la responsabilidad patrimonial del Estado en el ámbito médico sanitario, hace preciso prestar atención a ciertos temas que resultan especialmente relevantes y que han sido decantados jurisprudencialmente por el Honorable Consejo de Estado. De esta manera el presente trabajo desarrolla temas descollantes y novedosos en materia de imputabilidad como viene a ser la prueba de la falla médica mediante la teoría "res ipsa loquitur"; la prueba del nexo causal a través de la prueba indiciaria y la teoría de la probabilidad preponderante. Así mismo se estudian los diversos tipos de daños antijurídicos que pueden darse dentro de la prestación médica a cargo del Estado, destacando especialmente la lesión al derecho a recibir una atención oportuna y eficaz, la pérdida de una oportunidad debida a la no obtención del consentimiento informado del paciente, lo que supone, a su vez, el cercenamiento del derecho de este a elegir someterse o no a determinado tratamiento, previo valoración de pros y contras de la terapia sugerida por el galeno (principio de no agravación). Así mismo se analizanlas hipótesis de daños antijurídicos derivados del error en el diagnóstico, la falla por la omisión de las entidades de control y vigilancia, falla en gineco-obstetricia, así como las hipótesis de responsabilidad objetiva del Estado por óblito quirúrgico, para finalmente tratar el tema novedoso del alea terapéutica con sus particulares características y eventual aplicabilidad en el sistema jurídico colombiano.
Resumo:
Protein tyrosine phosphatase non-receptor type 22 (PTPN22) is a negative regulator of T-cell activation associated with several autoimmune diseases, including systemic lupus erythematosus (SLE). Missense rs2476601 is associated with SLE in individuals with European ancestry. Since the rs2476601 risk allele frequency differs dramatically across ethnicities, we assessed robustness of PTPN22 association with SLE and its clinical subphenotypes across four ethnically diverse populations. Ten SNPs were genotyped in 8220 SLE cases and 7369 controls from in European-Americans (EA), African-Americans (AA), Asians (AS), and Hispanics (HS). We performed imputation-based association followed by conditional analysis to identify independent associations. Significantly associated SNPs were tested for association with SLE clinical sub-phenotypes, including autoantibody profiles. Multiple testing was accounted for by using false discovery rate. We successfully imputed and tested allelic association for 107 SNPs within the PTPN22 region and detected evidence of ethnic-specific associations from EA and HS. In EA, the strongest association was at rs2476601 (P = 4.761029, OR = 1.40 (95% CI = 1.25–1.56)). Independent association with rs1217414 was also observed in EA, and both SNPs are correlated with increased European ancestry. For HS imputed intronic SNP, rs3765598, predicted to be a cis-eQTL, was associated (P = 0.007, OR = 0.79 and 95% CI = 0.67–0.94). No significant associations were observed in AA or AS. Case-only analysis using lupus-related clinical criteria revealed differences between EA SLE patients positive for moderate to high titers of IgG anti-cardiolipin (aCL IgG .20) versus negative aCL IgG at rs2476601 (P = 0.012, OR = 1.65). Association was reinforced when these cases were compared to controls (P = 2.761025, OR = 2.11). Our results validate that rs2476601 is the most significantly associated SNP in individuals with European ancestry. Additionally, rs1217414 and rs3765598 may be associated with SLE. Further studies are required to confirm the involvement of rs2476601 with aCL IgG.
Resumo:
Immunoregulatory cytokine interleukin-10 (IL-10) is elevated in sera from patients with systemic lupus erythematosus (SLE) correlating with disease activity. The established association of IL10 with SLE and other autoimmune diseases led us to fine map causal variant(s) and to explore underlying mechanisms. We assessed 19 tag SNPs, covering the IL10 gene cluster including IL19, IL20 and IL24, for association with SLE in 15,533 case and control subjects from four ancestries. The previously reported IL10 variant, rs3024505 located at 1 kb downstream of IL10, exhibited the strongest association signal and was confirmed for association with SLE in European American (EA) (P = 2.7×10−8, OR = 1.30), but not in non-EA ancestries. SNP imputation conducted in EA dataset identified three additional SLE-associated SNPs tagged by rs3024505 (rs3122605, rs3024493 and rs3024495 located at 9.2 kb upstream, intron 3 and 4 of IL10, respectively), and SLE-risk alleles of these SNPs were dose-dependently associated with elevated levels of IL10 mRNA in PBMCs and circulating IL-10 protein in SLE patients and controls. Using nuclear extracts of peripheral blood cells from SLE patients for electrophoretic mobility shift assays, we identified specific binding of transcription factor Elk-1 to oligodeoxynucleotides containing the risk (G) allele of rs3122605, suggesting rs3122605 as the most likely causal variant regulating IL10 expression. Elk-1 is known to be activated by phosphorylation and nuclear localization to induce transcription. Of interest, phosphorylated Elk-1 (p-Elk-1) detected only in nuclear extracts of SLE PBMCs appeared to increase with disease activity. Co-expression levels of p-Elk-1 and IL-10 were elevated in SLE T, B cells and monocytes, associated with increased disease activity in SLE B cells, and were best downregulated by ERK inhibitor. Taken together, our data suggest that preferential binding of activated Elk-1 to the IL10 rs3122605-G allele upregulates IL10 expression and confers increased risk for SLE in European Americans.
Resumo:
We previously established an 80 kb haplotype upstream of TNFSF4 as a susceptibility locus in the autoimmune disease SLE. SLE-associated alleles at this locus are associated with inflammatory disorders, including atherosclerosis and ischaemic stroke. In Europeans, the TNFSF4 causal variants have remained elusive due to strong linkage disequilibrium exhibited by alleles spanning the region. Using a trans-ancestral approach to fine-map the locus, utilising 17,900 SLE and control subjects including Amerindian/Hispanics (1348 cases, 717 controls), African-Americans (AA) (1529, 2048) and better powered cohorts of Europeans and East Asians, we find strong association of risk alleles in all ethnicities; the AA association replicates in African-American Gullah (152,122). The best evidence of association comes from two adjacent markers: rs2205960-T (P = 1.71×10-34, OR = 1.43[1.26-1.60]) and rs1234317-T (P = 1.16×10-28, OR = 1.38[1.24-1.54]). Inference of fine-scale recombination rates for all populations tested finds the 80 kb risk and non-risk haplotypes in all except African-Americans. In this population the decay of recombination equates to an 11 kb risk haplotype, anchored in the 5′ region proximal to TNFSF4 and tagged by rs2205960-T after 1000 Genomes phase 1 (v3) imputation. Conditional regression analyses delineate the 5′ risk signal to rs2205960-T and the independent non-risk signal to rs1234314-C. Our case-only and SLE-control cohorts demonstrate robust association of rs2205960-T with autoantibody production. The rs2205960-T is predicted to form part of a decameric motif which binds NF-κBp65 with increased affinity compared to rs2205960-G. ChIP-seq data also indicate NF-κB interaction with the DNA sequence at this position in LCL cells. Our research suggests association of rs2205960-T with SLE across multiple groups and an independent non-risk signal at rs1234314-C. rs2205960-T is associated with autoantibody production and lymphopenia. Our data confirm a global signal at TNFSF4 and a role for the expressed product at multiple stages of lymphocyte dysregulation during SLE pathogenesis. We confirm the validity of trans-ancestral mapping in a complex trait. © 2013 Manku et al.