864 resultados para Feature selection algorithm
Resumo:
The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.
Resumo:
Remote sensing image processing is nowadays a mature research area. The techniques developed in the field allow many real-life applications with great societal value. For instance, urban monitoring, fire detection or flood prediction can have a great impact on economical and environmental issues. To attain such objectives, the remote sensing community has turned into a multidisciplinary field of science that embraces physics, signal theory, computer science, electronics, and communications. From a machine learning and signal/image processing point of view, all the applications are tackled under specific formalisms, such as classification and clustering, regression and function approximation, image coding, restoration and enhancement, source unmixing, data fusion or feature selection and extraction. This paper serves as a survey of methods and applications, and reviews the last methodological advances in remote sensing image processing.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
Alzheimer's disease is the most prevalent form of progressive degenerative dementia; it has a high socio-economic impact in Western countries. Therefore it is one of the most active research areas today. Alzheimer's is sometimes diagnosed by excluding other dementias, and definitive confirmation is only obtained through a post-mortem study of the brain tissue of the patient. The work presented here is part of a larger study that aims to identify novel technologies and biomarkers for early Alzheimer's disease detection, and it focuses on evaluating the suitability of a new approach for early diagnosis of Alzheimer’s disease by non-invasive methods. The purpose is to examine, in a pilot study, the potential of applying Machine Learning algorithms to speech features obtained from suspected Alzheimer sufferers in order help diagnose this disease and determine its degree of severity. Two human capabilities relevant in communication have been analyzed for feature selection: Spontaneous Speech and Emotional Response. The experimental results obtained were very satisfactory and promising for the early diagnosis and classification of Alzheimer’s disease patients.
Resumo:
Background To analyse the extent and profile of outpatient regular dispensation of antipsychotics, both in combination and monotherapy, in the Barcelona Health Region (Spain), focusing on the use of clozapine and long-acting injections (LAI). Methods Antipsychotic drugs dispensed for people older than 18 and processed by the Catalan Health Service during 2007 were retrospectively reviewed. First and second generation antipsychotic drugs (FGA and SGA) from the Anatomical Therapeutic Chemical classification (ATC) code N05A (except lithium) were included. A patient selection algorithm was designed to identify prescriptions regularly dispensed. Variables included were age, gender, antipsychotic type, route of administration and number of packages dispensed. Results A total of 117,811 patients were given any antipsychotic, of whom 71,004 regularly received such drugs. Among the latter, 9,855 (13.9%) corresponded to an antipsychotic combination, 47,386 (66.7%) to monotherapy and 13,763 (19.4%) to unspecified combinations. Of the patients given antipsychotics in association, 58% were men. Olanzapine (37.1%) and oral risperidone (36.4%) were the most common dispensations. Analysis of the patients dispensed two antipsychotics (57.8%) revealed 198 different combinations, the most frequent being the association of FGA and SGA (62.0%). Clozapine was dispensed to 2.3% of patients. Of those who were receiving antipsychotics in combination, 6.6% were given clozapine, being clozapine plus amisulpride the most frequent association (22.8%). A total of 3.800 patients (5.4%) were given LAI antipsychotics, and 2.662 of these (70.1%) were in combination. Risperidone was the most widely used LAI. Conclusions The scant evidence available regarding the efficacy of combining different antipsychotics contrasts with the high number and variety of combinations prescribed to outpatients, as well as with the limited use of clozapine. Background To analyse the extent and profile of outpatient regular dispensation of antipsychotics, both in combination and monotherapy, in the Barcelona Health Region (Spain), focusing on the use of clozapine and long-acting injections (LAI). Methods Antipsychotic drugs dispensed for people older than 18 and processed by the Catalan Health Service during 2007 were retrospectively reviewed. First and second generation antipsychotic drugs (FGA and SGA) from the Anatomical Therapeutic Chemical classification (ATC) code N05A (except lithium) were included. A patient selection algorithm was designed to identify prescriptions regularly dispensed. Variables included were age, gender, antipsychotic type, route of administration and number of packages dispensed. Results A total of 117,811 patients were given any antipsychotic, of whom 71,004 regularly received such drugs. Among the latter, 9,855 (13.9%) corresponded to an antipsychotic combination, 47,386 (66.7%) to monotherapy and 13,763 (19.4%) to unspecified combinations. Of the patients given antipsychotics in association, 58% were men. Olanzapine (37.1%) and oral risperidone (36.4%) were the most common dispensations. Analysis of the patients dispensed two antipsychotics (57.8%) revealed 198 different combinations, the most frequent being the association of FGA and SGA (62.0%). Clozapine was dispensed to 2.3% of patients. Of those who were receiving antipsychotics in combination, 6.6% were given clozapine, being clozapine plus amisulpride the most frequent association (22.8%). A total of 3.800 patients (5.4%) were given LAI antipsychotics, and 2.662 of these (70.1%) were in combination. Risperidone was the most widely used LAI. Conclusions The scant evidence available regarding the efficacy of combining different antipsychotics contrasts with the high number and variety of combinations prescribed to outpatients, as well as with the limited use of clozapine.
Resumo:
This thesis is composed of three main parts. The first consists of a state of the art of the different notions that are significant to understand the elements surrounding art authentication in general, and of signatures in particular, and that the author deemed them necessary to fully grasp the microcosm that makes up this particular market. Individuals with a solid knowledge of the art and expertise area, and that are particularly interested in the present study are advised to advance directly to the fourth Chapter. The expertise of the signature, it's reliability, and the factors impacting the expert's conclusions are brought forward. The final aim of the state of the art is to offer a general list of recommendations based on an exhaustive review of the current literature and given in light of all of the exposed issues. These guidelines are specifically formulated for the expertise of signatures on paintings, but can also be applied to wider themes in the area of signature examination. The second part of this thesis covers the experimental stages of the research. It consists of the method developed to authenticate painted signatures on works of art. This method is articulated around several main objectives: defining measurable features on painted signatures and defining their relevance in order to establish the separation capacities between groups of authentic and simulated signatures. For the first time, numerical analyses of painted signatures have been obtained and are used to attribute their authorship to given artists. An in-depth discussion of the developed method constitutes the third and final part of this study. It evaluates the opportunities and constraints when applied by signature and handwriting experts in forensic science. A brief summary covering each chapter allows a rapid overview of the study and summarizes the aims and main themes of each chapter. These outlines presented below summarize the aims and main themes addressed in each chapter. Part I - Theory Chapter 1 exposes legal aspects surrounding the authentication of works of art by art experts. The definition of what is legally authentic, the quality and types of the experts that can express an opinion concerning the authorship of a specific painting, and standard deontological rules are addressed. The practices applied in Switzerland will be specifically dealt with. Chapter 2 presents an overview of the different scientific analyses that can be carried out on paintings (from the canvas to the top coat). Scientific examinations of works of art have become more common, as more and more museums equip themselves with laboratories, thus an understanding of their role in the art authentication process is vital. The added value that a signature expertise can have in comparison to other scientific techniques is also addressed. Chapter 3 provides a historical overview of the signature on paintings throughout the ages, in order to offer the reader an understanding of the origin of the signature on works of art and its evolution through time. An explanation is given on the transitions that the signature went through from the 15th century on and how it progressively took on its widely known modern form. Both this chapter and chapter 2 are presented to show the reader the rich sources of information that can be provided to describe a painting, and how the signature is one of these sources. Chapter 4 focuses on the different hypotheses the FHE must keep in mind when examining a painted signature, since a number of scenarios can be encountered when dealing with signatures on works of art. The different forms of signatures, as well as the variables that may have an influence on the painted signatures, are also presented. Finally, the current state of knowledge of the examination procedure of signatures in forensic science in general, and in particular for painted signatures, is exposed. The state of the art of the assessment of the authorship of signatures on paintings is established and discussed in light of the theoretical facets mentioned previously. Chapter 5 considers key elements that can have an impact on the FHE during his or her2 examinations. This includes a discussion on elements such as the skill, confidence and competence of an expert, as well as the potential bias effects he might encounter. A better understanding of elements surrounding handwriting examinations, to, in turn, better communicate results and conclusions to an audience, is also undertaken. Chapter 6 reviews the judicial acceptance of signature analysis in Courts and closes the state of the art section of this thesis. This chapter brings forward the current issues pertaining to the appreciation of this expertise by the non- forensic community, and will discuss the increasing number of claims of the unscientific nature of signature authentication. The necessity to aim for more scientific, comprehensive and transparent authentication methods will be discussed. The theoretical part of this thesis is concluded by a series of general recommendations for forensic handwriting examiners in forensic science, specifically for the expertise of signatures on paintings. These recommendations stem from the exhaustive review of the literature and the issues exposed from this review and can also be applied to the traditional examination of signatures (on paper). Part II - Experimental part Chapter 7 describes and defines the sampling, extraction and analysis phases of the research. The sampling stage of artists' signatures and their respective simulations are presented, followed by the steps that were undertaken to extract and determine sets of characteristics, specific to each artist, that describe their signatures. The method is based on a study of five artists and a group of individuals acting as forgers for the sake of this study. Finally, the analysis procedure of these characteristics to assess of the strength of evidence, and based on a Bayesian reasoning process, is presented. Chapter 8 outlines the results concerning both the artist and simulation corpuses after their optical observation, followed by the results of the analysis phase of the research. The feature selection process and the likelihood ratio evaluation are the main themes that are addressed. The discrimination power between both corpuses is illustrated through multivariate analysis. Part III - Discussion Chapter 9 discusses the materials, the methods, and the obtained results of the research. The opportunities, but also constraints and limits, of the developed method are exposed. Future works that can be carried out subsequent to the results of the study are also presented. Chapter 10, the last chapter of this thesis, proposes a strategy to incorporate the model developed in the last chapters into the traditional signature expertise procedures. Thus, the strength of this expertise is discussed in conjunction with the traditional conclusions reached by forensic handwriting examiners in forensic science. Finally, this chapter summarizes and advocates a list of formal recommendations for good practices for handwriting examiners. In conclusion, the research highlights the interdisciplinary aspect of signature examination of signatures on paintings. The current state of knowledge of the judicial quality of art experts, along with the scientific and historical analysis of paintings and signatures, are overviewed to give the reader a feel of the different factors that have an impact on this particular subject. The temperamental acceptance of forensic signature analysis in court, also presented in the state of the art, explicitly demonstrates the necessity of a better recognition of signature expertise by courts of law. This general acceptance, however, can only be achieved by producing high quality results through a well-defined examination process. This research offers an original approach to attribute a painted signature to a certain artist: for the first time, a probabilistic model used to measure the discriminative potential between authentic and simulated painted signatures is studied. The opportunities and limits that lie within this method of scientifically establishing the authorship of signatures on works of art are thus presented. In addition, the second key contribution of this work proposes a procedure to combine the developed method into that used traditionally signature experts in forensic science. Such an implementation into the holistic traditional signature examination casework is a large step providing the forensic, judicial and art communities with a solid-based reasoning framework for the examination of signatures on paintings. The framework and preliminary results associated with this research have been published (Montani, 2009a) and presented at international forensic science conferences (Montani, 2009b; Montani, 2012).
Resumo:
BACKGROUND: Transcranial magnetic stimulation combined with electroencephalogram (TMS-EEG) can be used to explore the dynamical state of neuronal networks. In patients with epilepsy, TMS can induce epileptiform discharges (EDs) with a stochastic occurrence despite constant stimulation parameters. This observation raises the possibility that the pre-stimulation period contains multiple covert states of brain excitability some of which are associated with the generation of EDs. OBJECTIVE: To investigate whether the interictal period contains "high excitability" states that upon brain stimulation produce EDs and can be differentiated from "low excitability" states producing normal appearing TMS-EEG responses. METHODS: In a cohort of 25 patients with Genetic Generalized Epilepsies (GGE) we identified two subjects characterized by the intermittent development of TMS-induced EDs. The high-excitability in the pre-stimulation period was assessed using multiple measures of univariate time series analysis. Measures providing optimal discrimination were identified by feature selection techniques. The "high excitability" states emerged in multiple loci (indicating diffuse cortical hyperexcitability) and were clearly differentiated on the basis of 14 measures from "low excitability" states (accuracy = 0.7). CONCLUSION: In GGE, the interictal period contains multiple, quasi-stable covert states of excitability a class of which is associated with the generation of TMS-induced EDs. The relevance of these findings to theoretical models of ictogenesis is discussed.
Resumo:
In this study, a wrapper approach was applied to objectively select the most important variables related to two different anaerobic digestion imbalances, acidogenic states and foaming. This feature selection method, implemented in artificial neural networks (ANN), was performed using input and output data from a fully instrumented pilot plant (1 m 3 upflow fixed bed digester). Results for acidogenic states showed that pH, volatile fatty acids, and inflow rate were the most relevant variables. Results for foaming showed that inflow rate and total organic carbon were among the relevant variables, both of which were related to the feed loading of the digester. Because there is not a complete agreement on the causes of foaming, these results highlight the role of digester feeding patterns in the development of foaming
Resumo:
The papermaking industry has been continuously developing intelligent solutions to characterize the raw materials it uses, to control the manufacturing process in a robust way, and to guarantee the desired quality of the end product. Based on the much improved imaging techniques and image-based analysis methods, it has become possible to look inside the manufacturing pipeline and propose more effective alternatives to human expertise. This study is focused on the development of image analyses methods for the pulping process of papermaking. Pulping starts with wood disintegration and forming the fiber suspension that is subsequently bleached, mixed with additives and chemicals, and finally dried and shipped to the papermaking mills. At each stage of the process it is important to analyze the properties of the raw material to guarantee the product quality. In order to evaluate properties of fibers, the main component of the pulp suspension, a framework for fiber characterization based on microscopic images is proposed in this thesis as the first contribution. The framework allows computation of fiber length and curl index correlating well with the ground truth values. The bubble detection method, the second contribution, was developed in order to estimate the gas volume at the delignification stage of the pulping process based on high-resolution in-line imaging. The gas volume was estimated accurately and the solution enabled just-in-time process termination whereas the accurate estimation of bubble size categories still remained challenging. As the third contribution of the study, optical flow computation was studied and the methods were successfully applied to pulp flow velocity estimation based on double-exposed images. Finally, a framework for classifying dirt particles in dried pulp sheets, including the semisynthetic ground truth generation, feature selection, and performance comparison of the state-of-the-art classification techniques, was proposed as the fourth contribution. The framework was successfully tested on the semisynthetic and real-world pulp sheet images. These four contributions assist in developing an integrated factory-level vision-based process control.
Resumo:
Electricity price forecasting has become an important area of research in the aftermath of the worldwide deregulation of the power industry that launched competitive electricity markets now embracing all market participants including generation and retail companies, transmission network providers, and market managers. Based on the needs of the market, a variety of approaches forecasting day-ahead electricity prices have been proposed over the last decades. However, most of the existing approaches are reasonably effective for normal range prices but disregard price spike events, which are caused by a number of complex factors and occur during periods of market stress. In the early research, price spikes were truncated before application of the forecasting model to reduce the influence of such observations on the estimation of the model parameters; otherwise, a very large forecast error would be generated on price spike occasions. Electricity price spikes, however, are significant for energy market participants to stay competitive in a market. Accurate price spike forecasting is important for generation companies to strategically bid into the market and to optimally manage their assets; for retailer companies, since they cannot pass the spikes onto final customers, and finally, for market managers to provide better management and planning for the energy market. This doctoral thesis aims at deriving a methodology able to accurately predict not only the day-ahead electricity prices within the normal range but also the price spikes. The Finnish day-ahead energy market of Nord Pool Spot is selected as the case market, and its structure is studied in detail. It is almost universally agreed in the forecasting literature that no single method is best in every situation. Since the real-world problems are often complex in nature, no single model is able to capture different patterns equally well. Therefore, a hybrid methodology that enhances the modeling capabilities appears to be a possibly productive strategy for practical use when electricity prices are predicted. The price forecasting methodology is proposed through a hybrid model applied to the price forecasting in the Finnish day-ahead energy market. The iterative search procedure employed within the methodology is developed to tune the model parameters and select the optimal input set of the explanatory variables. The numerical studies show that the proposed methodology has more accurate behavior than all other examined methods most recently applied to case studies of energy markets in different countries. The obtained results can be considered as providing extensive and useful information for participants of the day-ahead energy market, who have limited and uncertain information for price prediction to set up an optimal short-term operation portfolio. Although the focus of this work is primarily on the Finnish price area of Nord Pool Spot, given the result of this work, it is very likely that the same methodology will give good results when forecasting the prices on energy markets of other countries.
Resumo:
Human beings have always strived to preserve their memories and spread their ideas. In the beginning this was always done through human interpretations, such as telling stories and creating sculptures. Later, technological progress made it possible to create a recording of a phenomenon; first as an analogue recording onto a physical object, and later digitally, as a sequence of bits to be interpreted by a computer. By the end of the 20th century technological advances had made it feasible to distribute media content over a computer network instead of on physical objects, thus enabling the concept of digital media distribution. Many digital media distribution systems already exist, and their continued, and in many cases increasing, usage is an indicator for the high interest in their future enhancements and enriching. By looking at these digital media distribution systems, we have identified three main areas of possible improvement: network structure and coordination, transport of content over the network, and the encoding used for the content. In this thesis, our aim is to show that improvements in performance, efficiency and availability can be done in conjunction with improvements in software quality and reliability through the use of formal methods: mathematical approaches to reasoning about software so that we can prove its correctness, together with the desirable properties. We envision a complete media distribution system based on a distributed architecture, such as peer-to-peer networking, in which different parts of the system have been formally modelled and verified. Starting with the network itself, we show how it can be formally constructed and modularised in the Event-B formalism, such that we can separate the modelling of one node from the modelling of the network itself. We also show how the piece selection algorithm in the BitTorrent peer-to-peer transfer protocol can be adapted for on-demand media streaming, and how this can be modelled in Event-B. Furthermore, we show how modelling one peer in Event-B can give results similar to simulating an entire network of peers. Going further, we introduce a formal specification language for content transfer algorithms, and show that having such a language can make these algorithms easier to understand. We also show how generating Event-B code from this language can result in less complexity compared to creating the models from written specifications. We also consider the decoding part of a media distribution system by showing how video decoding can be done in parallel. This is based on formally defined dependencies between frames and blocks in a video sequence; we have shown that also this step can be performed in a way that is mathematically proven correct. Our modelling and proving in this thesis is, in its majority, tool-based. This provides a demonstration of the advance of formal methods as well as their increased reliability, and thus, advocates for their more wide-spread usage in the future.
Resumo:
Les réseaux optiques à commutation de rafales (OBS) sont des candidats pour jouer un rôle important dans le cadre des réseaux optiques de nouvelle génération. Dans cette thèse, nous nous intéressons au routage adaptatif et au provisionnement de la qualité de service dans ce type de réseaux. Dans une première partie de la thèse, nous nous intéressons à la capacité du routage multi-chemins et du routage alternatif (par déflection) à améliorer les performances des réseaux OBS, pro-activement pour le premier et ré-activement pour le second. Dans ce contexte, nous proposons une approche basée sur l’apprentissage par renforcement où des agents placés dans tous les nœuds du réseau coopèrent pour apprendre, continuellement, les chemins du routage et les chemins alternatifs optimaux selon l’état actuel du réseau. Les résultats numériques montrent que cette approche améliore les performances des réseaux OBS comparativement aux solutions proposées dans la littérature. Dans la deuxième partie de cette thèse, nous nous intéressons au provisionnement absolu de la qualité de service où les performances pire-cas des classes de trafic de priorité élevée sont garanties quantitativement. Plus spécifiquement, notre objectif est de garantir la transmission sans pertes des rafales de priorité élevée à l’intérieur du réseau OBS tout en préservant le multiplexage statistique et l’utilisation efficace des ressources qui caractérisent les réseaux OBS. Aussi, nous considérons l’amélioration des performances du trafic best effort. Ainsi, nous proposons deux approches : une approche basée sur les nœuds et une approche basée sur les chemins. Dans l’approche basée sur les nœuds, un ensemble de longueurs d’onde est assigné à chaque nœud du bord du réseau OBS pour qu’il puisse envoyer son trafic garanti. Cette assignation prend en considération les distances physiques entre les nœuds du bord. En outre, nous proposons un algorithme de sélection des longueurs d’onde pour améliorer les performances des rafales best effort. Dans l’approche basée sur les chemins, le provisionnement absolu de la qualité de service est fourni au niveau des chemins entre les nœuds du bord du réseau OBS. À cette fin, nous proposons une approche de routage et d’assignation des longueurs d’onde qui a pour but la réduction du nombre requis de longueurs d’onde pour établir des chemins sans contentions. Néanmoins, si cet objectif ne peut pas être atteint à cause du nombre limité de longueurs d’onde, nous proposons de synchroniser les chemins en conflit sans le besoin pour des équipements additionnels. Là aussi, nous proposons un algorithme de sélection des longueurs d’onde pour les rafales best effort. Les résultats numériques montrent que l’approche basée sur les nœuds et l’approche basée sur les chemins fournissent le provisionnement absolu de la qualité de service pour le trafic garanti et améliorent les performances du trafic best effort. En outre, quand le nombre de longueurs d’ondes est suffisant, l’approche basée sur les chemins peut accommoder plus de trafic garanti et améliorer les performances du trafic best effort par rapport à l’approche basée sur les nœuds.
Resumo:
Les documents publiés par des entreprises, tels les communiqués de presse, contiennent une foule d’informations sur diverses activités des entreprises. C’est une source précieuse pour des analyses en intelligence d’affaire. Cependant, il est nécessaire de développer des outils pour permettre d’exploiter cette source automatiquement, étant donné son grand volume. Ce mémoire décrit un travail qui s’inscrit dans un volet d’intelligence d’affaire, à savoir la détection de relations d’affaire entre les entreprises décrites dans des communiqués de presse. Dans ce mémoire, nous proposons une approche basée sur la classification. Les méthodes de classifications existantes ne nous permettent pas d’obtenir une performance satisfaisante. Ceci est notamment dû à deux problèmes : la représentation du texte par tous les mots, qui n’aide pas nécessairement à spécifier une relation d’affaire, et le déséquilibre entre les classes. Pour traiter le premier problème, nous proposons une approche de représentation basée sur des mots pivots c’est-à-dire les noms d’entreprises concernées, afin de mieux cerner des mots susceptibles de les décrire. Pour le deuxième problème, nous proposons une classification à deux étapes. Cette méthode s’avère plus appropriée que les méthodes traditionnelles de ré-échantillonnage. Nous avons testé nos approches sur une collection de communiqués de presse dans le domaine automobile. Nos expérimentations montrent que les approches proposées peuvent améliorer la performance de classification. Notamment, la représentation du document basée sur les mots pivots nous permet de mieux centrer sur les mots utiles pour la détection de relations d’affaire. La classification en deux étapes apporte une solution efficace au problème de déséquilibre entre les classes. Ce travail montre que la détection automatique des relations d’affaire est une tâche faisable. Le résultat de cette détection pourrait être utilisé dans une analyse d’intelligence d’affaire.
Resumo:
Les scores de propension (PS) sont fréquemment utilisés dans l’ajustement pour des facteurs confondants liés au biais d’indication. Cependant, ils sont limités par le fait qu’ils permettent uniquement l’ajustement pour des facteurs confondants connus et mesurés. Les scores de propension à hautes dimensions (hdPS), une variante des PS, utilisent un algorithme standardisé afin de sélectionner les covariables pour lesquelles ils vont ajuster. L’utilisation de cet algorithme pourrait permettre l’ajustement de tous les types de facteurs confondants. Cette thèse a pour but d’évaluer la performance de l’hdPS vis-à-vis le biais d’indication dans le contexte d’une étude observationnelle examinant l’effet diabétogénique potentiel des statines. Dans un premier temps, nous avons examiné si l’exposition aux statines était associée au risque de diabète. Les résultats de ce premier article suggèrent que l’exposition aux statines est associée avec une augmentation du risque de diabète et que cette relation est dose-dépendante et réversible dans le temps. Suite à l’identification de cette association, nous avons examiné dans un deuxième article si l’hdPS permettait un meilleur ajustement pour le biais d’indication que le PS; cette évaluation fut entreprise grâce à deux approches: 1) en fonction des mesures d’association ajustées et 2) en fonction de la capacité du PS et de l’hdPS à sélectionner des sous-cohortes appariées de patients présentant des caractéristiques similaires vis-à-vis 19 caractéristiques lorsqu’ils sont utilisés comme critère d’appariement. Selon les résultats présentés dans le cadre du deuxième article, nous avons démontré que l’évaluation de la performance en fonction de la première approche était non concluante, mais que l’évaluation en fonction de la deuxième approche favorisait l’hdPS dans son ajustement pour le biais d’indication. Le dernier article de cette thèse a cherché à examiner la performance de l’hdPS lorsque des facteurs confondants connus et mesurés sont masqués à l’algorithme de sélection. Les résultats de ce dernier article indiquent que l’hdPS pourrait, au moins partiellement, ajuster pour des facteurs confondants masqués et qu’il pourrait donc potentiellement ajuster pour des facteurs confondants non mesurés. Ensemble ces résultats indiquent que l’hdPS serait supérieur au PS dans l’ajustement pour le biais d’indication et supportent son utilisation lors de futures études observationnelles basées sur des données médico-administratives.
Resumo:
This paper presents a writer identification scheme for Malayalam documents. As the accomplishment rate of a scheme is highly dependent on the features extracted from the documents, the process of feature selection and extraction is highly relevant. The paper describes a set of novel features exclusively for Malayalam language. The features were studied in detail which resulted in a comparative study of all the features. The features are fused to form the feature vector or knowledge vector. This knowledge vector is then used in all the phases of the writer identification scheme. The scheme has been tested on a test bed of 280 writers of which 50 writers having only one page, 215 writers with at least 2 pages and 15 writers with at least 4 pages. To perform a comparative evaluation of the scheme the test is conducted using WD-LBP method also. A recognition rate of around 95% was obtained for the proposed approach