Biblioteca Digital

438 resultados para Estimateur de Bayes

Classification supervisée multi-étiquette en actes de dialogue: analyse discriminante et transformations de Schoenberg

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract This work studies the multi-label classification of turns in simple English Wikipedia talk pages into dialog acts. The treated dataset was created and multi-labeled by (Ferschke et al., 2012). The first part analyses dependences between labels, in order to examine the annotation coherence and to determine a classification method. Then, a multi-label classification is computed, after transforming the problem into binary relevance. Regarding features, whereas (Ferschke et al., 2012) use features such as uni-, bi-, and trigrams, time distance between turns or the indentation level of the turn, other features are considered here: lemmas, part-of-speech tags and the meaning of verbs (according to WordNet). The dataset authors applied approaches such as Naive Bayes or Support Vector Machines. The present paper proposes, as an alternative, to use Schoenberg transformations which, following the example of kernel methods, transform original Euclidean distances into other Euclidean distances, in a space of high dimensionality. Résumé Ce travail étudie la classification supervisée multi-étiquette en actes de dialogue des tours de parole des contributeurs aux pages de discussion de Simple English Wikipedia (Wikipédia en anglais simple). Le jeu de données considéré a été créé et multi-étiqueté par (Ferschke et al., 2012). Une première partie analyse les relations entre les étiquettes pour examiner la cohérence des annotations et pour déterminer une méthode de classification. Ensuite, une classification supervisée multi-étiquette est effectuée, après recodage binaire des étiquettes. Concernant les variables, alors que (Ferschke et al., 2012) utilisent des caractéristiques telles que les uni-, bi- et trigrammes, le temps entre les tours de parole ou l'indentation d'un tour de parole, d'autres descripteurs sont considérés ici : les lemmes, les catégories morphosyntaxiques et le sens des verbes (selon WordNet). Les auteurs du jeu de données ont employé des approches telles que le Naive Bayes ou les Séparateurs à Vastes Marges (SVM) pour la classification. Cet article propose, de façon alternative, d'utiliser et d'étendre l'analyse discriminante linéaire aux transformations de Schoenberg qui, à l'instar des méthodes à noyau, transforment les distances euclidiennes originales en d'autres distances euclidiennes, dans un espace de haute dimensionnalité.

Statistical hypothesis testing and common misinterpretations: should we abandon p-value in forensic science applications?

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices.

Impact of a cis-associated gene expression SNP on chromosome 20q11.22 on bipolar disorder susceptibility, hippocampal structure and cognitive performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BackgroundBipolar disorder is a highly heritable polygenic disorder. Recent enrichment analyses suggest that there may be true risk variants for bipolar disorder in the expression quantitative trait loci (eQTL) in the brain.AimsWe sought to assess the impact of eQTL variants on bipolar disorder risk by combining data from both bipolar disorder genome-wide association studies (GWAS) and brain eQTL.MethodTo detect single nucleotide polymorphisms (SNPs) that influence expression levels of genes associated with bipolar disorder, we jointly analysed data from a bipolar disorder GWAS (7481 cases and 9250 controls) and a genome-wide brain (cortical) eQTL (193 healthy controls) using a Bayesian statistical method, with independent follow-up replications. The identified risk SNP was then further tested for association with hippocampal volume (n = 5775) and cognitive performance (n = 342) among healthy individuals.ResultsIntegrative analysis revealed a significant association between a brain eQTL rs6088662 on chromosome 20q11.22 and bipolar disorder (log Bayes factor = 5.48; bipolar disorder P = 5.85×10(-5)). Follow-up studies across multiple independent samples confirmed the association of the risk SNP (rs6088662) with gene expression and bipolar disorder susceptibility (P = 3.54×10(-8)). Further exploratory analysis revealed that rs6088662 is also associated with hippocampal volume and cognitive performance in healthy individuals.ConclusionsOur findings suggest that 20q11.22 is likely a risk region for bipolar disorder; they also highlight the informative value of integrating functional annotation of genetic variants for gene expression in advancing our understanding of the biological basis underlying complex disorders, such as bipolar disorder.

Dismissal of the illusion of uncertainty in the assessment of a likelihood ratio

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of the Bayes factor (BF) or likelihood ratio as a metric to assess the probative value of forensic traces is largely supported by operational standards and recommendations in different forensic disciplines. However, the progress towards more widespread consensus about foundational principles is still fragile as it raises new problems about which views differ. It is not uncommon e.g. to encounter scientists who feel the need to compute the probability distribution of a given expression of evidential value (i.e. a BF), or to place intervals or significance probabilities on such a quantity. The article here presents arguments to show that such views involve a misconception of principles and abuse of language. The conclusion of the discussion is that, in a given case at hand, forensic scientists ought to offer to a court of justice a given single value for the BF, rather than an expression based on a distribution over a range of values.

Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the global targets for non-communicable diseases is to halt, by 2025, the rise in the age-standardised adult prevalence of diabetes at its 2010 levels. We aimed to estimate worldwide trends in diabetes, how likely it is for countries to achieve the global target, and how changes in prevalence, together with population growth and ageing, are affecting the number of adults with diabetes. We pooled data from population-based studies that had collected data on diabetes through measurement of its biomarkers. We used a Bayesian hierarchical model to estimate trends in diabetes prevalence-defined as fasting plasma glucose of 7.0 mmol/L or higher, or history of diagnosis with diabetes, or use of insulin or oral hypoglycaemic drugs-in 200 countries and territories in 21 regions, by sex and from 1980 to 2014. We also calculated the posterior probability of meeting the global diabetes target if post-2000 trends continue. We used data from 751 studies including 4,372,000 adults from 146 of the 200 countries we make estimates for. Global age-standardised diabetes prevalence increased from 4.3% (95% credible interval 2.4-7.0) in 1980 to 9.0% (7.2-11.1) in 2014 in men, and from 5.0% (2.9-7.9) to 7.9% (6.4-9.7) in women. The number of adults with diabetes in the world increased from 108 million in 1980 to 422 million in 2014 (28.5% due to the rise in prevalence, 39.7% due to population growth and ageing, and 31.8% due to interaction of these two factors). Age-standardised adult diabetes prevalence in 2014 was lowest in northwestern Europe, and highest in Polynesia and Micronesia, at nearly 25%, followed by Melanesia and the Middle East and north Africa. Between 1980 and 2014 there was little change in age-standardised diabetes prevalence in adult women in continental western Europe, although crude prevalence rose because of ageing of the population. By contrast, age-standardised adult prevalence rose by 15 percentage points in men and women in Polynesia and Micronesia. In 2014, American Samoa had the highest national prevalence of diabetes (>30% in both sexes), with age-standardised adult prevalence also higher than 25% in some other islands in Polynesia and Micronesia. If post-2000 trends continue, the probability of meeting the global target of halting the rise in the prevalence of diabetes by 2025 at the 2010 level worldwide is lower than 1% for men and is 1% for women. Only nine countries for men and 29 countries for women, mostly in western Europe, have a 50% or higher probability of meeting the global target. Since 1980, age-standardised diabetes prevalence in adults has increased, or at best remained unchanged, in every country. Together with population growth and ageing, this rise has led to a near quadrupling of the number of adults with diabetes worldwide. The burden of diabetes, both in terms of prevalence and number of adults affected, has increased faster in low-income and middle-income countries than in high-income countries. Wellcome Trust.

Clasificación automática de textos y explotación BI

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El presente proyecto tiene como objetivo desarrollar una tecnología que permita codificar grandes cantidades de texto de manera automática para posteriormente ser visualizada y analizada mediante una aplicación diseñada en Qlikview. El motor de la investigación e implementación de este proyecto se ha encontrado en la incipiente presencia de tecnologías informáticas en los procesos de codificación para ciencias políticas. De esta manera, el programa creado tiene como objetivo automatizar un proceso que se desarrolla comúnmente de manera manual y, por ende, las ventajas de introducir técnicas informáticas son notablemente valiosas. Estas automatizaciones permiten ahorrar tanto en tiempo de codificación, como en recursos económicos o humanos. Se ha elaborado una revisión teórica y metodológica que han servido como instrumentos de estudio y mejora, con el firme propósito de reducir al máximo el margen de error y ofrecer un instrumento de calidad con salida de mercado real. El método de clasificación utilizado ha sido Bayes, y se ha implementado utilizando Matlab. Los resultados de la clasificación han llegado a índices del 99.2%. En la visualización y análisis mediante Qlikview se pueden modificar los parámetros referentes a partido político, año, categoría o región, con lo que se permite analizar numerosos aspectos relacionados con la distribución de las palabras repartidas entre las diferentes categorías y en el tiempo.

Efficiency of distinct data mining algorithms for classifying stress level in piglets from their vocalization

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Among the challenges of pig farming in today's competitive market, there is factor of the product traceability that ensures, among many points, animal welfare. Vocalization is a valuable tool to identify situations of stress in pigs, and it can be used in welfare records for traceability. The objective of this work was to identify stress in piglets using vocalization, calling this stress on three levels: no stress, moderate stress, and acute stress. An experiment was conducted on a commercial farm in the municipality of Holambra, São Paulo State , where vocalizations of twenty piglets were recorded during the castration procedure, and separated into two groups: without anesthesia and local anesthesia with lidocaine base. For the recording of acoustic signals, a unidirectional microphone was connected to a digital recorder, in which signals were digitized at a frequency of 44,100 Hz. For evaluation of sound signals, Praat® software was used, and different data mining algorithms were applied using Weka® software. The selection of attributes improved model accuracy, and the best attribute selection was used by applying Wrapper method, while the best classification algorithms were the k-NN and Naive Bayes. According to the results, it was possible to classify the level of stress in pigs through their vocalization.

Spatial distribution of pregnancy in adolescence and associations with socioeconomic and social responsibility indicators: State of Minas Gerais, Southeast of Brazil

Relevância:

10.00% 10.00%

Publicador:

Statistical segmentation methods and color variance analysis of retinal images

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this research, the effectiveness of Naive Bayes and Gaussian Mixture Models classifiers on segmenting exudates in retinal images is studied and the results are evaluated with metrics commonly used in medical imaging. Also, a color variation analysis of retinal images is carried out to find how effectively can retinal images be segmented using only the color information of the pixels.

Development and validation of a genotype 3 recombinant protein-based immunoassay for hepatitis E virus serology in swine

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hepatitis E virus (HEV) is classified within the family Hepeviridae, genus Hepevirus. HEV genotype 3 (Gt3) infections are endemic in pigs in Western Europe and in North and South America and cause zoonotic infections in humans. Several serological assays to detect HEV antibodies in pigs have been developed, at first mainly based on HEV genotype 1 (Gt1) antigens. To develop a sensitive HEV Gt3 ELISA, a recombinant baculovirus expression product of HEV Gt3 open reading frame-2 was produced and coated onto polystyrene ELISA plates. After incubation of porcine sera, bound HEV antibodies were detected with anti-porcine anti-IgG and anti-IgM conjugates. For primary estimation of sensitivity and specificity of the assay, sets of sera were used from pigs experimentally infected with HEV Gt3. For further validation of the assay and to set the cutoff value, a batch of 1100 pig sera was used. All pig sera were tested using the developed HEV Gt3 assay and two other serologic assays based on HEV Gt1 antigens. Since there is no gold standard available for HEV antibody testing, further validation and a definite setting of the cutoff of the developed HEV Gt3 assay were performed using a statistical approach based on Bayes' theorem. The developed and validated HEV antibody assay showed effective detection of HEV-specific antibodies. This assay can contribute to an improved detection of HEV antibodies and enable more reliable estimates of the prevalence of HEV Gt3 in swine in different regions.

A Bayesian Approach for Forecasting Heat Load in a District Heating System

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The growing population in cities increases the energy demand and affects the environment by increasing carbon emissions. Information and communications technology solutions which enable energy optimization are needed to address this growing energy demand in cities and to reduce carbon emissions. District heating systems optimize the energy production by reusing waste energy with combined heat and power plants. Forecasting the heat load demand in residential buildings assists in optimizing energy production and consumption in a district heating system. However, the presence of a large number of factors such as weather forecast, district heating operational parameters and user behavioural parameters, make heat load forecasting a challenging task. This thesis proposes a probabilistic machine learning model using a Naive Bayes classifier, to forecast the hourly heat load demand for three residential buildings in the city of Skellefteå, Sweden over a period of winter and spring seasons. The district heating data collected from the sensors equipped at the residential buildings in Skellefteå, is utilized to build the Bayesian network to forecast the heat load demand for horizons of 1, 2, 3, 6 and 24 hours. The proposed model is validated by using four cases to study the influence of various parameters on the heat load forecast by carrying out trace driven analysis in Weka and GeNIe. Results show that current heat load consumption and outdoor temperature forecast are the two parameters with most influence on the heat load forecast. The proposed model achieves average accuracies of 81.23 % and 76.74 % for a forecast horizon of 1 hour in the three buildings for winter and spring seasons respectively. The model also achieves an average accuracy of 77.97 % for three buildings across both seasons for the forecast horizon of 1 hour by utilizing only 10 % of the training data. The results indicate that even a simple model like Naive Bayes classifier can forecast the heat load demand by utilizing less training data.

Wood species identification by visual characterization of fibers

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Kandidaatintyö tehtiin osana PulpVision-tutkimusprojektia, jonka tarkoituksena on kehittää kuvapohjaisia laskenta- ja luokittelumetodeja sellun laaduntarkkailuun paperin valmistuksessa. Tämän tutkimusprojektin osana on aiemmin kehitetty metodi, jolla etsittiin kaarevia rakenteita kuvista, ja tätä metodia hyödynnettiin kuitujen etsintään kuvista. Tätä metodia käytettiin lähtökohtana kandidaatintyölle. Työn tarkoituksena oli tutkia, voidaanko erilaisista kuitukuvista laskettujen piirteiden avulla tunnistaa kuvassa olevien kuitujen laji. Näissä kuitukuvissa oli kuituja neljästä eri puulajista ja yhdestä kasvista. Nämä lajit olivat akasia, koivu, mänty, eukalyptus ja vehnä. Jokaisesta lajista valittiin 100 kuitukuvaa ja nämä kuvat jaettiin kahteen ryhmään, joista ensimmäistä käytettiin opetusryhmänä ja toista testausryhmänä. Opetusryhmän avulla jokaiselle kuitulajille laskettiin näitä kuvaavia piirteitä, joiden avulla pyrittiin tunnistamaan testausryhmän kuvissa olevat kuitulajit. Nämä kuvat oli tuottanut CEMIS-Oulu (Center for Measurement and Information Systems), joka on mittaustekniikkaan keskittynyt yksikkö Oulun yliopistossa. Yksittäiselle opetusryhmän kuitukuvalle laskettiin keskiarvot ja keskihajonnat kolmesta eri piirteestä, jotka olivat pituus, leveys ja kaarevuus. Lisäksi laskettiin, kuinka monta kuitua kuvasta löydettiin. Näiden piirteiden eri yhdistelmien avulla testattiin tunnistamisen tarkkuutta käyttämällä k:n lähimmän naapurin menetelmää ja Naiivi Bayes -luokitinta testausryhmän kuville. Testeistä saatiin lupaavia tuloksia muun muassa pituuden ja leveyden keskiarvoja käytettäessä saavutettiin jopa noin 98 %:n tarkkuus molemmilla algoritmeilla. Tunnistuksessa kuitujen keskimäärinen pituus vaikutti olevan kuitukuvia parhaiten kuvaava piirre. Käytettyjen algoritmien välillä ei ollut suurta vaihtelua tarkkuudessa. Testeissä saatujen tulosten perusteella voidaan todeta, että kuitukuvien tunnistaminen on mahdollista. Testien perusteella kuitukuvista tarvitsee laskea vain kaksi piirrettä, joilla kuidut voidaan tunnistaa tarkasti. Käytetyt lajittelualgoritmit olivat hyvin yksinkertaisia, mutta ne toimivat testeissä hyvin.

The Seemingly Unrelated Dynamic Cointegration Regression Model and Testing for Purching Power Parity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper studies seemingly unrelated linear models with integrated regressors and stationary errors. By adding leads and lags of the first differences of the regressors and estimating this augmented dynamic regression model by feasible generalized least squares using the long-run covariance matrix, we obtain an efficient estimator of the cointegrating vector that has a limiting mixed normal distribution. Simulation results suggest that this new estimator compares favorably with others already proposed in the literature. We apply these new estimators to the testing of purchasing power parity (PPP) among the G-7 countries. The test based on the efficient estimates rejects the PPP hypothesis for most countries.

The Bootstrap of Mean for Dependent Heterogeneous Arrays.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Presently, conditions ensuring the validity of bootstrap methods for the sample mean of (possibly heterogeneous) near epoch dependent (NED) functions of mixing processes are unknown. Here we establish the validity of the bootstrap in this context, extending the applicability of bootstrap methods to a class of processes broadly relevant for applications in economics and finance. Our results apply to two block bootstrap methods: the moving blocks bootstrap of Künsch ( 989) and Liu and Singh ( 992), and the stationary bootstrap of Politis and Romano ( 994). In particular, the consistency of the bootstrap variance estimator for the sample mean is shown to be robust against heteroskedasticity and dependence of unknown form. The first order asymptotic validity of the bootstrap approximation to the actual distribution of the sample mean is also established in this heterogeneous NED context.

Exact Nonparametric Two-Sample Homogeneity Tests for Possibly Discrete Distributions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we study several tests for the equality of two unknown distributions. Two are based on empirical distribution functions, three others on nonparametric probability density estimates, and the last ones on differences between sample moments. We suggest controlling the size of such tests (under nonparametric assumptions) by using permutational versions of the tests jointly with the method of Monte Carlo tests properly adjusted to deal with discrete distributions. We also propose a combined test procedure, whose level is again perfectly controlled through the Monte Carlo test technique and has better power properties than the individual tests that are combined. Finally, in a simulation experiment, we show that the technique suggested provides perfect control of test size and that the new tests proposed can yield sizeable power improvements.

«
1
2
...
9
10
11
12
13
14
15
...
29
30
»