869 resultados para Bayesian aggregation
Resumo:
Les modèles à sur-représentation de zéros discrets et continus ont une large gamme d'applications et leurs propriétés sont bien connues. Bien qu'il existe des travaux portant sur les modèles discrets à sous-représentation de zéro et modifiés à zéro, la formulation usuelle des modèles continus à sur-représentation -- un mélange entre une densité continue et une masse de Dirac -- empêche de les généraliser afin de couvrir le cas de la sous-représentation de zéros. Une formulation alternative des modèles continus à sur-représentation de zéros, pouvant aisément être généralisée au cas de la sous-représentation, est présentée ici. L'estimation est d'abord abordée sous le paradigme classique, et plusieurs méthodes d'obtention des estimateurs du maximum de vraisemblance sont proposées. Le problème de l'estimation ponctuelle est également considéré du point de vue bayésien. Des tests d'hypothèses classiques et bayésiens visant à déterminer si des données sont à sur- ou sous-représentation de zéros sont présentées. Les méthodes d'estimation et de tests sont aussi évaluées au moyen d'études de simulation et appliquées à des données de précipitation agrégées. Les diverses méthodes s'accordent sur la sous-représentation de zéros des données, démontrant la pertinence du modèle proposé. Nous considérons ensuite la classification d'échantillons de données à sous-représentation de zéros. De telles données étant fortement non normales, il est possible de croire que les méthodes courantes de détermination du nombre de grappes s'avèrent peu performantes. Nous affirmons que la classification bayésienne, basée sur la distribution marginale des observations, tiendrait compte des particularités du modèle, ce qui se traduirait par une meilleure performance. Plusieurs méthodes de classification sont comparées au moyen d'une étude de simulation, et la méthode proposée est appliquée à des données de précipitation agrégées provenant de 28 stations de mesure en Colombie-Britannique.
Resumo:
We investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 families were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two complementary approaches for analyzing correlated time-to-onset family data were considered: the generalized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to changing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression parameters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for estimating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma.
Resumo:
As a thorough aggregation of probability and graph theory, Bayesian networks currently enjoy widespread interest as a means for studying factors that affect the coherent evaluation of scientific evidence in forensic science. Paper I of this series of papers intends to contribute to the discussion of Bayesian networks as a framework that is helpful for both illustrating and implementing statistical procedures that are commonly employed for the study of uncertainties (e.g. the estimation of unknown quantities). While the respective statistical procedures are widely described in literature, the primary aim of this paper is to offer an essentially non-technical introduction on how interested readers may use these analytical approaches - with the help of Bayesian networks - for processing their own forensic science data. Attention is mainly drawn to the structure and underlying rationale of a series of basic and context-independent network fragments that users may incorporate as building blocs while constructing larger inference models. As an example of how this may be done, the proposed concepts will be used in a second paper (Part II) for specifying graphical probability networks whose purpose is to assist forensic scientists in the evaluation of scientific evidence encountered in the context of forensic document examination (i.e. results of the analysis of black toners present on printed or copied documents).
Resumo:
Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Resumo:
Temporal replicate counts are often aggregated to improve model fit by reducing zero-inflation and count variability, and in the case of migration counts collected hourly throughout a migration, allows one to ignore nonindependence. However, aggregation can represent a loss of potentially useful information on the hourly or seasonal distribution of counts, which might impact our ability to estimate reliable trends. We simulated 20-year hourly raptor migration count datasets with known rate of change to test the effect of aggregating hourly counts to daily or annual totals on our ability to recover known trend. We simulated data for three types of species, to test whether results varied with species abundance or migration strategy: a commonly detected species, e.g., Northern Harrier, Circus cyaneus; a rarely detected species, e.g., Peregrine Falcon, Falco peregrinus; and a species typically counted in large aggregations with overdispersed counts, e.g., Broad-winged Hawk, Buteo platypterus. We compared accuracy and precision of estimated trends across species and count types (hourly/daily/annual) using hierarchical models that assumed a Poisson, negative binomial (NB) or zero-inflated negative binomial (ZINB) count distribution. We found little benefit of modeling zero-inflation or of modeling the hourly distribution of migration counts. For the rare species, trends analyzed using daily totals and an NB or ZINB data distribution resulted in a higher probability of detecting an accurate and precise trend. In contrast, trends of the common and overdispersed species benefited from aggregation to annual totals, and for the overdispersed species in particular, trends estimating using annual totals were more precise, and resulted in lower probabilities of estimating a trend (1) in the wrong direction, or (2) with credible intervals that excluded the true trend, as compared with hourly and daily counts.
Resumo:
Histological and histochemical observations support the hypothesis that collagen fibers can link to elastic fibers. However, the resulting organization of elastin and collagen type complexes and differences between these materials in terms of macromolecular orientation and frequencies of their chemical vibrational groups have not yet been solved. This study aimed to investigate the macromolecular organization of pure elastin, collagen type I and elastin-collagen complexes using polarized light DIC-microscopy. Additionally, differences and similarities between pure elastin and collagen bundles (CB) were investigated by Fourier transform-infrared (FT-IR) microspectroscopy. Although elastin exhibited a faint birefringence, the elastin-collagen complex aggregates formed in solution exhibited a deep birefringence and formation of an ordered-supramolecular complex typical of collagen chiral structure. The FT-IR study revealed elastin and CB peptide NH groups involved in different types of H-bonding. More energy is absorbed in the vibrational transitions corresponding to CH, CH2 and CH3 groups (probably associated with the hydrophobicity demonstrated by 8-anilino-1-naphtalene sulfonic acid sodium salt [ANS] fluorescence), and to νCN, δNH and ωCH2 groups of elastin compared to CB. It is assumed that the α-helix contribution to the pure elastin amide I profile is 46.8%, whereas that of the B-sheet is 20% and that unordered structures contribute to the remaining percentage. An FT-IR profile library reveals that the elastin signature within the 1360-1189cm(-1) spectral range resembles that of Conex-Toray aramid fibers.
Resumo:
In this study the role of different metal centers (magnesium, zinc and copper) on the enhancement of the hydrophilic character of metallochlorophylls, was evaluated. The solvatochromism as well as the aggregation process for these compounds in water/ethanol mixtures at different volume ratios were evaluated using Fluorescence, and Resonant Light Scattering (RLS) measurements, aiming to characterize the behavior of these compounds. Independently on the studied metallochlorophyll, the presence of at least 60% of water results in a considerable increase in the fluorescence emission, probably a direct consequence of a lower aggregation of these compounds, which is confirmed by the results from RLS measurements. Additionally, the results suggest that magnesium and zinc chlorophyll should be promising phototherapeutic agents for Photodynamic Therapy.
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
Hardy-Weinberg Equilibrium (HWE) is an important genetic property that populations should have whenever they are not observing adverse situations as complete lack of panmixia, excess of mutations, excess of selection pressure, etc. HWE for decades has been evaluated; both frequentist and Bayesian methods are in use today. While historically the HWE formula was developed to examine the transmission of alleles in a population from one generation to the next, use of HWE concepts has expanded in human diseases studies to detect genotyping error and disease susceptibility (association); Ryckman and Williams (2008). Most analyses focus on trying to answer the question of whether a population is in HWE. They do not try to quantify how far from the equilibrium the population is. In this paper, we propose the use of a simple disequilibrium coefficient to a locus with two alleles. Based on the posterior density of this disequilibrium coefficient, we show how one can conduct a Bayesian analysis to verify how far from HWE a population is. There are other coefficients introduced in the literature and the advantage of the one introduced in this paper is the fact that, just like the standard correlation coefficients, its range is bounded and it is symmetric around zero (equilibrium) when comparing the positive and the negative values. To test the hypothesis of equilibrium, we use a simple Bayesian significance test, the Full Bayesian Significance Test (FBST); see Pereira, Stern andWechsler (2008) for a complete review. The disequilibrium coefficient proposed provides an easy and efficient way to make the analyses, especially if one uses Bayesian statistics. A routine in R programs (R Development Core Team, 2009) that implements the calculations is provided for the readers.
Resumo:
Animal cloning has been associated with developmental abnormalities, with the level of heteroplasmy caused by the procedure being one of its potential limiting factors. The aim of this study was to determine the effect of the fusion of hemicytoplasts or aggregation of hemiembryos, varying the final cytoplasmic volume, on development and cell density of embryos produced by hand-made cloning (HMC), parthenogenesis or by in vitro fertilization (IVF). One or two enucleated hemicytoplasts were paired and fused with one skin somatic cell. Activated clone and zona-free parthenote embryos and hemiembryos were in vitro cultured in the well-of-the-well (WOW) system, being allocated to one of six experimental groups, on a per WOW basis: single clone or parthenote hemiembryos (1 x 50%); aggregation of two (2 x 50%), three (3 x 50%), or four (4 x 50%) clone or parthenote hemiembryos; single clone or parthenote embryos (1 x 100%); or aggregation of two clone or parthenote embryos (2 x 100%). Control zona-intact parthenote or IVF embryos were in vitro cultured in four-well dishes. Results indicated that the increase in the number of aggregated structures within each WOW was followed by a linear increase in cleavage, blastocyst rate, and cell density. The increase in cytoplasmic volume, either by fusion or by aggregation, had a positive effect on embryo development, supporting the establishment of pregnancies and the birth of a viable clone calf after transfer to recipients. However, embryo aggregation did not improve development on a hemicytoplast basis, except for the aggregation of two clone embryos.
Resumo:
Background: Protein aggregates containing alpha-synuclein, beta-amyloid and hyperphosphorylated tau are commonly found during neurodegenerative processes which is often accompanied by the impairment of mitochondrial complex I respiratory chain and dysfunction of cellular systems of protein degradation. In view of this, we aimed to develop an in vitro model to study protein aggregation associated to neurodegenerative diseases using cultured cells from hippocampus, locus coeruleus and substantia nigra of newborn Lewis rats exposed to 0.5, 1, 10 and 25 nM of rotenone, which is an agricultural pesticide, for 48 hours. Results: We demonstrated that the proportion of cells in culture is approximately the same as found in the brain nuclei they were extracted from. Rotenone at 0.5 nM was able to induce alpha-synuclein and beta amyloid aggregation, as well as increased hyperphosphorylation of tau, although high concentrations of this pesticide (over 1 nM) lead cells to death before protein aggregation. We also demonstrated that the 14kDa isoform of alpha-synuclein is not present in newborn Lewis rats. Conclusion: Rotenone exposure may lead to constitutive protein aggregation in vitro, which may be of relevance to study the mechanisms involved in idiopathic neurodegeneration.
Resumo:
We propose and analyze two different Bayesian online algorithms for learning in discrete Hidden Markov Models and compare their performance with the already known Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of generalization we draw learning curves in simplified situations for these algorithms and compare their performances.
Resumo:
Gas aggregation is a well known method used to produce clusters of different materials with good size control, reduced dispersion, and precise stoichiometry. The cost of these systems is relatively high and they are generally dedicated apparatuses. Furthermore, the usual sample production speed of these systems is not as fast as physical vapor deposition devices posing a problem when thick samples are needed. In this paper we describe the development of a multipurpose gas aggregation system constructed as an adaptation to a magnetron sputtering system. The cost of this adaptation is negligible and its installation and operation are both remarkably simple. The gas flow for flux in the range of 60-130 SCCM (SCCM denotes cubic centimeter per minute at STP) is able to completely collimate all the sputtered material, producing spherical nanoparticles. Co nanoparticles were produced and characterized using electron microscopy techniques and Rutherford back-scattering analysis. The size of the particles is around 10 nm with around 75 nm/min of deposition rate at the center of a Gaussian profile nanoparticle beam.
Resumo:
Chagas disease is still a major public health problem in Latin America. Its causative agent, Trypanosoma cruzi, can be typed into three major groups, T. cruzi I, T. cruzi II and hybrids. These groups each have specific genetic characteristics and epidemiological distributions. Several highly virulent strains are found in the hybrid group; their origin is still a matter of debate. The null hypothesis is that the hybrids are of polyphyletic origin, evolving independently from various hybridization events. The alternative hypothesis is that all extant hybrid strains originated from a single hybridization event. We sequenced both alleles of genes encoding EF-1 alpha, actin and SSU rDNA of 26 T. cruzi strains and DHFR-TS and TR of 12 strains. This information was used for network genealogy analysis and Bayesian phylogenies. We found T. cruzi I and T. cruzi II to be monophyletic and that all hybrids had different combinations of T. cruzi I and T. cruzi II haplotypes plus hybrid-specific haplotypes. Bootstrap values (networks) and posterior probabilities (Bayesian phylogenies) of clades supporting the monophyly of hybrids were far below the 95% confidence interval, indicating that the hybrid group is polyphyletic. We hypothesize that T. cruzi I and T. cruzi II are two different species and that the hybrids are extant representatives of independent events of genome hybridization, which sporadically have sufficient fitness to impact on the epidemiology of Chagas disease.
Resumo:
Here, I investigate the use of Bayesian updating rules applied to modeling how social agents change their minds in the case of continuous opinion models. Given another agent statement about the continuous value of a variable, we will see that interesting dynamics emerge when an agent assigns a likelihood to that value that is a mixture of a Gaussian and a uniform distribution. This represents the idea that the other agent might have no idea about what is being talked about. The effect of updating only the first moments of the distribution will be studied, and we will see that this generates results similar to those of the bounded confidence models. On also updating the second moment, several different opinions always survive in the long run, as agents become more stubborn with time. However, depending on the probability of error and initial uncertainty, those opinions might be clustered around a central value.