942 resultados para hierarchical Bayesian models
Resumo:
In this thesis we address a multi-label hierarchical text classification problem in a low-resource setting and explore different approaches to identify the best one for our case. The goal is to train a model that classifies English school exercises according to a hierarchical taxonomy with few labeled data. The experiments made in this work employ different machine learning models and text representation techniques: CatBoost with tf-idf features, classifiers based on pre-trained models (mBERT, LASER), and SetFit, a framework for few-shot text classification. SetFit proved to be the most promising approach, achieving better performance when during training only a few labeled examples per class are available. However, this thesis does not consider all the hierarchical taxonomy, but only the first two levels: to address classification with the classes at the third level further experiments should be carried out, exploring methods for zero-shot text classification, data augmentation, and strategies to exploit the hierarchical structure of the taxonomy during training.
Resumo:
Prosopis rubriflora and Prosopis ruscifolia are important species in the Chaquenian regions of Brazil. Because of the restriction and frequency of their physiognomy, they are excellent models for conservation genetics studies. The use of microsatellite markers (Simple Sequence Repeats, SSRs) has become increasingly important in recent years and has proven to be a powerful tool for both ecological and molecular studies. In this study, we present the development and characterization of 10 new markers for P. rubriflora and 13 new markers for P. ruscifolia. The genotyping was performed using 40 P. rubriflora samples and 48 P. ruscifolia samples from the Chaquenian remnants in Brazil. The polymorphism information content (PIC) of the P. rubriflora markers ranged from 0.073 to 0.791, and no null alleles or deviation from Hardy-Weinberg equilibrium (HW) were detected. The PIC values for the P. ruscifolia markers ranged from 0.289 to 0.883, but a departure from HW and null alleles were detected for certain loci; however, this departure may have resulted from anthropic activities, such as the presence of livestock, which is very common in the remnant areas. In this study, we describe novel SSR polymorphic markers that may be helpful in future genetic studies of P. rubriflora and P. ruscifolia.
Resumo:
New DNA-based predictive tests for physical characteristics and inference of ancestry are highly informative tools that are being increasingly used in forensic genetic analysis. Two eye colour prediction models: a Bayesian classifier - Snipper and a multinomial logistic regression (MLR) system for the Irisplex assay, have been described for the analysis of unadmixed European populations. Since multiple SNPs in combination contribute in varying degrees to eye colour predictability in Europeans, it is likely that these predictive tests will perform in different ways amongst admixed populations that have European co-ancestry, compared to unadmixed Europeans. In this study we examined 99 individuals from two admixed South American populations comparing eye colour versus ancestry in order to reveal a direct correlation of light eye colour phenotypes with European co-ancestry in admixed individuals. Additionally, eye colour prediction following six prediction models, using varying numbers of SNPs and based on Snipper and MLR, were applied to the study populations. Furthermore, patterns of eye colour prediction have been inferred for a set of publicly available admixed and globally distributed populations from the HGDP-CEPH panel and 1000 Genomes databases with a special emphasis on admixed American populations similar to those of the study samples.
Resumo:
Garlic is a spice and a medicinal plant; hence, there is an increasing interest in 'developing' new varieties with different culinary properties or with high content of nutraceutical compounds. Phenotypic traits and dominant molecular markers are predominantly used to evaluate the genetic diversity of garlic clones. However, 24 SSR markers (codominant) specific for garlic are available in the literature, fostering germplasm researches. In this study, we genotyped 130 garlic accessions from Brazil and abroad using 17 polymorphic SSR markers to assess the genetic diversity and structure. This is the first attempt to evaluate a large set of accessions maintained by Brazilian institutions. A high level of redundancy was detected in the collection (50 % of the accessions represented eight haplotypes). However, non-redundant accessions presented high genetic diversity. We detected on average five alleles per locus, Shannon index of 1.2, HO of 0.5, and HE of 0.6. A core collection was set with 17 accessions, covering 100 % of the alleles with minimum redundancy. Overall FST and D values indicate a strong genetic structure within accessions. Two major groups identified by both model-based (Bayesian approach) and hierarchical clustering (UPGMA dendrogram) techniques were coherent with the classification of accessions according to maturity time (growth cycle): early-late and midseason accessions. Assessing genetic diversity and structure of garlic collections is the first step towards an efficient management and conservation of accessions in genebanks, as well as to advance future genetic studies and improvement of garlic worldwide.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
Health economic evaluations require estimates of expected survival from patients receiving different interventions, often over a lifetime. However, data on the patients of interest are typically only available for a much shorter follow-up time, from randomised trials or cohorts. Previous work showed how to use general population mortality to improve extrapolations of the short-term data, assuming a constant additive or multiplicative effect on the hazards for all-cause mortality for study patients relative to the general population. A more plausible assumption may be a constant effect on the hazard for the specific cause of death targeted by the treatments. To address this problem, we use independent parametric survival models for cause-specific mortality among the general population. Because causes of death are unobserved for the patients of interest, a polyhazard model is used to express their all-cause mortality as a sum of latent cause-specific hazards. Assuming proportional cause-specific hazards between the general and study populations then allows us to extrapolate mortality of the patients of interest to the long term. A Bayesian framework is used to jointly model all sources of data. By simulation, we show that ignoring cause-specific hazards leads to biased estimates of mean survival when the proportion of deaths due to the cause of interest changes through time. The methods are applied to an evaluation of implantable cardioverter defibrillators for the prevention of sudden cardiac death among patients with cardiac arrhythmia. After accounting for cause-specific mortality, substantial differences are seen in estimates of life years gained from implantable cardioverter defibrillators.
Resumo:
A common breeding strategy is to carry out basic studies to investigate the hypothesis of a single gene controlling the trait (major gene) with or without polygenes of minor effect. In this study we used Bayesian inference to fit genetic additive-dominance models of inheritance to plant breeding experiments with multiple generations. Normal densities with different means, according to the major gene genotype, were considered in a linear model in which the design matrix of the genetic effects had unknown coefficients (which were estimated in individual basis). An actual data set from an inheritance study of partenocarpy in zucchini (Cucurbita pepo L.) was used for illustration. Model fitting included posterior probabilities for all individual genotypes. Analysis agrees with results in the literature but this approach was far more efficient than previous alternatives assuming that design matrix was known for the generations. Partenocarpy in zucchini is controlled by a major gene with important additive effect and partial dominance.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
The aim of this study was to comparatively assess dental arch width, in the canine and molar regions, by means of direct measurements from plaster models, photocopies and digitized images of the models. The sample consisted of 130 pairs of plaster models, photocopies and digitized images of the models of white patients (n = 65), both genders, with Class I and Class II Division 1 malocclusions, treated by standard Edgewise mechanics and extraction of the four first premolars. Maxillary and mandibular intercanine and intermolar widths were measured by a calibrated examiner, prior to and after orthodontic treatment, using the three modes of reproduction of the dental arches. Dispersion of the data relative to pre- and posttreatment intra-arch linear measurements (mm) was represented as box plots. The three measuring methods were compared by one-way ANOVA for repeated measurements (α = 0.05). Initial / final mean values varied as follows: 33.94 to 34.29 mm / 34.49 to 34.66 mm (maxillary intercanine width); 26.23 to 26.26 mm / 26.77 to 26.84 mm (mandibular intercanine width); 49.55 to 49.66 mm / 47.28 to 47.45 mm (maxillary intermolar width) and 43.28 to 43.41 mm / 40.29 to 40.46 mm (mandibular intermolar width). There were no statistically significant differences between mean dental arch widths estimated by the three studied methods, prior to and after orthodontic treatment. It may be concluded that photocopies and digitized images of the plaster models provided reliable reproductions of the dental arches for obtaining transversal intra-arch measurements.
Resumo:
Dental impression is an important step in the preparation of prostheses since it provides the reproduction of anatomic and surface details of teeth and adjacent structures. The objective of this study was to evaluate the linear dimensional alterations in gypsum dies obtained with different elastomeric materials, using a resin coping impression technique with individual shells. A master cast made of stainless steel with fixed prosthesis characteristics with two prepared abutment teeth was used to obtain the impressions. References points (A, B, C, D, E and F) were recorded on the occlusal and buccal surfaces of abutments to register the distances. The impressions were obtained using the following materials: polyether, mercaptan-polysulfide, addition silicone, and condensation silicone. The transfer impressions were made with custom trays and an irreversible hydrocolloid material and were poured with type IV gypsum. The distances between identified points in gypsum dies were measured using an optical microscope and the results were statistically analyzed by ANOVA (p < 0.05) and Tukey's test. The mean of the distances were registered as follows: addition silicone (AB = 13.6 µm, CD=15.0 µm, EF = 14.6 µm, GH=15.2 µm), mercaptan-polysulfide (AB = 36.0 µm, CD = 36.0 µm, EF = 39.6 µm, GH = 40.6 µm), polyether (AB = 35.2 µm, CD = 35.6 µm, EF = 39.4 µm, GH = 41.4 µm) and condensation silicone (AB = 69.2 µm, CD = 71.0 µm, EF = 80.6 µm, GH = 81.2 µm). All of the measurements found in gypsum dies were compared to those of a master cast. The results demonstrated that the addition silicone provides the best stability of the compounds tested, followed by polyether, polysulfide and condensation silicone. No statistical differences were obtained between polyether and mercaptan-polysulfide materials.
Resumo:
The purpose of this study was to develop and validate equations to estimate the aboveground phytomass of a 30 years old plot of Atlantic Forest. In two plots of 100 m², a total of 82 trees were cut down at ground level. For each tree, height and diameter were measured. Leaves and woody material were separated in order to determine their fresh weights in field conditions. Samples of each fraction were oven dried at 80 °C to constant weight to determine their dry weight. Tree data were divided into two random samples. One sample was used for the development of the regression equations, and the other for validation. The models were developed using single linear regression analysis, where the dependent variable was the dry mass, and the independent variables were height (h), diameter (d) and d²h. The validation was carried out using Pearson correlation coefficient, paired t-Student test and standard error of estimation. The best equations to estimate aboveground phytomass were: lnDW = -3.068+2.522lnd (r² = 0.91; s y/x = 0.67) and lnDW = -3.676+0.951ln d²h (r² = 0.94; s y/x = 0.56).
Resumo:
Neste artigo apresentamos uma análise Bayesiana para o modelo de volatilidade estocástica (SV) e uma forma generalizada deste, cujo objetivo é estimar a volatilidade de séries temporais financeiras. Considerando alguns casos especiais dos modelos SV usamos algoritmos de Monte Carlo em Cadeias de Markov e o software WinBugs para obter sumários a posteriori para as diferentes formas de modelos SV. Introduzimos algumas técnicas Bayesianas de discriminação para a escolha do melhor modelo a ser usado para estimar as volatilidades e fazer previsões de séries financeiras. Um exemplo empírico de aplicação da metodologia é introduzido com a série financeira do IBOVESPA.
Resumo:
The enzyme purine nucleoside phosphorylase from Schistosoma mansoni (SmPNP) is an attractive molecular target for the treatment of major parasitic infectious diseases, with special emphasis on its role in the discovery of new drugs against schistosomiasis, a tropical disease that affects millions of people worldwide. In the present work, we have determined the inhibitory potency and developed descriptor- and fragment-based quantitative structure-activity relationships (QSAR) for a series of 9-deazaguanine analogs as inhibitors of SmPNP. Significant statistical parameters (descriptor-based model: r² = 0.79, q² = 0.62, r²pred = 0.52; and fragment-based model: r² = 0.95, q² = 0.81, r²pred = 0.80) were obtained, indicating the potential of the models for untested compounds. The fragment-based model was then used to predict the inhibitory potency of a test set of compounds, and the predicted values are in good agreement with the experimental results
Resumo:
In this work we report on a comparison of some theoretical models usually used to fit the dependence on temperature of the fundamental energy gap of semiconductor materials. We used in our investigations the theoretical models of Viña, Pässler-p and Pässler-ρ to fit several sets of experimental data, available in the literature for the energy gap of GaAs in the temperature range from 12 to 974 K. Performing several fittings for different values of the upper limit of the analyzed temperature range (Tmax), we were able to follow in a systematic way the evolution of the fitting parameters up to the limit of high temperatures and make a comparison between the zero-point values obtained from the different models by extrapolating the linear dependence of the gaps at high T to T = 0 K and that determined by the dependence of the gap on isotope mass. Using experimental data measured by absorption spectroscopy, we observed the non-linear behavior of Eg(T) of GaAs for T > ΘD.
Resumo:
The aim of this study was to determine the reproducibility, reliability and validity of measurements in digital models compared to plaster models. Fifteen pairs of plaster models were obtained from orthodontic patients with permanent dentition before treatment. These were digitized to be evaluated with the program Cécile3 v2.554.2 beta. Two examiners measured three times the mesiodistal width of all the teeth present, intercanine, interpremolar and intermolar distances, overjet and overbite. The plaster models were measured using a digital vernier. The t-Student test for paired samples and interclass correlation coefficient (ICC) were used for statistical analysis. The ICC of the digital models were 0.84 ± 0.15 (intra-examiner) and 0.80 ± 0.19 (inter-examiner). The average mean difference of the digital models was 0.23 ± 0.14 and 0.24 ± 0.11 for each examiner, respectively. When the two types of measurements were compared, the values obtained from the digital models were lower than those obtained from the plaster models (p < 0.05), although the differences were considered clinically insignificant (differences < 0.1 mm). The Cécile digital models are a clinically acceptable alternative for use in Orthodontics.