962 resultados para Unbiased estimates
Resumo:
Esta tese é composta por três artigos e uma nota, sendo um em cada capítulo. Todos os capítulos enquadram-se na área de Microeconomia Aplicada e Economia do Trabalho. O primeiro artigo estende o modelo tradicional de decomposição das flutuações na taxa de desemprego de Shimer (2012), separando o emprego formal do informal. Com essa modificação, os principais resultados da metodologia se alteram e conclui-se que os principais fatores para a queda do desemprego na última década foram (i) a queda na taxa de participação, principalmente pela menor entrada na força de trabalho; (ii) o aumento da formalização, atingido tanto pelo aumento da probabilidade de encontrar um trabalho formal quanto pela probabilidade de deixar a condição de empregado formal. O segundo capítulo apresenta estimativas para o retorno à educação no Brasil, utilizando uma nova metodologia que não necessita de variáveis de exclusão. A vantagem do método em relação a abordagens que utilizam variáveis instrumentais é a de permitir avaliar o retorno médio para todos os trabalhadores (e não somente os afetados pelos instrumentos) e em qualquer instante do tempo. Face aos resultados, concluímos as estimativas via MQO subestimam o retorno médio. Discute-se possíveis explicações para esse fenômeno. O terceiro artigo trata da terceirização da mão de obra no Brasil. Mais especificamente, mede-se o diferencial de salários entre os trabalhadores terceirizados e os contratados diretamente. Os resultados de uma comparação não condicional indicam que os terceirizados têm salário médio 17% menor no período 2007 a 2012. Porém, com estimativas que levam em conta o efeito fixo de cada trabalhador, esse diferencial cai para 3,0%. Além disso, o diferencial é bastante heterogêneo entre os tipos de serviços: aqueles que utilizam trabalhadores de baixa qualificação apresentam salário menores, enquanto nas ocupações de maior qualificação os terceirizados têm salários iguais ou maiores do que os diretamente contratados. Mais ainda, as evidencias apontam para a diminuição do diferencial ao longo do tempo no período analisado. Finalmente, a nota que encerra a tese documenta dois aspectos relevantes e pouco conhecidos da Pesquisa Mensal de Emprego do IBGE que podem levar a resultados imprecisos nas pesquisas que utilizam esse painel se não forem tratados adequadamente.
Resumo:
Quantitative data on lung structure are essential to set up structure-function models for assessing the functional performance of the lung or to make statistically valid comparisons in experimental morphology, physiology, or pathology. The methods of choice for microscopy-based lung morphometry are those of stereology, the science of quantitative characterization of irregular three-dimensional objects on the basis of measurements made on two-dimensional sections. From a practical perspective, stereology is an assumption-free set of methods of unbiased sampling with geometric probes, based on a solid mathematical foundation. Here, we discuss the pitfalls of lung morphometry and present solutions, from specimen preparation to the sampling scheme in multiple stages, for obtaining unbiased estimates of morphometric parameters such as volumes, surfaces, lengths, and numbers. This is demonstrated on various examples. Stereological methods are accurate, efficient, simple, and transparent; the precision of the estimates depends on the size and distribution of the sample. For obtaining quantitative data on lung structure at all microscopic levels, state-of-the-art stereology is the gold standard.
Resumo:
Investigators interested in whether a disease aggregates in families often collect case-control family data, which consist of disease status and covariate information for families selected via case or control probands. Here, we focus on the use of case-control family data to investigate the relative contributions to the disease of additive genetic effects (A), shared family environment (C), and unique environment (E). To this end, we describe a ACE model for binary family data and then introduce an approach to fitting the model to case-control family data. The structural equation model, which has been described previously, combines a general-family extension of the classic ACE twin model with a (possibly covariate-specific) liability-threshold model for binary outcomes. Our likelihood-based approach to fitting involves conditioning on the proband’s disease status, as well as setting prevalence equal to a pre-specified value that can be estimated from the data themselves if necessary. Simulation experiments suggest that our approach to fitting yields approximately unbiased estimates of the A, C, and E variance components, provided that certain commonly-made assumptions hold. These assumptions include: the usual assumptions for the classic ACE and liability-threshold models; assumptions about shared family environment for relative pairs; and assumptions about the case-control family sampling, including single ascertainment. When our approach is used to fit the ACE model to Austrian case-control family data on depression, the resulting estimate of heritability is very similar to those from previous analyses of twin data.
Resumo:
In rheumatology and joint research, as in other fields, a purely descriptional appqoach to morphology cannot satisfy the exactions of modern clinical medicine. Investigators now appreciate the need to gauge pathological changes and their response to treatment by quantifying susceptible structural parameters. But the desired information respecting three-dimensional structures must be gleaned from either actual or virtual two-dimensional sections through the tissue. This information can be obtained only if the laws governing stereology are respected. In this chapter, the stereological principles that must be applied, and the practical methods that have been devised, to yield unbiased estimates of the most commonly determined structural parameters, namely, volume, surface area and number, are summarized.
Resumo:
Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^
Resumo:
In recent years, disaster preparedness through assessment of medical and special needs persons (MSNP) has taken a center place in public eye in effect of frequent natural disasters such as hurricanes, storm surge or tsunami due to climate change and increased human activity on our planet. Statistical methods complex survey design and analysis have equally gained significance as a consequence. However, there exist many challenges still, to infer such assessments over the target population for policy level advocacy and implementation. ^ Objective. This study discusses the use of some of the statistical methods for disaster preparedness and medical needs assessment to facilitate local and state governments for its policy level decision making and logistic support to avoid any loss of life and property in future calamities. ^ Methods. In order to obtain precise and unbiased estimates for Medical Special Needs Persons (MSNP) and disaster preparedness for evacuation in Rio Grande Valley (RGV) of Texas, a stratified and cluster-randomized multi-stage sampling design was implemented. US School of Public Health, Brownsville surveyed 3088 households in three counties namely Cameron, Hidalgo, and Willacy. Multiple statistical methods were implemented and estimates were obtained taking into count probability of selection and clustering effects. Statistical methods for data analysis discussed were Multivariate Linear Regression (MLR), Survey Linear Regression (Svy-Reg), Generalized Estimation Equation (GEE) and Multilevel Mixed Models (MLM) all with and without sampling weights. ^ Results. Estimated population for RGV was 1,146,796. There were 51.5% female, 90% Hispanic, 73% married, 56% unemployed and 37% with their personal transport. 40% people attained education up to elementary school, another 42% reaching high school and only 18% went to college. Median household income is less than $15,000/year. MSNP estimated to be 44,196 (3.98%) [95% CI: 39,029; 51,123]. All statistical models are in concordance with MSNP estimates ranging from 44,000 to 48,000. MSNP estimates for statistical methods are: MLR (47,707; 95% CI: 42,462; 52,999), MLR with weights (45,882; 95% CI: 39,792; 51,972), Bootstrap Regression (47,730; 95% CI: 41,629; 53,785), GEE (47,649; 95% CI: 41,629; 53,670), GEE with weights (45,076; 95% CI: 39,029; 51,123), Svy-Reg (44,196; 95% CI: 40,004; 48,390) and MLM (46,513; 95% CI: 39,869; 53,157). ^ Conclusion. RGV is a flood zone, most susceptible to hurricanes and other natural disasters. People in the region are mostly Hispanic, under-educated with least income levels in the U.S. In case of any disaster people in large are incapacitated with only 37% have their personal transport to take care of MSNP. Local and state government’s intervention in terms of planning, preparation and support for evacuation is necessary in any such disaster to avoid loss of precious human life. ^ Key words: Complex Surveys, statistical methods, multilevel models, cluster randomized, sampling weights, raking, survey regression, generalized estimation equations (GEE), random effects, Intracluster correlation coefficient (ICC).^
Resumo:
The objective of this dissertation was to design and implement strategies for assessment of exposures to organic chemicals used in the production of a styrene-butadiene polymer at the Texas Plastics Company (TPC). Linear statistical retrospective exposure models, univariate and multivariate, were developed based on the validation of historical industrial hygiene monitoring data collected by industrial hygienists at TPC, and additional current industrial hygiene monitoring data collected for the purposes of this study. The current monitoring data served several purposes. First, it provided information on current exposure data, in the form of unbiased estimates of mean exposure to organic chemicals for each job title included. Second, it provided information on homogeneity of exposure within each job title, through the use of a carefully designed sampling scheme which addressed variability of exposure both between and within job titles. Third, it permitted the investigation of how well current exposure data can serve as an evaluation tool for retrospective exposure estimation. Finally, this dissertation investigated the simultaneous evaluation of exposure to several chemicals, as well as the use of values below detection limits in a multivariate linear statistical model of exposures. ^
Resumo:
The measurement of fast changing temperature fluctuations is a challenging problem due to the inherent limited bandwidth of temperature sensors. This results in a measured signal that is a lagged and attenuated version of the input. Compensation can be performed provided an accurate, parameterised sensor model is available. However, to account for the in influence of the measurement environment and changing conditions such as gas velocity, the model must be estimated in-situ. The cross-relation method of blind deconvolution is one approach for in-situ characterisation of sensors. However, a drawback with the method is that it becomes positively biased and unstable at high noise levels. In this paper, the cross-relation method is cast in the discrete-time domain and a bias compensation approach is developed. It is shown that the proposed compensation scheme is robust and yields unbiased estimates with lower estimation variance than the uncompensated version. All results are verified using Monte-Carlo simulations.
Resumo:
The measurement of fast changing temperature fluctuations is a challenging problem due to the inherent limited bandwidth of temperature sensors. This results in a measured signal that is a lagged and attenuated version of the input. Compensation can be performed provided an accurate, parameterised sensor model is available. However, to account for the influence of the measurement environment and changing conditions such as gas velocity, the model must be estimated in-situ. The cross-relation method of blind deconvolution is one approach for in-situ characterisation of sensors. However, a drawback with the method is that it becomes positively biased and unstable at high noise levels. In this paper, the cross-relation method is cast in the discrete-time domain and a bias compensation approach is developed. It is shown that the proposed compensation scheme is robust and yields unbiased estimates with lower estimation variance than the uncompensated version. All results are verified using Monte-Carlo simulations.
Resumo:
Plantings of mixed native species (termed 'environmental plantings') are increasingly being established for carbon sequestration whilst providing additional environmental benefits such as biodiversity and water quality. In Australia, they are currently one of the most common forms of reforestation. Investment in establishing and maintaining such plantings relies on having a cost-effective modelling approach to providing unbiased estimates of biomass production and carbon sequestration rates. In Australia, the Full Carbon Accounting Model (FullCAM) is used for both national greenhouse gas accounting and project-scale sequestration activities. Prior to undertaking the work presented here, the FullCAM tree growth curve was not calibrated specifically for environmental plantings and generally under-estimated their biomass. Here we collected and analysed above-ground biomass data from 605 mixed-species environmental plantings, and tested the effects of several planting characteristics on growth rates. Plantings were then categorised based on significant differences in growth rates. Growth of plantings differed between temperate and tropical regions. Tropical plantings were relatively uniform in terms of planting methods and their growth was largely related to stand age, consistent with the un-calibrated growth curve. However, in temperate regions where plantings were more variable, key factors influencing growth were planting width, stand density and species-mix (proportion of individuals that were trees). These categories provided the basis for FullCAM calibration. Although the overall model efficiency was only 39-46%, there was nonetheless no significant bias when the model was applied to the various planting categories. Thus, modelled estimates of biomass accumulation will be reliable on average, but estimates at any particular location will be uncertain, with either under- or over-prediction possible. When compared with the un-calibrated yield curves, predictions using the new calibrations show that early growth is likely to be more rapid and total above-ground biomass may be higher for many plantings at maturity. This study has considerably improved understanding of the patterns of growth in different types of environmental plantings, and in modelling biomass accumulation in young (<25. years old) plantings. However, significant challenges remain to understand longer-term stand dynamics, particularly with temporal changes in stand density and species composition. © 2014.
Resumo:
This work presents Bayes invariant quadratic unbiased estimator, for short BAIQUE. Bayesian approach is used here to estimate the covariance functions of the regionalized variables which appear in the spatial covariance structure in mixed linear model. Firstly a brief review of spatial process, variance covariance components structure and Bayesian inference is given, since this project deals with these concepts. Then the linear equations model corresponding to BAIQUE in the general case is formulated. That Bayes estimator of variance components with too many unknown parameters is complicated to be solved analytically. Hence, in order to facilitate the handling with this system, BAIQUE of spatial covariance model with two parameters is considered. Bayesian estimation arises as a solution of a linear equations system which requires the linearity of the covariance functions in the parameters. Here the availability of prior information on the parameters is assumed. This information includes apriori distribution functions which enable to find the first and the second moments matrix. The Bayesian estimation suggested here depends only on the second moment of the prior distribution. The estimation appears as a quadratic form y'Ay , where y is the vector of filtered data observations. This quadratic estimator is used to estimate the linear function of unknown variance components. The matrix A of BAIQUE plays an important role. If such a symmetrical matrix exists, then Bayes risk becomes minimal and the unbiasedness conditions are fulfilled. Therefore, the symmetry of this matrix is elaborated in this work. Through dealing with the infinite series of matrices, a representation of the matrix A is obtained which shows the symmetry of A. In this context, the largest singular value of the decomposed matrix of the infinite series is considered to deal with the convergence condition and also it is connected with Gerschgorin Discs and Poincare theorem. Then the BAIQUE model for some experimental designs is computed and compared. The comparison deals with different aspects, such as the influence of the position of the design points in a fixed interval. The designs that are considered are those with their points distributed in the interval [0, 1]. These experimental structures are compared with respect to the Bayes risk and norms of the matrices corresponding to distances, covariance structures and matrices which have to satisfy the convergence condition. Also different types of the regression functions and distance measurements are handled. The influence of scaling on the design points is studied, moreover, the influence of the covariance structure on the best design is investigated and different covariance structures are considered. Finally, BAIQUE is applied for real data. The corresponding outcomes are compared with the results of other methods for the same data. Thereby, the special BAIQUE, which estimates the general variance of the data, achieves a very close result to the classical empirical variance.
Resumo:
Only a small fraction of spectra acquired in LC-MS/MS runs matches peptides from target proteins upon database searches. The remaining, operationally termed background, spectra originate from a variety of poorly controlled sources and affect the throughput and confidence of database searches. Here, we report an algorithm and its software implementation that rapidly removes background spectra, regardless of their precise origin. The method estimates the dissimilarity distance between screened MS/MS spectra and unannotated spectra from a partially redundant background library compiled from several control and blank runs. Filtering MS/MS queries enhanced the protein identification capacity when searches lacked spectrum to sequence matching specificity. In sequence-similarity searches it reduced by, on average, 30-fold the number of orphan hits, which were not explicitly related to background protein contaminants and required manual validation. Removing high quality background MS/MS spectra, while preserving in the data set the genuine spectra from target proteins, decreased the false positive rate of stringent database searches and improved the identification of low-abundance proteins.
Resumo:
Aboveground tropical tree biomass and carbon storage estimates commonly ignore tree height (H). We estimate the effect of incorporating H on tropics-wide forest biomass estimates in 327 plots across four continents using 42 656 H and diameter measurements and harvested trees from 20 sites to answer the following questions: 1. What is the best H-model form and geographic unit to include in biomass models to minimise site-level uncertainty in estimates of destructive biomass? 2. To what extent does including H estimates derived in (1) reduce uncertainty in biomass estimates across all 327 plots? 3. What effect does accounting for H have on plot- and continental-scale forest biomass estimates? The mean relative error in biomass estimates of destructively harvested trees when including H (mean 0.06), was half that when excluding H (mean 0.13). Power- andWeibull-H models provided the greatest reduction in uncertainty, with regional Weibull-H models preferred because they reduce uncertainty in smaller-diameter classes (?40 cm D) that store about one-third of biomass per hectare in most forests. Propagating the relationships from destructively harvested tree biomass to each of the 327 plots from across the tropics shows that including H reduces errors from 41.8Mgha?1 (range 6.6 to 112.4) to 8.0Mgha?1 (?2.5 to 23.0). For all plots, aboveground live biomass was ?52.2 Mgha?1 (?82.0 to ?20.3 bootstrapped 95%CI), or 13%, lower when including H estimates, with the greatest relative reductions in estimated biomass in forests of the Brazilian Shield, east Africa, and Australia, and relatively little change in the Guiana Shield, central Africa and southeast Asia. Appreciably different stand structure was observed among regions across the tropical continents, with some storing significantly more biomass in small diameter stems, which affects selection of the best height models to reduce uncertainty and biomass reductions due to H. After accounting for variation in H, total biomass per hectare is greatest in Australia, the Guiana Shield, Asia, central and east Africa, and lowest in eastcentral Amazonia, W. Africa, W. Amazonia, and the Brazilian Shield (descending order). Thus, if tropical forests span 1668 million km2 and store 285 Pg C (estimate including H), then applying our regional relationships implies that carbon storage is overestimated by 35 PgC (31?39 bootstrapped 95%CI) if H is ignored, assuming that the sampled plots are an unbiased statistical representation of all tropical forest in terms of biomass and height factors. Our results show that tree H is an important allometric factor that needs to be included in future forest biomass estimates to reduce error in estimates of tropical carbon stocks and emissions due to deforestation.
Resumo:
The aim of this study was to estimate genetic parameters to support the selection of bacuri progenies for a first cycle of recurrent selection, using the REML/BLUP (restricted maximum likelihood/best linear unbiased prediction) procedure to estimate the variance components and genotypic values. Were evaluated twelve variables in a total of 210 fruits from 39 different seed trees, from a field trial with an experimental design of incomplete blocks with clonal replies among subplots. The three variables related with the fruit development (weight, diameter, length) showed strong correlation, and where fruit length showed higher heritability and potential to be used for indirect selection. Among the 39 progenies evaluated in this study, five present potential to compose the next cycle of recurrent selection, due they hold good selection differential either to agrotechnological variables as to development of bacuri fruit.
Resumo:
This study aims to estimate an adult-equivalent scale for calorie requirements and to determine the differences between adult-equivalent and per capita measurements of calorie availability in the Brazilian population. The study used data from the 2002-2003 Brazilian Household Budget Survey. The calorie requirement for a reference adult individual was based on the mean requirements for adult males and females (2,550kcal/day). The conversion factors were defined as the ratios between the calorie requirements for each age group and gender and that of the reference adult. The adult-equivalent calorie availability levels were higher than the per capita levels, with the largest differences in rural and low-income households. Differences in household calorie availability varied from 22kcal/day (households with adults and an adolescent) to 428kcal/day (households with elderly individuals), thus showing that per capital measurements can underestimate the real calorie availability, since they overlook differences in household composition.