832 resultados para Poisson generalized linear mixed models
Resumo:
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Resumo:
In recent years, disaster preparedness through assessment of medical and special needs persons (MSNP) has taken a center place in public eye in effect of frequent natural disasters such as hurricanes, storm surge or tsunami due to climate change and increased human activity on our planet. Statistical methods complex survey design and analysis have equally gained significance as a consequence. However, there exist many challenges still, to infer such assessments over the target population for policy level advocacy and implementation. ^ Objective. This study discusses the use of some of the statistical methods for disaster preparedness and medical needs assessment to facilitate local and state governments for its policy level decision making and logistic support to avoid any loss of life and property in future calamities. ^ Methods. In order to obtain precise and unbiased estimates for Medical Special Needs Persons (MSNP) and disaster preparedness for evacuation in Rio Grande Valley (RGV) of Texas, a stratified and cluster-randomized multi-stage sampling design was implemented. US School of Public Health, Brownsville surveyed 3088 households in three counties namely Cameron, Hidalgo, and Willacy. Multiple statistical methods were implemented and estimates were obtained taking into count probability of selection and clustering effects. Statistical methods for data analysis discussed were Multivariate Linear Regression (MLR), Survey Linear Regression (Svy-Reg), Generalized Estimation Equation (GEE) and Multilevel Mixed Models (MLM) all with and without sampling weights. ^ Results. Estimated population for RGV was 1,146,796. There were 51.5% female, 90% Hispanic, 73% married, 56% unemployed and 37% with their personal transport. 40% people attained education up to elementary school, another 42% reaching high school and only 18% went to college. Median household income is less than $15,000/year. MSNP estimated to be 44,196 (3.98%) [95% CI: 39,029; 51,123]. All statistical models are in concordance with MSNP estimates ranging from 44,000 to 48,000. MSNP estimates for statistical methods are: MLR (47,707; 95% CI: 42,462; 52,999), MLR with weights (45,882; 95% CI: 39,792; 51,972), Bootstrap Regression (47,730; 95% CI: 41,629; 53,785), GEE (47,649; 95% CI: 41,629; 53,670), GEE with weights (45,076; 95% CI: 39,029; 51,123), Svy-Reg (44,196; 95% CI: 40,004; 48,390) and MLM (46,513; 95% CI: 39,869; 53,157). ^ Conclusion. RGV is a flood zone, most susceptible to hurricanes and other natural disasters. People in the region are mostly Hispanic, under-educated with least income levels in the U.S. In case of any disaster people in large are incapacitated with only 37% have their personal transport to take care of MSNP. Local and state government’s intervention in terms of planning, preparation and support for evacuation is necessary in any such disaster to avoid loss of precious human life. ^ Key words: Complex Surveys, statistical methods, multilevel models, cluster randomized, sampling weights, raking, survey regression, generalized estimation equations (GEE), random effects, Intracluster correlation coefficient (ICC).^
Resumo:
Generalized linear Poisson and logistic regression models were utilized to examine the relationship between temperature and precipitation and cases of Saint Louis encephalitis virus spread in the Houston metropolitan area. The models were investigated with and without repeated measures, with a first order autoregressive (AR1) correlation structure used for the repeated measures model. The two types of Poisson regression models, with and without correlation structure, showed that a unit increase in temperature measured in degrees Fahrenheit increases the occurrence of the virus 1.7 times and a unit increase in precipitation measured in inches increases the occurrence of the virus 1.5 times. Logistic regression did not show these covariates to be significant as predictors for encephalitis activity in Houston for either correlation structure. This discrepancy for the logistic model could be attributed to the small data set.^ Keywords: Saint Louis Encephalitis; Generalized Linear Model; Poisson; Logistic; First Order Autoregressive; Temperature; Precipitation. ^
Resumo:
We introduce a new class of generalized isotropic Lipkin–Meshkov–Glick models with su(m+1) spin and long-range non-constant interactions, whose non-degenerate ground state is a Dicke state of su(m+1) type. We evaluate in closed form the reduced density matrix of a block of Lspins when the whole system is in its ground state, and study the corresponding von Neumann and Rényi entanglement entropies in the thermodynamic limit. We show that both of these entropies scale as a log L when L tends to infinity, where the coefficient a is equal to (m − k)/2 in the ground state phase with k vanishing magnon densities. In particular, our results show that none of these generalized Lipkin–Meshkov–Glick models are critical, since when L-->∞ their Rényi entropy R_q becomes independent of the parameter q. We have also computed the Tsallis entanglement entropy of the ground state of these generalized su(m+1) Lipkin–Meshkov–Glick models, finding that it can be made extensive by an appropriate choice of its parameter only when m-k≥3. Finally, in the su(3) case we construct in detail the phase diagram of the ground state in parameter space, showing that it is determined in a simple way by the weights of the fundamental representation of su(3). This is also true in the su(m+1) case; for instance, we prove that the region for which all the magnon densities are non-vanishing is an (m + 1)-simplex in R^m whose vertices are the weights of the fundamental representation of su(m+1).
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
We investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 families were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two complementary approaches for analyzing correlated time-to-onset family data were considered: the generalized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to changing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression parameters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for estimating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma.
Resumo:
Highways are generally designed to serve a mixed traffic flow that consists of passenger cars, trucks, buses, recreational vehicles, etc. The fact that the impacts of these different vehicle types are not uniform creates problems in highway operations and safety. A common approach to reducing the impacts of truck traffic on freeways has been to restrict trucks to certain lane(s) to minimize the interaction between trucks and other vehicles and to compensate for their differences in operational characteristics. ^ The performance of different truck lane restriction alternatives differs under different traffic and geometric conditions. Thus, a good estimate of the operational performance of different truck lane restriction alternatives under prevailing conditions is needed to help make informed decisions on truck lane restriction alternatives. This study develops operational performance models that can be applied to help identify the most operationally efficient truck lane restriction alternative on a freeway under prevailing conditions. The operational performance measures examined in this study include average speed, throughput, speed difference, and lane changes. Prevailing conditions include number of lanes, interchange density, free-flow speeds, volumes, truck percentages, and ramp volumes. ^ Recognizing the difficulty of collecting sufficient data for an empirical modeling procedure that involves a high number of variables, the simulation approach was used to estimate the performance values for various truck lane restriction alternatives under various scenarios. Both the CORSIM and VISSIM simulation models were examined for their ability to model truck lane restrictions. Due to a major problem found in the CORSIM model for truck lane modeling, the VISSIM model was adopted as the simulator for this study. ^ The VISSIM model was calibrated mainly to replicate the capacity given in the 2000 Highway Capacity Manual (HCM) for various free-flow speeds under the ideal basic freeway section conditions. Non-linear regression models for average speed, throughput, average number of lane changes, and speed difference between the lane groups were developed. Based on the performance models developed, a simple decision procedure was recommended to select the desired truck lane restriction alternative for prevailing conditions. ^
Resumo:
In this thesis used four different methods in order to diagnose the precipitation extremes on Northeastern Brazil (NEB): Generalized Linear Model s via logistic regression and Poisson, extreme value theory analysis via generalized extre me value (GEV) and generalized Pareto (GPD) distributions and Vectorial Generalized Linea r Models via GEV (MVLG GEV). The logistic regression and Poisson models were used to identify the interactions between the precipitation extremes and other variables based on the odds ratios and relative risks. It was found that the outgoing longwave radiation was the indicator variable for the occurrence of extreme precipitation on eastern, northern and semi arid NEB, and the relative humidity was verified on southern NEB. The GEV and GPD distribut ions (based on the 95th percentile) showed that the location and scale parameters were presented the maximum on the eastern and northern coast NEB, the GEV verified a maximum core on western of Pernambuco influenced by weather systems and topography. The GEV and GPD shape parameter, for most regions the data fitted by Weibull negative an d Beta distributions (ξ < 0) , respectively. The levels and return periods of GEV (GPD) on north ern Maranhão (centerrn of Bahia) may occur at least an extreme precipitation event excee ding over of 160.9 mm /day (192.3 mm / day) on next 30 years. The MVLG GEV model found tha t the zonal and meridional wind components, evaporation and Atlantic and Pacific se a surface temperature boost the precipitation extremes. The GEV parameters show the following results: a) location ( ), the highest value was 88.26 ± 6.42 mm on northern Maran hão; b) scale ( σ ), most regions showed positive values, except on southern of Maranhão; an d c) shape ( ξ ), most of the selected regions were adjusted by the Weibull negative distr ibution ( ξ < 0 ). The southern Maranhão and southern Bahia have greater accuracy. The level period, it was estimated that the centern of Bahia may occur at least an extreme precipitatio n event equal to or exceeding over 571.2 mm/day on next 30 years.
Resumo:
This dissertation proposes statistical methods to formulate, estimate and apply complex transportation models. Two main problems are part of the analyses conducted and presented in this dissertation. The first method solves an econometric problem and is concerned with the joint estimation of models that contain both discrete and continuous decision variables. The use of ordered models along with a regression is proposed and their effectiveness is evaluated with respect to unordered models. Procedure to calculate and optimize the log-likelihood functions of both discrete-continuous approaches are derived, and difficulties associated with the estimation of unordered models explained. Numerical approximation methods based on the Genz algortithm are implemented in order to solve the multidimensional integral associated with the unordered modeling structure. The problems deriving from the lack of smoothness of the probit model around the maximum of the log-likelihood function, which makes the optimization and the calculation of standard deviations very difficult, are carefully analyzed. A methodology to perform out-of-sample validation in the context of a joint model is proposed. Comprehensive numerical experiments have been conducted on both simulated and real data. In particular, the discrete-continuous models are estimated and applied to vehicle ownership and use models on data extracted from the 2009 National Household Travel Survey. The second part of this work offers a comprehensive statistical analysis of free-flow speed distribution; the method is applied to data collected on a sample of roads in Italy. A linear mixed model that includes speed quantiles in its predictors is estimated. Results show that there is no road effect in the analysis of free-flow speeds, which is particularly important for model transferability. A very general framework to predict random effects with few observations and incomplete access to model covariates is formulated and applied to predict the distribution of free-flow speed quantiles. The speed distribution of most road sections is successfully predicted; jack-knife estimates are calculated and used to explain why some sections are poorly predicted. Eventually, this work contributes to the literature in transportation modeling by proposing econometric model formulations for discrete-continuous variables, more efficient methods for the calculation of multivariate normal probabilities, and random effects models for free-flow speed estimation that takes into account the survey design. All methods are rigorously validated on both real and simulated data.
Resumo:
Undoubtedly, statistics has become one of the most important subjects in the modern world, where its applications are ubiquitous. The importance of statistics is not limited to statisticians, but also impacts upon non-statisticians who have to use statistics within their own disciplines. Several studies have indicated that most of the academic departments around the world have realized the importance of statistics to non-specialist students. Therefore, the number of students enrolled in statistics courses has vastly increased, coming from a variety of disciplines. Consequently, research within the scope of statistics education has been able to develop throughout the last few years. One important issue is how statistics is best taught to, and learned by, non-specialist students. This issue is controlled by several factors that affect the learning and teaching of statistics to non-specialist students, such as the use of technology, the role of the English language (especially for those whose first language is not English), the effectiveness of statistics teachers and their approach towards teaching statistics courses, students’ motivation to learn statistics and the relevance of statistics courses to the main subjects of non-specialist students. Several studies, focused on aspects of learning and teaching statistics, have been conducted in different countries around the world, particularly in Western countries. Conversely, the situation in Arab countries, especially in Saudi Arabia, is different; here, there is very little research in this scope, and what there is does not meet the needs of those countries towards the development of learning and teaching statistics to non-specialist students. This research was instituted in order to develop the field of statistics education. The purpose of this mixed methods study was to generate new insights into this subject by investigating how statistics courses are currently taught to non-specialist students in Saudi universities. Hence, this study will contribute towards filling the knowledge gap that exists in Saudi Arabia. This study used multiple data collection approaches, including questionnaire surveys from 1053 non-specialist students who had completed at least one statistics course in different colleges of the universities in Saudi Arabia. These surveys were followed up with qualitative data collected via semi-structured interviews with 16 teachers of statistics from colleges within all six universities where statistics is taught to non-specialist students in Saudi Arabia’s Eastern Region. The data from questionnaires included several types, so different techniques were used in analysis. Descriptive statistics were used to identify the demographic characteristics of the participants. The chi-square test was used to determine associations between variables. Based on the main issues that are raised from literature review, the questions (items scales) were grouped and five key groups of questions were obtained which are: 1) Effectiveness of Teachers; 2) English Language; 3) Relevance of Course; 4) Student Engagement; 5) Using Technology. Exploratory data analysis was used to explore these issues in more detail. Furthermore, with the existence of clustering in the data (students within departments within colleges, within universities), multilevel generalized linear models for dichotomous analysis have been used to clarify the effects of clustering at those levels. Factor analysis was conducted confirming the dimension reduction of variables (items scales). The data from teachers’ interviews were analysed on an individual basis. The responses were assigned to one of the eight themes that emerged from within the data: 1) the lack of students’ motivation to learn statistics; 2) students' participation; 3) students’ assessment; 4) the effective use of technology; 5) the level of previous mathematical and statistical skills of non-specialist students; 6) the English language ability of non-specialist students; 7) the need for extra time for teaching and learning statistics; and 8) the role of administrators. All the data from students and teachers indicated that the situation of learning and teaching statistics to non-specialist students in Saudi universities needs to be improved in order to meet the needs of those students. The findings of this study suggested a weakness in the use of statistical software applications in these courses. This study showed that there is lack of application of technology such as statistical software programs in these courses, which would allow non-specialist students to consolidate their knowledge. The results also indicated that English language is considered one of the main challenges in learning and teaching statistics, particularly in institutions where English is not used as the main language. Moreover, the weakness of mathematical skills of students is considered another major challenge. Additionally, the results indicated that there was a need to tailor statistics courses to the needs of non-specialist students based on their main subjects. The findings indicate that statistics teachers need to choose appropriate methods when teaching statistics courses.
Resumo:
In this work, the relationship between diameter at breast height (d) and total height (h) of individual-tree was modeled with the aim to establish provisory height-diameter (h-d) equations for maritime pine (Pinus pinaster Ait.) stands in the Lomba ZIF, Northeast Portugal. Using data collected locally, several local and generalized h-d equations from the literature were tested and adaptations were also considered. Model fitting was conducted by using usual nonlinear least squares (nls) methods. The best local and generalized models selected, were also tested as mixed models applying a first-order conditional expectation (FOCE) approximation procedure and maximum likelihood methods to estimate fixed and random effects. For the calibration of the mixed models and in order to be consistent with the fitting procedure, the FOCE method was also used to test different sampling designs. The results showed that the local h-d equations with two parameters performed better than the analogous models with three parameters. However a unique set of parameter values for the local model can not be used to all maritime pine stands in Lomba ZIF and thus, a generalized model including covariates from the stand, in addition to d, was necessary to obtain an adequate predictive performance. No evident superiority of the generalized mixed model in comparison to the generalized model with nonlinear least squares parameters estimates was observed. On the other hand, in the case of the local model, the predictive performance greatly improved when random effects were included. The results showed that the mixed model based in the local h-d equation selected is a viable alternative for estimating h if variables from the stand are not available. Moreover, it was observed that it is possible to obtain an adequate calibrated response using only 2 to 5 additional h-d measurements in quantile (or random) trees from the distribution of d in the plot (stand). Balancing sampling effort, accuracy and straightforwardness in practical applications, the generalized model from nls fit is recommended. Examples of applications of the selected generalized equation to the forest management are presented, namely how to use it to complete missing information from forest inventory and also showing how such an equation can be incorporated in a stand-level decision support system that aims to optimize the forest management for the maximization of wood volume production in Lomba ZIF maritime pine stands.
Resumo:
Uma avaliação das metodologias de análise e recolha de dados aplicadas pelo Programa NOCTUAPortugal é de extrema importância para se apurar se estas são as mais indicadas em estudos de citizen science. Comparou-se os resultados de diferentes metodologias analíticas de estimação das tendências populacionais das espécies de aves noturnas durante o período de realização do Programa NOCTUA-Portugal (análise gráfica simples, modelos lineares generalizados (GLM-Poisson e GLMM), modelos aditivos generalizados (GAM-LOESS e GAM-mgcv) e software TRIM). Analisou-se a metodologia de censo de modo a avaliar o número de registos face à duração dos pontos de escuta, comparar a eficiência do ponto de deteção com outros estudos, variação das respostas ao longo da noite e efeito da época do ano, vento, nebulosidade e luminosidade da lua. Os resultados mostraram que a metodologia analítica mais indicada era o GLMM e que não era necessário realizar nenhum ajuste em particular na metodologia de censo; Trends in nocturnal birds in Portugal Methods and analysis of a volunteer-based monitoring program ABSTRACT: An evaluation of the methodologies of analysis and data collection applied by NOCTUA-Portugal Program is extremely important to determine whether these are the most suitable in citizen science studies. We compared the results of different analytical methodologies to estimate population trends of the species of nocturnal birds during the period of the NOCTUA-Portugal Program (simple graphical analysis, generalized linear models (GLM-Poisson and GLMM), generalized additive models (GAM-LOESS and GAMmgcv) and software TRIM). We analyzed the field methodology to assess the effect of point duration on the number of records, compared the point count efficiency with other sources, the variation of responses throughout the night, the effect of time of year, wind, cloud cover and moon luminosity. The results showed that the most suitable analytical methodology was the GLMM and it was not necessary to make any particular adjustment in the field methodology.
Resumo:
Distribution models are used increasingly for species conservation assessments over extensive areas, but the spatial resolution of the modeled data and, consequently, of the predictions generated directly from these models are usually too coarse for local conservation applications. Comprehensive distribution data at finer spatial resolution, however, require a level of sampling that is impractical for most species and regions. Models can be downscaled to predict distribution at finer resolutions, but this increases uncertainty because the predictive ability of models is not necessarily consistent beyond their original scale. We analyzed the performance of downscaled, previously published models of environmental favorability (a generalized linear modeling technique) for a restricted endemic insectivore, the Iberian desman (Galemys pyrenaicus), and a more widespread carnivore, the Eurasian otter ( Lutra lutra), in the Iberian Peninsula. The models, built from presence–absence data at 10 × 10 km resolution, were extrapolated to a resolution 100 times finer (1 × 1 km). We compared downscaled predictions of environmental quality for the two species with published data on local observations and on important conservation sites proposed by experts. Predictions were significantly related to observed presence or absence of species and to expert selection of sampling sites and important conservation sites. Our results suggest the potential usefulness of downscaled projections of environmental quality as a proxy for expensive and time-consuming field studies when the field studies are not feasible. This method may be valid for other similar species if coarse-resolution distribution data are available to define high-quality areas at a scale that is practical for the application of concrete conservation measures
Resumo:
Transferring distribution models between different geographical areas may be problematic, as the performance of models outside their original scope is hard to predict. A modelling procedure is needed that gets the gist of the environmental descriptors of a distribution area, without either overfitting to the training data or overestimating the species’ distribution potential.We tested the transferability power of the favourability function, a generalized linear model, on the distribution of the Iberian desman (Galemys pyrenaicus) in the Iberian territories of Portugal and Spain.We also tested the effects of two of the main potential constraints on model transferability: the analysed ranges of the predictor variables, and the completeness of the species distribution data. We modelled 10 km×10km presence/absence data from Portugal and Spain separately, extrapolated each model to the other country, and compared predictions with observations. The Spanish model, despite arguably containing more false absences, showed good predictive ability in Portugal. The Portuguese model, whose predictors ranged between only a subset of the values observed in Spain, overestimated desman distribution when transferred.We discuss possible reasons for this differential model behaviour, and highlight the importance of this kind of models for prediction and conservation applications