923 resultados para Generalised Linear Models
Resumo:
This paper explains how Poisson regression can be used in studies in which the dependent variable describes the number of occurrences of some rare event such as suicide. After pointing out why ordinary linear regression is inappropriate for treating dependent variables of this sort, we go on to present the basic Poisson regression model and show how it fits in the broad class of generalized linear models. Then we turn to discussing a major problem of Poisson regression known as overdispersion and suggest possible solutions, including the correction of standard errors and negative binomial regression. The paper ends with a detailed empirical example, drawn from our own research on suicide.
Resumo:
This study explores factors related to the prompt difficulty in Automated Essay Scoring. The sample was composed of 6,924 students. For each student, there were 1-4 essays, across 20 different writing prompts, for a total of 20,243 essays. E-rater® v.2 essay scoring engine developed by the Educational Testing Service was used to score the essays. The scoring engine employs a statistical model that incorporates 10 predictors associated with writing characteristics of which 8 were used. The Rasch partial credit analysis was applied to the scores to determine the difficulty levels of prompts. In addition, the scores were used as outcomes in the series of hierarchical linear models (HLM) in which students and prompts constituted the cross-classification levels. This methodology was used to explore the partitioning of the essay score variance.^ The results indicated significant differences in prompt difficulty levels due to genre. Descriptive prompts, as a group, were found to be more difficult than the persuasive prompts. In addition, the essay score variance was partitioned between students and prompts. The amount of the essay score variance that lies between prompts was found to be relatively small (4 to 7 percent). When the essay-level, student-level-and prompt-level predictors were included in the model, it was able to explain almost all variance that lies between prompts. Since in most high-stakes writing assessments only 1-2 prompts per students are used, the essay score variance that lies between prompts represents an undesirable or "noise" variation. Identifying factors associated with this "noise" variance may prove to be important for prompt writing and for constructing Automated Essay Scoring mechanisms for weighting prompt difficulty when assigning essay score.^
Resumo:
Light transmission was measured through intact, submerged periphyton communities on artificial seagrass leaves. The periphyton communities were representative of the communities on Thalassia testudinum in subtropical seagrass meadows. The periphyton communities sampled were adhered carbonate sediment, coralline algae, and mixed algal assemblages. Crustose or film-forming periphyton assemblages were best prepared for light transmission measurements using artificial leaves fouled on both sides, while measurements through three-dimensional filamentous algae required the periphyton to be removed from one side. For one-sided samples, light transmission could be measured as the difference between fouled and reference artificial leaf samples. For two-sided samples, the percent periphyton light transmission to the leaf surface was calculated as the square root of the fraction of incident light. Linear, exponential, and hyperbolic equations were evaluated as descriptors of the periphyton dry weight versus light transmission relationship. Hyperbolic and exponential decay models were superior to linear models and exhibited the best fits for the observed relationships. Differences between the coefficients of determination (r2) of hyperbolic and exponential decay models were statistically insignificant. Constraining these models for 100% light transmission at zero periphyton load did not result in any statistically significant loss in the explanatory capability of the models. In most all cases, increasing model complexity using three-parameter models rather than two-parameter models did not significantly increase the amount of variation explained. Constrained two-parameter hyperbolic or exponential decay models were judged best for describing the periphyton dry weight versus light transmission relationship. On T. testudinum in Florida Bay and the Florida Keys, significant differences were not observed in the light transmission characteristics of the varying periphyton communities at different study sites. Using pooled data from the study sites, the hyperbolic decay coefficient for periphyton light transmission was estimated to be 4.36 mg dry wt. cm−2. For exponential models, the exponential decay coefficient was estimated to be 0.16 cm2 mg dry wt.−1.
Resumo:
Background: Diabetes and diabetes-related complications are major causes of morbidity and mortality in the United States. Depressive symptoms and perceived stress have been identified as possible risk factors for beta cell dysfunction and diabetes. The purpose of this study was to assess associations between depression symptoms and perceived stress with beta cell function between African and Haitian Americans with and without type 2 diabetes. Participants and Methods: Informed consent and data were available for 462 participants (231 African Americans and 231 Haitian Americans) for this cross-sectional study. A demographic questionnaire developed by the Primary Investigator was used to collect information regarding age, gender, smoking, and ethnicity. Diabetes status was determined by self-report and confirmed by fasting blood glucose. Anthropometrics (weight, and height and waist circumference) and vital signs (blood pressure) were taken. Blood samples were drawn after 8 10 hours over-night fasting to measure lipid panel, fasting plasma glucose and serum insulin concentrations. The homeostatic model assessment, version 2 (HOMA2) computer model was used to calculate beta cell function. Depression was assessed using the Beck Depression Inventory-II (BDI-II) and stress levels were assessed using the Perceived Stress Scale (PSS). Results: Moderate to severe depressive symptoms were more likely for persons with diabetes (p = 0.030). There were no differences in perceived stress between ethnicity and diabetes status (p = 0.283). General linear models for participants with and without type 2 diabetes using beta cell function as the dependent variable showed no association with depressive symptoms and perceived stress; however, Haitian Americans had significantly lower beta cell function than African Americans both with and without diabetes and adjusting for age, gender, waist circumference and smoking. Further research is needed to compare these risk factors in other race/ethnic groups.
Resumo:
The purpose of this study was to analyze the behavior of Sell-Side analysts and analysts propose a classification, considering the performance of the price forecasts and recom- mendations (sell-hold-buy) in the Brazilian stock market. For this, the first step was to analyze the consensus of analysts to understand the importance of this collective interven- tion in the market; the second was to analyze the analysts individually to understand how improve their analysis in time. Third was to understand how are the main methods of ranking used in markets. Finally, propose a form of classification that reflects the previous aspects discussed. To investigate the hypotheses proposed in the study were used linear models for panel to capture elements in time. The data of price forecasts and analyst recommendations individually and consensus, in the period 2005-2013 were obtained from Bloomberg R ○ . The main results were: (i) superior performance of consensus recommen- dations, compared with the individual analyzes; (ii) associating the number of analysts issuing recommendations with improved accuracy allows supposing that this number may be associated with increased consensus strength and hence accuracy; (iii) the anchoring effect of the analysts consensus revisions makes his predictions are biased, overvaluating the assets; (iv) analysts need to have greater caution in times of economic turbulence, noting also foreign markets such as the USA. For these may result changes in bias between optimism and pessimism; (v) effects due to changes in bias, as increased pessimism can cause excessive increase in purchase recommendations number. In this case, analysts can should be more cautious in analysis, mainly for consistency between recommendation and the expected price; (vi) the experience of the analyst with the asset economic sector and the asset contributes to the improvement of forecasts, however, the overall experience showed opposite evidence; (vii) the optimism associated with the overall experience, over time, shows a similar behavior to an excess of confidence, which could cause reduction of accuracy; (viii) the conflicting effect of general experience between the accuracy and the observed return shows evidence that, over time, the analyst has effects similar to the endowment bias on assets, which would result in a conflict analysis of recommendations and forecasts ; (ix) despite the focus on fewer sectors contribute to the quality of accuracy, the same does not occur with the focus on assets. So it is possible that analysts may have economies of scale when cover more assets within the same industry; and finally, (x) was possible to develop a proposal for classification analysts to consider both returns and the consistency of these predictions, called Analysis coefficient. This ranking resulted better results, considering the return / standard deviation.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
Numerous works have been conducted on modelling basic compliant elements such as wire beams, and closed-form analytical models of most basic compliant elements have been well developed. However, the modelling of complex compliant mechanisms is still a challenging work. This paper proposes a constraint-force-based (CFB) modelling approach to model compliant mechanisms with a particular emphasis on modelling complex compliant mechanisms. The proposed CFB modelling approach can be regarded as an improved free-body- diagram (FBD) based modelling approach, and can be extended to a development of the screw-theory-based design approach. A compliant mechanism can be decomposed into rigid stages and compliant modules. A compliant module can offer elastic forces due to its deformation. Such elastic forces are regarded as variable constraint forces in the CFB modelling approach. Additionally, the CFB modelling approach defines external forces applied on a compliant mechanism as constant constraint forces. If a compliant mechanism is at static equilibrium, all the rigid stages are also at static equilibrium under the influence of the variable and constant constraint forces. Therefore, the constraint force equilibrium equations for all the rigid stages can be obtained, and the analytical model of the compliant mechanism can be derived based on the constraint force equilibrium equations. The CFB modelling approach can model a compliant mechanism linearly and nonlinearly, can obtain displacements of any points of the rigid stages, and allows external forces to be exerted on any positions of the rigid stages. Compared with the FBD based modelling approach, the CFB modelling approach does not need to identify the possible deformed configuration of a complex compliant mechanism to obtain the geometric compatibility conditions and the force equilibrium equations. Additionally, the mathematical expressions in the CFB approach have an easily understood physical meaning. Using the CFB modelling approach, the variable constraint forces of three compliant modules, a wire beam, a four-beam compliant module and an eight-beam compliant module, have been derived in this paper. Based on these variable constraint forces, the linear and non-linear models of a decoupled XYZ compliant parallel mechanism are derived, and verified by FEA simulations and experimental tests.
Resumo:
Trees and shrubs in tropical Africa use the C3 cycle as a carbon fixation pathway during photosynthesis, while grasses and sedges mostly use the C4 cycle. Leaf-wax lipids from sedimentary archives such as the long-chain n-alkanes (e.g., n-C27 to n-C33) inherit carbon isotope ratios that are representative of the carbon fixation pathway. Therefore, n-alkane d13C values are often used to reconstruct past C3/C4 composition of vegetation, assuming that the relative proportions of C3 and C4 leaf waxes reflect the relative proportions of C3 and C4 plants. We have compared the d13C values of n-alkanes from modern C3 and C4 plants with previously published values from recent lake sediments and provide a framework for estimating the fractional contribution (areal-based) of C3 vegetation cover (fC3) represented by these sedimentary archives. Samples were collected in Cameroon, across a latitudinal transect that accommodates a wide range of climate zones and vegetation types, as reflected in the progressive northward replacement of C3-dominated rain forest by C4-dominated savanna. The C3 plants analysed were characterised by substantially higher abundances of n-C29 alkanes and by substantially lower abundances of n-C33 alkanes than the C4 plants. Furthermore, the sedimentary d13C values of n-C29 and n-C31 alkanes from recent lake sediments in Cameroon (-37.4 per mil to -26.5 per mil) were generally within the range of d13C values for C3 plants, even when from sites where C4 plants dominated the catchment vegetation. In such cases simple linear mixing models fail to accurately reconstruct the relative proportions of C3 and C4 vegetation cover when using the d13C values of sedimentary n-alkanes, overestimating the proportion of C3 vegetation, likely as a consequence of the differences in plant wax production, preservation, transport, and/or deposition between C3 and C4 plants. We therefore tested a set of non-linear binary mixing models using d13C values from both C3 and C4 vegetation as end-members. The non-linear models included a sigmoid function (sine-squared) that describes small variations in the fC3 values as the minimum and maximum d13C values are approached, and a hyperbolic function that takes into account the differences between C3 and C4 plants discussed above. Model fitting and the estimation of uncertainties were completed using the Monte Carlo algorithm and can be improved by future data addition. Models that provided the best fit with the observed d13C values of sedimentary n-alkanes were either hyperbolic functions or a combination of hyperbolic and sine-squared functions. Such non-linear models may be used to convert d13C measurements on sedimentary n-alkanes directly into reconstructions of C3 vegetation cover.
Resumo:
[EN]To compare the one year effect of two dietary interventions with MeDiet on GL and GI in the PREDIMED trial. Methods. Participants were older subjects at high risk for cardiovascular disease. This analysis included 2866 nondiabetic subjects. Diet was assessed with a validated 137-item food frequency questionnaire (FFQ). The GI of each FFQ item was assigned by a 5-step methodology using the International Tables of GI and GL Values. Generalized linear models were fitted to assess the relationship between the intervention group and dietary GL and GI at one year of follow-up, using control group as reference.
Resumo:
Objective: 1) to assess the preparedness to practice and satisfaction in learning environment amongst new graduates from European osteopathic institutions; 2) to compare the results of preparedness to practice and satisfaction in learning environment between and within countries where osteopathy is regulated and where regulation is still to be achieved; 3) to identify possible correlations between learning environment and preparedness to practice. Method: Osteopathic education providers of full-time education located in Europe were enrolled, and their final year students were contacted to complete a survey. Measures used were: Dundee Ready Educational Environment Measure (DREEM), the Association of American Medical Colleges (AAMC) and a demographic questionnaire. Scores were compared across institutions using one-way ANOVA and generalised linear model. Results: Nine European osteopathic education institutions participated in the study (4 located in Italy, 2 in the UK, 1 in France, 1 in Belgium and 1 in the Netherlands) and 243 (77%) of their final-year students completed the survey. The DREEM total score mean was 121.4 (SEM: 1.66) whilst the AAMC was 17.58 (SEM:0.35). A generalised linear model found a significant association between not-regulated countries and total score as well as subscales DREEM scores (p<0.001). Learning environment and preparedness to practice were significantly positively correlated (r=0.76; p<0.01). Discussion: A perceived higher level of preparedness and satisfaction was found amongst students from osteopathic institutions located in countries without regulation compared to those located in countries where osteopathy is regulated; however, all institutions obtained a 'more positive than negative' result. Moreover, in general, cohorts with fewer than 20 students scored significantly higher compared to larger student cohorts. Finally, an overall positive correlation between students' preparedness and satisfaction were found across all institutions recruited.
Resumo:
Eucalyptus pellita demonstrated good growth and wood quality traits in this study, with young plantation grown timber being suitable for both solid and pulp wood products. All traits examined were under moderate levels of genetic control with little genotype by environment interaction when grown on two contrasting sites in Vietnam. Eucalyptus pellita currently has a significant role in reforestation in the tropics. Research to support expanded of use of this species is needed: particularly, research to better understand the genetic control of key traits will facilitate the development of genetically improved planting stock. This study aimed to provide estimates of the heritability of diameter at breast height over bark, wood basic density, Kraft pulp yield, modulus of elasticity and microfibril angle, and the genetic correlations among these traits, and understand the importance of genotype by environment interactions in Vietnam. Data for diameter and wood properties were collected from two 10-year-old, open-pollinated progeny trials of E. pellita in Vietnam that evaluated 104 families from six native range and three orchard sources. Wood properties were estimated from wood samples using near-infrared (NIR) spectroscopy. Data were analysed using mixed linear models to estimate genetic parameters (heritability, proportion of variance between seed sources and genetic correlations). Variation among the nine sources was small compared to additive variance. Narrow-sense heritability and genetic correlation estimates indicated that simultaneous improvements in most traits could be achieved from selection among and within families as the genetic correlations among traits were either favourable or close to zero. Type B genetic correlations approached one for all traits suggesting that genotype by environment interactions were of little importance. These results support a breeding strategy utilizing a single breeding population advanced by selecting the best individuals across all seed sources. Both growth and wood properties have been evaluated. Multi-trait selection for growth and wood property traits will lead to more productive populations of E. pellita both with improved productivity and improved timber and pulp properties.
Resumo:
Endogenous and environmental variables are fundamental in explaining variations in fish condition. Based on more than 20 yr of fish weight and length data, relative condition indices were computed for anchovy and sardine caught in the Gulf of Lions. Classification and regression trees (CART) were used to identify endogenous factors affecting fish condition, and to group years of similar condition. Both species showed a similar annual cycle with condition being minimal in February and maximal in July. CART identified 3 groups of years where the fish populations generally showed poor, average and good condition and within which condition differed between age classes but not according to sex. In particular, during the period of poor condition (mostly recent years), sardines older than 1 yr appeared to be more strongly affected than younger individuals. Time-series were analyzed using generalized linear models (GLMs) to examine the effects of oceanographic abiotic (temperature, Western Mediterranean Oscillation [WeMO] and Rhone outflow) and biotic (chlorophyll a and 6 plankton classes) factors on fish condition. The selected models explained 48 and 35% of the variance of anchovy and sardine condition, respectively. Sardine condition was negatively related to temperature but positively related to the WeMO and mesozooplankton and diatom concentrations. A positive effect of mesozooplankton and Rhone runoff on anchovy condition was detected. The importance of increasing temperatures and reduced water mixing in the NW Mediterranean Sea, affecting planktonic productivity and thus fish condition by bottom-up control processes, was highlighted by these results. Changes in plankton quality, quantity and phenology could lead to insufficient or inadequate food supply for both species.
Resumo:
Breast milk is regarded as an ideal source of nutrients for the growth and development of neonates, but it can also be a potential source of pollutants. Mothers can be exposed to different contaminants as a result of their lifestyle and environmental pollution. Mercury (Hg) and arsenic (As) could adversely affect the development of fetal and neonatal nervous system. Some fish and shellfish are rich in selenium (Se), an essential trace element that forms part of several enzymes related to the detoxification process, including glutathione S-transferase (GST). The goal of this study was to determine the interaction between Hg, As and Se and analyze its effect on the activity of GST in breast milk. Milk samples were collected from women between day 7 and 10 postpartum. The GST activity was determined spectrophotometrically; total Hg, As and Se concentrations were measured by atomic absorption spectrometry. To explain the possible association of Hg, As and Se concentrations with GST activity in breast milk, generalized linear models were constructed. The model explained 44% of the GST activity measured in breast milk. The GLM suggests that GST activity was positively correlated with Hg, As and Se concentrations. The activity of the enzyme was also explained by the frequency of consumption of marine fish and shellfish in the diet of the breastfeeding women.
Resumo:
Several recent offsite recreational fishing surveys have used public landline telephone directories as a sampling frame. Sampling biases inherent in this method are recognised, but are assumed to be corrected through demographic data expansion. However, the rising prevalence of mobile-only households has potentially increased these biases by skewing raw samples towards households that maintain relatively high levels of coverage in telephone directories. For biases to be corrected through demographic expansion, both the fishing participation rate and fishing activity must be similar among listed and unlisted fishers within each demographic group. In this study, we tested for a difference in the fishing activity of listed and unlisted fishers within demographic groups by comparing their avidity (number of fishing trips per year), as well as the platform used (boat or shore) and species targeted on their most recent fishing trip. 3062 recreational fishers were interviewed at 34 tackle stores across 12 residential regions of Queensland, Australia. For each fisher, data collected included their fishing avidity, the platform used and species targeted on their most recent trip, their gender, age, residential region, and whether their household had a listed telephone number. Although the most avid fishers were younger and less likely to have a listed phone number, cumulative link models revealed that avidity was not affected by an interaction of phone listing status, age group and residential region (p > 0.05). Likewise, binomial generalized linear models revealed that there was no interaction between phone listing, age group and avidity acting on platform (p > 0.05), and platform was not affected by an interaction of phone listing status, age group, and residential region (p > 0.05). Ordination of target species using Bray-Curtis dissimilarity indices found a significant but irrelevant difference (i.e. small effect size) between listed and unlisted fishers (ANOSIM R < 0.05, p < 0.05). These results suggest that, at this time, the fishing activity of listed and unlisted fishers in Queensland is similar within demographic groups. Future research seeking to validate the assumptions of recreational fishing telephone surveys should investigate fishing participation rates of listed and unlisted fishers within demographic groups.
Resumo:
Undoubtedly, statistics has become one of the most important subjects in the modern world, where its applications are ubiquitous. The importance of statistics is not limited to statisticians, but also impacts upon non-statisticians who have to use statistics within their own disciplines. Several studies have indicated that most of the academic departments around the world have realized the importance of statistics to non-specialist students. Therefore, the number of students enrolled in statistics courses has vastly increased, coming from a variety of disciplines. Consequently, research within the scope of statistics education has been able to develop throughout the last few years. One important issue is how statistics is best taught to, and learned by, non-specialist students. This issue is controlled by several factors that affect the learning and teaching of statistics to non-specialist students, such as the use of technology, the role of the English language (especially for those whose first language is not English), the effectiveness of statistics teachers and their approach towards teaching statistics courses, students’ motivation to learn statistics and the relevance of statistics courses to the main subjects of non-specialist students. Several studies, focused on aspects of learning and teaching statistics, have been conducted in different countries around the world, particularly in Western countries. Conversely, the situation in Arab countries, especially in Saudi Arabia, is different; here, there is very little research in this scope, and what there is does not meet the needs of those countries towards the development of learning and teaching statistics to non-specialist students. This research was instituted in order to develop the field of statistics education. The purpose of this mixed methods study was to generate new insights into this subject by investigating how statistics courses are currently taught to non-specialist students in Saudi universities. Hence, this study will contribute towards filling the knowledge gap that exists in Saudi Arabia. This study used multiple data collection approaches, including questionnaire surveys from 1053 non-specialist students who had completed at least one statistics course in different colleges of the universities in Saudi Arabia. These surveys were followed up with qualitative data collected via semi-structured interviews with 16 teachers of statistics from colleges within all six universities where statistics is taught to non-specialist students in Saudi Arabia’s Eastern Region. The data from questionnaires included several types, so different techniques were used in analysis. Descriptive statistics were used to identify the demographic characteristics of the participants. The chi-square test was used to determine associations between variables. Based on the main issues that are raised from literature review, the questions (items scales) were grouped and five key groups of questions were obtained which are: 1) Effectiveness of Teachers; 2) English Language; 3) Relevance of Course; 4) Student Engagement; 5) Using Technology. Exploratory data analysis was used to explore these issues in more detail. Furthermore, with the existence of clustering in the data (students within departments within colleges, within universities), multilevel generalized linear models for dichotomous analysis have been used to clarify the effects of clustering at those levels. Factor analysis was conducted confirming the dimension reduction of variables (items scales). The data from teachers’ interviews were analysed on an individual basis. The responses were assigned to one of the eight themes that emerged from within the data: 1) the lack of students’ motivation to learn statistics; 2) students' participation; 3) students’ assessment; 4) the effective use of technology; 5) the level of previous mathematical and statistical skills of non-specialist students; 6) the English language ability of non-specialist students; 7) the need for extra time for teaching and learning statistics; and 8) the role of administrators. All the data from students and teachers indicated that the situation of learning and teaching statistics to non-specialist students in Saudi universities needs to be improved in order to meet the needs of those students. The findings of this study suggested a weakness in the use of statistical software applications in these courses. This study showed that there is lack of application of technology such as statistical software programs in these courses, which would allow non-specialist students to consolidate their knowledge. The results also indicated that English language is considered one of the main challenges in learning and teaching statistics, particularly in institutions where English is not used as the main language. Moreover, the weakness of mathematical skills of students is considered another major challenge. Additionally, the results indicated that there was a need to tailor statistics courses to the needs of non-specialist students based on their main subjects. The findings indicate that statistics teachers need to choose appropriate methods when teaching statistics courses.