913 resultados para large-sample
Resumo:
The completion of and graduation from high school is a major problem that must be resolved. A 6% random sample of the students (1,059) who were in the 9th grade in Miami-Dade Public Schools, a large urban school district, September 1992, were selected for this study. The sample was divided into 2 groups, advanced academic and general track students. Each group was then divided into vocational and non-vocational. A causal comparative design was used to evaluate the results for graduate vs. non-graduate. The indicators were the program of study, attendance, standardized test scores, grade point average, ethnicity and gender. It was found that both advanced academic and general track students had significantly higher graduation rates at the .01 level when a vocational education program was part of their studies. All of the other indicators did not show any significant differences. If we arc to improve students educational outcomes and reduce the dropout rate, vocational education should be part of every student's education.^
Resumo:
Anthropogenic habitat alterations and water-management practices have imposed an artificial spatial scale onto the once contiguous freshwater marshes of the Florida Everglades. To gain insight into how these changes may affect biotic communities, we examined whether variation in the abundance and community structure of large fishes (SL . 8 cm) in Everglades marshes varied more at regional or intraregional scales, and whether this variation was related to hydroperiod, water depth, floating mat volume, and vegetation density. From October 1997 to October 2002, we used an airboat electrofisher to sample large fishes at sites within three regions of the Everglades. Each of these regions is subject to unique watermanagement schedules. Dry-down events (water depth , 10 cm) occurred at several sites during spring in 1999, 2000, 2001, and 2002. The 2001 dry-down event was the most severe and widespread. Abundance of several fishes decreased significantly through time, and the number of days post-dry-down covaried significantly with abundance for several species. Processes operating at the regional scale appear to play important roles in regulating large fishes. The most pronounced patterns in abundance and community structure occurred at the regional scale, and the effect size for region was greater than the effect size for sites nested within region for abundance of all species combined, all predators combined, and each of the seven most abundant species. Non-metric multi-dimensional scaling revealed distinct groupings of sites corresponding to the three regions. We also found significant variation in community structure through time that correlated with the number of days post-dry-down. Our results suggest that hydroperiod and water management at the regional scale influence large fish communities of Everglades marshes.
Resumo:
Exploring the relationship between early oral reading fluency ability and reading comprehension achievement among an ethnically and racially diverse sample of young learners from low-income families, attending elementary school within a large public school district in southeast Florida is the purpose of this longitudinal study. Although many studies have been conducted to address the relationship between oral reading fluency ability and reading comprehension achievement, most of the existing research failed either to disaggregate the data by demographic subgroups or secure a large enough sample of students to adequately represent the diverse subgroups. The research questions that guided this study were: (a) To what extent does early oral reading fluency ability measured in first, second, or third grade correlate with reading comprehension achievement in third grade? (b) To what extent does the relationship of early oral reading fluency ability and reading comprehension achievement vary by demographic subgroup membership (i.e., gender, race/ethnicity, socioeconomic status) among a diverse sample of students? A predictive research design using archived secondary data was employed in this nonexperimental quantitative methods study of 1,663 third grade students who attended a cohort of 25 Reading First funded schools. The data analyzed derived from the Dynamic Indicators of Basic Early Literacy Skills Oral Reading Fluency (DIBELS ORF) measure administered in first, second, and third grades and the Florida Comprehensive Assessment Test of the Sunshine State Standards (FCAT-SSS) Reading administered in third grade. Linear regression analyses between each of the oral reading fluency and reading comprehension measures produced significant positive correlations. Hierarchical regression analyses supported the predictive potential of all three oral reading fluency ability measures toward reading comprehension achievement, with the first grade oral reading fluency ability measure explaining the most significant variance in third grade reading comprehension achievement. Male students produced significant overall differences in variance when compared to female students as did the Other student subgroup (i.e., Asian, Multiracial, and Native American) when compared to Black, White, and Hispanic students. No significant differences in variance were produced between students from low and moderate socioeconomic families. These findings are vital toward adding to the literature of diverse young learners.
Resumo:
This research was undertaken to explore dimensions of the risk construct, identify factors related to risk-taking in education, and study risk propensity among employees at a community college. Risk-taking propensity (RTP) was measured by the 12-item BCDQ, which consisted of personal and professional risk-related situations balanced for the money, reputation, and satisfaction dimensions of the risk construct. Scoring ranged from 1.00 (most cautious) to 6.00 (most risky).^ Surveys including the BCDQ and seven demographic questions relating to age, gender, professional status, length of service, academic discipline, highest degree, and campus location were sent to faculty, administrators, and academic department heads. A total of 325 surveys were returned, resulting in a 66.7% response rate. Subjects were relatively homogeneous for age, length of service, and highest degree.^ Subjects were also homogeneous for risk-taking propensity: no substantive differences in RTP scores were noted within and among demographic groups, with the possible exception of academic discipline. The mean RTP score for all subjects was 3.77, for faculty was 3.76, for administrators was 3.83, and for department heads was 3.64.^ The relationship between propensity to take personal risks and propensity to take professional risks was tested by computing Pearson r correlation coefficients. The relationships for the total sample, faculty, and administrator groups were statistically significant, but of limited practical significance. Subjects were placed into risk categories by dividing the response scale into thirds. A 3 x 3 factorial ANOVA revealed no interaction effects between professional status and risk category with regard to RTP score. A discriminant analysis showed that a seven-factor model was not effective in predicting risk category.^ The homogeneity of the study sample and the effect of a risk-encouraging environment were discussed in the context of the community college. Since very little data on risk-taking in education is available, risk propensity data from this study could serve as a basis for comparison to future research. Results could be used by institutions to plan professional development activities, designed to increase risk-taking and encourage active acceptance of change. ^
Resumo:
The presence of inhibitory substances in biological forensic samples has, and continues to affect the quality of the data generated following DNA typing processes. Although the chemistries used during the procedures have been enhanced to mitigate the effects of these deleterious compounds, some challenges remain. Inhibitors can be components of the samples, the substrate where samples were deposited or chemical(s) associated to the DNA purification step. Therefore, a thorough understanding of the extraction processes and their ability to handle the various types of inhibitory substances can help define the best analytical processing for any given sample. A series of experiments were conducted to establish the inhibition tolerance of quantification and amplification kits using common inhibitory substances in order to determine if current laboratory practices are optimal for identifying potential problems associated with inhibition. DART mass spectrometry was used to determine the amount of inhibitor carryover after sample purification, its correlation to the initial inhibitor input in the sample and the overall effect in the results. Finally, a novel alternative at gathering investigative leads from samples that would otherwise be ineffective for DNA typing due to the large amounts of inhibitory substances and/or environmental degradation was tested. This included generating data associated with microbial peak signatures to identify locations of clandestine human graves. Results demonstrate that the current methods for assessing inhibition are not necessarily accurate, as samples that appear inhibited in the quantification process can yield full DNA profiles, while those that do not indicate inhibition may suffer from lowered amplification efficiency or PCR artifacts. The extraction methods tested were able to remove >90% of the inhibitors from all samples with the exception of phenol, which was present in variable amounts whenever the organic extraction approach was utilized. Although the results attained suggested that most inhibitors produce minimal effect on downstream applications, analysts should practice caution when selecting the best extraction method for particular samples, as casework DNA samples are often present in small quantities and can contain an overwhelming amount of inhibitory substances.
Resumo:
This research was undertaken to explore dimensions of the risk construct, identify factors related to risk-taking in education, and study risk propensity among employees at a community college. Risk-taking propensity (RTP) was measured by the 12-item BCDQ, which consisted of personal and professional risk-related situations balanced for the money, reputation, and satisfaction dimensions of the risk construct. Scoring ranged from 1.00 (most cautious) to 6.00 (most risky). Surveys including the BCDQ and seven demographic questions relating to age, gender, professional status, length of service, academic discipline, highest degree, and campus location were sent to faculty, administrators, and academic department heads. A total of 325 surveys were returned, resulting in a 66.7% response rate. Subjects were relatively homogeneous for age, length of service, and highest degree. Subjects were also homogeneous for risk-taking propensity: no substantive differences in RTP scores were noted within and among demographic groups, with the possible exception of academic discipline. The mean RTP score for all subjects was 3.77, for faculty was 3.76, for administrators was 3.83, and for department heads was 3.64. The relationship between propensity to take personal risks and propensity to take professional risks was tested by computing Pearson r correlation coefficients. The relationships for the total sample, faculty, and administrator groups were statistically significant, but of limited practical significance. Subjects were placed into risk categories by dividing the response scale into thirds. A 3 X 3 factorial ANOVA revealed no interaction effects between professional status and risk category with regard to RTP score. A discriminant analysis showed that a seven-factor model was not effective in predicting risk category. The homogeneity of the study sample and the effect of a risk encouraging environment were discussed in the context of the community college. Since very little data on risk-taking in education is available, risk propensity data from this study could serve as a basis for comparison to future research. Results could be used by institutions to plan professional development activities, designed to increase risk-taking and encourage active acceptance of change.
Resumo:
Current interest in measuring quality of life is generating interest in the construction of computerized adaptive tests (CATs) with Likert-type items. Calibration of an item bank for use in CAT requires collecting responses to a large number of candidate items. However, the number is usually too large to administer to each subject in the calibration sample. The concurrent anchor-item design solves this problem by splitting the items into separate subtests, with some common items across subtests; then administering each subtest to a different sample; and finally running estimation algorithms once on the aggregated data array, from which a substantial number of responses are then missing. Although the use of anchor-item designs is widespread, the consequences of several configuration decisions on the accuracy of parameter estimates have never been studied in the polytomous case. The present study addresses this question by simulation, comparing the outcomes of several alternatives on the configuration of the anchor-item design. The factors defining variants of the anchor-item design are (a) subtest size, (b) balance of common and unique items per subtest, (c) characteristics of the common items, and (d) criteria for the distribution of unique items across subtests. The results of this study indicate that maximizing accuracy in item parameter recovery requires subtests of the largest possible number of items and the smallest possible number of common items; the characteristics of the common items and the criterion for distribution of unique items do not affect accuracy.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
We report the results of direct measurement of remanent hysteresis loops on nanochains of BiFeO3 at room temperature under zero and ∼20 kOe magnetic field. We noticed a suppression of remanent polarization by nearly ∼40% under the magnetic field. The powder neutron diffraction data reveal significant ion displacements under a magnetic field which seems to be the origin of the suppression of polarization. The isolated nanoparticles, comprising the chains, exhibit evolution of ferroelectric domains under dc electric field and complete 180 switching in switching-spectroscopy piezoresponse force microscopy. They also exhibit stronger ferromagnetism with nearly an order of magnitude higher saturation magnetization than that of the bulk sample. These results show that the nanoscale BiFeO3 exhibits coexistence of ferroelectric and ferromagnetic order and a strong magnetoelectric multiferroic coupling at room temperature comparable to what some of the type-II multiferroics show at a very low temperature.
Resumo:
This reconnaissance study was undertaken to determine whether the mass extinctions and faunal successions that mark the Cretaceous/Tertiary (K/T) boundary left a discernible molecular fossil record in the sediments of this period. Lipid signatures of sediments taken from above and below the K/T boundary were compared in core and outcrop samples taken from two locations: the U.S. east coast continental margin (western Atlantic Ocean, DSDP Site 605) and Stevns Klint, Denmark. Four calcareous sediments taken from above and below the K/T boundary in DSDP Hole 605, Section 605-66-1, revealed changing lipid signatures between above and below that are characterized by a large component of unresolved naphthenic hydrocarbons and a homologous series of n-alkanes ranging from Ci6 to C33. These lipid signatures are attributed to an influx of a terrestrial higher plant component and to bacterial reworking of the sediments under partially anoxic depositional and/or diagenetic conditions. The outcrop samples from Stevns Klint had extremely low concentrations of indigenous lipids. The fish clay at the K/T boundary contained traces of microbial hydrocarbons and fatty acids, whereas the carbonates above and below had only microbial fatty acids and additional terrestrial resin acids. The data from both sites indicate a perturbation in the deposition of lipid compound classes across the K/T boundary.
Resumo:
Oceanic flood basalts are poorly understood, short-term expressions of highly increased heat flux and mass flow within the convecting mantle. The uniqueness of the Caribbean Large Igneous Province (CLIP, 92-74 Ma) with respect to other Cretaceous oceanic plateaus is its extensive sub-aerial exposures, providing an excellent basis to investigate the temporal and compositional relationships within a starting plume head. We present major element, trace element and initial Sr-Nd-Pb isotope composition of 40 extrusive rocks from the Caribbean Plateau, including onland sections in Costa Rica, Colombia and Curaçao as well as DSDP Sites in the Central Caribbean. Even though the lavas were erupted over an area of ~3*10**6 km**2, the majority have strikingly uniform incompatible element patterns (La/Yb=0.96+/-0.16, n=64 out of 79 samples, 2sigma) and initial Nd-Pb isotopic compositions (e.g. 143Nd/144Ndin=0.51291+/-3, epsilon-Nd i=7.3+/-0.6, 206Pb/204Pbin=18.86+/-0.12, n=54 out of 66, 2sigma). Lavas with endmember compositions have only been sampled at the DSDP Sites, Gorgona Island (Colombia) and the 65-60 Ma accreted Quepos and Osa igneous complexes (Costa Rica) of the subsequent hotspot track. Despite the relatively uniform composition of most lavas, linear correlations exist between isotope ratios and between isotope and highly incompatible trace element ratios. The Sr-Nd-Pb isotope and trace element signatures of the chemically enriched lavas are compatible with derivation from recycled oceanic crust, while the depleted lavas are derived from a highly residual source. This source could represent either oceanic lithospheric mantle left after ocean crust formation or gabbros with interlayered ultramafic cumulates of the lower oceanic crust. High 3He/4He in olivines of enriched picrites at Quepos are ~12 times higher than the atmospheric ratio suggesting that the enriched component may have once resided in the lower mantle. Evaluation of the Sm-Nd and U-Pb isotope systematics on isochron diagrams suggests that the age of separation of enriched and depleted components from the depleted MORB source mantle could have been <=500 Ma before CLIP formation and interpreted to reflect the recycling time of the CLIP source. Mantle plume heads may provide a mechanism for transporting large volumes of possibly young recycled oceanic lithosphere residing in the lower mantle back into the shallow MORB source mantle.
Resumo:
A comprehensive approach to sport expertise should consider the entire situation that is comprised of the person, the task, the environment, and the complex interplay of these components (Hackfort, 1986). Accordingly, the Developmental Model of Sport Participation (Côté, Baker, & Abernethy, 2007; Côté & Fraser-Thomas, 2007) provides a comprehensive framework for sport expertise that outlines different pathways of involvement in sport. In pathways one and two, early sampling serves as the foundation for both elite and recreational sport participation. Early sampling is based on two main elements of childhood sport participation: 1) involvement in various sports and 2) participation in deliberate play. In contrast, pathway three shows the course to elite performance through early specialization in one sport. Early specialization implies a focused involvement on one sport and a large number of deliberate practice activities with the goal of improving sport skills and performance during childhood. This paper proposes seven postulates regarding the role that sampling and deliberate play, as opposed to specialization and deliberate practice, can have during childhood in promoting continued participation and elite performance in sport.
Resumo:
Quantile regression (QR) was first introduced by Roger Koenker and Gilbert Bassett in 1978. It is robust to outliers which affect least squares estimator on a large scale in linear regression. Instead of modeling mean of the response, QR provides an alternative way to model the relationship between quantiles of the response and covariates. Therefore, QR can be widely used to solve problems in econometrics, environmental sciences and health sciences. Sample size is an important factor in the planning stage of experimental design and observational studies. In ordinary linear regression, sample size may be determined based on either precision analysis or power analysis with closed form formulas. There are also methods that calculate sample size based on precision analysis for QR like C.Jennen-Steinmetz and S.Wellek (2005). A method to estimate sample size for QR based on power analysis was proposed by Shao and Wang (2009). In this paper, a new method is proposed to calculate sample size based on power analysis under hypothesis test of covariate effects. Even though error distribution assumption is not necessary for QR analysis itself, researchers have to make assumptions of error distribution and covariate structure in the planning stage of a study to obtain a reasonable estimate of sample size. In this project, both parametric and nonparametric methods are provided to estimate error distribution. Since the method proposed can be implemented in R, user is able to choose either parametric distribution or nonparametric kernel density estimation for error distribution. User also needs to specify the covariate structure and effect size to carry out sample size and power calculation. The performance of the method proposed is further evaluated using numerical simulation. The results suggest that the sample sizes obtained from our method provide empirical powers that are closed to the nominal power level, for example, 80%.
Resumo:
It is increasingly recognized that ecological restoration demands conservation action beyond the borders of existing protected areas. This requires the coordination of land uses and management over a larger area, usually with a range of partners, which presents novel institutional challenges for conservation planners. Interviews were undertaken with managers of a purposive sample of large-scale conservation areas in the UK. Interviews were open-ended and analyzed using standard qualitative methods. Results show a wide variety of organizations are involved in large-scale conservation projects, and that partnerships take time to create and demand resilience in the face of different organizational practices, staff turnover, and short-term funding. Successful partnerships with local communities depend on the establishment of trust and the availability of external funds to support conservation land uses. We conclude that there is no single institutional model for large-scale conservation: success depends on finding institutional strategies that secure long-term conservation outcomes, and ensure that conservation gains are not reversed when funding runs out, private owners change priorities, or land changes hands.