5 resultados para dependent data
em DigitalCommons@University of Nebraska - Lincoln
Resumo:
Dynamic conferencing refers to a scenario wherein any subset of users in a universe of users form a conference for sharing confidential information among themselves. The key distribution (KD) problem in dynamic conferencing is to compute a shared secret key for such a dynamically formed conference. In literature, the KD schemes for dynamic conferencing either are computationally unscalable or require communication among users, which is undesirable. The extended symmetric polynomial based dynamic conferencing scheme (ESPDCS) is one such KD scheme which has a high computational complexity that is universe size dependent. In this paper we present an enhancement to the ESPDCS scheme to develop a KD scheme called universe-independent SPDCS (UI-SPDCS) such that its complexity is independent of the universe size. However, the UI-SPDCS scheme does not scale with the conference size. We propose a relatively scalable KD scheme termed as DH-SPDCS that uses the UI-SPDCS scheme and the tree-based group Diffie- Hellman (TGDH) key exchange protocol. The proposed DH-SPDCS scheme provides a configurable trade-off between computation and communication complexity of the scheme.
Resumo:
1. The crabeater seal Lobodon carcinophaga is considered to be a key species in the krill-based food web of the Southern Ocean. Reliable estimates of the abundance of this species are necessary to allow the development of multispecies, predator–prey models as a basis for management of the krill fishery in the Southern Ocean. 2. A survey of crabeater seal abundance was undertaken in 1500 000 km2 of pack-ice off east Antarctica between longitudes 64–150° E during the austral summer of 1999/2000. Sighting surveys, using double observer line transect methods, were conducted from an icebreaker and two helicopters to estimate the density of seals hauled out on the ice in survey strips. Satellite-linked dive recorders were deployed on a sample of seals to estimate the probability of seals being hauled out on the ice at the times of day when sighting surveys were conducted. Model-based inference, involving fitting a density surface, was used to infer densities in the entire survey region from estimates in the surveyed areas. 3. Crabeater seal abundance was estimated to be between 0.7 and 1.4 million animals (with 95% confidence), with the most likely estimate slightly less than 1 million. 4. Synthesis and applications. The estimation of crabeater seal abundance in Convention for the Conservation of Antarctic Marine Living Resources (CCAMLR) management areas off east Antarctic where krill biomass has also been estimated recently provides the data necessary to begin extending from single-species to multispecies management of the krill fishery. Incorporation of all major sources of uncertainty allows a precautionary interpretation of crabeater abundance and demand for krill in keeping with CCAMLR’s precautionary approach to management. While this study focuses on the crabeater seal and management of living resources in the Southern Ocean, it has also led to technical and theoretical developments in survey methodology that have widespread potential application in ecological and resource management studies, and will contribute to a more fundamental understanding of the structure and function of the Southern Ocean ecosystem.
Resumo:
Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.
Resumo:
Evaluations of measurement invariance provide essential construct validity evidence. However, the quality of such evidence is partly dependent upon the validity of the resulting statistical conclusions. The presence of Type I or Type II errors can render measurement invariance conclusions meaningless. The purpose of this study was to determine the effects of categorization and censoring on the behavior of the chi-square/likelihood ratio test statistic and two alternative fit indices (CFI and RMSEA) under the context of evaluating measurement invariance. Monte Carlo simulation was used to examine Type I error and power rates for the (a) overall test statistic/fit indices, and (b) change in test statistic/fit indices. Data were generated according to a multiple-group single-factor CFA model across 40 conditions that varied by sample size, strength of item factor loadings, and categorization thresholds. Seven different combinations of model estimators (ML, Yuan-Bentler scaled ML, and WLSMV) and specified measurement scales (continuous, censored, and categorical) were used to analyze each of the simulation conditions. As hypothesized, non-normality increased Type I error rates for the continuous scale of measurement and did not affect error rates for the categorical scale of measurement. Maximum likelihood estimation combined with a categorical scale of measurement resulted in more correct statistical conclusions than the other analysis combinations. For the continuous and censored scales of measurement, the Yuan-Bentler scaled ML resulted in more correct conclusions than normal-theory ML. The censored measurement scale did not offer any advantages over the continuous measurement scale. Comparing across fit statistics and indices, the chi-square-based test statistics were preferred over the alternative fit indices, and ΔRMSEA was preferred over ΔCFI. Results from this study should be used to inform the modeling decisions of applied researchers. However, no single analysis combination can be recommended for all situations. Therefore, it is essential that researchers consider the context and purpose of their analyses.
Resumo:
Regression coefficients specify the partial effect of a regressor on the dependent variable. Sometimes the bivariate or limited multivariate relationship of that regressor variable with the dependent variable is known from population-level data. We show here that such population- level data can be used to reduce variance and bias about estimates of those regression coefficients from sample survey data. The method of constrained MLE is used to achieve these improvements. Its statistical properties are first described. The method constrains the weighted sum of all the covariate-specific associations (partial effects) of the regressors on the dependent variable to equal the overall association of one or more regressors, where the latter is known exactly from the population data. We refer to those regressors whose bivariate or limited multivariate relationships with the dependent variable are constrained by population data as being ‘‘directly constrained.’’ Our study investigates the improvements in the estimation of directly constrained variables as well as the improvements in the estimation of other regressor variables that may be correlated with the directly constrained variables, and thus ‘‘indirectly constrained’’ by the population data. The example application is to the marital fertility of black versus white women. The difference between white and black women’s rates of marital fertility, available from population-level data, gives the overall association of race with fertility. We show that the constrained MLE technique both provides a far more powerful statistical test of the partial effect of being black and purges the test of a bias that would otherwise distort the estimated magnitude of this effect. We find only trivial reductions, however, in the standard errors of the parameters for indirectly constrained regressors.