975 resultados para data consistency


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Systematic protocols that use decision rules or scores arc, seen to improve consistency and transparency in classifying the conservation status of species. When applying these protocols, assessors are typically required to decide on estimates for attributes That are inherently uncertain, Input data and resulting classifications are usually treated as though they arc, exact and hence without operator error We investigated the impact of data interpretation on the consistency of protocols of extinction risk classifications and diagnosed causes of discrepancies when they occurred. We tested three widely used systematic classification protocols employed by the World Conservation Union, NatureServe, and the Florida Fish and Wildlife Conservation Commission. We provided 18 assessors with identical information for 13 different species to infer estimates for each of the required parameters for the three protocols. The threat classification of several of the species varied from low risk to high risk, depending on who did the assessment. This occurred across the three Protocols investigated. Assessors tended to agree on their placement of species in the highest (50-70%) and lowest risk categories (20-40%), but There was poor agreement on which species should be placed in the intermediate categories, Furthermore, the correspondence between The three classification methods was unpredictable, with large variation among assessors. These results highlight the importance of peer review and consensus among multiple assessors in species classifications and the need to be cautious with assessments carried out 4), a single assessor Greater consistency among assessors requires wide use of training manuals and formal methods for estimating parameters that allow uncertainties to be represented, carried through chains of calculations, and reported transparently.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigated cross-cultural differences in the factor structure and psychometric properties of the 75-item Young Schema Questionnaire-Short Form (YSQ-SF). Participants were 833 South Korean and 271 Australian undergraduate students. The South Korean sample was randomly divided into two sub-samples. Sample A was used for Exploratory Factor Analysis (EFA) and sample B was used for Confirmatory Factor Analysis (CFA). EFA for the South Korean sample revealed a 13-factor solution to be the best fit for the data, and CFA on the data from sample B confirmed this result. CFA on the data from the Australian sample also revealed a 13-factor solution. The overall scale of the YSQ-SF demonstrated a high level of internal consistency in the South Korean and Australian groups. Furthermore, adequate internal consistencies for all subscales in the South Korean and Australian samples were demonstrated. In conclusion, the results showed that YSQ-SF with 13 factors has good psychometric properties and reliability for South Korean and Australian University students. Korean samples had significantly higher YSD scores on most of the 13 subscales than the Australian sample. However, limitations of the current study preclude the generalisability of the findings to beyond undergraduate student populations. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of the present study is to test the case linkage principles of behavioural consistency and behavioural distinctiveness using serial vehicle theft data. Data from 386 solved vehicle thefts committed by 193 offenders were analysed using Jaccard's, regression and Receiver Operating Characteristic analyses to determine whether objectively observable aspects of crime scene behaviour could be used to distinguish crimes committed by the same offender from those committed by different offenders. The findings indicate that spatial behaviour, specifically the distance between theft locations and between dump locations, is a highly consistent and distinctive aspect of vehicle theft behaviour; thus, intercrime and interdump distance represent the most useful aspects of vehicle theft for the purpose of case linkage analysis. The findings have theoretical and practical implications for understanding of criminal behaviour and for the development of decision-support tools to assist police investigation and apprehension of serial vehicle theft offenders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Indicators which summarise the characteristics of spatiotemporal data coverages significantly simplify quality evaluation, decision making and justification processes by providing a number of quality cues that are easy to manage and avoiding information overflow. Criteria which are commonly prioritised in evaluating spatial data quality and assessing a dataset’s fitness for use include lineage, completeness, logical consistency, positional accuracy, temporal and attribute accuracy. However, user requirements may go far beyond these broadlyaccepted spatial quality metrics, to incorporate specific and complex factors which are less easily measured. This paper discusses the results of a study of high level user requirements in geospatial data selection and data quality evaluation. It reports on the geospatial data quality indicators which were identified as user priorities, and which can potentially be standardised to enable intercomparison of datasets against user requirements. We briefly describe the implications for tools and standards to support the communication and intercomparison of data quality, and the ways in which these can contribute to the generation of a GEO label.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Results from sediment trap experiments conducted in the seasonal upwelling area off south Java from November 2000 until July 2003 revealed significant monsoon-, El Niño-Southern Oscillation-, and Indian Ocean Dipole-induced seasonal and interannual variations in flux and shell geochemistry of planktonic foraminifera. Surface net primary production rates together with total and species-specific planktonic foraminiferal flux rates were highest during the SE monsoon-induced coastal upwelling period from July to October, with three species Globigerina bulloides, Neogloboquadrina pachyderma dex., and Globigerinita glutinata contributing to 40% of the total foraminiferal flux. Shell stable oxygen isotopes (d18O) and Mg/Ca data of Globigerinoides ruber sensu stricto (s.s.), G. ruber sensu lato (s.l.), Neogloboquadrina dutertrei, Pulleniatina obliquiloculata, and Globorotalia menardii in the sediment trap time series recorded surface and subsurface conditions. We infer habitats of 0-30 m for G. ruber at the mixed layer depth, 60-80 m (60-90 m) for P. obliquiloculata (N. dutertrei) at the upper thermocline depth, and 90-110 m (100-150 m) for G. menardii in the 355-500 mm (>500 µm) size fraction corresponding to the (lower) thermocline depth in the study area. Shell Mg/Ca ratio of G. ruber (s.l. and s.s.) reveals an exponential relationship with temperature that agrees with published relationships particularly with the Anand et al. (2003) equations. Flux-weighted foraminiferal data in sediment trap are consistent with average values in surface sediment samples off SW Indonesia. This consistency confirms the excellent potential of these proxies for reconstructing past environmental conditions in this part of the ocean realm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In an effort to achieve greater consistency and comparability in state‐wide seat belt use reporting, the National Highway Traffic Safety Administration (NHTSA) issued new requirements in 2011 for observing and reporting future seat belt use. The requirements included the involvement of a qualified statistician in the sampling and weighting portions of the process as well as a variety of operational details. The Iowa Governor’s Traffic Safety Bureau contracted with Iowa State University’s Survey & Behavioral Research Services (SBRS) in 2011 to develop the study design and data collection plan for the State of Iowa annual survey that would meet the new requirements of the NHTSA. A seat belt survey plan for Iowa was developed by SBRS with statistical expertise provided by Zhengyuan Zhu, Ph.D., Associate Professor of Statistics at Iowa State University. The Iowa plan was submitted to NHTSA in December of 2011 and official approval was received on March 19, 2012.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The protein lysate array is an emerging technology for quantifying the protein concentration ratios in multiple biological samples. It is gaining popularity, and has the potential to answer questions about post-translational modifications and protein pathway relationships. Statistical inference for a parametric quantification procedure has been inadequately addressed in the literature, mainly due to two challenges: the increasing dimension of the parameter space and the need to account for dependence in the data. Each chapter of this thesis addresses one of these issues. In Chapter 1, an introduction to the protein lysate array quantification is presented, followed by the motivations and goals for this thesis work. In Chapter 2, we develop a multi-step procedure for the Sigmoidal models, ensuring consistent estimation of the concentration level with full asymptotic efficiency. The results obtained in this chapter justify inferential procedures based on large-sample approximations. Simulation studies and real data analysis are used to illustrate the performance of the proposed method in finite-samples. The multi-step procedure is simpler in both theory and computation than the single-step least squares method that has been used in current practice. In Chapter 3, we introduce a new model to account for the dependence structure of the errors by a nonlinear mixed effects model. We consider a method to approximate the maximum likelihood estimator of all the parameters. Using the simulation studies on various error structures, we show that for data with non-i.i.d. errors the proposed method leads to more accurate estimates and better confidence intervals than the existing single-step least squares method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recent advent of new technologies has led to huge amounts of genomic data. With these data come new opportunities to understand biological cellular processes underlying hidden regulation mechanisms and to identify disease related biomarkers for informative diagnostics. However, extracting biological insights from the immense amounts of genomic data is a challenging task. Therefore, effective and efficient computational techniques are needed to analyze and interpret genomic data. In this thesis, novel computational methods are proposed to address such challenges: a Bayesian mixture model, an extended Bayesian mixture model, and an Eigen-brain approach. The Bayesian mixture framework involves integration of the Bayesian network and the Gaussian mixture model. Based on the proposed framework and its conjunction with K-means clustering and principal component analysis (PCA), biological insights are derived such as context specific/dependent relationships and nested structures within microarray where biological replicates are encapsulated. The Bayesian mixture framework is then extended to explore posterior distributions of network space by incorporating a Markov chain Monte Carlo (MCMC) model. The extended Bayesian mixture model summarizes the sampled network structures by extracting biologically meaningful features. Finally, an Eigen-brain approach is proposed to analyze in situ hybridization data for the identification of the cell-type specific genes, which can be useful for informative blood diagnostics. Computational results with region-based clustering reveals the critical evidence for the consistency with brain anatomical structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper explores the effect of using regional data for livestock attributes on estimation of greenhouse gas (GHG) emissions for the northern beef industry in Australia, compared with using state/territory-wide values, as currently used in Australia’s national GHG inventory report. Regional GHG emissions associated with beef production are reported for 21 defined agricultural statistical regions within state/territory jurisdictions. A management scenario for reduced emissions that could qualify as an Emissions Reduction Fund (ERF) project was used to illustrate the effect of regional level model parameters on estimated abatement levels. Using regional parameters, instead of state level parameters, for liveweight (LW), LW gain and proportion of cows lactating and an expanded number of livestock classes, gives a 5.2% reduction in estimated emissions (range +12% to –34% across regions). Estimated GHG emissions intensity (emissions per kilogram of LW sold) varied across the regions by up to 2.5-fold, ranging from 10.5 kg CO2-e kg–1 LW sold for Darling Downs, Queensland, through to 25.8 kg CO2-e kg–1 LW sold for the Pindan and North Kimberley, Western Australia. This range was driven by differences in production efficiency, reproduction rate, growth rate and survival. This suggests that some regions in northern Australia are likely to have substantial opportunities for GHG abatement and higher livestock income. However, this must be coupled with the availability of management activities that can be implemented to improve production efficiency; wet season phosphorus (P) supplementation being one such practice. An ERF case study comparison showed that P supplementation of a typical-sized herd produced an estimated reduction of 622 t CO2-e year–1, or 7%, compared with a non-P supplemented herd. However, the different model parameters used by the National Inventory Report and ERF project means that there was an anomaly between the herd emissions for project cattle excised from the national accounts (13 479 t CO2-e year–1) and the baseline herd emissions estimated for the ERF project (8 896 t CO2-e year–1) before P supplementation was implemented. Regionalising livestock model parameters in both ERF projects and the national accounts offers the attraction of being able to more easily and accurately reflect emissions savings from this type of emissions reduction project in Australia’s national GHG accounts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation research points out major challenging problems with current Knowledge Organization (KO) systems, such as subject gateways or web directories: (1) the current systems use traditional knowledge organization systems based on controlled vocabulary which is not very well suited to web resources, and (2) information is organized by professionals not by users, which means it does not reflect intuitively and instantaneously expressed users’ current needs. In order to explore users’ needs, I examined social tags which are user-generated uncontrolled vocabulary. As investment in professionally-developed subject gateways and web directories diminishes (support for both BUBL and Intute, examined in this study, is being discontinued), understanding characteristics of social tagging becomes even more critical. Several researchers have discussed social tagging behavior and its usefulness for classification or retrieval; however, further research is needed to qualitatively and quantitatively investigate social tagging in order to verify its quality and benefit. This research particularly examined the indexing consistency of social tagging in comparison to professional indexing to examine the quality and efficacy of tagging. The data analysis was divided into three phases: analysis of indexing consistency, analysis of tagging effectiveness, and analysis of tag attributes. Most indexing consistency studies have been conducted with a small number of professional indexers, and they tended to exclude users. Furthermore, the studies mainly have focused on physical library collections. This dissertation research bridged these gaps by (1) extending the scope of resources to various web documents indexed by users and (2) employing the Information Retrieval (IR) Vector Space Model (VSM) - based indexing consistency method since it is suitable for dealing with a large number of indexers. As a second phase, an analysis of tagging effectiveness with tagging exhaustivity and tag specificity was conducted to ameliorate the drawbacks of consistency analysis based on only the quantitative measures of vocabulary matching. Finally, to investigate tagging pattern and behaviors, a content analysis on tag attributes was conducted based on the FRBR model. The findings revealed that there was greater consistency over all subjects among taggers compared to that for two groups of professionals. The analysis of tagging exhaustivity and tag specificity in relation to tagging effectiveness was conducted to ameliorate difficulties associated with limitations in the analysis of indexing consistency based on only the quantitative measures of vocabulary matching. Examination of exhaustivity and specificity of social tags provided insights into particular characteristics of tagging behavior and its variation across subjects. To further investigate the quality of tags, a Latent Semantic Analysis (LSA) was conducted to determine to what extent tags are conceptually related to professionals’ keywords and it was found that tags of higher specificity tended to have a higher semantic relatedness to professionals’ keywords. This leads to the conclusion that the term’s power as a differentiator is related to its semantic relatedness to documents. The findings on tag attributes identified the important bibliographic attributes of tags beyond describing subjects or topics of a document. The findings also showed that tags have essential attributes matching those defined in FRBR. Furthermore, in terms of specific subject areas, the findings originally identified that taggers exhibited different tagging behaviors representing distinctive features and tendencies on web documents characterizing digital heterogeneous media resources. These results have led to the conclusion that there should be an increased awareness of diverse user needs by subject in order to improve metadata in practical applications. This dissertation research is the first necessary step to utilize social tagging in digital information organization by verifying the quality and efficacy of social tagging. This dissertation research combined both quantitative (statistics) and qualitative (content analysis using FRBR) approaches to vocabulary analysis of tags which provided a more complete examination of the quality of tags. Through the detailed analysis of tag properties undertaken in this dissertation, we have a clearer understanding of the extent to which social tagging can be used to replace (and in some cases to improve upon) professional indexing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A Similar Exposure Group (SEG) can be created through the evaluation of workers performing the same or similar task, hazards they are exposed to, frequency and duration of their exposures, engineering controls available during their operations, personal protective equipment used, and exposure data. For this report, the samples of one facility that has collected nearly 40,000 various types of samples will be evaluated to determine if the creation of a SEG can be supported. The data will be reviewed for consistency with collection methods and laboratory detection limits. A subset of the samples may be selected based on the review. Data will also be statistically evaluated in order to determine whether the data is sufficient to terminate the sampling. IHDataAnalyst V1.27 will be used to assess the data. This program uses Bayesian Analysis to assist in making determinations. The 95 percent confidence interval will be calculated and evaluated in making decisions. This evaluation will be used to determine if a SEG can be created for any of the workers and determine the need for future sample collection. The data and evaluation presented in this report have been selected and evaluated specifically for the purposes of this project.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In an effort to achieve greater consistency and comparability in state-wide seat belt use reporting, the National Highway Traffic Safety Administration (NHTSA) issued new requirements in 2011 for observing and reporting future seat belt use. The requirements included the involvement of a qualified statistician in the sampling and weighting portions of the process as well as a variety of operational details. The Iowa Governor’s Traffic Safety Bureau contracted with Iowa State University’s Survey & Behavioral Research Services (SBRS) in 2011 to develop the study design and data collection plan for the State of Iowa annual survey that would meet the new requirements of the NHTSA. A seat belt survey plan for Iowa was developed by SBRS with statistical expertise provided by Zhengyuan Zhu, Ph.D., Associate Professor of Statistics at Iowa State University and was approved by NHTSA on March 19, 2012.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In an effort to achieve greater consistency and comparability in state-wide seat belt use reporting, the National Highway Traffic Safety Administration (NHTSA) issued new requirements in 2011 for observing and reporting future seat belt use. The requirements included the involvement of a qualified statistician in the sampling and weighting portions of the process as well as a variety of operational details. The Iowa Governor’s Traffic Safety Bureau contracted with Iowa State University’s Survey & Behavioral Research Services (SBRS) in 2011 to develop the study design and data collection plan for the State of Iowa annual survey that would meet the new requirements of the NHTSA. A seat belt survey plan for Iowa was developed by SBRS with statistical expertise provided by Zhengyuan Zhu, Ph.D., Associate Professor of Statistics at Iowa State University and was approved by NHTSA on March 19, 2012.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In an effort to achieve greater consistency and comparability in state-wide seat belt use reporting, the National Highway Traffic Safety Administration (NHTSA) issued new requirements in 2011 for observing and reporting future seat belt use. The requirements included the involvement of a qualified statistician in the sampling and weighting portions of the process as well as a variety of operational details. The Iowa Governor’s Traffic Safety Bureau contracted with Iowa State University’s Survey & Behavioral Research Services (SBRS) in 2011 to develop the study design and data collection plan for the State of Iowa annual survey that would meet the new requirements of the NHTSA. A seat belt survey plan for Iowa was developed by SBRS with statistical expertise provided by Zhengyuan Zhu, Ph.D., Associate Professor of Statistics at Iowa State University and Director of the Center for Survey Statistics and Methodology. The plan was approved by NHTSA on March 19, 2012.