17 resultados para Healthcare Big Data Analytics

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Similar to other health care processes, referrals are susceptible to breakdowns. These breakdowns in the referral process can lead to poor continuity of care, slow diagnostic processes, delays and repetition of tests, patient and provider dissatisfaction, and can lead to a loss of confidence in providers. These facts and the necessity for a deeper understanding of referrals in healthcare served as the motivation to conduct a comprehensive study of referrals. The research began with the real problem and need to understand referral communication as a mean to improve patient care. Despite previous efforts to explain referrals and the dynamics and interrelations of the variables that influence referrals there is not a common, contemporary, and accepted definition of what a referral is in the health care context. The research agenda was guided by the need to explore referrals as an abstract concept by: 1) developing a conceptual definition of referrals, and 2) developing a model of referrals, to finally propose a 3) comprehensive research framework. This dissertation has resulted in a standard conceptual definition of referrals and a model of referrals. In addition a mixed-method framework to evaluate referrals was proposed, and finally a data driven model was developed to predict whether a referral would be approved or denied by a specialty service. The three manuscripts included in this dissertation present the basis for studying and assessing referrals using a common framework that should allow an easier comparative research agenda to improve referrals taking into account the context where referrals occur.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The current state of health and biomedicine includes an enormity of heterogeneous data ‘silos’, collected for different purposes and represented differently, that are presently impossible to share or analyze in toto. The greatest challenge for large-scale and meaningful analyses of health-related data is to achieve a uniform data representation for data extracted from heterogeneous source representations. Based upon an analysis and categorization of heterogeneities, a process for achieving comparable data content by using a uniform terminological representation is developed. This process addresses the types of representational heterogeneities that commonly arise in healthcare data integration problems. Specifically, this process uses a reference terminology, and associated "maps" to transform heterogeneous data to a standard representation for comparability and secondary use. The capture of quality and precision of the “maps” between local terms and reference terminology concepts enhances the meaning of the aggregated data, empowering end users with better-informed queries for subsequent analyses. A data integration case study in the domain of pediatric asthma illustrates the development and use of a reference terminology for creating comparable data from heterogeneous source representations. The contribution of this research is a generalized process for the integration of data from heterogeneous source representations, and this process can be applied and extended to other problems where heterogeneous data needs to be merged.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective Interruptions are known to have a negative impact on activity performance. Understanding how an interruption contributes to human error is limited because there is not a standard method for analyzing and classifying interruptions. Qualitative data are typically analyzed by either a deductive or an inductive method. Both methods have limitations. In this paper a hybrid method was developed that integrates deductive and inductive methods for the categorization of activities and interruptions recorded during an ethnographic study of physicians and registered nurses in a Level One Trauma Center. Understanding the effects of interruptions is important for designing and evaluating informatics tools in particular and for improving healthcare quality and patient safety in general. Method The hybrid method was developed using a deductive a priori classification framework with the provision of adding new categories discovered inductively in the data. The inductive process utilized line-by-line coding and constant comparison as stated in Grounded Theory. Results The categories of activities and interruptions were organized into a three-tiered hierarchy of activity. Validity and reliability of the categories were tested by categorizing a medical error case external to the study. No new categories of interruptions were identified during analysis of the medical error case. Conclusions Findings from this study provide evidence that the hybrid model of categorization is more complete than either a deductive or an inductive method alone. The hybrid method developed in this study provides the methodical support for understanding, analyzing, and managing interruptions and workflow.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE: Interruptions are known to have a negative impact on activity performance. Understanding how an interruption contributes to human error is limited because there is not a standard method for analyzing and classifying interruptions. Qualitative data are typically analyzed by either a deductive or an inductive method. Both methods have limitations. In this paper, a hybrid method was developed that integrates deductive and inductive methods for the categorization of activities and interruptions recorded during an ethnographic study of physicians and registered nurses in a Level One Trauma Center. Understanding the effects of interruptions is important for designing and evaluating informatics tools in particular as well as improving healthcare quality and patient safety in general. METHOD: The hybrid method was developed using a deductive a priori classification framework with the provision of adding new categories discovered inductively in the data. The inductive process utilized line-by-line coding and constant comparison as stated in Grounded Theory. RESULTS: The categories of activities and interruptions were organized into a three-tiered hierarchy of activity. Validity and reliability of the categories were tested by categorizing a medical error case external to the study. No new categories of interruptions were identified during analysis of the medical error case. CONCLUSIONS: Findings from this study provide evidence that the hybrid model of categorization is more complete than either a deductive or an inductive method alone. The hybrid method developed in this study provides the methodical support for understanding, analyzing, and managing interruptions and workflow.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

People often use tools to search for information. In order to improve the quality of an information search, it is important to understand how internal information, which is stored in user’s mind, and external information, represented by the interface of tools interact with each other. How information is distributed between internal and external representations significantly affects information search performance. However, few studies have examined the relationship between types of interface and types of search task in the context of information search. For a distributed information search task, how data are distributed, represented, and formatted significantly affects the user search performance in terms of response time and accuracy. Guided by UFuRT (User, Function, Representation, Task), a human-centered process, I propose a search model, task taxonomy. The model defines its relationship with other existing information models. The taxonomy clarifies the legitimate operations for each type of search task of relation data. Based on the model and taxonomy, I have also developed prototypes of interface for the search tasks of relational data. These prototypes were used for experiments. The experiments described in this study are of a within-subject design with a sample of 24 participants recruited from the graduate schools located in the Texas Medical Center. Participants performed one-dimensional nominal search tasks over nominal, ordinal, and ratio displays, and searched one-dimensional nominal, ordinal, interval, and ratio tasks over table and graph displays. Participants also performed the same task and display combination for twodimensional searches. Distributed cognition theory has been adopted as a theoretical framework for analyzing and predicting the search performance of relational data. It has been shown that the representation dimensions and data scales, as well as the search task types, are main factors in determining search efficiency and effectiveness. In particular, the more external representations used, the better search task performance, and the results suggest the ideal search performance occurs when the question type and corresponding data scale representation match. The implications of the study lie in contributing to the effective design of search interface for relational data, especially laboratory results, which are often used in healthcare activities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective. The purpose of this study was to determine the relationship between ethnicity and skin cancer risk perception while controlling for other risk factors: education, gender, age, access to healthcare, family history of skin cancer, fear, and worry. ^ Methods. This study utilized the Health Information National Trends Survey (HINTS) dataset, a nationally representative sample of 5,586 individuals 18 years of age or older. One third of the respondents were chosen at random and asked questions involving skin cancer. Analysis was based on questions that identified skin cancer risk perception, fear of finding skin cancer, and frequency of worry about skin cancer and a variety of sociodemographic factors. ^ Results. Ethnicity had a significant impact on risk perception scores while controlling for other risk factors. Other risk factors that also had a significant impact on risk perception scores included family history of skin cancer, age, and worry. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Between the 1990 and 2000 Censuses, the Latino population accounted for 40% of the increase in the nation’s total population. The growing population of Latinos underscores the importance for understanding factors that influence whether and how Latinos take care of their health. According to the U.S. Department of Human Health Service’s Office of Minority Health (OMH), Latinos are at greater risk for health disparities (2003). Factors such as lack of health insurance and access to preventive care play a major role in limiting Latino use of primary health care (Institute of Medicine, 2005). Other significant barriers to preventive health care maintenance behaviors have been identified in current literature such as primary care physician interaction, self-perceived health status, and socio-cultural beliefs and traditions (Rojas-Guyler, King, Montieth and 2008; Meir, Medina, and Ory, 2007; Black, 1999). Despite these studies, there remains less information regarding interpersonal perceptions, environmental dynamics and individual and cultural attitudes relevant to utilization of healthcare (Rojas-Guyler, King, Montieth and 2008; Aguirre-Molina, Molina and Zambrana, 2001). Understanding the perceptions of Latinos and the barriers to health care could directly affect healthcare delivery. Improved healthcare utilization among Latinos could reduce the long term health consequences of many preventable and manageable diseases. The purpose of this study was to explore Latino perceptions of U.S. health care and desired changes by Latinos in the U.S. healthcare system. The study had several objectives, including to explore perceived barriers to healthcare utilization and the resulting effects on health among Latinos, to describe culturally influenced attitudes about health care and use of health care services among Latinos, and to make recommendations for reducing disparities by improving healthcare and its utilization. The current study utilized data that were collected as part of a larger study to examine multidimensional, cross-cultural issues relevant to interactions between healthcare consumers and providers. Qualitative methods were used to analyze four Spanish-language focus group transcripts to interpret cultural influences on perceptions and beliefs among Latinos. Direct coding of transcript content was carried out by two reviewers, who conducted independent reviews of each transcript. Team members developed and refined thematic categories, positive and negative cases, and example text segments for each theme and sub-theme. Incongruities of interpretations were resolved through extensive discussion. Study participants included 44 self-identified Latino adults (16 male, 28 female) between age 18 and 64 years. Thirty seven (84.1%) of the participants were immigrants. The study population comprised eight ethnic subgroups. While 31% of the participants reported being employed on a full-time basis, only 18.4% had medical insurance that was private or employee sponsored. Five major themes regarding the perceptions and healthcare utilization behaviors of Latinos were consistent across all focus groups and were identified during the analysis. These were: (1) healthcare utilization, experience, and access; (2) organizational and institutional systems; (3) communication and interpersonal interactions between healthcare provider, staff, and patient; (4) Latinos’ perception of their own health status; (5) cultural influences on healthcare utilization, which included an innovation termed culturally-bound locus of control. Healthcare utilization was directly influenced by healthcare experience, access, current health status, and cultural factors and indirectly influenced by organizational systems. There was a strong interdependence among the main themes. The ability to communicate and interact effectively with healthcare providers and navigate healthcare systems (organizational and institutional access) significantly influenced the participant’s health care experience, most often (indirectly) impacting utilization negatively. ^ Research such as this can help to identify those perceptions and attitudes held by Latinos concerning utilization or underutilization of healthcare systems. These data suggest that for healthcare utilization to improve among Latinos, healthcare systems must create more culturally competent environments by providing better language services at the organizational level and more culturally sensitive providers at the interpersonal level. Better understanding of the complex interactions between these impediments can aid intervention developments, and help health providers and researchers in determining appropriate, adequate, and effective measurers of care to better increase overall health of Latinos.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective. To determine the association between nativity status and mammography utilization among women in the U.S. and assess whether demographic variables, socioeconomic factors healthcare access, breast cancer risk factors and acculturation variables were predictors in the relationship between nativity status and mammography in the past two years. ^ Methods. The NHIS collects demographic and health information using face-to-face interviews among a representative sample of the U.S. population and a cancer control module assessing screening behaviors is included every five years. Descriptive statistics were used to report demographic characteristics of women aged 40 and older who have received a mammogram in the last 2 years from 2000 and 2005. We used chi square analyses to determine statistically significant differences by mammography screening for each covariate. Logistic regression was used to determine whether demographic characteristics, socioeconomic characteristics, healthcare access, breast cancer risk factors and acculturation variables among foreign-born Hispanics affected the relationship between nativity status and mammography use in the past 2 years. ^ Results. In 2000, the crude model between nativity and mammography was significant but results were not significant after adjusting for health insurance, access and reported health status. Significant results were also reported for years in U.S. and mammography among foreign-born born women. In 2005, the crude model was also significant but results were not significant after adjusting for demographic factors. Furthermore, there was a significant finding between citizenship and mammography in the past 2 years. ^ Conclusions. Our study contributes to the literature as one of the first national-based studies assessing mammography in the past two years based on nativity status. Based on our findings, health insurance and access to care is an important predictor in mammography utilization among foreign-born women. For those with health care access, physician recommendation should further be assessed to determine whether women are made aware of mammography as a means to detect breast cancer at an early stage and further reduce the risk of mortality from the breast cancer.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Increasing attention has been given to the problem of medical errors over the past decade. Included within that focused attention has been a strong interest in reducing the occurrence of healthcare-associated infections (HAIs). Acting concurrently with federal initiatives, the majority of U.S. states have statutorily required reporting and public disclosure of HAI data. Although the occurrence of these state statutory enactments and other state initiatives represent a recognition of the strong concern pertaining to HAIs, vast differences in each state’s HAI reporting and public disclosure requirements creates a varied and unequal response to what has become a national problem.^ The purpose of this research was to explore the variations in state HAI legal requirements and other state mandates. State actions, including statutory enactments, regulations, and other initiatives related to state reporting and public disclosure mechanisms were compared, discussed, and analyzed in an effort to illustrate the impact of the lack of uniformity as a public health concern.^ The HAI statutes, administrative requirements, and other mandates of each state and two U.S. territories were reviewed to answer the following seven research questions: How far has the state progressed in its HAI initiative? If the state has a HAI reporting requirement, is it mandatory or voluntary? What healthcare entities are subject to the reporting requirements? What data collection system is utilized? What measures are required to be reported? What is the public disclosure mechanism? How is the underlying reported information protected from public disclosure or other legal release?^ Secondary publicly available data, including state statutes, administrative rules, and other initiatives, were utilized to examine the current HAI-related legislative and administrative activity of the study subjects. The information was reviewed and analyzed to determine variations in HAI reporting and public disclosure laws. Particular attention was given to the seven key research questions.^ The research revealed that considerable progress has been achieved in state HAI initiatives since 2004. Despite this progress, however, when reviewing the state laws and HAI programs comparatively, considerable variations were found to exist with regards to the type of reporting requirements, healthcare facilities subject to the reporting laws, data collection systems utilized, reportable measures, public disclosure requirements, and confidentiality and privilege provisions. The wide variations in state statutes, administrative rules, and other agency directives create a fragmented and inconsistent approach to addressing the nationwide occurrence of HAIs in the U.S. healthcare system. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this culminating experience was to investigate the relationships between healthcare utilization, insurance coverage, and socioeconomic characteristics of children with asthma along the Texas-Mexico Border. A secondary data analysis was conducted on cross-sectional data from the Texas Child Asthma Call-back Survey, a follow-up survey to the random digit dialed Behavior Risk Factor Surveillance Study (BRFSS) conducted between 2006-2009 ( n = 556 adults living in households with a child with asthma).^ The proportion of Hispanic children with asthma in Border areas of Texas was more than twice that of non-Border areas (84.8% vs. 28.8%). Parents in Border areas were less likely to have their own health insurance (OR = 0.251, 95% C.I. = 0.117-0.540) and less likely to complete the survey in English than Spanish (OR = 0.251 95% C.I. = 0.117-0.540) than parents in non-Border areas. No significant socio-economic or health care utilization differences were noted between Hispanic children living in Border areas compared to Hispanic children living in non-Border areas. Children with asthma along the Texas-Mexico Border, regardless of ethnicity and language, have insurance coverage rates, reported cost barriers to care, symptom management, and medication usage patterns similar to those in non-Border areas. When compared to English-speakers, Spanish-speaking parents in Texas as a whole are far less likely to be taught what to do during an asthma attack (50.2% vs. 78.6%).^ Language preference, rather than ethnicity or geographical residence, played a larger role on childhood asthma-related health disparities for children in Texas. Spanish-speaking parents in are less likely to receive adequate asthma self-management education. Investigating the effects of Hispanic acculturation rates and incongruent parent-child health insurance coverage may provide better insight into the health disparities of children along the Texas-Mexico Border.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Stakeholder groups with special interests as donors to finance congressional campaigns have been a controversial issue in the United Sates. While previous studies concentrated on whether a connection existed between the campaign contributions provided by stakeholder groups and the voting behavior of congressional members, there is little evidence to show the trend of allocation of their campaign contributions to their favorite candidates during the elections. This issue has become increasingly important in the health sector since the health care reform bill was passed in early 2010.^ This study examined the long-term trend of campaign contributions offered by various top healthcare stakeholder groups to particular political parties (i.e. Democrat and Republican). The main focus of this paper was to observe and describe the financial donations provided by these healthcare stakeholder groups in the congressional election cycles from 1990 to 2008 in order to obtain an overview of their patterns of campaign contributions. Their contributing behaviors were characterized based on the campaign finance data collected by the Center for Responsive Politics (CRP). Specifically, I answered the questions: (1) to which political party did specific healthcare stakeholder groups give money and (2) what was the pattern of their campaign contributions from 1990 to 2008?^ The findings of my study revealed that the healthcare stakeholder groups had different political party preferences and partisanship orientations regarding the Democratic or Republican Party. These differences were obvious throughout the election cycles from 1990 to 2008 and their distinct patterns of financial contribution were evident across industries in the health sector as well. Among all the healthcare stakeholder groups in this study, physicians were the top contributors in the congressional election. The pharmaceutical industry was the only group where the majority of contribution funds were allocated to Republicans in every election period studied. This study found that no interest group has succeeded in electing the preferred congressional candidate by giving the majority of its financial support to the winning party in every election. Chiropractors, hospitals/nursing homes, and health services/HMOs performed better than other healthcare stakeholder groups by supporting the electoral winner 8 out of 9 election cycles.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The Sacred Vocation Program (SVP) (Amick B, Karff S., 2003) helps workers find meaning, spirituality, and see their job as a sacred vocation. The SVP is based on Participatory Action Research (PAR) (Minkler & Wallerstein, 1997; Parker & Wall, 1998). This study aims to evaluate the SVP implemented at the Baylor Healthcare System, Dallas-Fort Worth. ^ Methods: The study design is a qualitative design. We used data from study participants who have participated in focus groups. During these focus groups specific questions and probes regarding the effectiveness of the SVP have been asked. We analyzed the focus groups and derived themes. ^ Results: Results of this study demonstrate SVP helps graduates feel valued and important. The SVP has improved meaningful work for employees and improved a sense of belonging for participants. The program has also increased participant spirituality. The coping techniques developed during a SVP class helps participants deal with stressful situations. The SVP faces challenges of implementation fidelity, poor communication, program viability in tough economic times and implementation of phase II. Another sustainability challenge for SVP is the perception of the program being a religious one versus a spiritual program. ^ Conclusion: Several aspects of the SVP work. The phase I of SVP is successful in improving meaningful work and a sense of belonging for participants. The coping techniques help participants deal with difficult work situations. The SVP can increase effectiveness through improvements in implementation fidelity, communication and leadership commitment. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE. To determine the effectiveness of active surveillance cultures and associated infection control practices on the incidence of methicillin resistant Staphylococcus aureus (MRSA) in the acute care setting. DESIGN. A historical analysis of existing clinical data utilizing an interrupted time series design. ^ SETTING AND PARTICIPANTS. Patients admitted to a 260-bed tertiary care facility in Houston, TX between January 2005 through December 2010. ^ INTERVENTION. Infection control practices, including enhanced barrier precautions, compulsive hand hygiene, disinfection and environmental cleaning, and executive ownership and education, were simultaneously introduced during a 5-month intervention implementation period culminating with the implementation of active surveillance screening. Beginning June 2007, all high risk patients were cultured for MRSA nasal carriage within 48 hours of admission. Segmented Poisson regression was used to test the significance of the difference in incidence of healthcare-associated MRSA during the 29-month pre-intervention period compared to the 43-month post-intervention period. ^ RESULTS. A total of 9,957 of 11,095 high-risk patients (89.7%) were screened for MRSA carriage during the intervention period. Active surveillance cultures identified 1,330 MRSA-positive patients (13.4%) contributing to an admission prevalence of 17.5% in high-risk patients. The mean rate of healthcare-associated MRSA infection and colonization decreased from 1.1 per 1,000 patient-days in the pre-intervention period to 0.36 per 1,000 patient-days in the post-intervention period (P<0.001). The effect of the intervention in association with the percentage of S. aureus isolates susceptible to oxicillin were shown to be statistically significantly associated with the incidence of MRSA infection and colonization (IRR = 0.50, 95% CI = 0.31-0.80 and IRR = 0.004, 95% CI = 0.00003-0.40, respectively). ^ CONCLUSIONS. It can be concluded that aggressively targeting patients at high risk for colonization of MRSA with active surveillance cultures and associated infection control practices as part of a multifaceted, hospital-wide intervention is effective in reducing the incidence of healthcare-associated MRSA.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clinical Research Data Quality Literature Review and Pooled Analysis We present a literature review and secondary analysis of data accuracy in clinical research and related secondary data uses. A total of 93 papers meeting our inclusion criteria were categorized according to the data processing methods. Quantitative data accuracy information was abstracted from the articles and pooled. Our analysis demonstrates that the accuracy associated with data processing methods varies widely, with error rates ranging from 2 errors per 10,000 files to 5019 errors per 10,000 fields. Medical record abstraction was associated with the highest error rates (70–5019 errors per 10,000 fields). Data entered and processed at healthcare facilities had comparable error rates to data processed at central data processing centers. Error rates for data processed with single entry in the presence of on-screen checks were comparable to double entered data. While data processing and cleaning methods may explain a significant amount of the variability in data accuracy, additional factors not resolvable here likely exist. Defining Data Quality for Clinical Research: A Concept Analysis Despite notable previous attempts by experts to define data quality, the concept remains ambiguous and subject to the vagaries of natural language. This current lack of clarity continues to hamper research related to data quality issues. We present a formal concept analysis of data quality, which builds on and synthesizes previously published work. We further posit that discipline-level specificity may be required to achieve the desired definitional clarity. To this end, we combine work from the clinical research domain with findings from the general data quality literature to produce a discipline-specific definition and operationalization for data quality in clinical research. While the results are helpful to clinical research, the methodology of concept analysis may be useful in other fields to clarify data quality attributes and to achieve operational definitions. Medical Record Abstractor’s Perceptions of Factors Impacting the Accuracy of Abstracted Data Medical record abstraction (MRA) is known to be a significant source of data errors in secondary data uses. Factors impacting the accuracy of abstracted data are not reported consistently in the literature. Two Delphi processes were conducted with experienced medical record abstractors to assess abstractor’s perceptions about the factors. The Delphi process identified 9 factors that were not found in the literature, and differed with the literature by 5 factors in the top 25%. The Delphi results refuted seven factors reported in the literature as impacting the quality of abstracted data. The results provide insight into and indicate content validity of a significant number of the factors reported in the literature. Further, the results indicate general consistency between the perceptions of clinical research medical record abstractors and registry and quality improvement abstractors. Distributed Cognition Artifacts on Clinical Research Data Collection Forms Medical record abstraction, a primary mode of data collection in secondary data use, is associated with high error rates. Distributed cognition in medical record abstraction has not been studied as a possible explanation for abstraction errors. We employed the theory of distributed representation and representational analysis to systematically evaluate cognitive demands in medical record abstraction and the extent of external cognitive support employed in a sample of clinical research data collection forms. We show that the cognitive load required for abstraction in 61% of the sampled data elements was high, exceedingly so in 9%. Further, the data collection forms did not support external cognition for the most complex data elements. High working memory demands are a possible explanation for the association of data errors with data elements requiring abstractor interpretation, comparison, mapping or calculation. The representational analysis used here can be used to identify data elements with high cognitive demands.