931 resultados para Topological data analysis
Resumo:
Exposimeters are increasingly applied in bioelectromagnetic research to determine personal radiofrequency electromagnetic field (RF-EMF) exposure. The main advantages of exposimeter measurements are their convenient handling for study participants and the large amount of personal exposure data, which can be obtained for several RF-EMF sources. However, the large proportion of measurements below the detection limit is a challenge for data analysis. With the robust ROS (regression on order statistics) method, summary statistics can be calculated by fitting an assumed distribution to the observed data. We used a preliminary sample of 109 weekly exposimeter measurements from the QUALIFEX study to compare summary statistics computed by robust ROS with a naïve approach, where values below the detection limit were replaced by the value of the detection limit. For the total RF-EMF exposure, differences between the naïve approach and the robust ROS were moderate for the 90th percentile and the arithmetic mean. However, exposure contributions from minor RF-EMF sources were considerably overestimated with the naïve approach. This results in an underestimation of the exposure range in the population, which may bias the evaluation of potential exposure-response associations. We conclude from our analyses that summary statistics of exposimeter data calculated by robust ROS are more reliable and more informative than estimates based on a naïve approach. Nevertheless, estimates of source-specific medians or even lower percentiles depend on the assumed data distribution and should be considered with caution.
Resumo:
Turrialba is one of the largest and most active stratovolcanoes in the Central Cordillera of Costa Rica and an excellent target for validation of satellite data using ground based measurements due to its high elevation, relative ease of access, and persistent elevated SO2 degassing. The Ozone Monitoring Instrument (OMI) aboard the Aura satellite makes daily global observations of atmospheric trace gases and it is used in this investigation to obtain volcanic SO2 retrievals in the Turrialba volcanic plume. We present and evaluate the relative accuracy of two OMI SO2 data analysis procedures, the automatic Band Residual Index (BRI) technique and the manual Normalized Cloud-mass (NCM) method. We find a linear correlation and good quantitative agreement between SO2 burdens derived from the BRI and NCM techniques, with an improved correlation when wet season data are excluded. We also present the first comparisons between volcanic SO2 emission rates obtained from ground-based mini-DOAS measurements at Turrialba and three new OMI SO2 data analysis techniques: the MODIS smoke estimation, OMI SO2 lifetime, and OMI SO2 transect techniques. A robust validation of OMI SO2 retrievals was made, with both qualitative and quantitative agreements under specific atmospheric conditions, proving the utility of satellite measurements for estimating accurate SO2 emission rates and monitoring passively degassing volcanoes.
DIMENSION REDUCTION FOR POWER SYSTEM MODELING USING PCA METHODS CONSIDERING INCOMPLETE DATA READINGS
Resumo:
Principal Component Analysis (PCA) is a popular method for dimension reduction that can be used in many fields including data compression, image processing, exploratory data analysis, etc. However, traditional PCA method has several drawbacks, since the traditional PCA method is not efficient for dealing with high dimensional data and cannot be effectively applied to compute accurate enough principal components when handling relatively large portion of missing data. In this report, we propose to use EM-PCA method for dimension reduction of power system measurement with missing data, and provide a comparative study of traditional PCA and EM-PCA methods. Our extensive experimental results show that EM-PCA method is more effective and more accurate for dimension reduction of power system measurement data than traditional PCA method when dealing with large portion of missing data set.
Resumo:
Dr. Rossi discusses the common errors that are made when fitting statistical models to data. Focuses on the planning, data analysis, and interpretation phases of a statistical analysis, and highlights the errors that are commonly made by researchers of these phases. The implications of these commonly made errors are discussed along with a discussion of the methods that can be used to prevent these errors from occurring. A prescription for carrying out a correct statistical analysis will be discussed.
Resumo:
Cluster randomized trials (CRTs) use as the unit of randomization clusters, which are usually defined as a collection of individuals sharing some common characteristics. Common examples of clusters include entire dental practices, hospitals, schools, school classes, villages, and towns. Additionally, several measurements (repeated measurements) taken on the same individual at different time points are also considered to be clusters. In dentistry, CRTs are applicable as patients may be treated as clusters containing several individual teeth. CRTs require certain methodological procedures during sample calculation, randomization, data analysis, and reporting, which are often ignored in dental research publications. In general, due to similarity of the observations within clusters, each individual within a cluster provides less information compared with an individual in a non-clustered trial. Therefore, clustered designs require larger sample sizes compared with non-clustered randomized designs, and special statistical analyses that account for the fact that observations within clusters are correlated. It is the purpose of this article to highlight with relevant examples the important methodological characteristics of cluster randomized designs as they may be applied in orthodontics and to explain the problems that may arise if clustered observations are erroneously treated and analysed as independent (non-clustered).
Resumo:
This paper presents an overview of the Mobile Data Challenge (MDC), a large-scale research initiative aimed at generating innovations around smartphone-based research, as well as community-based evaluation of mobile data analysis methodologies. First, we review the Lausanne Data Collection Campaign (LDCC), an initiative to collect unique longitudinal smartphone dataset for the MDC. Then, we introduce the Open and Dedicated Tracks of the MDC, describe the specific datasets used in each of them, discuss the key design and implementation aspects introduced in order to generate privacy-preserving and scientifically relevant mobile data resources for wider use by the research community, and summarize the main research trends found among the 100+ challenge submissions. We finalize by discussing the main lessons learned from the participation of several hundred researchers worldwide in the MDC Tracks.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
OBJECTIVES To identify the timing of significant arch dimensional increases during orthodontic alignment involving round and rectangular nickel-titanium (NiTi) wires and rectangular stainless steel (SS). A secondary aim was to compare the timing of changes occurring with conventional and self-ligating fixed appliance systems. METHODS In this non-primary publication, additional data from a multicenter randomised trial initially involving 96 patients, aged 16 years and above, were analysed. The main pre-specified outcome measures were the magnitude and timing of maxillary intercanine, interpremolar, and intermolar dimensions. Each participant underwent alignment with a standard Damon (Ormco, Orange, CA) wire sequence for a minimum of 34 weeks. Blinding of clinicians and patients was not possible; however, outcome assessors and data analysts were kept blind to the appliance type during data analysis. RESULTS Complete data were obtained from 71 subjects. Significant arch dimensional changes were observed relatively early in treatment. In particular, changes in maxillary inter-first and second premolar dimensions occurred after alignment with an 0.014in. NiTi wire (P<0.05). No statistical differences in transverse dimensions were found between rectangular NiTi and working SS wires for each transverse dimension (P>0.05). Bracket type had no significant effect on the timing of the transverse dimensional changes. CONCLUSIONS Arch dimensional changes were found to occur relatively early in treatment, irrespective of the appliance type. Nickel-titanium wires may have a more profound effect on transverse dimensions than previously believed. CLINICAL SIGNIFICANCE On the basis of this research orthodontic expansion may occur relatively early in treatment. Nickel-titanium wires may have a more profound effect on transverse dimensions than previously believed.
Resumo:
PRINCIPALS Over a million people worldwide die each year from road traffic injuries and more than 10 million sustain permanent disabilities. Many of these victims are pedestrians. The present retrospective study analyzes the severity and mortality of injuries suffered by adult pedestrians, depending on whether they used a zebra crosswalk. METHODS Our retrospective data analysis covered adult patients admitted to our emergency department (ED) between 1 January 2000 and 31 December 2012 after being hit by a vehicle while crossing the road as a pedestrian. Patients were identified by using a string term. Medical, police and ambulance records were reviewed for data extraction. RESULTS A total of 347 patients were eligible for study inclusion. Two hundred and three (203; 58.5%) patients were on a zebra crosswalk and 144 (41.5%) were not. The mean ISS (injury Severity Score) was 12.1 (SD 14.7, range 1-75). The vehicles were faster in non-zebra crosswalk accidents (47.7 km/n, versus 41.4 km/h, p<0.027). The mean ISS score was higher in patients with non-zebra crosswalk accidents; 14.4 (SD 16.5, range 1-75) versus 10.5 (SD13.14, range 1-75) (p<0.019). Zebra crosswalk accidents were associated with less risk of severe injury (OR 0.61, 95% CI 0.38-0.98, p<0.042). Accidents involving a truck were associated with increased risk of severe injury (OR 3.53, 95%CI 1.21-10.26, p<0.02). CONCLUSION Accidents on zebra crosswalks are more common than those not on zebra crosswalks. The injury severity of non-zebra crosswalk accidents is significantly higher than in patients with zebra crosswalk accidents. Accidents involving large vehicles are associated with increased risk of severe injury. Further prospective studies are needed, with detailed assessment of motor vehicle types and speed.
Resumo:
Noble gas analysis in early solar system materials, which can provide valuable information about early solar system processes and timescales, are very challenging because of extremely low noble gas concentrations (ppt). We therefore developed a new compact sized (33 cm length, 7.2cm diameter, 1.3 L internal volume) Time-of-Flight (TOF) noble gas mass spectrometer for high sensitivity. We call it as Edel Gas Time-of-flight (EGT) mass spectrometer. The instrument uses electron impact ionization coupled to an ion trap, which allows us to ionize and measure all noble gas isotopes. Using a reflectron set-up improves the mass resolution. In addition, the reflectron set-up also enables some extra focusing. The detection is via MCPs and the signals are processed either via ADC or TDC systems. The objective of this work is to understand the newly developed Time-Of-Flight (TOF) mass spectrometer for noble gas analysis in presolar grains of the meteorites. Chapter 1 briefly introduces the basic idea and importance of the instrument. The physics relevant to time-of-flight mass spectrometry technique is discussed in the Chapter 2 and Chapter 3 will present the oxidation technique of nanodiamonds of the presolar grains by using copper oxide. Chapter 4 will present the details about EGT data analysis software. Chapter 5 and Chapter 6 will explain the details about EGT design and operation. Finally, the performance results will be presented and discussed in the Chapter 7, and whole work is summarized in Chapter 8 and also outlook of the future work is given.
Resumo:
The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^
Resumo:
Introduction. The HIV/AIDS disease burden disproportionately affects minority populations, specifically African Americans. While sexual risk behaviors play a role in the observed HIV burden, other factors including gender, age, socioeconomics, and barriers to healthcare access may also be contributory. The goal of this study was to determine how far down the HIV/AIDS disease process people of different ethnicities first present for healthcare. The study specifically analyzed the differences in CD4 cell counts at the initial HIV-1 diagnosis with respect to ethnicity. The study also analyzed racial differences in HIV/AIDS risk factors. ^ Methods. This is a retrospective study using data from the Adult Spectrum of HIV Disease (ASD), collected by the City of Houston Department of Health. The ASD database contains information on newly reported HIV cases in the Harris County District Hospitals between 1989 and 2000. Each patient had an initial and a follow-up report. The extracted variables of interest from the ASD data set were CD4 counts at the initial HIV diagnosis, race, gender, age at HIV diagnosis and behavioral risk factors. One-way ANOVA was used to examine differences in baseline CD4 counts at HIV diagnosis between racial/ethnic groups. Chi square was used to analyze racial differences in risk factors. ^ Results. The analyzed study sample was 4767. The study population was 47% Black, 37% White and 16% Hispanic [p<0.05]. The mean and median CD4 counts at diagnosis were 254 and 193 cells per ml, respectively. At the initial HIV diagnosis Blacks had the highest average CD4 counts (285), followed by Whites (233) and Hispanics (212) [p<0.001 ]. These statistical differences, however, were only observed with CD4 counts above 350 [p<0.001], even when adjusted for age at diagnosis and gender [p<0.05]. Looking at risk factors, Blacks were mostly affected by intravenous drug use (IVDU) and heterosexuality, whereas Whites and Hispanics were more affected by male homosexuality [ p<0.05]. ^ Conclusion. (1) There were statistical differences in CD4 counts with respect to ethnicity, but these differences only existed for CD4 counts above 350. These differences however do not appear to have clinical significance. Antithetically, Blacks had the highest CD4 counts followed by Whites and Hispanics. (2) 50% of this study group clinically had AIDS at their initial HIV diagnosis (median=193), irrespective of ethnicity. It was not clear from data analysis if these observations were due to failure of early HIV surveillance, HIV testing policies or healthcare access. More studies need to be done to address this question. (3) Homosexuality and bisexuality were the biggest risk factors for Whites and Hispanics, whereas for Blacks were mostly affected by heterosexuality and IVDU, implying a need for different public health intervention strategies for these racial groups. ^
Resumo:
The purpose of this comparative analysis of CHIP Perinatal policy (42 CFR § 457) was to provide a basis for understanding the variation in policy outputs across the twelve states that, as of June 2007, implemented the Unborn Child rule. This Department of Health and Human Services regulation expanded in 2002 the definition of “child” to include the period from conception to birth, allowing states to consider an unborn child a “targeted low-income child” and therefore eligible for SCHIP coverage. ^ Specific study aims were to (1) describe typologically the structural and contextual features of the twelve states that adopted a CHIP Perinatal policy; (2) describe and differentiate among the various designs of CHIP Perinatal policy implemented in the states; and (3) develop a conceptual model that links the structural and contextual features of the adopting states to differences in the forms the policy assumed, once it was implemented. ^ Secondary data were collected from publicly available information sources to describe characteristics of states’ political system, health system, economic system, sociodemographic context and implemented policy attributes. I posited that socio-demographic differences, political system differences and health system differences would directly account for the observed differences in policy output among the states. ^ Exploratory data analysis techniques, which included median polishing and multidimensional scaling, were employed to identify compelling patterns in the data. Scaled results across model components showed that economic system was most closely related to policy output, followed by health system. Political system and socio-demographic characteristics were shown to be weakly associated with policy output. Goodness-of-fit measures for MDS solutions implemented across states and model components, in one- and two-dimensions, were very good. ^ This comparative policy analysis of twelve states that adopted and implemented HHS Regulation 42 C.F.R. § 457 contributes to existing knowledge in three areas: CHIP Perinatal policy, public health policy and policy sciences. First, the framework allows for the identification of CHIP Perinatal program design possibilities and provides a basis for future studies that evaluate policy impact or performance. Second, studies of policy determinants are not well represented in the health policy literature. Thus, this study contributes to the development of the literature in public health policy. Finally, the conceptual framework for policy determinants developed in this study suggests new ways for policy makers and practitioners to frame policy arguments, encouraging policy change or reform. ^
Resumo:
As schools are pressured to perform on academics and standardized examinations, schools are reluctant to dedicate increased time to physical activity. After-school exercise and health programs may provide an opportunity to engage in more physical activity without taking time away from coursework during the day. The current study is a secondary data analysis of data from a randomized trial of a 10-week after-school program (six schools, n = 903) that implemented an exercise component based on the CATCH physical activity component and health modules based on the culturally-tailored Bienestar health education program. Outcome variables included BMI and aerobic capacity, health knowledge and healthy food intentions as assessed through path analysis techniques. Both the baseline model (χ2 (df = 8) = 16.90, p = .031; RMSEA = .035 (90% CI of .010–.058), NNFI = 0.983 and the CFI = 0.995) and the model incorporating intervention participation proved to be a good fit to the data (χ2 (df = 10) = 11.59, p = .314. RMSEA = .013 (90% CI of .010–.039); NNFI = 0.996 and CFI = 0.999). Experimental group participation was not predictive of changes in health knowledge, intentions to eat healthy foods or changes in Body Mass Index, but it was associated with increased aerobic capacity, β = .067, p < .05. School characteristics including SES and Language proficiency proved to be significantly associated with changes in knowledge and physical indicators. Further effects of school level variables on intervention outcomes are recommended so that tailored interventions can be developed aimed at the specific characteristics of each participating school. ^