961 resultados para Data utility
Resumo:
There is a growing body of literature that provides evidence for the efficacy of positive youth development programs in general and preliminary empirical support for the efficacy of the Changing Lives Program (CLP) in particular. This dissertation sought to extend previous efforts to develop and preliminarily examine the Transformative Goal Attainment Scale (TGAS) as a measure of participant empowerment in the promotion of positive development. Consistent with recent advances in the use of qualitative research methods, this dissertation sought to further investigate the utility of Relational Data Analysis (RDA) for providing categorizations of qualitative open-ended response data. In particular, a qualitative index of Transformative Goals, TG, was developed to complement the previously developed quantitative index of Transformative Goal Attainment (TGA), and RDA procedures for calculating reliability and content validity were refined. Second, as a Stage I pilot/feasibility study this study preliminarily examined the potentially mediating role of empowerment, as indexed by the TGAS, in the promotion of positive development. ^ Fifty-seven participants took part in this study, forty CLP intervention participants and seventeen control condition participants. All 57 participants were administered the study's measures just prior to and just following the fall 2003 semester. This study thus used a short-term longitudinal quasi-experimental research design with a comparison control group. ^ RDA procedures were refined and applied to the categorization of open-ended response data regarding participants' transformative goals (TG) and future possible selves (PSQ-QE). These analyses revealed relatively strong, indirect evidence for the construct validity of the categories as well as their theoretically meaningful structural organization, thereby providing sufficient support for the utility of RDA procedures in the categorization of qualitative open-ended response data. ^ In addition, transformative goals (TG) and future possible selves (PSQ-QE), and the quantitative index of perceived goal attainment (TGA) were evaluated as potential mediators of positive development by testing their relationships to other indices of positive intervention outcome within a four-step method involving both analysis of variance (ANOVA and RMANOVAs) and regression analysis. Though more limited in scope than the efforts at the development and refinement of the measures of these mediators, the results were also promising. ^
Resumo:
Research has identified a number of putative risk factors that places adolescents at incrementally higher risk for involvement in alcohol and other drug (AOD) use and sexual risk behaviors (SRBs). Such factors include personality characteristics such as sensation-seeking, cognitive factors such as positive expectancies and inhibition conflict as well as peer norm processes. The current study was guided by a conceptual perspective that support the notion that an integrative framework that includes multi-level factors has significant explanatory value for understanding processes associated with the co-occurrence of AOD use and sexual risk behavior outcomes. This study evaluated simultaneously the mediating role of AOD-sex related expectancies and inhibition conflict on antecedents of AOD use and SRBs including sexual sensation-seeking and peer norms for condom use. The sample was drawn from the Enhancing My Personal Options While Evaluating Risk (EMPOWER: Jonathan Tubman, PI), data set (N = 396; aged 12-18 years). Measures used in the study included Sexual Sensation-Seeking Scale, Inhibition Conflict for Condom Use, Risky Sex Scale. All relevant measures had well-documented psychometric properties. A global assessment of alcohol, drug use and sexual risk behaviors was used. Results demonstrated that AOD-sex related expectancies mediated the influence of sexual sensation-seeking on the co-occurrence of alcohol and other drug use and sexual risk behaviors. The evaluation of the integrative model also revealed that sexual sensation-seeking was positively associated with peer norms for condom use. Also, peer norms predicted inhibition conflict among this sample of multi-problem youth. This dissertation research identified mechanisms of risk and protection associated with the co-occurrence of AOD use and SRBs among a multi-problem sample of adolescents receiving treatment for alcohol or drug use and related problems. This study is informative for adolescent-serving programs that address those individual and contextual characteristics that enhance treatment efficacy and effectiveness among adolescents receiving substance use and related problems services.
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
Over 50% of the world's population live within 3. km of rivers and lakes highlighting the on-going importance of freshwater resources to human health and societal well-being. Whilst covering c. 3.5% of the Earth's non-glaciated land mass, trends in the environmental quality of the world's standing waters (natural lakes and reservoirs) are poorly understood, at least in comparison with rivers, and so evaluation of their current condition and sensitivity to change are global priorities. Here it is argued that a geospatial approach harnessing existing global datasets, along with new generation remote sensing products, offers the basis to characterise trajectories of change in lake properties e.g., water quality, physical structure, hydrological regime and ecological behaviour. This approach furthermore provides the evidence base to understand the relative importance of climatic forcing and/or changing catchment processes, e.g. land cover and soil moisture data, which coupled with climate data provide the basis to model regional water balance and runoff estimates over time. Using examples derived primarily from the Danube Basin but also other parts of the World, we demonstrate the power of the approach and its utility to assess the sensitivity of lake systems to environmental change, and hence better manage these key resources in the future.
Resumo:
Tumor genomic instability and selective treatment pressures result in clonal disease evolution; molecular stratification for molecularly targeted drug administration requires repeated access to tumor DNA. We hypothesized that circulating plasma DNA (cpDNA) in advanced cancer patients is largely derived from tumor, has prognostic utility, and can be utilized for multiplex tumor mutation sequencing when repeat biopsy is not feasible. We utilized the Sequenom MassArray System and OncoCarta panel for somatic mutation profiling. Matched samples, acquired from the same patient but at different time points were evaluated; these comprised formalin-fixed paraffin-embedded (FFPE) archival tumor tissue (primary and/or metastatic) and cpDNA. The feasibility, sensitivity, and specificity of this high-throughput, multiplex mutation detection approach was tested utilizing specimens acquired from 105 patients with solid tumors referred for participation in Phase I trials of molecularly targeted drugs. The median cpDNA concentration was 17 ng/ml (range: 0.5-1600); this was 3-fold higher than in healthy volunteers. Moreover, higher cpDNA concentrations associated with worse overall survival; there was an overall survival (OS) hazard ratio of 2.4 (95% CI 1.4, 4.2) for each 10-fold increase in cpDNA concentration and in multivariate analyses, cpDNA concentration, albumin, and performance status remained independent predictors of OS. These data suggest that plasma DNA in these cancer patients is largely derived from tumor. We also observed high detection concordance for critical 'hot-spot' mutations (KRAS, BRAF, PIK3CA) in matched cpDNA and archival tumor tissue, and important differences between archival tumor and cpDNA. This multiplex sequencing assay can be utilized to detect somatic mutations from plasma in advanced cancer patients, when safe repeat tumor biopsy is not feasible and genomic analysis of archival tumor is deemed insufficient. Overall, circulating nucleic acid biomarker studies have clinically important multi-purpose utility in advanced cancer patients and further studies to pursue their incorporation into the standard of care are warranted.
Resumo:
Background: The impact of cancer upon children, teenagers and young people can be profound. Research has been undertaken to explore the impacts upon children, teenagers and young people with cancer, but little is known about how researchers can ‘best’ engage with this group to explore their experiences. This review paper provides an overview of the utility of data collection methods employed when undertaking research with children, teenagers and young people. A systematic review of relevant databases was undertaken utilising the search terms ‘young people’, ‘young adult’, ‘adolescent’ and ‘data collection methods’. The full-text of the papers that were deemed eligible from the title and abstract were accessed and following discussion within the research team, thirty papers were included. Findings: Due to the heterogeneity in terms of the scope of the papers identified the following data collections methods were included in the results section. Three of the papers identified provided an overview of data collection methods utilised with this population and the remaining twenty seven papers covered the following data collection methods: Digital technologies; art based research; comparing the use of ‘paper and pencil’ research with web-based technologies, the use of games; the use of a specific communication tool; questionnaires and interviews; focus groups and telephone interviews/questionnaires. The strengths and limitations of the range of data collection methods included are discussed drawing upon such issues as of the appropriateness of particular methods for particular age groups, or the most appropriate method to employ when exploring a particularly sensitive topic area. Conclusions: There are a number of data collection methods utilised to undertaken research with children, teenagers and young adults. This review provides a summary of the current available evidence and an overview of the strengths and limitations of data collection methods employed.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Cancer and cardio-vascular diseases are the leading causes of death world-wide. Caused by systemic genetic and molecular disruptions in cells, these disorders are the manifestation of profound disturbance of normal cellular homeostasis. People suffering or at high risk for these disorders need early diagnosis and personalized therapeutic intervention. Successful implementation of such clinical measures can significantly improve global health. However, development of effective therapies is hindered by the challenges in identifying genetic and molecular determinants of the onset of diseases; and in cases where therapies already exist, the main challenge is to identify molecular determinants that drive resistance to the therapies. Due to the progress in sequencing technologies, the access to a large genome-wide biological data is now extended far beyond few experimental labs to the global research community. The unprecedented availability of the data has revolutionized the capabilities of computational researchers, enabling them to collaboratively address the long standing problems from many different perspectives. Likewise, this thesis tackles the two main public health related challenges using data driven approaches. Numerous association studies have been proposed to identify genomic variants that determine disease. However, their clinical utility remains limited due to their inability to distinguish causal variants from associated variants. In the presented thesis, we first propose a simple scheme that improves association studies in supervised fashion and has shown its applicability in identifying genomic regulatory variants associated with hypertension. Next, we propose a coupled Bayesian regression approach -- eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combinations of regulatory genomic variants that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance in samples, but also predicts gene expression more accurately than other methods. We demonstrate that eQTeL accurately detects causal regulatory SNPs by simulation, particularly those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. The challenge of identifying molecular determinants of cancer resistance so far could only be dealt with labor intensive and costly experimental studies, and in case of experimental drugs such studies are infeasible. Here we take a fundamentally different data driven approach to understand the evolving landscape of emerging resistance. We introduce a novel class of genetic interactions termed synthetic rescues (SR) in cancer, which denotes a functional interaction between two genes where a change in the activity of one vulnerable gene (which may be a target of a cancer drug) is lethal, but subsequently altered activity of its partner rescuer gene restores cell viability. Next we describe a comprehensive computational framework --termed INCISOR-- for identifying SR underlying cancer resistance. Applying INCISOR to mine The Cancer Genome Atlas (TCGA), a large collection of cancer patient data, we identified the first pan-cancer SR networks, composed of interactions common to many cancer types. We experimentally test and validate a subset of these interactions involving the master regulator gene mTOR. We find that rescuer genes become increasingly activated as breast cancer progresses, testifying to pervasive ongoing rescue processes. We show that SRs can be utilized to successfully predict patients' survival and response to the majority of current cancer drugs, and importantly, for predicting the emergence of drug resistance from the initial tumor biopsy. Our analysis suggests a potential new strategy for enhancing the effectiveness of existing cancer therapies by targeting their rescuer genes to counteract resistance. The thesis provides statistical frameworks that can harness ever increasing high throughput genomic data to address challenges in determining the molecular underpinnings of hypertension, cardiovascular disease and cancer resistance. We discover novel molecular mechanistic insights that will advance the progress in early disease prevention and personalized therapeutics. Our analyses sheds light on the fundamental biological understanding of gene regulation and interaction, and opens up exciting avenues of translational applications in risk prediction and therapeutics.
Resumo:
Personal information is increasingly gathered and used for providing services tailored to user preferences, but the datasets used to provide such functionality can represent serious privacy threats if not appropriately protected. Work in privacy-preserving data publishing targeted privacy guarantees that protect against record re-identification, by making records indistinguishable, or sensitive attribute value disclosure, by introducing diversity or noise in the sensitive values. However, most approaches fail in the high-dimensional case, and the ones that don’t introduce a utility cost incompatible with tailored recommendation scenarios. This paper aims at a sensible trade-off between privacy and the benefits of tailored recommendations, in the context of privacy-preserving data publishing. We empirically demonstrate that significant privacy improvements can be achieved at a utility cost compatible with tailored recommendation scenarios, using a simple partition-based sanitization method.
Resumo:
The main purpose of this study is to present an alternative benchmarking approach that can be used by national regulators of utilities. It is widely known that the lack of sizeable data sets limits the choice of the benchmarking method and the specification of the model to set price controls within incentive-based regulation. Ill-posed frontier models are the problem that some national regulators have been facing. Maximum entropy estimators are useful in the estimation of such ill-posed models, in particular in models exhibiting small sample sizes, collinearity and non-normal errors, as well as in models where the number of parameters to be estimated exceeds the number of observations available. The empirical study involves a sample data used by the Portuguese regulator of the electricity sector to set the parameters for the electricity distribution companies in the regulatory period of 2012-2014. DEA and maximum entropy methods are applied and the efficiency results are compared.
Resumo:
A ecografia é o exame de primeira linha na identificação e caraterização de tumores anexiais. Foram descritos diversos métodos de diagnóstico diferencial incluindo a avaliação subjetiva do observador, índices descritivos simples e índices matematicamente desenvolvidos como modelos de regressão logística, continuando a avaliação subjectiva por examinador diferenciado a ser o melhor método de discriminação entre tumores malignos e benignos. No entanto, devido à subjectividade inerente a esta avaliação tornouse necessário estabelecer uma nomenclatura padronizada e uma classificação que facilitasse a comunicação de resultados e respectivas recomendações de vigilância. O objetivo deste artigo é resumir e comparar diferentes métodos de avaliação e classificação de tumores anexiais, nomeadamente os modelos do grupo International Ovary Tumor Analysis (IOTA) e a classificação Gynecologic Imaging Report and Data System (GI-RADS), em termos de desempenho diagnóstico e utilidade na prática clínica.
Dinoflagellate Genomic Organization and Phylogenetic Marker Discovery Utilizing Deep Sequencing Data
Resumo:
Dinoflagellates possess large genomes in which most genes are present in many copies. This has made studies of their genomic organization and phylogenetics challenging. Recent advances in sequencing technology have made deep sequencing of dinoflagellate transcriptomes feasible. This dissertation investigates the genomic organization of dinoflagellates to better understand the challenges of assembling dinoflagellate transcriptomic and genomic data from short read sequencing methods, and develops new techniques that utilize deep sequencing data to identify orthologous genes across a diverse set of taxa. To better understand the genomic organization of dinoflagellates, a genomic cosmid clone of the tandemly repeated gene Alchohol Dehydrogenase (AHD) was sequenced and analyzed. The organization of this clone was found to be counter to prevailing hypotheses of genomic organization in dinoflagellates. Further, a new non-canonical splicing motif was described that could greatly improve the automated modeling and annotation of genomic data. A custom phylogenetic marker discovery pipeline, incorporating methods that leverage the statistical power of large data sets was written. A case study on Stramenopiles was undertaken to test the utility in resolving relationships between known groups as well as the phylogenetic affinity of seven unknown taxa. The pipeline generated a set of 373 genes useful as phylogenetic markers that successfully resolved relationships among the major groups of Stramenopiles, and placed all unknown taxa on the tree with strong bootstrap support. This pipeline was then used to discover 668 genes useful as phylogenetic markers in dinoflagellates. Phylogenetic analysis of 58 dinoflagellates, using this set of markers, produced a phylogeny with good support of all branches. The Suessiales were found to be sister to the Peridinales. The Prorocentrales formed a monophyletic group with the Dinophysiales that was sister to the Gonyaulacales. The Gymnodinales was found to be paraphyletic, forming three monophyletic groups. While this pipeline was used to find phylogenetic markers, it will likely also be useful for finding orthologs of interest for other purposes, for the discovery of horizontally transferred genes, and for the separation of sequences in metagenomic data sets.
Resumo:
Electoral researchers are so much accustomed to analyzing the choice of the single most preferred party as the left-hand side variable of their models of electoral behavior that they often ignore revealed preference data. Drawing on random utility theory, their models predict electoral behavior at the extensive margin of choice. Since the seminal work of Luce and others on individual choice behavior, however, many social science disciplines (consumer research, labor market research, travel demand, etc.) have extended their inventory of observed preference data with, for instance, multiple paired comparisons, complete or incomplete rankings, and multiple ratings. Eliciting (voter) preferences using these procedures and applying appropriate choice models is known to considerably increase the efficiency of estimates of causal factors in models of (electoral) behavior. In this paper, we demonstrate the efficiency gain when adding additional preference information to first preferences, up to full ranking data. We do so for multi-party systems of different sizes. We use simulation studies as well as empirical data from the 1972 German election study. Comparing the practical considerations for using ranking and single preference data results in suggestions for choice of measurement instruments in different multi-candidate and multi-party settings.
Resumo:
Agroforestry has large potential for carbon (C) sequestration while providing many economical, social, and ecological benefits via its diversified products. Airborne lidar is considered as the most accurate technology for mapping aboveground biomass (AGB) over landscape levels. However, little research in the past has been done to study AGB of agroforestry systems using airborne lidar data. Focusing on an agroforestry system in the Brazilian Amazon, this study first predicted plot-level AGB using fixed-effects regression models that assumed the regression coefficients to be constants. The model prediction errors were then analyzed from the perspectives of tree DBH (diameter at breast height)?height relationships and plot-level wood density, which suggested the need for stratifying agroforestry fields to improve plot-level AGB modeling. We separated teak plantations from other agroforestry types and predicted AGB using mixed-effects models that can incorporate the variation of AGB-height relationship across agroforestry types. We found that, at the plot scale, mixed-effects models led to better model prediction performance (based on leave-one-out cross-validation) than the fixed-effects models, with the coefficient of determination (R2) increasing from 0.38 to 0.64. At the landscape level, the difference between AGB densities from the two types of models was ~10% on average and up to ~30% at the pixel level. This study suggested the importance of stratification based on tree AGB allometry and the utility of mixed-effects models in modeling and mapping AGB of agroforestry systems.
Resumo:
Several factors have recently converged, elevating the need for highly parallel diagnostic platforms that have the ability to detect many known, novel, and emerging pathogenic agents simultaneously. Panviral DNA microarrays represent the most robust approach for massively parallel viral surveillance and detection. The Virochip is a panviral DNA microarray that is capable of detecting all known viruses, as well as novel viruses related to known viral families, in a single assay and has been used to successfully identify known and novel viral agents in clinical human specimens. However, the usefulness and the sensitivity of the Virochip platform have not been tested on a set of clinical veterinary specimens with the high degree of genetic variance that is frequently observed with swine virus field isolates. In this report, we investigate the utility and sensitivity of the Virochip to positively detect swine viruses in both cell culture-derived samples and clinical swine samples. The Virochip successfully detected porcine reproductive and respiratory syndrome virus (PRRSV) in serum containing 6.10 × 10(2) viral copies per microliter and influenza A virus in lung lavage fluid containing 2.08 × 10(6) viral copies per microliter. The Virochip also successfully detected porcine circovirus type 2 (PCV2) in serum containing 2.50 × 10(8) viral copies per microliter and porcine respiratory coronavirus (PRCV) in turbinate tissue homogenate. Collectively, the data in this report demonstrate that the Virochip can successfully detect pathogenic viruses frequently found in swine in a variety of solid and liquid specimens, such as turbinate tissue homogenate and lung lavage fluid, as well as antemortem samples, such as serum.