5 resultados para Large scale graph processing
em DigitalCommons@The Texas Medical Center
Resumo:
A wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a "cosmopolitan" tagging approach to capture the genetic diversity across approximately 2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions.
Resumo:
BACKGROUND: Enterococcus faecalis has emerged as a major hospital pathogen. To explore its diversity, we sequenced E. faecalis strain OG1RF, which is commonly used for molecular manipulation and virulence studies. RESULTS: The 2,739,625 base pair chromosome of OG1RF was found to contain approximately 232 kilobases unique to this strain compared to V583, the only publicly available sequenced strain. Almost no mobile genetic elements were found in OG1RF. The 64 areas of divergence were classified into three categories. First, OG1RF carries 39 unique regions, including 2 CRISPR loci and a new WxL locus. Second, we found nine replacements where a sequence specific to V583 was substituted by a sequence specific to OG1RF. For example, the iol operon of OG1RF replaces a possible prophage and the vanB transposon in V583. Finally, we found 16 regions that were present in V583 but missing from OG1RF, including the proposed pathogenicity island, several probable prophages, and the cpsCDEFGHIJK capsular polysaccharide operon. OG1RF was more rapidly but less frequently lethal than V583 in the mouse peritonitis model and considerably outcompeted V583 in a murine model of urinary tract infections. CONCLUSION: E. faecalis OG1RF carries a number of unique loci compared to V583, but the almost complete lack of mobile genetic elements demonstrates that this is not a defining feature of the species. Additionally, OG1RF's effects in experimental models suggest that mediators of virulence may be diverse between different E. faecalis strains and that virulence is not dependent on the presence of mobile genetic elements.
Cerebellar mechanisms for motor learning: Testing predictions from a large-scale computer simulation
Resumo:
The cerebellum is the major brain structure that contributes to our ability to improve movements through learning and experience. We have combined computer simulations with behavioral and lesion studies to investigate how modification of synaptic strength at two different sites within the cerebellum contributes to a simple form of motor learning—Pavlovian conditioning of the eyelid response. These studies are based on the wealth of knowledge about the intrinsic circuitry and physiology of the cerebellum and the straightforward manner in which this circuitry is engaged during eyelid conditioning. Thus, our simulations are constrained by the well-characterized synaptic organization of the cerebellum and further, the activity of cerebellar inputs during simulated eyelid conditioning is based on existing recording data. These simulations have allowed us to make two important predictions regarding the mechanisms underlying cerebellar function, which we have tested and confirmed with behavioral studies. The first prediction describes the mechanisms by which one of the sites of synaptic modification, the granule to Purkinje cell synapses (gr → Pkj) of the cerebellar cortex, could generate two time-dependent properties of eyelid conditioning—response timing and the ISI function. An empirical test of this prediction using small, electrolytic lesions of the cerebellar cortex revealed the pattern of results predicted by the simulations. The second prediction made by the simulations is that modification of synaptic strength at the other site of plasticity, the mossy fiber to deep nuclei synapses (mf → nuc), is under the control of Purkinje cell activity. The analysis predicts that this property should confer mf → nuc synapses with resistance to extinction. Thus, while extinction processes erase plasticity at the first site, residual plasticity at mf → nuc synapses remains. The residual plasticity at the mf → nuc site confers the cerebellum with the capability for rapid relearning long after the learned behavior has been extinguished. We confirmed this prediction using a lesion technique that reversibly disconnected the cerebellar cortex at various stages during extinction and reacquisition of eyelid responses. The results of these studies represent significant progress toward a complete understanding of how the cerebellum contributes to motor learning. ^
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
A review of literature related to appointment-keeping served as the basis for the development of an organizational paradigm for the study of appointment-keeping in the Beta-blocker Heart Attack Trial (BHAT). Features of the organizational environment, demographic characteristics of BHAT enrollees, organizational structure and processes and previous organizational performance variables were measured so as to provide exploratory information relating to the appointment-keeping behavior of 3,837 participants enrolled at thirty-two Clinical Centers. Results suggest that the social context of individual behavior is an important consideration for the understanding of patient compliance. In particular, the degree to which previous organizational performance--as measured by obtaining recruitment goals--and the ability to utilize resources had particularly strong bivariate associations with appointment-keeping. Implications for future theory development, research and practical implications were provided as was a suggestion for the development of multidisciplinary research efforts conducted within the context of Centers for the study and application of adherence behaviors. ^