Biblioteca Digital

970 resultados para Datasets

A Kallikrein 15 (KLK15) single nucleotide polymorphism located close to a novel exon shows evidence of association with poor ovarian cancer survival

Relevância:

10.00% 10.00%

Publicador:

Resumo:

KLK15 over-expression is reported to be a significant predictor of reduced progression-free survival and overall survival in ovarian cancer. Our aim was to analyse the KLK15 gene for putative functional single nucleotide polymorphisms (SNPs) and assess the association of these and KLK15 HapMap tag SNPs with ovarian cancer survival. Results In silico analysis was performed to identify KLK15 regulatory elements and to classify potentially functional SNPs in these regions. After SNP validation and identification by DNA sequencing of ovarian cancer cell lines and aggressive ovarian cancer patients, 9 SNPs were shortlisted and genotyped using the Sequenom iPLEX Mass Array platform in a cohort of Australian ovarian cancer patients (N = 319). In the Australian dataset we observed significantly worse survival for the KLK15 rs266851 SNP in a dominant model (Hazard Ratio (HR) 1.42, 95% CI 1.02-1.96). This association was observed in the same direction in two independent datasets, with a combined HR for the three studies of 1.16 (1.00-1.34). This SNP lies 15bp downstream of a novel exon and is predicted to be involved in mRNA splicing. The mutant allele is also predicted to abrogate an HSF-2 binding site. Conclusions We provide evidence of association for the SNP rs266851 with ovarian cancer survival. Our results provide the impetus for downstream functional assays and additional independent validation studies to assess the role of KLK15 regulatory SNPs and KLK15 isoforms with alternative intracellular functional roles in ovarian cancer survival.

Validation of de-identified record linkage to ascertain hospital admissions in a cohort study

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Cohort studies can provide valuable evidence of cause and effect relationships but are subject to loss of participants over time, limiting the validity of findings. Computerised record linkage offers a passive and ongoing method of obtaining health outcomes from existing routinely collected data sources. However, the quality of record linkage is reliant upon the availability and accuracy of common identifying variables. We sought to develop and validate a method for linking a cohort study to a state-wide hospital admissions dataset with limited availability of unique identifying variables. Methods A sample of 2000 participants from a cohort study (n = 41 514) was linked to a state-wide hospitalisations dataset in Victoria, Australia using the national health insurance (Medicare) number and demographic data as identifying variables. Availability of the health insurance number was limited in both datasets; therefore linkage was undertaken both with and without use of this number and agreement tested between both algorithms. Sensitivity was calculated for a sub-sample of 101 participants with a hospital admission confirmed by medical record review. Results Of the 2000 study participants, 85% were found to have a record in the hospitalisations dataset when the national health insurance number and sex were used as linkage variables and 92% when demographic details only were used. When agreement between the two methods was tested the disagreement fraction was 9%, mainly due to "false positive" links when demographic details only were used. A final algorithm that used multiple combinations of identifying variables resulted in a match proportion of 87%. Sensitivity of this final linkage was 95%. Conclusions High quality record linkage of cohort data with a hospitalisations dataset that has limited identifiers can be achieved using combinations of a national health insurance number and demographic data as identifying variables.

Piecing together an integrative taxonomic puzzle: microsatellite, wing shape and aedeagus length analyses of Bactrocera dorsalis s.l. (Diptera: Tephritidae) find no evidence of multiple lineages in a proposed contact zone along the Thai/Malay Peninsula

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Bactrocera dorsalis (Hendel) and B. papayae Drew & Hancock represent a closely related sibling species pair for which the biological species limits are unclear; i.e., it is uncertain if they are truely two biological species, or one biological species which has been incorrectly taxonomically split. The geographic ranges of the two taxa are thought to abut or overlap on or around the Isthmus of Kra, a recognised biogeographic barrier located on the narrowest portion of the Thai Peninsula. We collected fresh material of B. dorsalis sensu lato (i.e., B. dorsalis sensu stricto + B. papayae) in a north-south transect down the Thai Peninsula, from areas regarded as being exclusively B. dorsalis s.s., across the Kra Isthmus, and into regions regarded as exclusively B. papayae. We carried out microsatellite analyses and took measurements of male genitalia and wing shape. Both the latter morphological tests have been used previously to separate these two taxa. No significant population structuring was found in the microsatellite analysis and results were consistent with an interpretation of one, predominantly panmictic population. Both morphological datasets showed consistent, clinal variation along the transect, with no evidence for disjunction. No evidence in any tests supported historical vicariance driven by the Isthmus of Kra, and none of the three datasets supported the current taxonomy of two species. Rather, within and across the area of range overlap or abutment between the two species, only continuous morphological and genetic variation was recorded. Recognition that morphological traits previously used to separate these taxa are continuous, and that there is no genetic evidence for population segregation in the region of suspected species overlap, is consistent with a growing body of literature that reports no evidence of biological differentiation between these taxa.

Bayesian hierarchical models in statistical quality control methods to improve healthcare in hospitals

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Quality oriented management systems and methods have become the dominant business and governance paradigm. From this perspective, satisfying customers’ expectations by supplying reliable, good quality products and services is the key factor for an organization and even government. During recent decades, Statistical Quality Control (SQC) methods have been developed as the technical core of quality management and continuous improvement philosophy and now are being applied widely to improve the quality of products and services in industrial and business sectors. Recently SQC tools, in particular quality control charts, have been used in healthcare surveillance. In some cases, these tools have been modified and developed to better suit the health sector characteristics and needs. It seems that some of the work in the healthcare area has evolved independently of the development of industrial statistical process control methods. Therefore analysing and comparing paradigms and the characteristics of quality control charts and techniques across the different sectors presents some opportunities for transferring knowledge and future development in each sectors. Meanwhile considering capabilities of Bayesian approach particularly Bayesian hierarchical models and computational techniques in which all uncertainty are expressed as a structure of probability, facilitates decision making and cost-effectiveness analyses. Therefore, this research investigates the use of quality improvement cycle in a health vii setting using clinical data from a hospital. The need of clinical data for monitoring purposes is investigated in two aspects. A framework and appropriate tools from the industrial context are proposed and applied to evaluate and improve data quality in available datasets and data flow; then a data capturing algorithm using Bayesian decision making methods is developed to determine economical sample size for statistical analyses within the quality improvement cycle. Following ensuring clinical data quality, some characteristics of control charts in the health context including the necessity of monitoring attribute data and correlated quality characteristics are considered. To this end, multivariate control charts from an industrial context are adapted to monitor radiation delivered to patients undergoing diagnostic coronary angiogram and various risk-adjusted control charts are constructed and investigated in monitoring binary outcomes of clinical interventions as well as postintervention survival time. Meanwhile, adoption of a Bayesian approach is proposed as a new framework in estimation of change point following control chart’s signal. This estimate aims to facilitate root causes efforts in quality improvement cycle since it cuts the search for the potential causes of detected changes to a tighter time-frame prior to the signal. This approach enables us to obtain highly informative estimates for change point parameters since probability distribution based results are obtained. Using Bayesian hierarchical models and Markov chain Monte Carlo computational methods, Bayesian estimators of the time and the magnitude of various change scenarios including step change, linear trend and multiple change in a Poisson process are developed and investigated. The benefits of change point investigation is revisited and promoted in monitoring hospital outcomes where the developed Bayesian estimator reports the true time of the shifts, compared to priori known causes, detected by control charts in monitoring rate of excess usage of blood products and major adverse events during and after cardiac surgery in a local hospital. The development of the Bayesian change point estimators are then followed in a healthcare surveillances for processes in which pre-intervention characteristics of patients are viii affecting the outcomes. In this setting, at first, the Bayesian estimator is extended to capture the patient mix, covariates, through risk models underlying risk-adjusted control charts. Variations of the estimator are developed to estimate the true time of step changes and linear trends in odds ratio of intensive care unit outcomes in a local hospital. Secondly, the Bayesian estimator is extended to identify the time of a shift in mean survival time after a clinical intervention which is being monitored by riskadjusted survival time control charts. In this context, the survival time after a clinical intervention is also affected by patient mix and the survival function is constructed using survival prediction model. The simulation study undertaken in each research component and obtained results highly recommend the developed Bayesian estimators as a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances as well as industrial and business contexts. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The empirical results and simulations indicate that the Bayesian estimators are a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The advantages of the Bayesian approach seen in general context of quality control may also be extended in the industrial and business domains where quality monitoring was initially developed.

Socioeconomic differences in takeaway food consumption among adults

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background In Australia and other developed countries, there are consistent and marked socioeconomic inequalities in health. Diet is a major contributing factor to the poorer health of lower socioeconomic groups: the dietary patterns of disadvantaged groups are least consistent with dietary recommendations for the prevention of diet-related chronic diseases compared with their more advantaged counterparts. Part of the reason that lower socioeconomic groups have poorer diets may be their consumption of takeaway foods. These foods typically have nutrient contents that fail to comply with the dietary recommendations for the prevention of chronic disease and associated risk factors. A high level of takeaway food consumption, therefore, may negatively influence overall dietary intakes and, consequently, lead to adverse health outcomes. Despite this, little attention has focused on the association between socioeconomic position (SEP) and takeaway food consumption, with the limited number of studies showing mixed results. Additionally, studies have been limited by only considering a narrow range of takeaway foods and not examining how different socioeconomic groups make choices that are more (or less) consistent with dietary recommendations. While a large number of earlier studies have consistently reported socioeconomically disadvantaged groups consume a lesser amount of fruit and vegetables, there is limited knowledge about the role of takeaway food in socioeconomic variations in fruit and vegetable intake. Furthermore, no known studies have investigated why there are socioeconomic differences in takeaway food consumption. The aims of this study are to: examine takeaway food consumption and the types of takeaway food consumed (healthy and less healthy) by different socioeconomic groups, to determine whether takeaway food consumption patterns explain socioeconomic variations in fruit and vegetable intake, and investigate the role of a range of psychosocial factors in explaining the association between SEP and takeaway food consumption and the choice of takeaway food. Methods This study used two cross-sectional population-based datasets: 1) the 1995 Australian National Nutrition Survey (NNS) which was conducted among a nationally representative sample of adults aged between 25.64 years (N = 7319, 61% response rate); and 2) the Food and Lifestyle Survey (FLS) which was conducted by the candidate and was undertaken among randomly selected adults aged between 25.64 years residing in Brisbane, Australia in 2009 (N = 903, 64% response rate). The FLS extended the NNS in several ways by describing current socioeconomic differences in takeaway food consumption patterns, formally assessing the mediated effect of takeaway food consumption to socioeconomic inequalities in fruit and vegetable intake, and also investigating whether (and which) psychosocial factors contributed to the observed socioeconomic variations in takeaway food consumption patterns. Results Approximately 32% of the NNS participants consumed takeaway food in the previous 24 hours and 38% of the FLS participants reported consuming takeaway food once a week or more. The results from analyses of the NNS and the FLS were somewhat mixed; however, disadvantaged groups were likely to consume a high level of �\less healthy. takeaway food compared with their more advantaged counterparts. The lower fruit and vegetable intake among lower socioeconomic groups was partly mediated by their high consumption of �\less healthy. takeaway food. Lower socioeconomic groups were more likely to have negative meal preparation behaviours and attitudes, and weaker health and nutrition-related beliefs and knowledge. Socioeconomic differences in takeaway food consumption were partly explained by meal preparation behaviours and attitudes, and these factors along with health and nutrition-related beliefs and knowledge appeared to contribute to the socioeconomic variations in choice of takeaway foods. Conclusion This thesis enhances our understanding of socioeconomic differences in dietary behaviours and the potential pathways by describing takeaway food consumption patterns by SEP, explaining the role of takeaway food consumption in socioeconomic inequalities in fruit and vegetable intake, and identifying the potential impact of psychosocial factors on socioeconomic differences in takeaway food consumption and the choice of takeaway food. Some important evidence is also provided for developing policies and effective intervention programs to improve the diet quality of the population, especially among lower socioeconomic groups. This thesis concludes with a discussion of a number of recommendations about future research and strategies to improve the dietary intake of the whole population, and especially among disadvantaged groups.

Researcher attitudes to data sharing : cultural change requires better motivations

Relevância:

10.00% 10.00%

Publicador:

Resumo:

While undertaking the ANDS RDA Gold Standard Record Exemplars project, research data sharing was discussed with many QUT researchers. Our experiences provided rich insight into researcher attitudes towards their data and the sharing of such data. Generally, we found traditional altruistic motivations for research data sharing did not inspire researchers, but an explanation of the more achievement-oriented benefits were more compelling.

VIVO as an Institutional Data Registry

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Queensland University of Technology (QUT) in Brisbane, Australia, is involved in a number of projects funded by the Australian National Data Service (ANDS). Currently, QUT is working on a project (Metadata Stores Project) that uses open source VIVO software to aid in the storage and management of metadata relating to data sets created/managed by the QUT research community. The registry (called QUT Research Data Finder) will support the sharing and reuse of research datasets, within and external to QUT. QUT uses VIVO for both the display and the editing of research metadata.

An analytical tool that quantifies cellular morphology changes from three-dimensional fluorescence images

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The most common software analysis tools available for measuring fluorescence images are for two-dimensional (2D) data that rely on manual settings for inclusion and exclusion of data points, and computer-aided pattern recognition to support the interpretation and findings of the analysis. It has become increasingly important to be able to measure fluorescence images constructed from three-dimensional (3D) datasets in order to be able to capture the complexity of cellular dynamics and understand the basis of cellular plasticity within biological systems. Sophisticated microscopy instruments have permitted the visualization of 3D fluorescence images through the acquisition of multispectral fluorescence images and powerful analytical software that reconstructs the images from confocal stacks that then provide a 3D representation of the collected 2D images. Advanced design-based stereology methods have progressed from the approximation and assumptions of the original model-based stereology(1) even in complex tissue sections(2). Despite these scientific advances in microscopy, a need remains for an automated analytic method that fully exploits the intrinsic 3D data to allow for the analysis and quantification of the complex changes in cell morphology, protein localization and receptor trafficking. Current techniques available to quantify fluorescence images include Meta-Morph (Molecular Devices, Sunnyvale, CA) and Image J (NIH) which provide manual analysis. Imaris (Andor Technology, Belfast, Northern Ireland) software provides the feature MeasurementPro, which allows the manual creation of measurement points that can be placed in a volume image or drawn on a series of 2D slices to create a 3D object. This method is useful for single-click point measurements to measure a line distance between two objects or to create a polygon that encloses a region of interest, but it is difficult to apply to complex cellular network structures. Filament Tracer (Andor) allows automatic detection of the 3D neuronal filament-like however, this module has been developed to measure defined structures such as neurons, which are comprised of dendrites, axons and spines (tree-like structure). This module has been ingeniously utilized to make morphological measurements to non-neuronal cells(3), however, the output data provide information of an extended cellular network by using a software that depends on a defined cell shape rather than being an amorphous-shaped cellular model. To overcome the issue of analyzing amorphous-shaped cells and making the software more suitable to a biological application, Imaris developed Imaris Cell. This was a scientific project with the Eidgenössische Technische Hochschule, which has been developed to calculate the relationship between cells and organelles. While the software enables the detection of biological constraints, by forcing one nucleus per cell and using cell membranes to segment cells, it cannot be utilized to analyze fluorescence data that are not continuous because ideally it builds cell surface without void spaces. To our knowledge, at present no user-modifiable automated approach that provides morphometric information from 3D fluorescence images has been developed that achieves cellular spatial information of an undefined shape (Figure 1). We have developed an analytical platform using the Imaris core software module and Imaris XT interfaced to MATLAB (Mat Works, Inc.). These tools allow the 3D measurement of cells without a pre-defined shape and with inconsistent fluorescence network components. Furthermore, this method will allow researchers who have extended expertise in biological systems, but not familiarity to computer applications, to perform quantification of morphological changes in cell dynamics.

Real, complex, and binary semantic vectors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a combined structure for using real, complex, and binary valued vectors for semantic representation. The theory, implementation, and application of this structure are all significant. For the theory underlying quantum interaction, it is important to develop a core set of mathematical operators that describe systems of information, just as core mathematical operators in quantum mechanics are used to describe the behavior of physical systems. The system described in this paper enables us to compare more traditional quantum mechanical models (which use complex state vectors), alongside more generalized quantum models that use real and binary vectors. The implementation of such a system presents fundamental computational challenges. For large and sometimes sparse datasets, the demands on time and space are different for real, complex, and binary vectors. To accommodate these demands, the Semantic Vectors package has been carefully adapted and can now switch between different number types comparatively seamlessly. This paper describes the key abstract operations in our semantic vector models, and describes the implementations for real, complex, and binary vectors. We also discuss some of the key questions that arise in the field of quantum interaction and informatics, explaining how the wide availability of modelling options for different number fields will help to investigate some of these questions.

Visualisation of complex flows using texture-based techniques

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Complex flow datasets are often difficult to represent in detail using traditional vector visualisation techniques such as arrow plots and streamlines. This is particularly true when the flow regime changes in time. Texture-based techniques, which are based on the advection of dense textures, are novel techniques for visualising such flows (i.e., complex dynamics and time-dependent). In this paper, we review two popular texture-based techniques and their application to flow datasets sourced from real research projects. The texture-based techniques investigated were Line Integral Convolution (LIC), and Image-Based Flow Visualisation (IBFV). We evaluated these techniques and in this paper report on their visualisation effectiveness (when compared with traditional techniques), their ease of implementation, and their computational overhead.

Modelling web-server Flash Events

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A Flash Event (FE) represents a period of time when a web-server experiences a dramatic increase in incoming traffic, either following a newsworthy event that has prompted users to locate and access it, or as a result of redirection from other popular web or social media sites. This usually leads to network congestion and Quality-of-Service (QoS) degradation. These events can be mistaken for Distributed Denial-of-Service (DDoS) attacks aimed at disrupting the server. Accurate detection of FEs and their distinction from DDoS attacks is important, since different actions need to be undertaken by network administrators in these two cases. However, lack of public domain FE datasets hinders research in this area. In this paper we present a detailed study of flash events and classify them into three broad categories. In addition, the paper describes FEs in terms of three key components: the volume of incoming traffic, the related source IP-addresses, and the resources being accessed. We present such a FE model with minimal parameters and use publicly available datasets to analyse and validate our proposed model. The model can be used to generate different types of FE traffic, closely approximating real-world scenarios, in order to facilitate research into distinguishing FEs from DDoS attacks.

Marginal reversible jump Markov chain Monte Carlo with application to motor unit number estimation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motor unit number estimation (MUNE) is a method which aims to provide a quantitative indicator of progression of diseases that lead to loss of motor units, such as motor neurone disease. However the development of a reliable, repeatable and fast real-time MUNE method has proved elusive hitherto. Ridall et al. (2007) implement a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm to produce a posterior distribution for the number of motor units using a Bayesian hierarchical model that takes into account biological information about motor unit activation. However we find that the approach can be unreliable for some datasets since it can suffer from poor cross-dimensional mixing. Here we focus on improved inference by marginalising over latent variables to create the likelihood. In particular we explore how this can improve the RJMCMC mixing and investigate alternative approaches that utilise the likelihood (e.g. DIC (Spiegelhalter et al., 2002)). For this model the marginalisation is over latent variables which, for a larger number of motor units, is an intractable summation over all combinations of a set of latent binary variables whose joint sample space increases exponentially with the number of motor units. We provide a tractable and accurate approximation for this quantity and also investigate simulation approaches incorporated into RJMCMC using results of Andrieu and Roberts (2009).

Visualisation of complex flows using texture-based techniques

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Detailed representations of complex flow datasets are often difficult to generate using traditional vector visualisation techniques such as arrow plots and streamlines. This is particularly true when the flow regime changes in time. Texture-based techniques, which are based on the advection of dense textures, are novel techniques for visualising such flows. We review two popular texture based techniques and their application to flow datasets sourced from active research projects. The techniques investigated were Line integral convolution (LIC) [1], and Image based flow visualisation (IBFV) [18]. We evaluated these and report on their effectiveness from a visualisation perspective. We also report on their ease of implementation and computational overheads.

Pore formation during dehydration of polycrystalline gypsum observed and quantified in a time-series synchrotron radiation based X-ray micro-tomography experiment

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We conducted an in-situ X-ray micro-computed tomography heating experiment at the Advanced Photon Source (USA) to dehydrate an unconfined 2.3 mm diameter cylinder of Volterra Gypsum. We used a purpose-built X-ray transparent furnace to heat the sample to 388 K for a total of 310 min to acquire a three-dimensional time-series tomography dataset comprising nine time steps. The voxel size of 2.2 μm3 proved sufficient to pinpoint reaction initiation and the organization of drainage architecture in space and time. We observed that dehydration commences across a narrow front, which propagates from the margins to the centre of the sample in more than four hours. The advance of this front can be fitted with a square-root function, implying that the initiation of the reaction in the sample can be described as a diffusion process. Novel parallelized computer codes allow quantifying the geometry of the porosity and the drainage architecture from the very large tomographic datasets (20483 voxels) in unprecedented detail. We determined position, volume, shape and orientation of each resolvable pore and tracked these properties over the duration of the experiment. We found that the pore-size distribution follows a power law. Pores tend to be anisotropic but rarely crack-shaped and have a preferred orientation, likely controlled by a pre-existing fabric in the sample. With on-going dehydration, pores coalesce into a single interconnected pore cluster that is connected to the surface of the sample cylinder and provides an effective drainage pathway. Our observations can be summarized in a model in which gypsum is stabilized by thermal expansion stresses and locally increased pore fluid pressures until the dehydration front approaches to within about 100 μm. Then, the internal stresses are released and dehydration happens efficiently, resulting in new pore space. Pressure release, the production of pores and the advance of the front are coupled in a feedback loop.

Prediction of transcriptional regulatory interactions in bacteria : a comparative genomics approach

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.

«
1
2
...
16
17
18
19
20
21
22
...
64
65
»