949 resultados para Statistical genomics
Resumo:
The gross overrepresentation of Indigenous peoples in prison populations suggests that sentencing may be a discriminatory process. Using findings from recent (1991–2011) multivariate statistical sentencing analyses from the United States, Canada, and Australia, we review the 3 key hypotheses advanced as plausible explanations for baseline sentencing discrepancies between Indigenous and non-Indigenous adult criminal defendants: (a) differential involvement, (b) negative discrimination, and (c) positive discrimination. Overall, the prior research shows strong support for the differential involvement thesis and some support for the discrimination theses (positive and negative). We argue that where discrimination is found, it may be explained by the lack of a more complete set of control variables in researchers’ multivariate models and/or differing political and social contexts.
Resumo:
Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.
Resumo:
Certain statistic and scientometric features of articles published in the journal “International Research in Geographical and Environmental Education” are examined in this paper, for the period 1992-2009, by applying nonparametric statistics and Shannon’s entropy (diversity) formula. The main findings of this analysis are: a) after 2004 the research priorities of researchers in geographical and environmental education seem to have changed, b) “teacher education” has been the most recurrent theme throughout these 18 years, followed by “values & attitudes” and “inquiry & problem solving” c) the themes “GIS” and “Sustainability” were the most “stable” throughout the 18 years, meaning that they maintained their ranks as publication priorities more than other themes, d) citations of IRGEE increase annually, e) the average thematic diversity of articles published during the period 1992-2009 is 82.7% of the maximum thematic diversity (very high), meaning that the Journal has the capacity to attract a wide readership for the 10 themes it has successfully covered throughout the 18 years of its publication.
Resumo:
The Clarence-Moreton Basin (CMB) covers approximately 26000 km2 and is the only sub-basin of the Great Artesian Basin (GAB) in which there is flow to both the south-west and the east, although flow to the south-west is predominant. In many parts of the basin, including catchments of the Bremer, Logan and upper Condamine Rivers in southeast Queensland, the Walloon Coal Measures are under exploration for Coal Seam Gas (CSG). In order to assess spatial variations in groundwater flow and hydrochemistry at a basin-wide scale, a 3D hydrogeological model of the Queensland section of the CMB has been developed using GoCAD modelling software. Prior to any large-scale CSG extraction, it is essential to understand the existing hydrochemical character of the different aquifers and to establish any potential linkage. To effectively use the large amount of water chemistry data existing for assessment of hydrochemical evolution within the different lithostratigraphic units, multivariate statistical techniques were employed.
Resumo:
Cancer poses an undeniable burden to the health and wellbeing of the Australian community. In a recent report commissioned by the Australian Institute for Health and Welfare(AIHW, 2010), one in every two Australians on average will be diagnosed with cancer by the age of 85, making cancer the second leading cause of death in 2007, preceded only by cardiovascular disease. Despite modest decreases in standardised combined cancer mortality over the past few decades, in part due to increased funding and access to screening programs, cancer remains a significant economic burden. In 2010, all cancers accounted for an estimated 19% of the country's total burden of disease, equating to approximately $3:8 billion in direct health system costs (Cancer Council Australia, 2011). Furthermore, there remains established socio-economic and other demographic inequalities in cancer incidence and survival, for example, by indigenous status and rurality. Therefore, in the interests of the nation's health and economic management, there is an immediate need to devise data-driven strategies to not only understand the socio-economic drivers of cancer but also facilitate the implementation of cost-effective resource allocation for cancer management...
Resumo:
The selection of optimal camera configurations (camera locations, orientations etc.) for multi-camera networks remains an unsolved problem. Previous approaches largely focus on proposing various objective functions to achieve different tasks. Most of them, however, do not generalize well to large scale networks. To tackle this, we introduce a statistical formulation of the optimal selection of camera configurations as well as propose a Trans-Dimensional Simulated Annealing (TDSA) algorithm to effectively solve the problem. We compare our approach with a state-of-the-art method based on Binary Integer Programming (BIP) and show that our approach offers similar performance on small scale problems. However, we also demonstrate the capability of our approach in dealing with large scale problems and show that our approach produces better results than 2 alternative heuristics designed to deal with the scalability issue of BIP.
Resumo:
The need for a house rental model in Townsville, Australia is addressed. Models developed for predicting house rental levels are described. An analytical model is built upon a priori selected variables and parameters of rental levels. Regression models are generated to provide a comparison to the analytical model. Issues in model development and performance evaluation are discussed. A comparison of the models indicates that the analytical model performs better than the regression models.
Resumo:
Vacuum circuit breaker (VCB) overvoltage failure and its catastrophic failures during shunt reactor switching have been analyzed through computer simulations for multiple reignitions with a statistical VCB model found in the literature. However, a systematic review (SR) that is related to the multiple reignitions with a statistical VCB model does not yet exist. Therefore, this paper aims to analyze and explore the multiple reignitions with a statistical VCB model. It examines the salient points, research gaps and limitations of the multiple reignition phenomenon to assist with future investigations following the SR search. Based on the SR results, seven issues and two approaches to enhance the current statistical VCB model are identified. These results will be useful as an input to improve the computer modeling accuracy as well as the development of a reignition switch model with point-on-wave controlled switching for condition monitoring
Resumo:
Matched case–control research designs can be useful because matching can increase power due to reduced variability between subjects. However, inappropriate statistical analysis of matched data could result in a change in the strength of association between the dependent and independent variables or a change in the significance of the findings. We sought to ascertain whether matched case–control studies published in the nursing literature utilized appropriate statistical analyses. Of 41 articles identified that met the inclusion criteria, 31 (76%) used an inappropriate statistical test for comparing data derived from case subjects and their matched controls. In response to this finding, we developed an algorithm to support decision-making regarding statistical tests for matched case–control studies.
Resumo:
When a community already torn by an event such as a prolonged war, is then hit by a natural disaster, the negative impact of this subsequent disaster in the longer term can be extremely devastating. Natural disasters further damage already destabilised and demoralised communities, making it much harder for them to be resilient and recover. Communities often face enormous challenges during the immediate recovery and the subsequent long term reconstruction periods, mainly due to the lack of a viable community involvement process. In post-war settings, affected communities, including those internally displaced, are often conceived as being completely disabled and are hardly ever consulted when reconstruction projects are being instigated. This lack of community involvement often leads to poor project planning, decreased community support, and an unsustainable completed project. The impact of war, coupled with the tensions created by the uninhabitable and poor housing provision, often hinders the affected residents from integrating permanently into their home communities. This paper outlines a number of fundamental factors that act as barriers to community participation related to natural disasters in post-war settings. The paper is based on a statistical analysis of, and findings from, a questionnaire survey administered in early 2012 in Afghanistan.
Resumo:
Background: Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult owing to species biology and behavioural characteristics. The design of robust sampling programmes should be based on an underlying statistical distribution that is sufficiently flexible to capture variations in the spatial distribution of the target species. Results: Comparisons are made of the accuracy of four probability-of-detection sampling models - the negative binomial model,1 the Poisson model,1 the double logarithmic model2 and the compound model3 - for detection of insects over a broad range of insect densities. Although the double log and negative binomial models performed well under specific conditions, it is shown that, of the four models examined, the compound model performed the best over a broad range of insect spatial distributions and densities. In particular, this model predicted well the number of samples required when insect density was high and clumped within experimental storages. Conclusions: This paper reinforces the need for effective sampling programs designed to detect insects over a broad range of spatial distributions. The compound model is robust over a broad range of insect densities and leads to substantial improvement in detection probabilities within highly variable systems such as grain storage.
Resumo:
Operational modal analysis (OMA) is prevalent in modal identifi cation of civil structures. It asks for response measurements of the underlying structure under ambient loads. A valid OMA method requires the excitation be white noise in time and space. Although there are numerous applications of OMA in the literature, few have investigated the statistical distribution of a measurement and the infl uence of such randomness to modal identifi cation. This research has attempted modifi ed kurtosis to evaluate the statistical distribution of raw measurement data. In addition, a windowing strategy employing this index has been proposed to select quality datasets. In order to demonstrate how the data selection strategy works, the ambient vibration measurements of a laboratory bridge model and a real cable-stayed bridge have been respectively considered. The analysis incorporated with frequency domain decomposition (FDD) as the target OMA approach for modal identifi cation. The modal identifi cation results using the data segments with different randomness have been compared. The discrepancy in FDD spectra of the results indicates that, in order to fulfi l the assumption of an OMA method, special care shall be taken in processing a long vibration measurement data. The proposed data selection strategy is easy-to-apply and verifi ed effective in modal analysis.
Resumo:
This thesis explored the development of statistical methods to support the monitoring and improvement in quality of treatment delivered to patients undergoing coronary angioplasty procedures. To achieve this goal, a suite of outcome measures was identified to characterise performance of the service, statistical tools were developed to monitor the various indicators and measures to strengthen governance processes were implemented and validated. Although this work focused on pursuit of these aims in the context of a an angioplasty service located at a single clinical site, development of the tools and techniques was undertaken mindful of the potential application to other clinical specialties and a wider, potentially national, scope.
Resumo:
Cardiomyopathies represent a group of diseases of the myocardium of the heart and include diseases both primarily of the cardiac muscle and systemic diseases leading to adverse effects on the heart muscle size, shape, and function. Traditionally cardiomyopathies were defined according to phenotypical appearance. Now, as our understanding of the pathophysiology of the different entities classified under each of the different phenotypes improves and our knowledge of the molecular and genetic basis for these entities progresses, the traditional classifications seem oversimplistic and do not reflect current understanding of this myriad of diseases and disease processes. Although our knowledge of the exact basis of many of the disease processes of cardiomyopathies is still in its infancy, it is important to have a classification system that has the ability to incorporate the coming tide of molecular and genetic information. This paper discusses how the traditional classification of cardiomyopathies based on morphology has evolved due to rapid advances in our understanding of the genetic and molecular basis for many of these clinical entities.
Resumo:
Nitrous oxide emissions from soil are known to be spatially and temporally volatile. Reliable estimation of emissions over a given time and space depends on measuring with sufficient intensity but deciding on the number of measuring stations and the frequency of observation can be vexing. The question of low frequency manual observations providing comparable results to high frequency automated sampling also arises. Data collected from a replicated field experiment was intensively studied with the intention to give some statistically robust guidance on these issues. The experiment had nitrous oxide soil to air flux monitored within 10 m by 2.5 m plots by automated closed chambers under a 3 h average sampling interval and by manual static chambers under a three day average sampling interval over sixty days. Observed trends in flux over time by the static chambers were mostly within the auto chamber bounds of experimental error. Cumulated nitrous oxide emissions as measured by each system were also within error bounds. Under the temporal response pattern in this experiment, no significant loss of information was observed after culling the data to simulate results under various low frequency scenarios. Within the confines of this experiment observations from the manual chambers were not spatially correlated above distances of 1 m. Statistical power was therefore found to improve due to increased replicates per treatment or chambers per replicate. Careful after action review of experimental data can deliver savings for future work.