35 resultados para Core data set

em Queensland University of Technology - ePrints Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A variety of sustainable development research efforts and related activities are attempting to reconcile the issues of conserving our natural resources without limiting economic motivation while also improving our social equity and quality of life. Land use/land cover change, occurring on a global scale, is an aggregate of local land use decisions and profoundly impacts our environment. It is therefore the local decision making process that should be the eventual target of many of the ongoing data collection and research efforts which strive toward supporting a sustainable future. Satellite imagery data is a primary source of data upon which to build a core data set for use by researchers in analyzing this global change. A process is necessary to link global change research, utilizing satellite imagery, to the local land use decision making process. One example of this is the NASA-sponsored Regional Data Center (RDC) prototype. The RDC approach is an attempt to integrate science and technology at the community level. The anticipated result of this complex interaction between research and the decision making communities will be realized in the form of long-term benefits to the public.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE Corneal confocal microscopy is a novel diagnostic technique for the detection of nerve damage and repair in a range of peripheral neuropathies, in particular diabetic neuropathy. Normative reference values are required to enable clinical translation and wider use of this technique. We have therefore undertaken a multicenter collaboration to provide worldwide age-adjusted normative values of corneal nerve fiber parameters. RESEARCH DESIGN AND METHODS A total of 1,965 corneal nerve images from 343 healthy volunteers were pooled from six clinical academic centers. All subjects underwent examination with the Heidelberg Retina Tomograph corneal confocal microscope. Images of the central corneal subbasal nerve plexus were acquired by each center using a standard protocol and analyzed by three trained examiners using manual tracing and semiautomated software (CCMetrics). Age trends were established using simple linear regression, and normative corneal nerve fiber density (CNFD), corneal nerve fiber branch density (CNBD), corneal nerve fiber length (CNFL), and corneal nerve fiber tortuosity (CNFT) reference values were calculated using quantile regression analysis. RESULTS There was a significant linear age-dependent decrease in CNFD (-0.164 no./mm(2) per year for men, P < 0.01, and -0.161 no./mm(2) per year for women, P < 0.01). There was no change with age in CNBD (0.192 no./mm(2) per year for men, P = 0.26, and -0.050 no./mm(2) per year for women, P = 0.78). CNFL decreased in men (-0.045 mm/mm(2) per year, P = 0.07) and women (-0.060 mm/mm(2) per year, P = 0.02). CNFT increased with age in men (0.044 per year, P < 0.01) and women (0.046 per year, P < 0.01). Height, weight, and BMI did not influence the 5th percentile normative values for any corneal nerve parameter. CONCLUSIONS This study provides robust worldwide normative reference values for corneal nerve parameters to be used in research and clinical practice in the study of diabetic and other peripheral neuropathies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mass flows on volcanic islands generated by volcanic lava dome collapse and by larger-volume flank collapse can be highly dangerous locally and may generate tsunamis that threaten a wider area. It is therefore important to understand their frequency, emplacement dynamics, and relationship to volcanic eruption cycles. The best record of mass flow on volcanic islands may be found offshore, where most material is deposited and where intervening hemipelagic sediment aids dating. Here we analyze what is arguably the most comprehensive sediment core data set collected offshore from a volcanic island. The cores are located southeast of Montserrat, on which the Soufriere Hills volcano has been erupting since 1995. The cores provide a record of mass flow events during the last 110 thousand years. Older mass flow deposits differ significantly from those generated by the repeated lava dome collapses observed since 1995. The oldest mass flow deposit originated through collapse of the basaltic South Soufriere Hills at 103-110 ka, some 20-30 ka after eruptions formed this volcanic center. A ∼1.8 km3 blocky debris avalanche deposit that extends from a chute in the island shelf records a particularly deep-seated failure. It likely formed from a collapse of almost equal amounts of volcanic edifice and coeval carbonate shelf, emplacing a mixed bioclastic-andesitic turbidite in a complex series of stages. This study illustrates how volcanic island growth and collapse involved extensive, large-volume submarine mass flows with highly variable composition. Runout turbidites indicate that mass flows are emplaced either in multiple stages or as single events.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

1. Ecological data sets often use clustered measurements or use repeated sampling in a longitudinal design. Choosing the correct covariance structure is an important step in the analysis of such data, as the covariance describes the degree of similarity among the repeated observations. 2. Three methods for choosing the covariance are: the Akaike information criterion (AIC), the quasi-information criterion (QIC), and the deviance information criterion (DIC). We compared the methods using a simulation study and using a data set that explored effects of forest fragmentation on avian species richness over 15 years. 3. The overall success was 80.6% for the AIC, 29.4% for the QIC and 81.6% for the DIC. For the forest fragmentation study the AIC and DIC selected the unstructured covariance, whereas the QIC selected the simpler autoregressive covariance. Graphical diagnostics suggested that the unstructured covariance was probably correct. 4. We recommend using DIC for selecting the correct covariance structure.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Longitudinal panel studies of large, random samples of business start-ups captured at the pre-operational stage allow researchers to address core issues for entrepreneurship research, namely, the processes of creation of new business ventures as well as their antecedents and outcomes. Here, we perform a methods-orientated review of all 83 journal articles that have used this type of data set, our purpose being to assist users of current data sets as well as designers of new projects in making the best use of this innovative research approach. Our review reveals a number of methods issues that are largely particular to this type of research. We conclude that amidst exemplary contributions, much of the reviewed research has not adequately managed these methods challenges, nor has it made use of the full potential of this new research approach. Specifically, we identify and suggest remedies for context-specific and interrelated methods challenges relating to sample definition, choice of level of analysis, operationalization and conceptualization, use of longitudinal data and dealing with various types of problematic heterogeneity. In addition, we note that future research can make further strides towards full utilization of the advantages of the research approach through better matching (from either direction) between theories and the phenomena captured in the data, and by addressing some under-explored research questions for which the approach may be particularly fruitful.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background Phylogeographic reconstruction of some bacterial populations is hindered by low diversity coupled with high levels of lateral gene transfer. A comparison of recombination levels and diversity at seven housekeeping genes for eleven bacterial species, most of which are commonly cited as having high levels of lateral gene transfer shows that the relative contributions of homologous recombination versus mutation for Burkholderia pseudomallei is over two times higher than for Streptococcus pneumoniae and is thus the highest value yet reported in bacteria. Despite the potential for homologous recombination to increase diversity, B. pseudomallei exhibits a relative lack of diversity at these loci. In these situations, whole genome genotyping of orthologous shared single nucleotide polymorphism loci, discovered using next generation sequencing technologies, can provide very large data sets capable of estimating core phylogenetic relationships. We compared and searched 43 whole genome sequences of B. pseudomallei and its closest relatives for single nucleotide polymorphisms in orthologous shared regions to use in phylogenetic reconstruction. Results Bayesian phylogenetic analyses of >14,000 single nucleotide polymorphisms yielded completely resolved trees for these 43 strains with high levels of statistical support. These results enable a better understanding of a separate analysis of population differentiation among >1,700 B. pseudomallei isolates as defined by sequence data from seven housekeeping genes. We analyzed this larger data set for population structure and allele sharing that can be attributed to lateral gene transfer. Our results suggest that despite an almost panmictic population, we can detect two distinct populations of B. pseudomallei that conform to biogeographic patterns found in many plant and animal species. That is, separation along Wallace's Line, a biogeographic boundary between Southeast Asia and Australia. Conclusion We describe an Australian origin for B. pseudomallei, characterized by a single introduction event into Southeast Asia during a recent glacial period, and variable levels of lateral gene transfer within populations. These patterns provide insights into mechanisms of genetic diversification in B. pseudomallei and its closest relatives, and provide a framework for integrating the traditionally separate fields of population genetics and phylogenetics for other bacterial species with high levels of lateral gene transfer.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Collaboration has been enacted as a core strategy by both the government and nongovernment sectors to address many of the intractable issues confronting contemporary society. The cult of collaboration has become so pervasive that it is now an elastic term referring generally to any form of ‘working together’. The lack of specificity about collaboration and its practice means that it risks being reduced to mere rhetoric without sustained practice or action. Drawing on an extensive data set (qualitative, quantitative) of broadly collaborative endeavours gathered over ten years in Queensland, Australia, this paper aims to fill out the black box of collaboration. Specifically it examines the drivers for collaboration, dominant structures and mechanisms adopted, what has worked and unintended consequences. In particular it investigates the skills and competencies required in an embeded collaborative endeavour within and across organisations. Social network analysis is applied to isolate the structural properties of collaborations over other forms of integration as well as highlighting key roles and tasks. Collaboration is found to be a distinctive form of working together, characterised by intense and interdependent relationships and exchanges, higher levels of cohesion (density) and requiring new ways of behaving, working, managing and leading. These elements are configured into a practice framework. Developing an empirical evidence base for collaboration structure, practice and strategy provides a useful foundation for theory extension. The paper concludes that for collaboration, to be successfully employed as a management strategy it must move beyond rhetoric and develop a coherent model for action.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A rule-based approach for classifying previously identified medical concepts in the clinical free text into an assertion category is presented. There are six different categories of assertions for the task: Present, Absent, Possible, Conditional, Hypothetical and Not associated with the patient. The assertion classification algorithms were largely based on extending the popular NegEx and Context algorithms. In addition, a health based clinical terminology called SNOMED CT and other publicly available dictionaries were used to classify assertions, which did not fit the NegEx/Context model. The data for this task includes discharge summaries from Partners HealthCare and from Beth Israel Deaconess Medical Centre, as well as discharge summaries and progress notes from University of Pittsburgh Medical Centre. The set consists of 349 discharge reports, each with pairs of ground truth concept and assertion files for system development, and 477 reports for evaluation. The system’s performance on the evaluation data set was 0.83, 0.83 and 0.83 for recall, precision and F1-measure, respectively. Although the rule-based system shows promise, further improvements can be made by incorporating machine learning approaches.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The National Road Safety Strategy 2011-2020 outlines plans to reduce the burden of road trauma via improvements and interventions relating to safe roads, safe speeds, safe vehicles, and safe people. It also highlights that a key aspect in achieving these goals is the availability of comprehensive data on the issue. The use of data is essential so that more in-depth epidemiologic studies of risk can be conducted as well as to allow effective evaluation of road safety interventions and programs. Before utilising data to evaluate the efficacy of prevention programs it is important for a systematic evaluation of the quality of underlying data sources to be undertaken to ensure any trends which are identified reflect true estimates rather than spurious data effects. However, there has been little scientific work specifically focused on establishing core data quality characteristics pertinent to the road safety field and limited work undertaken to develop methods for evaluating data sources according to these core characteristics. There are a variety of data sources in which traffic-related incidents and resulting injuries are recorded, which are collected for a variety of defined purposes. These include police reports, transport safety databases, emergency department data, hospital morbidity data and mortality data to name a few. However, as these data are collected for specific purposes, each of these data sources suffers from some limitations when seeking to gain a complete picture of the problem. Limitations of current data sources include: delays in data being available, lack of accurate and/or specific location information, and an underreporting of crashes involving particular road user groups such as cyclists. This paper proposes core data quality characteristics that could be used to systematically assess road crash data sources to provide a standardised approach for evaluating data quality in the road safety field. The potential for data linkage to qualitatively and quantitatively improve the quality and comprehensiveness of road crash data is also discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data mining techniques extract repeated and useful patterns from a large data set that in turn are utilized to predict the outcome of future events. The main purpose of the research presented in this paper is to investigate data mining strategies and develop an efficient framework for multi-attribute project information analysis to predict the performance of construction projects. The research team first reviewed existing data mining algorithms, applied them to systematically analyze a large project data set collected by the survey, and finally proposed a data-mining-based decision support framework for project performance prediction. To evaluate the potential of the framework, a case study was conducted using data collected from 139 capital projects and analyzed the relationship between use of information technology and project cost performance. The study results showed that the proposed framework has potential to promote fast, easy to use, interpretable, and accurate project data analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background Bactrocera dorsalis s.s. is a pestiferous tephritid fruit fly distributed from Pakistan to the Pacific, with the Thai/Malay peninsula its southern limit. Sister pest taxa, B. papayae and B. philippinensis, occur in the southeast Asian archipelago and the Philippines, respectively. The relationship among these species is unclear due to their high molecular and morphological similarity. This study analysed population structure of these three species within a southeast Asian biogeographical context to assess potential dispersal patterns and the validity of their current taxonomic status. Results Geometric morphometric results generated from 15 landmarks for wings of 169 flies revealed significant differences in wing shape between almost all sites following canonical variate analysis. For the combined data set there was a greater isolation-by-distance (IBD) effect under a ‘non-Euclidean’ scenario which used geographical distances within a biogeographical ‘Sundaland context’ (r2 = 0.772, P < 0.0001) as compared to a ‘Euclidean’ scenario for which direct geographic distances between sample sites was used (r2 = 0.217, P < 0.01). COI sequence data were obtained for 156 individuals and yielded 83 unique haplotypes with no correlation to current taxonomic designations via a minimum spanning network. BEAST analysis provided a root age and location of 540kya in northern Thailand, with migration of B. dorsalis s.l. into Malaysia 470kya and Sumatra 270kya. Two migration events into the Philippines are inferred. Sequence data revealed a weak but significant IBD effect under the ‘non-Euclidean’ scenario (r2 = 0.110, P < 0.05), with no historical migration evident between Taiwan and the Philippines. Results are consistent with those expected at the intra-specific level. Conclusions Bactrocera dorsalis s.s., B. papayae and B. philippinensis likely represent one species structured around the South China Sea, having migrated from northern Thailand into the southeast Asian archipelago and across into the Philippines. No migration is apparent between the Philippines and Taiwan. This information has implications for quarantine, trade and pest management.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective: Radiation safety principles dictate that imaging procedures should minimise the radiation risks involved, without compromising diagnostic performance. This study aims to define a core set of views that maximises clinical information yield for minimum radiation risk. Angiographers would supplement these views as clinically indicated. Methods: An algorithm was developed to combine published data detailing the quality of information derived for the major coronary artery segments through the use of a common set of views in angiography with data relating to the dose–area product and scatter radiation associated with these views. Results: The optimum view set for the left coronary system comprised four views: left anterior oblique (LAO) with cranial (Cr) tilt, shallow right anterior oblique (AP-RAO) with caudal (Ca) tilt, RAO with Ca tilt and AP-RAO with Cr tilt. For the right coronary system three views were identified: LAO with Cr tilt, RAO and AP-RAO with Cr tilt. An alternative left coronary view set including a left lateral achieved minimally superior efficiency (,5%), but with an ,8% higher radiation dose to the patient and 40% higher cardiologist dose. Conclusion: This algorithm identifies a core set of angiographic views that optimises the information yield and minimises radiation risk. This basic data set would be supplemented by additional clinically determined views selected by the angiographer for each case. The decision to use additional views for diagnostic angiography and interventions would be assisted by referencing a table of relative radiation doses for the views being considered.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.