98 resultados para Data type converter
em Queensland University of Technology - ePrints Archive
Resumo:
Data in germplasm collections contain a mixture of data types; binary, multistate and quantitative. Given the multivariate nature of these data, the pattern analysis methods of classification and ordination have been identified as suitable techniques for statistically evaluating the available diversity. The proximity (or resemblance) measure, which is in part the basis of the complementary nature of classification and ordination techniques, is often specific to particular data types. The use of a combined resemblance matrix has an advantage over data type specific proximity measures. This measure accommodates the different data types without manipulating them to be of a specific type. Descriptors are partitioned into their data types and an appropriate proximity measure is used on each. The separate proximity matrices, after range standardisation, are added as a weighted average and the combined resemblance matrix is then used for classification and ordination. Germplasm evaluation data for 831 accessions of groundnut (Arachis hypogaea L.) from the Australian Tropical Field Crops Genetic Resource Centre, Biloela, Queensland were examined. Data for four binary, five ordered multistate and seven quantitative descriptors have been documented. The interpretative value of different weightings - equal and unequal weighting of data types to obtain a combined resemblance matrix - was investigated by using principal co-ordinate analysis (ordination) and hierarchical cluster analysis. Equal weighting of data types was found to be more valuable for these data as the results provided a greater insight into the patterns of variability available in the Australian groundnut germplasm collection. The complementary nature of pattern analysis techniques enables plant breeders to identify relevant accessions in relation to the descriptors which distinguish amongst them. This additional information may provide plant breeders with a more defined entry point into the germplasm collection for identifying sources of variability for their plant improvement program, thus improving the utilisation of germplasm resources.
Resumo:
Background Traffic offences have been considered an important predictor of crash involvement, and have often been used as a proxy safety variable for crashes. However the association between crashes and offences has never been meta-analysed and the population effect size never established. Research is yet to determine the extent to which this relationship may be spuriously inflated through systematic measurement error, with obvious implications for researchers endeavouring to accurately identify salient factors predictive of crashes. Methodology and Principal Findings Studies yielding a correlation between crashes and traffic offences were collated and a meta-analysis of 144 effects drawn from 99 road safety studies conducted. Potential impact of factors such as age, time period, crash and offence rates, crash severity and data type, sourced from either self-report surveys or archival records, were considered and discussed. After weighting for sample size, an average correlation of r = .18 was observed over the mean time period of 3.2 years. Evidence emerged suggesting the strength of this correlation is decreasing over time. Stronger correlations between crashes and offences were generally found in studies involving younger drivers. Consistent with common method variance effects, a within country analysis found stronger effect sizes in self-reported data even controlling for crash mean. Significance The effectiveness of traffic offences as a proxy for crashes may be limited. Inclusion of elements such as independently validated crash and offence histories or accurate measures of exposure to the road would facilitate a better understanding of the factors that influence crash involvement.
Resumo:
AIM: To draw on empirical evidence to illustrate the core role of nurse practitioners in Australia and New Zealand. BACKGROUND: Enacted legislation provides for mutual recognition of qualifications, including nursing, between New Zealand and Australia. As the nurse practitioner role is relatively new in both countries, there is no consistency in role expectation and hence mutual recognition has not yet been applied to nurse practitioners. A study jointly commissioned by both countries' Regulatory Boards developed information on the core role of the nurse practitioner, to develop shared competency and educational standards. Reporting on this study's process and outcomes provides insights that are relevant both locally and internationally. METHOD: This interpretive study used multiple data sources, including published and grey literature, policy documents, nurse practitioner program curricula and interviews with 15 nurse practitioners from the two countries. Data were analysed according to the appropriate standard for each data type and included both deductive and inductive methods. The data were aggregated thematically according to patterns within and across the interview and material data. FINDINGS: The core role of the nurse practitioner was identified as having three components: dynamic practice, professional efficacy and clinical leadership. Nurse practitioner practice is dynamic and involves the application of high level clinical knowledge and skills in a wide range of contexts. The nurse practitioner demonstrates professional efficacy, enhanced by an extended range of autonomy that includes legislated privileges. The nurse practitioner is a clinical leader with a readiness and an obligation to advocate for their client base and their profession at the systems level of health care. CONCLUSION: A clearly articulated and research informed description of the core role of the nurse practitioner provides the basis for development of educational and practice competency standards. These research findings provide new perspectives to inform the international debate about this extended level of nursing practice. RELEVANCE TO CLINICAL PRACTICE: The findings from this research have the potential to achieve a standardised approach and internationally consistent nomenclature for the nurse practitioner role.
Resumo:
We present an approach to automatically de-identify health records. In our approach, personal health information is identified using a Conditional Random Fields machine learning classifier, a large set of linguistic and lexical features, and pattern matching techniques. Identified personal information is then removed from the reports. The de-identification of personal health information is fundamental for the sharing and secondary use of electronic health records, for example for data mining and disease monitoring. The effectiveness of our approach is first evaluated on the 2007 i2b2 Shared Task dataset, a widely adopted dataset for evaluating de-identification techniques. Subsequently, we investigate the robustness of the approach to limited training data; we study its effectiveness on different type and quality of data by evaluating the approach on scanned pathology reports from an Australian institution. This data contains optical character recognition errors, as well as linguistic conventions that differ from those contained in the i2b2 dataset, for example different date formats. The findings suggest that our approach compares to the best approach from the 2007 i2b2 Shared Task; in addition, the approach is found to be robust to variations of training size, data type and quality in presence of sufficient training data.
Resumo:
The mitochondrial (mt) genome is, to date, the most extensively studied genomic system in insects, outnumbering nuclear genomes tenfold and representing all orders versus very few. Phylogenomic analysis methods have been tested extensively, identifying compositional bias and rate variation, both within and between lineages, as the principal issues confronting accurate analyses. Major studies at both inter- and intraordinal levels have contributed to our understanding of phylogenetic relationships within many groups. Genome rearrangements are an additional data type for defining relationships, with rearrangement synapomorphies identified across multiple orders and at many different taxonomic levels. Hymenoptera and Psocodea have greatly elevated rates of rearrangement offering both opportunities and pitfalls for identifying rearrangement synapomorphies in each group. Finally, insects are model systems for studying aberrant mt genomes, including truncated tRNAs and multichromosomal genomes. Greater integration of nuclear and mt genomic studies is necessary to further our understanding of insect genomic evolution.
Resumo:
Objective Evaluate the effectiveness and robustness of Anonym, a tool for de-identifying free-text health records based on conditional random fields classifiers informed by linguistic and lexical features, as well as features extracted by pattern matching techniques. De-identification of personal health information in electronic health records is essential for the sharing and secondary usage of clinical data. De-identification tools that adapt to different sources of clinical data are attractive as they would require minimal intervention to guarantee high effectiveness. Methods and Materials The effectiveness and robustness of Anonym are evaluated across multiple datasets, including the widely adopted Integrating Biology and the Bedside (i2b2) dataset, used for evaluation in a de-identification challenge. The datasets used here vary in type of health records, source of data, and their quality, with one of the datasets containing optical character recognition errors. Results Anonym identifies and removes up to 96.6% of personal health identifiers (recall) with a precision of up to 98.2% on the i2b2 dataset, outperforming the best system proposed in the i2b2 challenge. The effectiveness of Anonym across datasets is found to depend on the amount of information available for training. Conclusion Findings show that Anonym compares to the best approach from the 2006 i2b2 shared task. It is easy to retrain Anonym with new datasets; if retrained, the system is robust to variations of training size, data type and quality in presence of sufficient training data.
Resumo:
High-Order Co-Clustering (HOCC) methods have attracted high attention in recent years because of their ability to cluster multiple types of objects simultaneously using all available information. During the clustering process, HOCC methods exploit object co-occurrence information, i.e., inter-type relationships amongst different types of objects as well as object affinity information, i.e., intra-type relationships amongst the same types of objects. However, it is difficult to learn accurate intra-type relationships in the presence of noise and outliers. Existing HOCC methods consider the p nearest neighbours based on Euclidean distance for the intra-type relationships, which leads to incomplete and inaccurate intra-type relationships. In this paper, we propose a novel HOCC method that incorporates multiple subspace learning with a heterogeneous manifold ensemble to learn complete and accurate intra-type relationships. Multiple subspace learning reconstructs the similarity between any pair of objects that belong to the same subspace. The heterogeneous manifold ensemble is created based on two-types of intra-type relationships learnt using p-nearest-neighbour graph and multiple subspaces learning. Moreover, in order to make sure the robustness of clustering process, we introduce a sparse error matrix into matrix decomposition and develop a novel iterative algorithm. Empirical experiments show that the proposed method achieves improved results over the state-of-art HOCC methods for FScore and NMI.
Resumo:
Background Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. Methods We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Results Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Conclusions Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.
Resumo:
Background Several prospective studies have suggested that gait and plantar pressure abnormalities secondary to diabetic peripheral neuropathy contributes to foot ulceration. There are many different methods by which gait and plantar pressures are assessed and currently there is no agreed standardised approach. This study aimed to describe the methods and reproducibility of three-dimensional gait and plantar pressure assessments in a small subset of participants using pre-existing protocols. Methods Fourteen participants were conveniently sampled prior to a planned longitudinal study; four patients with diabetes and plantar foot ulcers, five patients with diabetes but no foot ulcers and five healthy controls. The repeatability of measuring key biomechanical data was assessed including the identification of 16 key anatomical landmarks, the measurement of seven leg dimensions, the processing of 22 three-dimensional gait parameters and the analysis of four different plantar pressures measures at 20 foot regions. Results The mean inter-observer differences were within the pre-defined acceptable level (<7 mm) for 100 % (16 of 16) of key anatomical landmarks measured for gait analysis. The intra-observer assessment concordance correlation coefficients were > 0.9 for 100 % (7 of 7) of leg dimensions. The coefficients of variations (CVs) were within the pre-defined acceptable level (<10 %) for 100 % (22 of 22) of gait parameters. The CVs were within the pre-defined acceptable level (<30 %) for 95 % (19 of 20) of the contact area measures, 85 % (17 of 20) of mean plantar pressures, 70 % (14 of 20) of pressure time integrals and 55 % (11 of 20) of maximum sensor plantar pressure measures. Conclusion Overall, the findings of this study suggest that important gait and plantar pressure measurements can be reliably acquired. Nearly all measures contributing to three-dimensional gait parameter assessments were within predefined acceptable limits. Most plantar pressure measurements were also within predefined acceptable limits; however, reproducibility was not as good for assessment of the maximum sensor pressure. To our knowledge, this is the first study to investigate the reproducibility of several biomechanical methods in a heterogeneous cohort.
Resumo:
The infrared (IR) spectroscopic data for a series of eleven heteroleptic bis(phthalocyaninato) rare earth complexes MIII(Pc)[Pc(α-OC5H11)4] (M = Sm–Lu, Y) [H2Pc = unsubstituted phthalocyanine, H2Pc(α-OC5H11)4 = 1,8,15,22-tetrakis(3-pentyloxy)phthalocyanine] have been collected with 2 cm−1 resolution. Raman spectroscopic properties in the range of 500–1800 cm−1 for these double-decker molecules have also been comparatively studied using laser excitation sources emitting at 632.8 and 785 nm. Both the IR and Raman spectra for M(Pc)[Pc(α-OC5H11)4] are more complicated than those of homoleptic bis(phthalocyaninato) rare earth analogues due to the decreased molecular symmetry of these double-decker compounds, namely C4. For this series, the IR Pc√− marker band appears as an intense absorption at 1309–1317 cm−1, attributed to the pyrrole stretching. With laser excitation at 632.8 nm, Raman vibrations derived from isoindole ring and aza stretchings in the range of 1300–1600 cm−1 are selectively intensified. In contrast, when excited with laser radiation of 785 nm, the ring radial vibrations of isoindole moieties and dihedral plane deformations between 500 and 1000 cm−1 for M(Pc)[Pc(α-OC5H11)4] intensify to become the strongest scatterings. Both techniques reveal that the frequencies of pyrrole stretching, isoindole breathing, isoindole stretchings, aza stretchings and coupling of pyrrole and aza stretchings depend on the rare earth ionic size, shifting to higher energy along with the lanthanide contraction due to the increased ring-ring interaction across the series. The assignments of the vibrational bands for these compounds have been made and discussed in relation to other unsubstituted and substituted bis(phthalocyaninato) rare earth analogues, such as M(Pc)2 and M(OOPc)2 [H2OOPc = 2,3,9,10,16,17,23,24-octakis(octyloxy)phthalocyanine].
Resumo:
The infrared (IR) spectroscopic data and Raman spectroscopic properties for a series of 13 “pinwheel-like” homoleptic bis(phthalocyaninato) rare earth complexes M[Pc(α-OC5H11)4]2 [M = Y and Pr–Lu except Pm; H2Pc(α-OC5H11)4 = 1,8,15,22-tetrakis(3-pentyloxy)phthalocyanine] have been collected and comparatively studied. Both the IR and Raman spectra for M[Pc(α-OC5H11)4]2 are more complicated than those of homoleptic bis(phthalocyaninato) rare earth analogues, namely M(Pc)2 and M[Pc(OC8H17)8]2, but resemble (for IR) or are a bit more complicated (for Raman) than those of heteroleptic counterparts M(Pc)[Pc(α-OC5H11)4], revealing the decreased molecular symmetry of these double-decker compounds, namely S8. Except for the obvious splitting of the isoindole breathing band at 1110–1123 cm−1, the IR spectra of M[Pc(α-OC5H11)4]2 are quite similar to those of corresponding M(Pc)[Pc(α-OC5H11)4] and therefore are similarly assigned. With laser excitation at 633 nm, Raman bands derived from isoindole ring and aza stretchings in the range of 1300–1600 cm−1 are selectively intensified. The IR spectra reveal that the frequencies of pyrrole stretching and pyrrole stretching coupled with the symmetrical CH bending of –CH3 groups are sensitive to the rare earth ionic size, while the Raman technique shows that the bands due to the isoindole stretchings and the coupled pyrrole and aza stretchings are similarly affected. Nevertheless, the phthalocyanine monoanion radical Pc′− IR marker band of bis(phthalocyaninato) complexes involving the same rare earth ion is found to shift to lower energy in the order M(Pc)2 > M(Pc)[Pc(α-OC5H11)4] > M[Pc(α-OC5H11)4]2, revealing the weakened π–π interaction between the two phthalocyanine rings in the same order.
Resumo:
Reliable budget/cost estimates for road maintenance and rehabilitation are subjected to uncertainties and variability in road asset condition and characteristics of road users. The CRC CI research project 2003-029-C ‘Maintenance Cost Prediction for Road’ developed a method for assessing variation and reliability in budget/cost estimates for road maintenance and rehabilitation. The method is based on probability-based reliable theory and statistical method. The next stage of the current project is to apply the developed method to predict maintenance/rehabilitation budgets/costs of large networks for strategic investment. The first task is to assess the variability of road data. This report presents initial results of the analysis in assessing the variability of road data. A case study of the analysis for dry non reactive soil is presented to demonstrate the concept in analysing the variability of road data for large road networks. In assessing the variability of road data, large road networks were categorised into categories with common characteristics according to soil and climatic conditions, pavement conditions, pavement types, surface types and annual average daily traffic. The probability distributions, statistical means, and standard deviation values of asset conditions and annual average daily traffic for each type were quantified. The probability distributions and the statistical information obtained in this analysis will be used to asset the variation and reliability in budget/cost estimates in later stage. Generally, we usually used mean values of asset data of each category as input values for investment analysis. The variability of asset data in each category is not taken into account. This analysis method demonstrated that it can be used for practical application taking into account the variability of road data in analysing large road networks for maintenance/rehabilitation investment analysis.
Resumo:
Phase-type distributions represent the time to absorption for a finite state Markov chain in continuous time, generalising the exponential distribution and providing a flexible and useful modelling tool. We present a new reversible jump Markov chain Monte Carlo scheme for performing a fully Bayesian analysis of the popular Coxian subclass of phase-type models; the convenient Coxian representation involves fewer parameters than a more general phase-type model. The key novelty of our approach is that we model covariate dependence in the mean whilst using the Coxian phase-type model as a very general residual distribution. Such incorporation of covariates into the model has not previously been attempted in the Bayesian literature. A further novelty is that we also propose a reversible jump scheme for investigating structural changes to the model brought about by the introduction of Erlang phases. Our approach addresses more questions of inference than previous Bayesian treatments of this model and is automatic in nature. We analyse an example dataset comprising lengths of hospital stays of a sample of patients collected from two Australian hospitals to produce a model for a patient's expected length of stay which incorporates the effects of several covariates. This leads to interesting conclusions about what contributes to length of hospital stay with implications for hospital planning. We compare our results with an alternative classical analysis of these data.
Resumo:
Oberon-2 is an object-oriented language with a class structure based on type extension. The runtime structure of Oberon-2 is described and the low-level mechanism for dynamic type checking explained. It is shown that the superior type-safety of the language, when used for programming styles based on heterogeneous, pointer-linked data structures, has an entirely negligible cost in runtime performance.
Resumo:
The wide range of contributing factors and circumstances surrounding crashes on road curves suggest that no single intervention can prevent these crashes. This paper presents a novel methodology, based on data mining techniques, to identify contributing factors and the relationship between them. It identifies contributing factors that influence the risk of a crash. Incident records, described using free text, from a large insurance company were analysed with rough set theory. Rough set theory was used to discover dependencies among data, and reasons using the vague, uncertain and imprecise information that characterised the insurance dataset. The results show that male drivers, who are between 50 and 59 years old, driving during evening peak hours are involved with a collision, had a lowest crash risk. Drivers between 25 and 29 years old, driving from around midnight to 6 am and in a new car has the highest risk. The analysis of the most significant contributing factors on curves suggests that drivers with driving experience of 25 to 42 years, who are driving a new vehicle have the highest crash cost risk, characterised by the vehicle running off the road and hitting a tree. This research complements existing statistically based tools approach to analyse road crashes. Our data mining approach is supported with proven theory and will allow road safety practitioners to effectively understand the dependencies between contributing factors and the crash type with the view to designing tailored countermeasures.