920 resultados para Data representation
Resumo:
Next Generation Sequencing (NGS) has revolutionised molec- ular biology, allowing routine clinical sequencing. NGS data consists of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans, with some strains exhibiting antibiotic resistance. Here we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from other pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
Road surface skid resistance has been shown to have a strong relationship to road crash risk, however, applying the current method of using investigatory levels to identify crash prone roads is problematic as they may fail in identifying risky roads outside of the norm. The proposed method analyses a complex and formerly impenetrable volume of data from roads and crashes using data mining. This method rapidly identifies roads with elevated crash-rate, potentially due to skid resistance deficit, for investigation. A hypothetical skid resistance/crash risk curve is developed for each road segment, driven by the model deployed in a novel regression tree extrapolation method. The method potentially solves the problem of missing skid resistance values which occurs during network-wide crash analysis, and allows risk assessment of the major proportion of roads without skid resistance values.
Resumo:
Queensland University of Technology (QUT) Library offers a range of resources and services to researchers as part of their research support portfolio. This poster will present key features of two of the data management services offered by research support staff at QUT Library. The first service is QUT Research Data Finder (RDF), a product of the Australian National Data Service (ANDS) funded Metadata Stores project. RDF is a data registry (metadata repository) that aims to publicise datasets that are research outputs arising from completed QUT research projects. The second is a software and code registry, which is currently under development with the sole purpose of improving discovery of source code and software as QUT research outputs. RESEARCH DATA FINDER As an integrated metadata repository, Research Data Finder aligns with institutional sources of truth, such as QUT’s research administration system, ResearchMaster, as well as QUT’s Academic Profiles system to provide high quality data descriptions that increase awareness of, and access to, shareable research data. The repository and its workflows are designed to foster better data management practices, enhance opportunities for collaboration and research, promote cross-disciplinary research and maximise the impact of existing research data sets. SOFTWARE AND CODE REGISTRY The QUT Library software and code registry project stems from concerns amongst researchers with regards to development activities, storage, accessibility, discoverability and impact, sharing, copyright and IP ownership of software and code. As a result, the Library is developing a registry for code and software research outputs, which will use existing Research Data Finder architecture. The underpinning software for both registries is VIVO, open source software developed by Cornell University. The registry will use the Research Data Finder service instance of VIVO and will include a searchable interface, links to code/software locations and metadata feeds to Research Data Australia. Key benefits of the project include:improving the discoverability and reuse of QUT researchers’ code and software amongst QUT and the QUT research community; increasing the profile of QUT research outputs on a national level by providing a metadata feed to Research Data Australia, and; improving the metrics for access and reuse of code and software in the repository.
Resumo:
Children are encountering more and more graphic representations of data in their learning and everyday life. Much of this data occurs in quantitative forms as different forms of measurement are incorporated into the graphics during their construction. In their formal education, children are required to learn to use a range of these quantitative representations in subjects across the school curriculum. Previous research that focuses on the use of information processing and traditional approaches to cognitive psychology concludes that the development of an understanding of such representations of data is a complex process. An alternative approach is to investigate the experiences of children as they interact with graphic representations of quantitative data in their own life-worlds. This paper demonstrates how a phenomenographic approach may be used to reveal the qualitatively different ways in which children in Australian primary and secondary education understand the phenomenon of graphic representations of quantitative data. Seven variations of the children’s understanding were revealed. These have been described interpretively in the article and confirmed through the words of the children. A detailed outcome space demonstrates how these seven variations are structurally related.
Resumo:
Objectives: This study examines the accuracy of Gestational Diabetes Mellitus (GDM) case-ascertainment in routinely collected data. Methods: Retrospective cohort study analysed routinely collected data from all births at Cairns Base Hospital, Australia, from 1 January 2004 to 31 December 2010 in the Cairns Base Hospital Clinical Coding system (CBHCC) and the Queensland Perinatal Data Collection (QPDC). GDM case ascertainment in the National Diabetes Services Scheme (NDSS) and Cairns Diabetes Centre (CDC) data were compared. Results: From 2004 to 2010, the specificity of GDM case-ascertainment in the QPDC was 99%. In 2010, only 2 of 225 additional cases were identified from the CDC and CBHCC, suggesting QPDC sensitivity is also over 99%. In comparison, the sensitivity of the CBHCC data was 80% during 2004–2010. The sensitivity of CDC data was 74% in 2010. During 2010, 223 births were coded as GDM in the QPDC, and the NDSS registered 247 women with GDM from the same postcodes, suggesting reasonable uptake on the NDSS register. However, the proportion of Aboriginal and Torres Strait Islander women was lower than expected. Conclusion: The accuracy of GDM case ascertainment in the QPDC appears high, with lower accuracy in routinely collected hospital and local health service data. This limits capacity of local data for planning and evaluation, and developing structured systems to improve post-pregnancy care, and may underestimate resources required. Implications: Data linkage should be considered to improve accuracy of routinely collected local health service data. The accuracy of the NDSS for Aboriginal and Torres Strait Islander women requires further evaluation.
Resumo:
Operational modal analysis (OMA) is prevalent in modal identifi cation of civil structures. It asks for response measurements of the underlying structure under ambient loads. A valid OMA method requires the excitation be white noise in time and space. Although there are numerous applications of OMA in the literature, few have investigated the statistical distribution of a measurement and the infl uence of such randomness to modal identifi cation. This research has attempted modifi ed kurtosis to evaluate the statistical distribution of raw measurement data. In addition, a windowing strategy employing this index has been proposed to select quality datasets. In order to demonstrate how the data selection strategy works, the ambient vibration measurements of a laboratory bridge model and a real cable-stayed bridge have been respectively considered. The analysis incorporated with frequency domain decomposition (FDD) as the target OMA approach for modal identifi cation. The modal identifi cation results using the data segments with different randomness have been compared. The discrepancy in FDD spectra of the results indicates that, in order to fulfi l the assumption of an OMA method, special care shall be taken in processing a long vibration measurement data. The proposed data selection strategy is easy-to-apply and verifi ed effective in modal analysis.
Resumo:
Currently there are ~3000 known species of Sarcophagidae (Diptera), which are classified into 173 genera in three subfamilies. Almost 25% of sarcophagids belong to the genus Sarcophaga (sensu lato) however little is known about the validity of, and relationships between the ~150 (or more) subgenera of Sarcophaga s.l. In this preliminary study, we evaluated the usefulness of three sources of data for resolving relationships between 35 species from 14 Sarcophaga s.l. subgenera: the mitochondrial COI barcode region, ~800. bp of the nuclear gene CAD, and 110 morphological characters. Bayesian, maximum likelihood (ML) and maximum parsimony (MP) analyses were performed on the combined dataset. Much of the tree was only supported by the Bayesian and ML analyses, with the MP tree poorly resolved. The genus Sarcophaga s.l. was resolved as monophyletic in both the Bayesian and ML analyses and strong support was obtained at the species-level. Notably, the only subgenus consistently resolved as monophyletic was Liopygia. The monophyly of and relationships between the remaining Sarcophaga s.l. subgenera sampled remain questionable. We suggest that future phylogenetic studies on the genus Sarcophaga s.l. use combined datasets for analyses. We also advocate the use of additional data and a range of inference strategies to assist with resolving relationships within Sarcophaga s.l.
Resumo:
Big Data is a rising IT trend similar to cloud computing, social networking or ubiquitous computing. Big Data can offer beneficial scenarios in the e-health arena. However, one of the scenarios can be that Big Data needs to be kept secured for a long period of time in order to gain its benefits such as finding cures for infectious diseases and protecting patient privacy. From this connection, it is beneficial to analyse Big Data to make meaningful information while the data is stored securely. Therefore, the analysis of various database encryption techniques is essential. In this study, we simulated 3 types of technical environments, namely, Plain-text, Microsoft Built-in Encryption, and custom Advanced Encryption Standard, using Bucket Index in Data-as-a-Service. The results showed that custom AES-DaaS has a faster range query response time than MS built-in encryption. Furthermore, while carrying out the scalability test, we acknowledged that there are performance thresholds depending on physical IT resources. Therefore, for the purpose of efficient Big Data management in eHealth it is noteworthy to examine their scalability limits as well even if it is under a cloud computing environment. In addition, when designing an e-health database, both patient privacy and system performance needs to be dealt as top priorities.
Resumo:
The representation of business process models has been a continuing research topic for many years now. However, many process model representations have not developed beyond minimally interactive 2D icon-based representations of directed graphs and networks, with little or no annotation for information overlays. In addition, very few of these representations have undergone a thorough analysis or design process with reference to psychological theories on data and process visualization. This dearth of visualization research, we believe, has led to problems with BPM uptake in some organizations, as the representations can be difficult for stakeholders to understand, and thus remains an open research question for the BPM community. In addition, business analysts and process modeling experts themselves need visual representations that are able to assist with key BPM life cycle tasks in the process of generating optimal solutions. With the rise of desktop computers and commodity mobile devices capable of supporting rich interactive 3D environments, we believe that much of the research performed in computer human interaction, virtual reality, games and interactive entertainment have much potential in areas of BPM; to engage, provide insight, and to promote collaboration amongst analysts and stakeholders alike. We believe this is a timely topic, with research emerging in a number of places around the globe, relevant to this workshop. This is the second TAProViz workshop being run at BPM. The intention this year is to consolidate on the results of last year's successful workshop by further developing this important topic, identifying the key research topics of interest to the BPM visualization community.
Resumo:
This paper describes the work being conducted in the baseline rail level crossing project, supported by the Australian rail industry and the Cooperative Research Centre for Rail Innovation. The paper discusses the limitations of near-miss data for analysis obtained using current level crossing occurrence reporting practices. The project is addressing these limitations through the development of a data collection and analysis system with an underlying level crossing accident causation model. An overview of the methodology and improved data recording process are described. The paper concludes with a brief discussion of benefits this project is expected to provide the Australian rail industry.
Resumo:
We employed a Hidden-Markov-Model (HMM) algorithm in loss of heterozygosity (LOH) analysis of high-density single nucleotide polymorphism (SNP) array data from Non-Hodgkin’s lymphoma (NHL) entities, follicular lymphoma (FL), and diffuse large B-cell lymphoma (DLBCL). This revealed a high frequency of LOH over the chromosomal region 11p11.2, containing the gene encoding the protein tyrosine phosphatase receptor type J (PTPRJ). Although PTPRJ regulates components of key survival pathways in B-cells (i.e., BCR, MAPK, and PI3K signaling), its role in B-cell development is poorly understood. LOH of PTPRJ has been described in several types of cancer but not in any hematological malignancy. Interestingly, FL cases with LOH exhibited down-regulation of PTPRJ, in contrast no significant variation of expression was shown in DLBCLs. In addition, sequence screening in Exons 5 and 13 of PTPRJ identified the G973A (rs2270993), T1054C (rs2270992), A1182C (rs1566734), and G2971C (rs4752904) coding SNPs (cSNPs). The A1182 allele was significantly more frequent in FLs and in NHLs with LOH. Significant over-representation of the C1054 (rs2270992) and the C2971 (rs4752904) alleles were also observed in LOH cases. A haplotype analysis also revealed a significant lower frequency of haplotype GTCG in NHL cases, but it was only detected in cases with retention. Conversely, haplotype GCAC was over-representated in cases with LOH. Altogether, these results indicate that the inactivation of PTPRJ may be a common lymphomagenic mechanism in these NHL subtypes and that haplotypes in PTPRJ gene may play a role in susceptibility to NHL, by affecting activation of PTPRJ in these B-cell lymphomas.
Resumo:
This research aims to develop a reliable density estimation method for signalised arterials based on cumulative counts from upstream and downstream detectors. In order to overcome counting errors associated with urban arterials with mid-link sinks and sources, CUmulative plots and Probe Integration for Travel timE estimation (CUPRITE) is employed for density estimation. The method, by utilizing probe vehicles’ samples, reduces or cancels the counting inconsistencies when vehicles’ conservation is not satisfied within a section. The method is tested in a controlled environment, and the authors demonstrate the effectiveness of CUPRITE for density estimation in a signalised section, and discuss issues associated with the method.
Resumo:
Background: Multiple sclerosis (MS) is the most common cause of chronic neurologic disability beginning in early to middle adult life. Results from recent genome-wide association studies (GWAS) have substantially lengthened the list of disease loci and provide convincing evidence supporting a multifactorial and polygenic model of inheritance. Nevertheless, the knowledge of MS genetics remains incomplete, with many risk alleles still to be revealed. Methods: We used a discovery GWAS dataset (8,844 samples, 2,124 cases and 6,720 controls) and a multi-step logistic regression protocol to identify novel genetic associations. The emerging genetic profile included 350 independent markers and was used to calculate and estimate the cumulative genetic risk in an independent validation dataset (3,606 samples). Analysis of covariance (ANCOVA) was implemented to compare clinical characteristics of individuals with various degrees of genetic risk. Gene ontology and pathway enrichment analysis was done using the DAVID functional annotation tool, the GO Tree Machine, and the Pathway-Express profiling tool. Results: In the discovery dataset, the median cumulative genetic risk (P-Hat) was 0.903 and 0.007 in the case and control groups, respectively, together with 79.9% classification sensitivity and 95.8% specificity. The identified profile shows a significant enrichment of genes involved in the immune response, cell adhesion, cell communication/ signaling, nervous system development, and neuronal signaling, including ionotropic glutamate receptors, which have been implicated in the pathological mechanism driving neurodegeneration. In the validation dataset, the median cumulative genetic risk was 0.59 and 0.32 in the case and control groups, respectively, with classification sensitivity 62.3% and specificity 75.9%. No differences in disease progression or T2-lesion volumes were observed among four levels of predicted genetic risk groups (high, medium, low, misclassified). On the other hand, a significant difference (F = 2.75, P = 0.04) was detected for age of disease onset between the affected misclassified as controls (mean = 36 years) and the other three groups (high, 33.5 years; medium, 33.4 years; low, 33.1 years). Conclusions: The results are consistent with the polygenic model of inheritance. The cumulative genetic risk established using currently available genome-wide association data provides important insights into disease heterogeneity and completeness of current knowledge in MS genetics.
Resumo:
We present a method for optical encryption of information, based on the time-dependent dynamics of writing and erasure of refractive index changes in a bulk lithium niobate medium. Information is written into the photorefractive crystal with a spatially amplitude modulated laser beam which when overexposed significantly degrades the stored data making it unrecognizable. We show that the degradation can be reversed and that a one-to-one relationship exists between the degradation and recovery rates. It is shown that this simple relationship can be used to determine the erasure time required for decrypting the scrambled index patterns. In addition, this method could be used as a straightforward general technique for determining characteristic writing and erasure rates in photorefractive media.
Resumo:
During the current (1995-present) eruptive phase of the Soufrière Hills volcano on Montserrat, voluminous pyroclastic flows entered the sea off the eastern flank of the island, resulting in the deposition of well-defined submarine pyroclastic lobes. Previously reported bathymetric surveys documented the sequential construction of these deposits, but could not image their internal structure, the morphology or extent of their base, or interaction with the underlying sediments. We show, by combining these bathymetric data with new high-resolution three dimensional (3D) seismic data, that the sequence of previously detected pyroclastic deposits from different phases of the ongoing eruptive activity is still well preserved. A detailed interpretation of the 3D seismic data reveals the absence of significant (> 3. m) basal erosion in the distal extent of submarine pyroclastic deposits. We also identify a previously unrecognized seismic unit directly beneath the stack of recent lobes. We propose three hypotheses for the origin of this seismic unit, but prefer an interpretation that the deposit is the result of the subaerial flank collapse that formed the English's Crater scarp on the Soufrière Hills volcano. The 1995-recent volcanic activity on Montserrat accounts for a significant portion of the sediments on the southeast slope of Montserrat, in places forming deposits that are more than 60. m thick, which implies that the potential for pyroclastic flows to build volcanic island edifices is significant.