895 resultados para Data alignment
Resumo:
Background The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. Results RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene. Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. Conclusions This transcriptomic dataset is a useful resource for molecular genetic studies of the koala, for evolutionary genetic studies of marsupials, for validation and annotation of the koala genome sequence, and for investigation of koala retrovirus. Annotated transcripts can be browsed and queried at http://koalagenome.org
Resumo:
Problem addressed Wrist-worn accelerometers are associated with greater compliance. However, validated algorithms for predicting activity type from wrist-worn accelerometer data are lacking. This study compared the activity recognition rates of an activity classifier trained on acceleration signal collected on the wrist and hip. Methodology 52 children and adolescents (mean age 13.7 +/- 3.1 year) completed 12 activity trials that were categorized into 7 activity classes: lying down, sitting, standing, walking, running, basketball, and dancing. During each trial, participants wore an ActiGraph GT3X+ tri-axial accelerometer on the right hip and the non-dominant wrist. Features were extracted from 10-s windows and inputted into a regularized logistic regression model using R (Glmnet + L1). Results Classification accuracy for the hip and wrist was 91.0% +/- 3.1% and 88.4% +/- 3.0%, respectively. The hip model exhibited excellent classification accuracy for sitting (91.3%), standing (95.8%), walking (95.8%), and running (96.8%); acceptable classification accuracy for lying down (88.3%) and basketball (81.9%); and modest accuracy for dance (64.1%). The wrist model exhibited excellent classification accuracy for sitting (93.0%), standing (91.7%), and walking (95.8%); acceptable classification accuracy for basketball (86.0%); and modest accuracy for running (78.8%), lying down (74.6%) and dance (69.4%). Potential Impact Both the hip and wrist algorithms achieved acceptable classification accuracy, allowing researchers to use either placement for activity recognition.
Resumo:
Rapid recursive estimation of hidden Markov Model (HMM) parameters is important in applications that place an emphasis on the early availability of reasonable estimates (e.g. for change detection) rather than the provision of longer-term asymptotic properties (such as convergence, convergence rate, and consistency). In the context of vision- based aircraft (image-plane) heading estimation, this paper suggests and evaluates the short-data estimation properties of 3 recursive HMM parameter estimation techniques (a recursive maximum likelihood estimator, an online EM HMM estimator, and a relative entropy based estimator). On both simulated and real data, our studies illustrate the feasibility of rapid recursive heading estimation, but also demonstrate the need for careful step-size design of HMM recursive estimation techniques when these techniques are intended for use in applications where short-data behaviour is paramount.
Resumo:
The upstream oil & gas industry has been contending with massive data sets and monolithic files for many years, but “Big Data”—that is, the ability to apply more sophisticated types of analytical tools to information in a way that extracts new insights or creates new forms of value—is a relatively new concept that has the potential to significantly re-shape the industry. Despite the impressive amount of value that is being realized by Big Data technologies in other parts of the marketplace, however, much of the data collected within the oil & gas sector tends to be discarded, ignored, or analyzed in a very cursory way. This paper examines existing data management practices in the upstream oil & gas industry, and compares them to practices and philosophies that have emerged in organizations that are leading the Big Data revolution. The comparison shows that, in companies that are leading the Big Data revolution, data is regarded as a valuable asset. The presented evidence also shows, however, that this is usually not true within the oil & gas industry insofar as data is frequently regarded there as descriptive information about a physical asset rather than something that is valuable in and of itself. The paper then discusses how upstream oil & gas companies could potentially extract more value from data, and concludes with a series of specific technical and management-related recommendations to this end.
Resumo:
Heterogeneous health data is a critical issue when managing health information for quality decision making processes. In this paper we examine the efficient aggregation of lifestyle information through a data warehousing architecture lens. We present a proof of concept for a clinical data warehouse architecture that enables evidence based decision making processes by integrating and organising disparate data silos in support of healthcare services improvement paradigms.
Resumo:
Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.
Resumo:
Interpolation techniques for spatial data have been applied frequently in various fields of geosciences. Although most conventional interpolation methods assume that it is sufficient to use first- and second-order statistics to characterize random fields, researchers have now realized that these methods cannot always provide reliable interpolation results, since geological and environmental phenomena tend to be very complex, presenting non-Gaussian distribution and/or non-linear inter-variable relationship. This paper proposes a new approach to the interpolation of spatial data, which can be applied with great flexibility. Suitable cross-variable higher-order spatial statistics are developed to measure the spatial relationship between the random variable at an unsampled location and those in its neighbourhood. Given the computed cross-variable higher-order spatial statistics, the conditional probability density function (CPDF) is approximated via polynomial expansions, which is then utilized to determine the interpolated value at the unsampled location as an expectation. In addition, the uncertainty associated with the interpolation is quantified by constructing prediction intervals of interpolated values. The proposed method is applied to a mineral deposit dataset, and the results demonstrate that it outperforms kriging methods in uncertainty quantification. The introduction of the cross-variable higher-order spatial statistics noticeably improves the quality of the interpolation since it enriches the information that can be extracted from the observed data, and this benefit is substantial when working with data that are sparse or have non-trivial dependence structures.
Resumo:
The Echology: Making Sense of Data initiative seeks to break new ground in arts practice by asking artists to innovate with respect to a) the possible forms of data representation in public art and b) the artist's role in engaging publics on environmental sustainability in new urban developments. Initiated by ANAT and Carbon Arts in 2011, Echology has seen three artists selected by National competition in 2012 for Lend Lease sites across Australia. In 2013 commissioning of one of these works, the Mussel Choir by Natalie Jeremijenko, began in Melbourne's Victoria Harbour development. This emerging practice of data - driven and environmentally engaged public artwork presents multiple challenges to established systems of public arts production and management, at the same time as offering up new avenues for artists to forge new modes of collaboration. The experience of Echology and in particular, the Mussel Choir is examined here to reveal opportunities for expansion of this practice through identification of the factors that lead to a resilient 'ecology of part nership' between stakeholders that include science and technology researchers, education providers, city administrators, and urban developers.
Resumo:
Discovering the means to prevent and cure schizophrenia is a vision that motivates many scientists. But in order to achieve this goal, we need to understand its neurobiological basis. The emergent metadiscipline of cognitive neuroscience fields an impressive array of tools that can be marshaled towards achieving this goal, including powerful new methods of imaging the brain (both structural and functional) as well as assessments of perceptual and cognitive capacities based on psychophysical procedures, experimental tasks and models developed by cognitive science. We believe that the integration of data from this array of tools offers the greatest possibilities and potential for advancing understanding of the neural basis of not only normal cognition but also the cognitive impairments that are fundamental to schizophrenia. Since sufficient expertise in the application of these tools and methods rarely reside in a single individual, or even a single laboratory, collaboration is a key element in this endeavor. Here, we review some of the products of our integrative efforts in collaboration with our colleagues on the East Coast of Australia and Pacific Rim. This research focuses on the neural basis of executive function deficits and impairments in early auditory processing in patients using various combinations of performance indices (from perceptual and cognitive paradigms), ERPs, fMRI and sMRI. In each case, integration of two or more sources of information provides more information than any one source alone by revealing new insights into structure-function relationships. Furthermore, the addition of other imaging methodologies (such as DTI) and approaches (such as computational models of cognition) offers new horizons in human brain imaging research and in understanding human behavior.
Resumo:
Although there are many potential new insights to be gained through advancing research on the clients of male sex workers, significant social, ethical and methodological challenges to accessing this population exist. This research project case explores our attempts to recruit a population that does not typically form a cohesive or coherent 'community' and often avoids self-identifying to mitigate the stigma attached to buying sex. We used an arms-length recruitment campaign that focussed on directing potential participants to our study website, which could in turn lead them to participate in an anonymous telephone interview. Barriers to reaching male sex-work clients, however, demanded the evolution of our recruitment strategy. New technologies are part of the solution to accessing a hard-to-reach population, but they only work if researchers engage responsively. We also show how we conducted an in-depth interview with a client and discuss the value of using secondary data.
Resumo:
A tag-based item recommendation method generates an ordered list of items, likely interesting to a particular user, using the users past tagging behaviour. However, the users tagging behaviour varies in different tagging systems. A potential problem in generating quality recommendation is how to build user profiles, that interprets user behaviour to be effectively used, in recommendation models. Generally, the recommendation methods are made to work with specific types of user profiles, and may not work well with different datasets. In this paper, we investigate several tagging data interpretation and representation schemes that can lead to building an effective user profile. We discuss the various benefits a scheme brings to a recommendation method by highlighting the representative features of user tagging behaviours on a specific dataset. Empirical analysis shows that each interpretation scheme forms a distinct data representation which eventually affects the recommendation result. Results on various datasets show that an interpretation scheme should be selected based on the dominant usage in the tagging data (i.e. either higher amount of tags or higher amount of items present). The usage represents the characteristic of user tagging behaviour in the system. The results also demonstrate how the scheme is able to address the cold-start user problem.
Resumo:
Bactrocera papayae Drew & Hancock, Bactrocera philippinensis Drew & Hancock, Bactrocera carambolae Drew & Hancock, and Bactrocera invadens Drew, Tsuruta & White are four horticultural pest tephritid fruit fly species that are highly similar, morphologically and genetically, to the destructive pest, the Oriental fruit fly, Bactrocera dorsalis (Hendel) (Diptera: Tephritidae). This similarity has rendered the discovery of reliable diagnostic characters problematic, which, in view of the economic importance of these taxa and the international trade implications, has resulted in ongoing difficulties for many areas of plant protection and food security. Consequently, a major international collaborative and integrated multidisciplinary research effort was initiated in 2009 to build upon existing literature with the specific aim of resolving biological species limits among B. papayae, B. philippinensis, B. carambolae, B. invadens and B. dorsalis to overcome constraints to pest management and international trade. Bactrocera philippinensis has recently been synonymized with B. papayae as a result of this initiative and this review corroborates that finding; however, the other names remain in use. While consistent characters have been found to reliably distinguish B. carambolae from B. dorsalis, B. invadens and B. papayae, no such characters have been found to differentiate the latter three putative species. We conclude that B. carambolae is a valid species and that the remaining taxa, B. dorsalis, B. invadens and B. papayae, represent the same species. Thus, we consider B. dorsalis (Hendel) as the senior synonym of B. papayae Drew and Hancock syn.n. and B. invadens Drew, Tsuruta & White syn.n. A redescription of B. dorsalis is provided. Given the agricultural importance of B. dorsalis, this taxonomic decision will have significant global plant biosecurity implications, affecting pest management, quarantine, international trade, postharvest treatment and basic research. Throughout the paper, we emphasize the value of independent and multidisciplinary tools in delimiting species, particularly in complicated cases involving morphologically cryptic taxa.
Resumo:
This thesis presents a novel program parallelization technique incorporating with dynamic and static scheduling. It utilizes a problem specific pattern developed from the prior knowledge of the targeted problem abstraction. Suitable for solving complex parallelization problems such as data intensive all-to-all comparison constrained by memory, the technique delivers more robust and faster task scheduling compared to the state-of-the art techniques. Good performance is achieved from the technique in data intensive bioinformatics applications.
Resumo:
This pilot project investigated the existing practices and processes of Proficient, Highly Accomplished and Lead teachers in the interpretation, analysis and implementation of National Assessment Program – Literacy and Numeracy (NAPLAN) data. A qualitative case study approach was the chosen methodology, with nine teachers across a variety of school sectors interviewed. Themes and sub-themes were identified from the participants’ interview responses revealing the ways in which Queensland teachers work with NAPLAN data. The data illuminated that generally individual schools and teachers adopted their own ways of working with data, with approaches ranging from individual/ad hoc, to hierarchical or a whole school approach. Findings also revealed that data are the responsibility of various persons from within the school hierarchy; some working with the data electronically whilst others rely on manual manipulation. Manipulation of data is used for various purposes including tracking performance, value adding and targeting programmes for specific groups of students, for example the gifted and talented. Whilst all participants had knowledge of intervention programmes and how practice could be modified, there were large inconsistencies in knowledge and skills across schools. Some see the use of data as a mechanism for accountability, whilst others mention data with regards to changing the school culture and identifying best practice. Overall, the findings showed inconsistencies in approach to focus area 5.4. Recommendations therefore include a more national approach to the use of educational data.
Resumo:
In this chapter, we draw out the relevant themes from a range of critical scholarship from the small body of digital media and software studies work that has focused on the politics of Twitter data and the sociotechnical means by which access is regulated. We highlight in particular the contested relationships between social media research (in both academic and non-academic contexts) and the data wholesale, retail, and analytics industries that feed on them. In the second major section of the chapter we discuss in detail the pragmatic edge of these politics in terms of what kinds of scientific research is and is not possible in the current political economy of Twitter data access. Finally, at the end of the chapter we return to the much broader implications of these issues for the politics of knowledge, demonstrating how the apparently microscopic level of how the Twitter API mediates access to Twitter data actually inscribes and influences the macro level of the global political economy of science itself, through re-inscribing institutional and traditional disciplinary privilege We conclude with some speculations about future developments in data rights and data philanthropy that may at least mitigate some of these negative impacts.