937 resultados para DATA INTEGRATION


Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. METHODOLOGY/PRINCIPAL FINDINGS: We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. CONCLUSIONS/SIGNIFICANCE: The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Building on the planning efforts of the RCN4GSC project, a workshop was convened in San Diego to bring together experts from genomics and metagenomics, biodiversity, ecology, and bioinformatics with the charge to identify potential for positive interactions and progress, especially building on successes at establishing data standards by the GSC and by the biodiversity and ecological communities. Until recently, the contribution of microbial life to the biomass and biodiversity of the biosphere was largely overlooked (because it was resistant to systematic study). Now, emerging genomic and metagenomic tools are making investigation possible. Initial research findings suggest that major advances are in the offing. Although different research communities share some overlapping concepts and traditions, they differ significantly in sampling approaches, vocabularies and workflows. Likewise, their definitions of 'fitness for use' for data differ significantly, as this concept stems from the specific research questions of most importance in the different fields. Nevertheless, there is little doubt that there is much to be gained from greater coordination and integration. As a first step toward interoperability of the information systems used by the different communities, participants agreed to conduct a case study on two of the leading data standards from the two formerly disparate fields: (a) GSC's standard checklists for genomics and metagenomics and (b) TDWG's Darwin Core standard, used primarily in taxonomy and systematic biology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In chimpanzees, most females disperse from the community in which they were born to reproduce in a new community, thereby eliminating the risk of inbreeding with close kin. However, across sites, some females breed in their natal community, raising questions about the flexibility of dispersal, the costs and benefits of different strategies and the mitigation of costs associated with dispersal and integration. In this dissertation I address these questions by combining long-term behavioral data and recent field observations on maturing and young adult females in Gombe National Park with an experimental manipulation of relationship formation in captive apes in the Congo.

To assess the risk of inbreeding for females who do and do not disperse, 129 chimpanzees were genotyped and relatedness between each dyad was calculated. Natal females were more closely related to adult community males than were immigrant females. By examining the parentage of 58 surviving offspring, I found that natal females were not more related to the sires of their offspring than were immigrant females, despite three instances of close inbreeding. The sires of all offspring were less related to the mothers than non-sires regardless of the mother’s residence status. These results suggest that chimpanzees are capable of detecting relatedness and that, even when remaining natal, females can largely avoid, though not eliminate, inbreeding.

Next, I examined whether dispersal was associated with energetic, social, physiological and/or reproductive costs by comparing immigrant (n=10) and natal (n=9) females of similar age using 2358 hours of observational data. Natal and immigrant females did not differ in any energetic metric. Immigrant females received aggression from resident females more frequently than natal females. Immigrants spent less time in social grooming and more time self-grooming than natal females. Immigrant females primarily associated with resident males, had more social partners and lacked close social allies. There was no difference in levels of fecal glucocorticoid metabolites in immigrant and natal females. Immigrant females gave birth 2.5 years later than natal females, though the survival of their first offspring did not differ. These results indicate that immigrant females in Gombe National Park do not face energetic deficits upon transfer, but they do enter a hostile social environment and have a delayed first birth.

Next, I examined whether chimpanzees use condition- and phenotype-dependent cues in making dispersal decisions. I examined the effect of social and environmental conditions present at the time females of known age matured (n=25) on the females’ dispersal decisions. Females were more likely to disperse if they had more male maternal relatives and thus, a high risk of inbreeding. Females with a high ranking mother and multiple maternal female kin tended to disperse less frequently, suggesting that a strong female kin network provides benefits to the maturing daughter. Females were also somewhat less likely to disperse when fewer unrelated males were present in the group. Habitat quality and intrasexual competition did not affect dispersal decisions. Using a larger sample of 62 females observed as adults in Gombe, I also detected an effect of phenotypic differences in personality on the female’s dispersal decisions; extraverted, agreeable and open females were less likely to disperse.

Natural observations show that apes use grooming and play as social currency, but no experimental manipulations have been carried out to measure the effects of these behaviors on relationship formation, an essential component of integration. Thirty chimpanzees and 25 bonobos were given a choice between an unfamiliar human who had recently groomed or played with them over one who did not. Both species showed a preference for the human that had interacted with them, though the effect was driven by males. These results support the idea that grooming and play act as social currency in great apes that can rapidly shape social relationships between unfamiliar individuals. Further investigation is needed to elucidate the use of social currency in female apes.

I conclude that dispersal in female chimpanzees is flexible and the balance of costs and benefits varies for each individual. Females likely take into account social cues present at maturity and their own phenotype in choosing a settlement path and are especially sensitive to the presence of maternal male kin. The primary cost associated with philopatry is inbreeding risk and the primary cost associated with dispersal is delay in the age at first birth, presumably resulting from intense social competition. Finally, apes may strategically make use of affiliative behavior in pursuing particular relationships, something that should be useful in the integration process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A cross-domain workflow application may be constructed using a standard reference model such as the one by the Workflow Management Coalition (WfMC) [7] but the requirements for this type of application are inherently different from one organization to another. The existing models and systems built around them meet some but not all the requirements from all the organizations involved in a collaborative process. Furthermore the requirements change over time. This makes the applications difficult to develop and distribute. Service Oriented Architecture (SOA) based approaches such as the BPET (Business Process Execution Language) intend to provide a solution but fail to address the problems sufficiently, especially in the situations where the expectations and level of skills of the users (e.g. the participants of the processes) in different organisations are likely to be different. In this paper, we discuss a design pattern that provides a novel approach towards a solution. In the solution, business users can design the applications at a high level of abstraction: the use cases and user interactions; the designs are documented and used, together with the data and events captured later that represents the user interactions with the systems, to feed an intermediate component local to the users -the IFM (InterFace Mapper) -which bridges the gaps between the users and the systems. We discuss the main issues faced in the design and prototyping. The approach alleviates the need for re-programming with the APIs to any back-end service thus easing the development and distribution of the applications

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Remote sensing airborne hyperspectral data are routinely used for applications including algorithm development for satellite sensors, environmental monitoring and atmospheric studies. Single flight lines of airborne hyperspectral data are often in the region of tens of gigabytes in size. This means that a single aircraft can collect terabytes of remotely sensed hyperspectral data during a single year. Before these data can be used for scientific analyses, they need to be radiometrically calibrated, synchronised with the aircraft's position and attitude and then geocorrected. To enable efficient processing of these large datasets the UK Airborne Research and Survey Facility has recently developed a software suite, the Airborne Processing Library (APL), for processing airborne hyperspectral data acquired from the Specim AISA Eagle and Hawk instruments. The APL toolbox allows users to radiometrically calibrate, geocorrect, reproject and resample airborne data. Each stage of the toolbox outputs data in the common Band Interleaved Lines (BILs) format, which allows its integration with other standard remote sensing software packages. APL was developed to be user-friendly and suitable for use on a workstation PC as well as for the automated processing of the facility; to this end APL can be used under both Windows and Linux environments on a single desktop machine or through a Grid engine. A graphical user interface also exists. In this paper we describe the Airborne Processing Library software, its algorithms and approach. We present example results from using APL with an AISA Eagle sensor and we assess its spatial accuracy using data from multiple flight lines collected during a campaign in 2008 together with in situ surveyed ground control points.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The detection of dense harmful algal blooms (HABs) by satellite remote sensing is usually based on analysis of chlorophyll-a as a proxy. However, this approach does not provide information about the potential harm of bloom, nor can it identify the dominant species. The developed HAB risk classification method employs a fully automatic data-driven approach to identify key characteristics of water leaving radiances and derived quantities, and to classify pixels into “harmful”, “non-harmful” and “no bloom” categories using Linear Discriminant Analysis (LDA). Discrimination accuracy is increased through the use of spectral ratios of water leaving radiances, absorption and backscattering. To reduce the false alarm rate the data that cannot be reliably classified are automatically labelled as “unknown”. This method can be trained on different HAB species or extended to new sensors and then applied to generate independent HAB risk maps; these can be fused with other sensors to fill gaps or improve spatial or temporal resolution. The HAB discrimination technique has obtained accurate results on MODIS and MERIS data, correctly identifying 89% of Phaeocystis globosa HABs in the southern North Sea and 88% of Karenia mikimotoi blooms in the Western English Channel. A linear transformation of the ocean colour discriminants is used to estimate harmful cell counts, demonstrating greater accuracy than if based on chlorophyll-a; this will facilitate its integration into a HAB early warning system operating in the southern North Sea.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Remote sensing airborne hyperspectral data are routinely used for applications including algorithm development for satellite sensors, environmental monitoring and atmospheric studies. Single flight lines of airborne hyperspectral data are often in the region of tens of gigabytes in size. This means that a single aircraft can collect terabytes of remotely sensed hyperspectral data during a single year. Before these data can be used for scientific analyses, they need to be radiometrically calibrated, synchronised with the aircraft's position and attitude and then geocorrected. To enable efficient processing of these large datasets the UK Airborne Research and Survey Facility has recently developed a software suite, the Airborne Processing Library (APL), for processing airborne hyperspectral data acquired from the Specim AISA Eagle and Hawk instruments. The APL toolbox allows users to radiometrically calibrate, geocorrect, reproject and resample airborne data. Each stage of the toolbox outputs data in the common Band Interleaved Lines (BILs) format, which allows its integration with other standard remote sensing software packages. APL was developed to be user-friendly and suitable for use on a workstation PC as well as for the automated processing of the facility; to this end APL can be used under both Windows and Linux environments on a single desktop machine or through a Grid engine. A graphical user interface also exists. In this paper we describe the Airborne Processing Library software, its algorithms and approach. We present example results from using APL with an AISA Eagle sensor and we assess its spatial accuracy using data from multiple flight lines collected during a campaign in 2008 together with in situ surveyed ground control points.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the first attempts to develop a formal model of depth cue integration is to be found in Maloney and Landy's (1989) "human depth combination rule". They advocate that the combination of depth cues by the visual sysetem is best described by a weighted linear model. The present experiments tested whether the linear combination rule applies to the integration of texture and shading. As would be predicted by a linear combination rule, the weight assigned to the shading cue did vary as a function of its curvature value. However, the weight assigned to the texture cue varied systematically as a function of the curvature value of both cues. Here we descrive a non-linear model which provides a better fit to the data. Redescribing the stimuli in terms of depth rather than curvature reduced the goodness of fit for all models tested. These results support the hypothesis that the locus of cue integration is a curvature map, rather than a depth map. We conclude that the linear comination rule does not generalize to the integration of shading and texture, and that for these cues it is likely that integration occurs after the recovery of surface curvature.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To date, the processing of wildlife location data has relied on a diversity of software and file formats. Data management and the following spatial and statistical analyses were undertaken in multiple steps, involving many time-consuming importing/exporting phases. Recent technological advancements in tracking systems have made large, continuous, high-frequency datasets of wildlife behavioral data available, such as those derived from the global positioning system (GPS) and other animal-attached sensor devices. These data can be further complemented by a wide range of other information about the animals’ environment. Management of these large and diverse datasets for modelling animal behaviour and ecology can prove challenging, slowing down analysis and increasing the probability of mistakes in data handling. We address these issues by critically evaluating the requirements for good management of GPS data for wildlife biology. We highlight that dedicated data management tools and expertise are needed. We explore current research in wildlife data management. We suggest a general direction of development, based on a modular software architecture with a spatial database at its core, where interoperability, data model design and integration with remote-sensing data sources play an important role in successful GPS data handling.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multicore computational accelerators such as GPUs are now commodity components for highperformance computing at scale. While such accelerators have been studied in some detail as stand-alone computational engines, their integration in large-scale distributed systems raises new challenges and trade-offs. In this paper, we present an exploration of resource management alternatives for building asymmetric accelerator-based distributed systems. We present these alternatives in the context of a capabilities-aware framework for data-intensive computing, which uses an enhanced implementation of the MapReduce programming model for accelerator-based clusters, compared to the state of the art. The framework can transparently utilize heterogeneous accelerators for deriving high performance with low programming effort. Our work is the first to compare heterogeneous types of accelerators, GPUs and a Cell processors, in the same environment and the first to explore the trade-offs between compute-efficient and control-efficient accelerators on data-intensive systems. Our investigation shows that our framework scales well with the number of different compute nodes. Furthermore, it runs simultaneously on two different types of accelerators, successfully adapts to the resource capabilities, and performs 26.9% better on average than a static execution approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyses of voting in European Union referendums typically distinguish between ‘second-order’ effects and the impact of substantive ‘issues’. In order to explain change in referendum outcome, two types of substantive issues are distinguished in this article. Focusing on Irish voting in the Lisbon Treaty referendums and using data from post-referendum surveys, it is found that perceptions of treaty implications outperform underlying attitudes to EU integration in predicting vote choice at both referendums, and perceptions of treaty implications are strong predictors of vote change between the referendums. The findings have broadly positive implications for normative assessments of the usefulness of direct democracy as a tool for legitimising regional integration advance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We show how the architecture of two recently reported bit-level systolic array circuits - a single-bit coefficient correlator and a multibit convolver - may be modified to incorporate unidirectional data flow. This feature has advantages in terms of chip cascadability, fault tolerance and possible wafer-scale integration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last two decades there has been ongoing debate about the impact of environmental practices on operational performance. In recent years, studies have started to move beyond assessing the direct impact of environmental management on different dimensions of performance to consider factors that might moderate or mediate this relationship. This study considers the extent to which environmental integration and environmental capabilities moderate the relationship between pollution prevention and environmental performance outcomes. The mediating influence of environmental performance on the relationship between pollution prevention and cost and flexibility performance is also considered. Data were collected from a sample of UK food manufacturers and analysed using multiple regression analysis. The findings indicate the existence of some moderated and mediated relationships suggesting that there is more to improving performance than implementing environmental practices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bioenergy is a key component of the European Union long term energy strategy across all sectors, with a target contribution of up to 14% of the energy mix by 2020. It is estimated that there is the potential for 1TWh of primary energy from biogas per million persons in Europe, derived from agricultural by-products and waste. With an agricultural sector that accounts for 75% of land area and a large number of advanced engineering firms, Northern Ireland is a region with considerable potential for an integrated biogas industry. Northern Ireland is also heavily reliant on imported fossil fuels. Despite this, the industry is underdeveloped and there is a need for a collaborative approach from research, business and policy-makers across all sectors to optimise Northern Ireland’s abundant natural resources. ‘Developing Opportunities in Bio-Energy’ (i.e. Do Bioenergy) is a recently completed project that involved both academic and specialist industrial partners. The aim was to develop a biogas research action plan for 2020 to define priorities for intersectoral regional development, co-operation and knowledge transfer in the field of production and use of biogas. Consultations were held with regional stakeholders and working groups were established to compile supporting data, decide key objectives and implementation activities. Within the context of this study it was found that biogas from feedstocks including grass, agricultural slurry, household and industrial waste have the potential to contribute from 2.5% to 11% of Northern Ireland’s total energy consumption. The economics of on-farm production were assessed, along with potential markets and alternative uses for biogas in sectors such as transport, heat and electricity. Arising from this baseline data, a Do Bioenergy was developed. The plan sets out a strategic research agenda, and details priorities and targets for 2020. The challenge for Northern Ireland is how best to utilise the biogas – as electricity, heat or vehicle fuel and in what proportions. The research areas identified were: development of small scale solutions for biogas production and use; solutions for improved nutrient management; knowledge supporting and developing the integration of biogas into the rural economy; and future crops and bio-based products. The human resources and costs for the implementation were estimated as 80 person-years and £25 million respectively. It is also clear that the development of a robust bio-gas sector requires some reform of the regulatory regime, including a planning policy framework and a need to address social acceptance issues. The Action Plan was developed from a regional perspective but the results may be applicable to other regions in Europe and elsewhere. This paper presents the methodology, results and analysis, and discussion and key findings of the Do Bioenergy report for Northern Ireland.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Understanding the heterogeneous genotypes and phenotypes of prostate cancer is fundamental to improving the way we treat this disease. As yet, there are no validated descriptions of prostate cancer subgroups derived from integrated genomics linked with clinical outcome.

METHODS: In a study of 482 tumour, benign and germline samples from 259 men with primary prostate cancer, we used integrative analysis of copy number alterations (CNA) and array transcriptomics to identify genomic loci that affect expression levels of mRNA in an expression quantitative trait loci (eQTL) approach, to stratify patients into subgroups that we then associated with future clinical behaviour, and compared with either CNA or transcriptomics alone.

FINDINGS: We identified five separate patient subgroups with distinct genomic alterations and expression profiles based on 100 discriminating genes in our separate discovery and validation sets of 125 and 103 men. These subgroups were able to consistently predict biochemical relapse (p = 0.0017 and p = 0.016 respectively) and were further validated in a third cohort with long-term follow-up (p = 0.027). We show the relative contributions of gene expression and copy number data on phenotype, and demonstrate the improved power gained from integrative analyses. We confirm alterations in six genes previously associated with prostate cancer (MAP3K7, MELK, RCBTB2, ELAC2, TPD52, ZBTB4), and also identify 94 genes not previously linked to prostate cancer progression that would not have been detected using either transcript or copy number data alone. We confirm a number of previously published molecular changes associated with high risk disease, including MYC amplification, and NKX3-1, RB1 and PTEN deletions, as well as over-expression of PCA3 and AMACR, and loss of MSMB in tumour tissue. A subset of the 100 genes outperforms established clinical predictors of poor prognosis (PSA, Gleason score), as well as previously published gene signatures (p = 0.0001). We further show how our molecular profiles can be used for the early detection of aggressive cases in a clinical setting, and inform treatment decisions.

INTERPRETATION: For the first time in prostate cancer this study demonstrates the importance of integrated genomic analyses incorporating both benign and tumour tissue data in identifying molecular alterations leading to the generation of robust gene sets that are predictive of clinical outcome in independent patient cohorts.